CLI Guide
vidbgm is the scriptable surface for the same adaptive scoring pipeline used by the desktop app. Use it when you need repeatable runs, detailed artifacts, provider experiments, smoke checks, or video renders from a shell.
Build And Inspect
cargo buildtarget/debug/vidbgm --helpFor a local command on your PATH, install from the repo root:
cargo install --path .After that, examples can use vidbgm ... instead of target/debug/vidbgm .... The CLI is not published to crates.io or Homebrew yet.
The CLI currently exposes:
| Command | Use when |
|---|---|
analyze | You want frame analysis, extracted frames, contact sheets, and timeline inputs without generating audio. |
generate | You want a generated WAV for a video. |
render | You want a video export with generated music replacing or mixing with source audio. |
eval | You already have audio and a timeline and want an evaluation report. |
vision-smoke | You want to verify frame extraction and vision-provider behavior before a full run. |
music-smoke | You want to verify music generation without analyzing a video. |
sample-pack | You want a review folder with multiple style outputs for one video slice. |
magenta-setup | You want to download or prepare Magenta model/resources. |
magenta-status | You want a readiness check for Magenta assets and bridge support. |
Common Video And Vision Options
analyze, generate, render, and vision-smoke share the main video-analysis options:
| Option | Values or default | Purpose |
|---|---|---|
--video PATH | required | Source video. Streams are selected by media type, not index. |
--prompt TEXT | required | Initial musical direction. Keep it short and musical. |
--frame-interval-seconds N | default: 5 | Sampling interval for frame analysis. |
--max-frames N | optional | Cap sampled frames for cheap tests. |
--duration SECONDS | default: full video | Limit analysis/generation/render duration. |
--workdir PATH | command-specific | Directory for frames, manifests, timelines, and reports. |
--vision-provider NAME | default: openai-compatible | openai-compatible, openai, or anthropic. |
--vision-base-url URL | provider default | Local gateway or hosted provider root URL. |
--vision-model MODEL | required for hosted providers | Vision model name. |
--vision-api-key-env NAME | provider default | Environment variable holding an API key. |
--vision-extra-header KEY=VALUE | repeatable | Extra headers for gateways or routing. |
--vision-disable-reasoning BOOL | default: true; values: true, false | Disable reasoning fields for compatibility. |
--vision-profile PROFILE | default: balanced; values: fast, balanced, quality | Prompt/detail profile. |
--vision-frame-width PX | optional | Resize extracted JPEGs before sending to vision. |
--vision-jpeg-quality N | default: 75 | JPEG quality for vision frames. |
--vision-max-tokens N | default: 180 | Vision response token cap. |
--vision-timeout-seconds N | default: 60 | Per-request timeout. |
Provider defaults:
| Provider | Default URL | Default key env |
|---|---|---|
openai-compatible | http://localhost:1234/v1 | none |
openai | https://api.openai.com/v1 | OPENAI_API_KEY |
anthropic | https://api.anthropic.com | ANTHROPIC_API_KEY |
Analyze
Use analyze to confirm frame sampling, vision responses, and prompt-timeline inputs before spending time on audio generation.
target/debug/vidbgm analyze \
--video sample-video.mov \
--prompt "uplifting motorik synth pulse for road cycling" \
--frame-interval-seconds 30 \
--max-frames 8 \
--vision-timeout-seconds 120 \
--workdir output/cli/analyze-sample \
--out output/cli/analyze-sample/analysis.jsonTypical artifacts:
frames_manifest.json- extracted JPEG frames
contact_sheet.jpganalysis.jsontimeline.json
Generate
Use generate when the desired output is a WAV.
target/debug/vidbgm generate \
--video sample-video.mov \
--prompt "uplifting motorik synth pulse for road cycling" \
--duration 60 \
--frame-interval-seconds 15 \
--workdir output/cli/generate-sample \
--out-audio output/cli/generate-sample/music.wav \
--magenta-backend auto \
--magenta-model mrt2_small \
--prompt-update-mode continuousMusic options:
| Option | Values or default | Purpose |
|---|---|---|
--out-audio PATH | required | Generated WAV path. |
--magenta-backend BACKEND | default: auto; values: auto, bridge, synth, cli | Backend selection. |
--magenta-model MODEL | default: mrt2_small | Magenta model id. |
--magenta-runtime RUNTIME | default: mlx; values: mlx, jax | Runtime for bridge/CLI backends. |
--prompt-update-mode MODE | default: segment-stitch; values: segment-stitch, continuous | Timeline update behavior. |
Use --magenta-backend bridge when missing native setup should fail loudly. Use --magenta-backend synth for deterministic local validation.
Render
Use render when the desired output is a video file.
target/debug/vidbgm render \
--video sample-video.mov \
--prompt "uplifting motorik synth pulse for road cycling" \
--duration 60 \
--frame-interval-seconds 15 \
--audio-mode mix \
--music-volume-db -6 \
--original-volume-db -9 \
--workdir output/cli/render-mix \
--out-video output/cli/render-mix/sample-video-mix.movRender options:
| Option | Values or default | Purpose |
|---|---|---|
--audio-mode MODE | required; values: replace, mix | Replace source audio or mix with it. |
--out-video PATH | required | Rendered movie path. |
--music-volume-db DB | default: -3 | Generated-music gain in mix mode. |
--original-volume-db DB | default: -18 | Source-audio gain in mix mode. |
For quick audible review, --music-volume-db -6 --original-volume-db -9 keeps source audio present while still making the generated score easy to judge.
Eval
Use eval when you already have a generated WAV and timeline.
target/debug/vidbgm eval \
--video sample-video.mov \
--audio output/cli/generate-sample/music.wav \
--timeline output/cli/generate-sample/timeline.json \
--out output/cli/generate-sample/eval.mdThe eval report is a lightweight project artifact for checking whether the music changes over the video as expected.
Smoke Checks
Use vision-smoke before full analysis when changing providers, models, image width, or prompt profile:
target/debug/vidbgm vision-smoke \
--video sample-video.mov \
--prompt "uplifting motorik synth pulse for road cycling" \
--max-frames 3 \
--workdir output/cli/vision-smokeUse music-smoke before video runs when changing Magenta setup:
target/debug/vidbgm music-smoke \
--prompt "steady motorik synth pulse" \
--duration 10 \
--out-audio output/cli/music-smoke.wav \
--magenta-backend autoSample Pack
sample-pack creates a review folder for a short slice. It defaults to output/style-samples/initial-review, 30 seconds, 30 second frame interval, bridge backend, and replace render mode.
target/debug/vidbgm sample-pack \
--video sample-video.mov \
--duration 30 \
--workdir output/style-samples/initial-review \
--magenta-backend synthPass --no-render when you only want analysis and music outputs.
Magenta Setup Commands
Check readiness:
target/debug/vidbgm magenta-status --magenta-model mrt2_smallPrepare assets:
target/debug/vidbgm magenta-setup \
--magenta-model mrt2_small \
--magenta-download-source hf \
--yesSetup options:
| Option | Values or default | Purpose |
|---|---|---|
--magenta-model MODEL | default: mrt2_small | Model to verify or download. |
--magenta-download-source SOURCE | default: hf; values: hf, gcs | Asset source. |
--magenta-home PATH | workspace .magenta-home or MAGENTA_HOME | Model/resource root. |
--yes | default: false | Allow unattended first-time downloads. |
Practical Run Order
For a new video or provider, work from cheapest to heaviest:
magenta-statusvision-smoke --max-frames 3analyze --max-frames 8music-smoke --duration 10generate --duration 60render --duration 60 --audio-mode mixevalagainst the generatedmusic.wavandtimeline.json