Skip to content

CLI Guide

vidbgm is the scriptable surface for the same adaptive scoring pipeline used by the desktop app. Use it when you need repeatable runs, detailed artifacts, provider experiments, smoke checks, or video renders from a shell.

Build And Inspect

bash
cargo build
bash
target/debug/vidbgm --help

For a local command on your PATH, install from the repo root:

bash
cargo install --path .

After that, examples can use vidbgm ... instead of target/debug/vidbgm .... The CLI is not published to crates.io or Homebrew yet.

The CLI currently exposes:

CommandUse when
analyzeYou want frame analysis, extracted frames, contact sheets, and timeline inputs without generating audio.
generateYou want a generated WAV for a video.
renderYou want a video export with generated music replacing or mixing with source audio.
evalYou already have audio and a timeline and want an evaluation report.
vision-smokeYou want to verify frame extraction and vision-provider behavior before a full run.
music-smokeYou want to verify music generation without analyzing a video.
sample-packYou want a review folder with multiple style outputs for one video slice.
magenta-setupYou want to download or prepare Magenta model/resources.
magenta-statusYou want a readiness check for Magenta assets and bridge support.

Common Video And Vision Options

analyze, generate, render, and vision-smoke share the main video-analysis options:

OptionValues or defaultPurpose
--video PATHrequiredSource video. Streams are selected by media type, not index.
--prompt TEXTrequiredInitial musical direction. Keep it short and musical.
--frame-interval-seconds Ndefault: 5Sampling interval for frame analysis.
--max-frames NoptionalCap sampled frames for cheap tests.
--duration SECONDSdefault: full videoLimit analysis/generation/render duration.
--workdir PATHcommand-specificDirectory for frames, manifests, timelines, and reports.
--vision-provider NAMEdefault: openai-compatibleopenai-compatible, openai, or anthropic.
--vision-base-url URLprovider defaultLocal gateway or hosted provider root URL.
--vision-model MODELrequired for hosted providersVision model name.
--vision-api-key-env NAMEprovider defaultEnvironment variable holding an API key.
--vision-extra-header KEY=VALUErepeatableExtra headers for gateways or routing.
--vision-disable-reasoning BOOLdefault: true; values: true, falseDisable reasoning fields for compatibility.
--vision-profile PROFILEdefault: balanced; values: fast, balanced, qualityPrompt/detail profile.
--vision-frame-width PXoptionalResize extracted JPEGs before sending to vision.
--vision-jpeg-quality Ndefault: 75JPEG quality for vision frames.
--vision-max-tokens Ndefault: 180Vision response token cap.
--vision-timeout-seconds Ndefault: 60Per-request timeout.

Provider defaults:

ProviderDefault URLDefault key env
openai-compatiblehttp://localhost:1234/v1none
openaihttps://api.openai.com/v1OPENAI_API_KEY
anthropichttps://api.anthropic.comANTHROPIC_API_KEY

Analyze

Use analyze to confirm frame sampling, vision responses, and prompt-timeline inputs before spending time on audio generation.

bash
target/debug/vidbgm analyze \
  --video sample-video.mov \
  --prompt "uplifting motorik synth pulse for road cycling" \
  --frame-interval-seconds 30 \
  --max-frames 8 \
  --vision-timeout-seconds 120 \
  --workdir output/cli/analyze-sample \
  --out output/cli/analyze-sample/analysis.json

Typical artifacts:

  • frames_manifest.json
  • extracted JPEG frames
  • contact_sheet.jpg
  • analysis.json
  • timeline.json

Generate

Use generate when the desired output is a WAV.

bash
target/debug/vidbgm generate \
  --video sample-video.mov \
  --prompt "uplifting motorik synth pulse for road cycling" \
  --duration 60 \
  --frame-interval-seconds 15 \
  --workdir output/cli/generate-sample \
  --out-audio output/cli/generate-sample/music.wav \
  --magenta-backend auto \
  --magenta-model mrt2_small \
  --prompt-update-mode continuous

Music options:

OptionValues or defaultPurpose
--out-audio PATHrequiredGenerated WAV path.
--magenta-backend BACKENDdefault: auto; values: auto, bridge, synth, cliBackend selection.
--magenta-model MODELdefault: mrt2_smallMagenta model id.
--magenta-runtime RUNTIMEdefault: mlx; values: mlx, jaxRuntime for bridge/CLI backends.
--prompt-update-mode MODEdefault: segment-stitch; values: segment-stitch, continuousTimeline update behavior.

Use --magenta-backend bridge when missing native setup should fail loudly. Use --magenta-backend synth for deterministic local validation.

Render

Use render when the desired output is a video file.

bash
target/debug/vidbgm render \
  --video sample-video.mov \
  --prompt "uplifting motorik synth pulse for road cycling" \
  --duration 60 \
  --frame-interval-seconds 15 \
  --audio-mode mix \
  --music-volume-db -6 \
  --original-volume-db -9 \
  --workdir output/cli/render-mix \
  --out-video output/cli/render-mix/sample-video-mix.mov

Render options:

OptionValues or defaultPurpose
--audio-mode MODErequired; values: replace, mixReplace source audio or mix with it.
--out-video PATHrequiredRendered movie path.
--music-volume-db DBdefault: -3Generated-music gain in mix mode.
--original-volume-db DBdefault: -18Source-audio gain in mix mode.

For quick audible review, --music-volume-db -6 --original-volume-db -9 keeps source audio present while still making the generated score easy to judge.

Eval

Use eval when you already have a generated WAV and timeline.

bash
target/debug/vidbgm eval \
  --video sample-video.mov \
  --audio output/cli/generate-sample/music.wav \
  --timeline output/cli/generate-sample/timeline.json \
  --out output/cli/generate-sample/eval.md

The eval report is a lightweight project artifact for checking whether the music changes over the video as expected.

Smoke Checks

Use vision-smoke before full analysis when changing providers, models, image width, or prompt profile:

bash
target/debug/vidbgm vision-smoke \
  --video sample-video.mov \
  --prompt "uplifting motorik synth pulse for road cycling" \
  --max-frames 3 \
  --workdir output/cli/vision-smoke

Use music-smoke before video runs when changing Magenta setup:

bash
target/debug/vidbgm music-smoke \
  --prompt "steady motorik synth pulse" \
  --duration 10 \
  --out-audio output/cli/music-smoke.wav \
  --magenta-backend auto

Sample Pack

sample-pack creates a review folder for a short slice. It defaults to output/style-samples/initial-review, 30 seconds, 30 second frame interval, bridge backend, and replace render mode.

bash
target/debug/vidbgm sample-pack \
  --video sample-video.mov \
  --duration 30 \
  --workdir output/style-samples/initial-review \
  --magenta-backend synth

Pass --no-render when you only want analysis and music outputs.

Magenta Setup Commands

Check readiness:

bash
target/debug/vidbgm magenta-status --magenta-model mrt2_small

Prepare assets:

bash
target/debug/vidbgm magenta-setup \
  --magenta-model mrt2_small \
  --magenta-download-source hf \
  --yes

Setup options:

OptionValues or defaultPurpose
--magenta-model MODELdefault: mrt2_smallModel to verify or download.
--magenta-download-source SOURCEdefault: hf; values: hf, gcsAsset source.
--magenta-home PATHworkspace .magenta-home or MAGENTA_HOMEModel/resource root.
--yesdefault: falseAllow unattended first-time downloads.

Practical Run Order

For a new video or provider, work from cheapest to heaviest:

  1. magenta-status
  2. vision-smoke --max-frames 3
  3. analyze --max-frames 8
  4. music-smoke --duration 10
  5. generate --duration 60
  6. render --duration 60 --audio-mode mix
  7. eval against the generated music.wav and timeline.json

Rust CLI and Tauri desktop docs for adaptive video background music generation.