Skip to content

Overview

vidbgm is a local-first Rust application for generating adaptive background music from video with Magenta. Media processing and music generation run on your machine, so the core workflow is free to use after setup. Hosted vision providers are optional and may have their own costs.

It has two user surfaces:

  • a CLI named vidbgm for analysis, generation, render, Magenta setup, status checks, and evals
  • a Tauri desktop app in desktop/ for the Video Music Studio workflow

The core pipeline is:

  1. Probe the video with ffprobe.
  2. Sample frames with ffmpeg.
  3. Analyze frames through a local or hosted vision provider.
  4. Convert frame observations into short weighted music prompt slots.
  5. Generate a local 48 kHz stereo WAV with Magenta RealTime or a fallback backend.
  6. Export audio-only, replace-video-audio, or mixed-original-audio output.

Mental Model

The app does not send long scene prose directly to Magenta. Frame analysis produces structured visual cues. The timeline layer turns those cues into short musical phrases, and the Magenta bridge receives weighted prompt slots such as Open road melodic techno or Driving motorik synth pulse.

CLI examples use sample-video.mov as a placeholder for your own local media. Large media files are ignored by git, so sample videos stay local.

Main Screens

Video import and sample preview

Scene moments used for timeline changes

Rust CLI and Tauri desktop docs for adaptive video background music generation.