Stems Separation

Split a song into vocals, drums, bass, and instruments via Demucs AI — in your browser.

Drop files anywhere or click to browse

MP3, WAV, FLAC — works best on full mixes

WebGPU not detected.

Stems separation will fall back to WASM, which works but is roughly 5–10× slower. For best performance, use Chrome 113+, Edge 113+, or Safari 18+.

Split a song into four separate stems — vocals, drums, bass, and other instruments — using Demucs, the open-source AI model that powers most professional stem-separation services. The model runs entirely in your browser via WebGPU. Your audio never leaves your device, which is a privacy story most stem-separation services can't match.

How it works

4-step walkthrough

  1. 1

    Drop a song

    MP3, WAV, FLAC, AAC, OGG, M4A. Works best on full mixes (vocals + instruments together).

  2. 2

    Wait for the model

    On first run, Demucs (~250 MB) downloads from Hugging Face's CDN. Cached after, so subsequent runs skip the download.

  3. 3

    Watch the chunked inference

    Demucs processes audio in 7-second chunks with 0.5-second overlap. Each chunk produces 4 stems × 2 channels of output. Progress shows current chunk.

  4. 4

    Preview, download, repeat

    Each stem appears with its own audio player and a download button. Download all four as a zip, or pick the ones you want.

Why use Dropvert

Local-first, free, no upload required

  • Browser-side AI — your music never gets uploaded to a third-party server. The standard for stem separation has always been "upload, wait, download"; this is the privacy-first alternative.
  • Same model (Demucs HT) used by LALAL.AI, Moises, and others. Quality is comparable to the paid services.
  • Four-stem output: vocals, drums, bass, and "other" (typically guitars, keys, leads).
  • Free, no signup, no watermark on the output.
  • Output is full-quality WAV — bit-exact, ready for further editing in any DAW.

Frequently asked questions

6 answered

Why does the first run take 10+ minutes?
Two compounding reasons: (1) the Demucs model is ~250 MB and downloads on first use; (2) the inference itself is heavy — even with WebGPU, expect 2–4× the song duration. A 3-minute song typically takes 6–12 minutes total on first run, 4–8 minutes after the model is cached.
How clean are the stems?
For most modern pop, rock, and electronic music: very clean. Vocals are typically 90%+ isolated; drums and bass similarly. Older recordings, very dense mixes, classical music, and bootleg-quality audio produce more bleed between stems. The quality is comparable to LALAL.AI and Moises which use the same Demucs model under the hood.
Can I get just vocals (or just instrumental)?
You always get four stems. To get an instrumental, mix the drums + bass + other tracks together in any DAW. To get just vocals, use the vocals stem directly. We could add a "vocals only / instrumental only" preset that does this combination automatically — let us know if there's demand.
What's the maximum song length?
Practically, 5–6 minutes is comfortable. Beyond that, browser memory becomes the bottleneck — the model needs to hold both the input and a 4× output in memory. Splitting longer songs in half and processing each separately is a workaround.
Will WebGPU work on my browser?
Chrome 113+, Edge 113+, Safari 18+, and Firefox Nightly with the flag enabled. Without WebGPU, the tool falls back to WASM execution — works but 5–10× slower.
Is my song uploaded?
No. The audio decoding (FFmpeg) and model inference (ONNX Runtime) both run entirely in your browser. The Demucs model file is fetched from Hugging Face's public CDN — that's the only network request, and it doesn't carry your audio.

Related tools

4 suggestions