Skip to content
Serafin Sanchez
Music Source Separation CLI with Python, PyTorch, and Batch Processing preview

Music Source Separation CLI with Python, PyTorch, and Batch Processing

I modified and extended an open-source tool for music source separation, enabling users to split audio tracks into stems from folders of audio files using a CLI. Widely used by musicians, producers, and researchers.

Python
PyTorch
AI
Audio Processing
Open Source

Project Overview

I worked on extending Demucs, an open-source tool that uses AI to split songs into their individual parts—like separating the vocals from the drums, bass, and other instruments. It's pretty cool tech that musicians and producers use all the time for remixes, karaoke tracks, or just messing around with their favorite songs. I added batch processing so you can throw a whole folder of tracks at it and let it do its thing, which saves a ton of time if you're working with lots of files.

Key Features

🎵 Advanced Music Source Separation

  • Multi-Stem Extraction: Separates vocals, drums, bass, and other stems from any audio file
  • Multiple Model Architectures: Original Demucs, Hybrid Demucs (v3), and Hybrid Transformer Demucs (v4)
  • State-of-the-Art Quality: Industry-leading separation results validated by benchmarks
  • Pretrained & Custom Models: High-quality pretrained models with option for custom training

🔧 Flexible Processing Options

  • Batch & Programmatic Processing: Separate multiple tracks via CLI or Python API
  • Multiple Audio Formats: Supports wav, mp3, flac, ogg/vorbis input and output
  • Advanced Options: Segment splitting, multi-shift prediction, and model bagging
  • Memory Efficiency: Optimized for various hardware configurations

💻 Developer-Friendly Architecture

  • Python API: Robust Separator class for programmatic integration
  • CLI Interface: Comprehensive command-line tools with progress reporting
  • Cross-Platform: CPU, CUDA GPU, and Apple MPS support
  • Extensible: Support for custom model training and fine-tuning

🏗️ Production-Ready Features

  • Distributed Training: Multi-GPU and distributed training support

  • Docker Integration: Containerized deployment options

  • Third-Party GUIs: Community-developed graphical interfaces

  • Real-Time Processing: Optimized for live audio processing scenarios

Architecture and Backend Design

Core Framework

  • Python 3.8+ with comprehensive library ecosystem
  • PyTorch for deep learning model implementation and training
  • torchaudio for audio I/O and preprocessing
  • FFmpeg integration for broad format support

Model Architectures

  • U-Net Based Models: Original Demucs architecture for efficient separation
  • Hybrid U-Net: v3 architecture combining time and frequency domain processing
  • Hybrid Transformer: v4 with cross-domain attention between waveform and spectrogram
  • Ensemble Support: Model bagging for improved separation quality

API Design

  • Separator Class: Clean programmatic interface for integration
  • File & Tensor Processing: Support for both file-based and tensor-based workflows
  • Parameter Management: Dynamic model and processing parameter updates
  • Memory Management: Efficient handling of large audio files

Training Infrastructure

  • Dora Integration: Experiment management and tracking
  • Hydra Configuration: Flexible configuration management system
  • Distributed Support: Multi-GPU and multi-node training capabilities
  • Model Zoo: Centralized repository of pretrained models

Technical Challenges Overcome

Efficient Long-File Processing

  • Segment Splitting: Intelligent audio segmentation for memory-constrained environments
  • Overlap Strategies: Sophisticated overlap handling for seamless reconstruction
  • Memory Optimization: Dynamic memory management for various hardware configurations
  • Quality Preservation: Maintained separation quality across segmentation boundaries

High-Quality Separation

  • Hybrid Architecture: Combined time and frequency domain processing
  • Transformer Integration: Cross-domain attention mechanisms for improved accuracy
  • Multi-Scale Processing: Different resolution processing for various audio components
  • Quality Metrics: Comprehensive evaluation against industry benchmarks

Cross-Platform Audio Support

  • Format Compatibility: Robust handling of diverse audio formats and codecs
  • Hardware Abstraction: Seamless operation across CPU, CUDA, and MPS devices
  • Performance Optimization: Hardware-specific optimizations for maximum efficiency
  • Dependency Management: Minimal dependencies with optional advanced features

Scalability Solutions

  • Batch Processing: Efficient processing of multiple files simultaneously
  • Parallel Processing: Multi-core and multi-GPU utilization
  • Resource Management: Dynamic resource allocation based on available hardware
  • Production Deployment: Docker containers and cloud deployment support

UI/UX Details

Command-Line Interface

  • Intuitive Commands: Clear, well-documented CLI with comprehensive help
  • Progress Feedback: Real-time progress bars and processing status
  • Error Handling: Informative error messages with troubleshooting guidance
  • Flexible Options: Extensive configuration options for advanced users

Python API

  • Clean Integration: Simple import and initialization for developers
  • Comprehensive Documentation: Detailed API documentation and examples
  • Type Hints: Full type annotation for better development experience
  • Error Handling: Robust exception handling with clear error messages

Third-Party Interfaces

  • GUI Applications: Community-developed graphical user interfaces

  • Web Demos: Browser-based demonstrations and testing platforms

  • Colab Integration: Google Colab notebooks for easy experimentation

  • Hugging Face Spaces: Online demos and model sharing platform

Demucs was a really rewarding project that brought together serious AI research with practical engineering and a thriving open-source community. It's been cool to see it make a real difference for musicians, researchers, and audio professionals who need to work with separated audio tracks.

Related Projects

Other projects you might find interesting

A web app for fast, accurate multi-language audio transcription and translation, featuring speaker diarization, custom speaker naming, and instant subtitle export.
Python
Streamlit
AssemblyAI
+2
A modern web app for musicians and producers to extract audio stems, manage credits, and process payments, all in a secure, scalable environment.
Next.js
React
Supabase
+3
A modern web app for browsing, analyzing, and making informed decisions about online auction events and items, featuring AI-powered value estimation and a responsive, animated UI.
React
Vite
Node.js
+3