- Add segmented generation with crossfading for long audio (>30s)
- Add Bark integration for speech/vocal generation (--speech flag)
- Add full song generation with vocals over instrumental (--song flag)
- Auto-select MusicGen model size based on available VRAM
- Enforce CUDA for NVIDIA GPUs (no CPU fallback)
- Update README with new features and examples
- Fail fast if NVIDIA GPU detected but CUDA unavailable (no CPU fallback)
- Auto-select largest model based on VRAM (large=12GB+, medium=8GB+)
- Remove torchaudio dependency (scipy handles audio I/O)
- Use safetensors format to avoid torch.load security issues
Features:
- Generate music from text prompts using open-source MusicGen model
- Support for small/medium/large models (500MB to 6.5GB)
- CUDA, Apple Silicon MPS, and CPU support
- Interactive mode with example prompts
- Setup script that handles venv and GPU detection
Usage:
cd python_pkg/music_gen && ./setup.sh
python music_generator.py 'upbeat electronic dance music'