testsAndMisc-archive/python_pkg/music_gen/README.md
Krzysztof kuhy Rudnicki 15bc7bf34f music_gen: add segmented generation, Bark vocals, and song mixing
- Add segmented generation with crossfading for long audio (>30s)
- Add Bark integration for speech/vocal generation (--speech flag)
- Add full song generation with vocals over instrumental (--song flag)
- Auto-select MusicGen model size based on available VRAM
- Enforce CUDA for NVIDIA GPUs (no CPU fallback)
- Update README with new features and examples
2025-12-04 21:26:52 +01:00

146 lines
3.8 KiB
Markdown

# MusicGen - Local AI Music & Speech Generator
Generate music and speech/vocals from text prompts using Meta's MusicGen and Suno's Bark.
## Features
- **Music Generation**: Create instrumental music from text descriptions (MusicGen)
- **Long Audio Support**: Generate music of any length via automatic segmentation with crossfading
- **Speech/Vocals**: Generate speech and singing with Bark (optional)
- **CUDA Optimized**: Auto-detects GPU and selects best model for your VRAM
- **No API Keys**: Runs 100% locally on your hardware
## Quick Start
```bash
# 1. Run the setup script (creates venv, installs dependencies)
cd python_pkg/music_gen
./setup.sh
# 2. Activate the virtual environment
source venv/bin/activate
# 3. Generate music!
python music_generator.py "upbeat electronic dance music with synths"
```
## Usage
### Music Generation (MusicGen)
```bash
# Basic usage
python music_generator.py "jazz piano with soft drums"
# Set duration (any length supported via segmentation)
python music_generator.py --duration 60 "epic orchestral soundtrack"
# Generate a full 3-minute track
python music_generator.py --duration 180 "ambient electronic music"
# Use smaller/faster model
python music_generator.py --model small "rock guitar riff"
# Use larger/better quality model (needs 12GB+ VRAM)
python music_generator.py --model large "ambient electronic"
```
### Speech/Vocals Generation (Bark)
```bash
# First install Bark (not included in base setup)
pip install git+https://github.com/suno-ai/bark.git
# Generate speech
python music_generator.py --speech "Hello, how are you today?"
# Use different voice
python music_generator.py --speech --voice v2/en_speaker_3 "Welcome!"
# Generate singing
python music_generator.py --speech "♪ La la la, I love to sing ♪"
# With laughter and expression
python music_generator.py --speech "That's so funny! [laughter] I can't believe it."
```
**Bark special tokens:**
- `[laughter]`, `[laughs]`, `[sighs]`, `[gasps]` - expressions
- `[music]`, `[clears throat]` - sounds
- `♪` - singing
- `...` or `—` - hesitations
**Available voices:** `v2/en_speaker_0` through `v2/en_speaker_9`
### Interactive Mode
```bash
python music_generator.py --interactive
```
In interactive mode:
- Type prompts to generate music
- `:d 15` - Set duration to 15 seconds
- `:h` - Show example prompts
- `:q` - Quit
## Model Sizes (Auto-Selected by VRAM)
| Model | Size | VRAM | Quality | Speed |
| ------ | ------ | ----- | ------- | ------ |
| small | ~500MB | 3GB+ | Good | Fast |
| medium | ~3.3GB | 8GB+ | Better | Medium |
| large | ~6.5GB | 12GB+ | Best | Slow |
## Requirements
- Python 3.10+
- NVIDIA GPU with CUDA (required for NVIDIA systems)
- Apple Silicon supported via MPS
- 8GB+ VRAM recommended for best results
## Output
Generated audio files are saved to `./output/` as WAV files with timestamps.
## Example Prompts
- "upbeat electronic dance music with heavy bass"
- "calm acoustic guitar melody with soft percussion"
- "epic orchestral soundtrack with dramatic strings"
- "lo-fi hip hop beats for studying"
- "80s synthwave with retro vibes"
- "jazz piano trio with upright bass"
- "ambient electronic music for relaxation"
- "rock guitar riff with drums"
- "classical piano sonata in minor key"
## Troubleshooting
### Out of Memory
- Try `--model small` for lower VRAM usage
- Reduce duration with `--duration 10`
- Close other GPU applications
### Slow Generation
- Make sure GPU is detected (check output at startup)
- Use `--model small` for faster generation
- Reduce duration
### No Sound / Corrupted File
- Check if scipy is installed: `pip install scipy`
- Try a different audio player (VLC recommended)
### CUDA Not Available
If you see "NVIDIA GPU detected but CUDA is not available":
```bash
pip install torch --index-url https://download.pytorch.org/whl/cu121
```