mirror of https://github.com/kuhyx/testsAndMisc.git synced 2026-07-04 19:43:11 +02:00

History

Krzysztof kuhy Rudnicki 740726a3ae music_gen: add segmented generation, Bark vocals, and song mixing - Add segmented generation with crossfading for long audio (>30s) - Add Bark integration for speech/vocal generation (--speech flag) - Add full song generation with vocals over instrumental (--song flag) - Auto-select MusicGen model size based on available VRAM - Enforce CUDA for NVIDIA GPUs (no CPU fallback) - Update README with new features and examples		2025-12-04 21:26:52 +01:00
..
__init__.py	Add local AI music generator using Meta's MusicGen	2025-12-04 20:43:44 +01:00
music_generator.py	music_gen: add segmented generation, Bark vocals, and song mixing	2025-12-04 21:26:52 +01:00
README.md	music_gen: add segmented generation, Bark vocals, and song mixing	2025-12-04 21:26:52 +01:00
setup.sh	Add local AI music generator using Meta's MusicGen	2025-12-04 20:43:44 +01:00

README.md

MusicGen - Local AI Music & Speech Generator

Generate music and speech/vocals from text prompts using Meta's MusicGen and Suno's Bark.

Features

Music Generation: Create instrumental music from text descriptions (MusicGen)
Long Audio Support: Generate music of any length via automatic segmentation with crossfading
Speech/Vocals: Generate speech and singing with Bark (optional)
CUDA Optimized: Auto-detects GPU and selects best model for your VRAM
No API Keys: Runs 100% locally on your hardware

Quick Start

# 1. Run the setup script (creates venv, installs dependencies)
cd python_pkg/music_gen
./setup.sh

# 2. Activate the virtual environment
source venv/bin/activate

# 3. Generate music!
python music_generator.py "upbeat electronic dance music with synths"

Usage

Music Generation (MusicGen)

# Basic usage
python music_generator.py "jazz piano with soft drums"

# Set duration (any length supported via segmentation)
python music_generator.py --duration 60 "epic orchestral soundtrack"

# Generate a full 3-minute track
python music_generator.py --duration 180 "ambient electronic music"

# Use smaller/faster model
python music_generator.py --model small "rock guitar riff"

# Use larger/better quality model (needs 12GB+ VRAM)
python music_generator.py --model large "ambient electronic"

Speech/Vocals Generation (Bark)

# First install Bark (not included in base setup)
pip install git+https://github.com/suno-ai/bark.git

# Generate speech
python music_generator.py --speech "Hello, how are you today?"

# Use different voice
python music_generator.py --speech --voice v2/en_speaker_3 "Welcome!"

# Generate singing
python music_generator.py --speech "♪ La la la, I love to sing ♪"

# With laughter and expression
python music_generator.py --speech "That's so funny! [laughter] I can't believe it."

Bark special tokens:

[laughter], [laughs], [sighs], [gasps] - expressions
[music], [clears throat] - sounds
♪ - singing
... or — - hesitations

Available voices: v2/en_speaker_0 through v2/en_speaker_9

Interactive Mode

python music_generator.py --interactive

In interactive mode:

Type prompts to generate music
:d 15 - Set duration to 15 seconds
:h - Show example prompts
:q - Quit

Model Sizes (Auto-Selected by VRAM)

Model	Size	VRAM	Quality	Speed
small	~500MB	3GB+	Good	Fast
medium	~3.3GB	8GB+	Better	Medium
large	~6.5GB	12GB+	Best	Slow

Requirements

Python 3.10+
NVIDIA GPU with CUDA (required for NVIDIA systems)
Apple Silicon supported via MPS
8GB+ VRAM recommended for best results

Output

Generated audio files are saved to ./output/ as WAV files with timestamps.

Example Prompts

"upbeat electronic dance music with heavy bass"
"calm acoustic guitar melody with soft percussion"
"epic orchestral soundtrack with dramatic strings"
"lo-fi hip hop beats for studying"
"80s synthwave with retro vibes"
"jazz piano trio with upright bass"
"ambient electronic music for relaxation"
"rock guitar riff with drums"
"classical piano sonata in minor key"

Troubleshooting

Out of Memory

Try --model small for lower VRAM usage
Reduce duration with --duration 10
Close other GPU applications

Slow Generation

Make sure GPU is detected (check output at startup)
Use --model small for faster generation
Reduce duration

No Sound / Corrupted File

Check if scipy is installed: pip install scipy
Try a different audio player (VLC recommended)

CUDA Not Available

If you see "NVIDIA GPU detected but CUDA is not available":

pip install torch --index-url https://download.pytorch.org/whl/cu121