testsAndMisc/python_pkg/music_gen
Krzysztof kuhy Rudnicki ee27d10fef Reduce per-file-ignores by fixing lint violations across codebase
Fix ruff violations in ~15 source files and ~60+ test files to minimize
per-file-ignores in pyproject.toml. Remaining ignores are justified with
comments explaining why each suppression is necessary.

Source fixes: FBT003 (keyword args), S310 (URL validation), SLF001
(private access), T201 (print→logging), C901 (complexity), E501 (line
length), E402 (import order).

Test fixes: SIM117 (combined with), FBT (boolean args), PERF203 (try in
loop), S310/S607 (URLs/executables), E402/E501 (imports/lines), S108
(tmp paths), PLR0913 (too many args), ARG (unused args), ANN (type
annotations), RUF059 (unused unpacked vars), PT019 (fixture naming).

Remaining per-file-ignores (with justifications):
- Tests: ARG, D, PLC0415, PLR2004, S101, SLF001
- music_gen sources: PLC0415 (heavy ML lazy imports)
- moviepy_showcase: PLC0415 (circular dependency)
- generate_images: PLR0913 (matplotlib helpers need many params)
- praca_magisterska_video: E501, E402 (long paths, mpl.use)
2026-03-25 18:58:05 +01:00
..
tests Reduce per-file-ignores by fixing lint violations across codebase 2026-03-25 18:58:05 +01:00
__init__.py Add local AI music generator using Meta's MusicGen 2025-12-04 20:43:44 +01:00
_music_generation.py fix: resolve all pre-commit hook failures after file splits 2026-03-18 22:20:05 +01:00
_music_speech.py fix: resolve all pre-commit hook failures after file splits 2026-03-18 22:20:05 +01:00
music_generator.py Reduce per-file-ignores by fixing lint violations across codebase 2026-03-25 18:58:05 +01:00
README.md music_gen: add segmented generation, Bark vocals, and song mixing 2025-12-04 21:26:52 +01:00
run.sh feat: added run sh and makefile scripts 2026-02-22 22:00:50 +01:00
setup.sh Add local AI music generator using Meta's MusicGen 2025-12-04 20:43:44 +01:00

MusicGen - Local AI Music & Speech Generator

Generate music and speech/vocals from text prompts using Meta's MusicGen and Suno's Bark.

Features

  • Music Generation: Create instrumental music from text descriptions (MusicGen)
  • Long Audio Support: Generate music of any length via automatic segmentation with crossfading
  • Speech/Vocals: Generate speech and singing with Bark (optional)
  • CUDA Optimized: Auto-detects GPU and selects best model for your VRAM
  • No API Keys: Runs 100% locally on your hardware

Quick Start

# 1. Run the setup script (creates venv, installs dependencies)
cd python_pkg/music_gen
./setup.sh

# 2. Activate the virtual environment
source venv/bin/activate

# 3. Generate music!
python music_generator.py "upbeat electronic dance music with synths"

Usage

Music Generation (MusicGen)

# Basic usage
python music_generator.py "jazz piano with soft drums"

# Set duration (any length supported via segmentation)
python music_generator.py --duration 60 "epic orchestral soundtrack"

# Generate a full 3-minute track
python music_generator.py --duration 180 "ambient electronic music"

# Use smaller/faster model
python music_generator.py --model small "rock guitar riff"

# Use larger/better quality model (needs 12GB+ VRAM)
python music_generator.py --model large "ambient electronic"

Speech/Vocals Generation (Bark)

# First install Bark (not included in base setup)
pip install git+https://github.com/suno-ai/bark.git

# Generate speech
python music_generator.py --speech "Hello, how are you today?"

# Use different voice
python music_generator.py --speech --voice v2/en_speaker_3 "Welcome!"

# Generate singing
python music_generator.py --speech "♪ La la la, I love to sing ♪"

# With laughter and expression
python music_generator.py --speech "That's so funny! [laughter] I can't believe it."

Bark special tokens:

  • [laughter], [laughs], [sighs], [gasps] - expressions
  • [music], [clears throat] - sounds
  • - singing
  • ... or - hesitations

Available voices: v2/en_speaker_0 through v2/en_speaker_9

Interactive Mode

python music_generator.py --interactive

In interactive mode:

  • Type prompts to generate music
  • :d 15 - Set duration to 15 seconds
  • :h - Show example prompts
  • :q - Quit

Model Sizes (Auto-Selected by VRAM)

Model Size VRAM Quality Speed
small ~500MB 3GB+ Good Fast
medium ~3.3GB 8GB+ Better Medium
large ~6.5GB 12GB+ Best Slow

Requirements

  • Python 3.10+
  • NVIDIA GPU with CUDA (required for NVIDIA systems)
  • Apple Silicon supported via MPS
  • 8GB+ VRAM recommended for best results

Output

Generated audio files are saved to ./output/ as WAV files with timestamps.

Example Prompts

  • "upbeat electronic dance music with heavy bass"
  • "calm acoustic guitar melody with soft percussion"
  • "epic orchestral soundtrack with dramatic strings"
  • "lo-fi hip hop beats for studying"
  • "80s synthwave with retro vibes"
  • "jazz piano trio with upright bass"
  • "ambient electronic music for relaxation"
  • "rock guitar riff with drums"
  • "classical piano sonata in minor key"

Troubleshooting

Out of Memory

  • Try --model small for lower VRAM usage
  • Reduce duration with --duration 10
  • Close other GPU applications

Slow Generation

  • Make sure GPU is detected (check output at startup)
  • Use --model small for faster generation
  • Reduce duration

No Sound / Corrupted File

  • Check if scipy is installed: pip install scipy
  • Try a different audio player (VLC recommended)

CUDA Not Available

If you see "NVIDIA GPU detected but CUDA is not available":

pip install torch --index-url https://download.pytorch.org/whl/cu121