
Fragmenta
An All-in-One Pipeline for Training and Using Text-to-Audio Models
made for experimental music
User Friendly Interface
A clean, intuitive interface designed for ease of use.

Why?
With AI seemingly everywhere, most tools are either locked behind subscription models designed for consumers, or they require deep technical knowledge, coding skills, and a huge time investment to use effectively. At the same time, the technology's narrative is being driven by big tech, turning access into a commodity. These technofeudal structures limit who gets to shape the future of AI and how it can be used creatively.
There also exist the ethical problems. Much of today’s AI is trained on vast amounts of data scraped from the internet, without permission, often infringing on intellectual property rights. Built on open research from Stability AI, Fragmenta shows that ethically trained, personalized models can empower musicians and audio creators, without infringing copyrights or compromising artistic integrity.
Participating in how the narative of technology is being shaped can be a form of democratic intervention. Fragmenta exists to enable artists to train models on their own work, have a transparent understanding of their AI carbon footprint and use AI on their own terms rather than as a product. And most importantly, all of this stays local. Your music or recordings never leave your device.
Sound Examples
Fine-tuned outputs generated with Fragmenta — trained on personal audio data.
“[weird drum beat, 130 bpm]”
“[arpegio, light, minor, 130 bpm]”
“[drum beat]”
“[noisy ambient, texture, full spectrum]”
Three Ways to Run
Choose the setup that is accessible to you .
Hugging Face Spaces
No installation needed. Run Fragmenta directly in your browser. CPU inference — slow, ideal for exploring the interface. Limited GPU sessions available on request.
Open Space ↗Docker
Fastest local setup — no Python or Node.js required. Pull the image and run. Separate images available for NVIDIA GPU and CPU (Mac / Linux / Windows).
--gpus all mazcode/fragmenta:gpu
Run Locally
Clone the repo and run the launcher script. Installs everything in an isolated folder — deleting it removes all dependencies. Supports Linux, Windows, and macOS.
./run.sh # Linux
./run.bat # Windows
Three Modules
Data Processing
Add your audio files and let Fragmenta handle the dataset creation:
- Drag & drop audio file upload
- Annotation field for tagging
- Automatic dataset creation
- Automated metadata creation
Model Training
Fine-tune text-to-audio models with advanced configuration options:
- Stable Audio Open integration
- Custom training parameters
- Real-time loss monitoring
- One-click checkpoint unwrap
Audio Generation
Generate 44.1kHz stereo audio from text prompts using trained or base models:
- Ready to use even without fine-tuning
- Multiple model support
- Configurable duration
- Instant download
Real-time Monitoring
Training Monitor
LiveImportant Information
System Requirements
Performance Reference
Prerequisites
IMPORTANT: Fragmenta is built on open‑source research and models. It currently uses open‑source diffusion models from Stability AI, with support for additional architectures planned in the future. Fragmenta is intended for experimental music and does not create realistic audio. The project is in active development and not intended for production use. The beta is available now and free to use. Feedback will help shape its future development.
All models used in Fragmenta are subject to the terms and conditions of their respective licenses. By using Fragmenta—especially for commercial purposes—you agree to comply with these licensing terms. The creator of Fragmenta assumes no responsibility for how the software or models are used. Users are solely responsible for ensuring their own compliance with applicable laws and licenses.