Fragmenta Logo

Fragmenta

An All-in-One Pipeline for Training and Using Text-to-Audio Models

made for experimental music

User Friendly Interface

A clean, intuitive interface designed for ease of use.

Fragmenta App Interface

Why?

With AI seemingly everywhere, most tools are either locked behind subscription models designed for consumers, or they require deep technical knowledge, coding skills, and a huge time investment to use effectively. At the same time, the technology's narrative is being driven by big tech, turning access into a commodity. These technofeudal structures limit who gets to shape the future of AI and how it can be used creatively.

There also exist the ethical problems. Much of today’s AI is trained on vast amounts of data scraped from the internet, without permission, often infringing on intellectual property rights. Built on open research from Stability AI, Fragmenta shows that ethically trained, personalized models can empower musicians and audio creators, without infringing copyrights or compromising artistic integrity.

Participating in how the narative of technology is being shaped can be a form of democratic intervention. Fragmenta exists to enable artists to train models on their own work, have a transparent understanding of their AI carbon footprint and use AI on their own terms rather than as a product. And most importantly, all of this stays local. Your music or recordings never leave your device.

Sound Examples

Fine-tuned outputs generated with Fragmenta — trained on personal audio data.

“[weird drum beat, 130 bpm]”

0:000:00

“[arpegio, light, minor, 130 bpm]”

0:000:00

“[drum beat]”

0:000:00

“[noisy ambient, texture, full spectrum]”

0:000:00

Three Ways to Run

Choose the setup that is accessible to you .

Hugging Face

Hugging Face Spaces

No installation needed. Run Fragmenta directly in your browser. CPU inference — slow, ideal for exploring the interface. Limited GPU sessions available on request.

Open Space ↗
Docker

Docker

Fastest local setup — no Python or Node.js required. Pull the image and run. Separate images available for NVIDIA GPU and CPU (Mac / Linux / Windows).

docker run -d -p 5001:5001 \
  --gpus all mazcode/fragmenta:gpu
Docker Hub ↗
💻

Run Locally

Clone the repo and run the launcher script. Installs everything in an isolated folder — deleting it removes all dependencies. Supports Linux, Windows, and macOS.

./run.command  # macOS
./run.sh      # Linux
./run.bat     # Windows
View on GitHub ↗

Three Modules

Data Processing

Add your audio files and let Fragmenta handle the dataset creation:

  • Drag & drop audio file upload
  • Annotation field for tagging
  • Automatic dataset creation
  • Automated metadata creation

Model Training

Fine-tune text-to-audio models with advanced configuration options:

  • Stable Audio Open integration
  • Custom training parameters
  • Real-time loss monitoring
  • One-click checkpoint unwrap

Audio Generation

Generate 44.1kHz stereo audio from text prompts using trained or base models:

  • Ready to use even without fine-tuning
  • Multiple model support
  • Configurable duration
  • Instant download

Real-time Monitoring

Training Monitor

Live
Current Epoch
7 / 10
Current Step
245 / 350
Current Loss
0.0847
GPU Memory
14.2GB
Progress70%
Best Loss: 0.0821 | ETA: 12 minutes

Important Information

System Requirements

~15GB storage space (including large model)
Internet connection required for installation
CUDA GPU with >= 8GB VRAM (Recommended)
Non-CUDA systems supported (very limited performance)

Performance Reference

NVIDIA GPU~3s for 10s audio
Apple Silicon~9m for 10s audio (M1)
Hugging Face SpacesCPU — very slow, no setup

Prerequisites

Free Hugging Face account required
Accepting Stable Audio Open T&C
Basic understanding of AI (Recommended)
A considerable amount of your own audio data (Requierd for fine-tuning)

IMPORTANT: Fragmenta is built on open‑source research and models. It currently uses open‑source diffusion models from Stability AI, with support for additional architectures planned in the future. Fragmenta is intended for experimental music and does not create realistic audio. The project is in active development and not intended for production use. The beta is available now and free to use. Feedback will help shape its future development.

All models used in Fragmenta are subject to the terms and conditions of their respective licenses. By using Fragmenta—especially for commercial purposes—you agree to comply with these licensing terms. The creator of Fragmenta assumes no responsibility for how the software or models are used. Users are solely responsible for ensuring their own compliance with applicable laws and licenses.

This website uses cookies for functionality and analytics (only with your consent). I do not save or sell your personal data. External links may track you through their own services.