Store
Ultimate-TTS-StudioFeatured
Kokoro, KittenTTS, Higgs audio, Chatterbox/Multi, Fish-Speech, F5 & index-tts & indextts2, VoxCPM and VibeVoice in one app
StableDAWFeatured
Browser-based AI audio DAW for Stable Audio 3 with text-to-audio, inpainting, LoRA training, FFmpeg effects, waveform editing, sequencer, piano roll, and persistent library. https://github.com/gantasmo/stabledaw
ComfyuiFeatured
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface. https://github.com/comfyanonymous/ComfyUI
VoiceboxFeatured
Local-first voice synthesis studio powered by Qwen3-TTS.
ACE-Step 1.5Featured
The most powerful local music generation model that outperforms most commercial alternatives.
audiocraft_plusFeatured
AudioCraft Plus is an all-in-one WebUI for the original AudioCraft, adding many quality features on top https://github.com/GrandaddyShmax/audiocraft_plus
RC Stable Audio ToolsFeatured
Advanced Gradio UI for Stable Audio https://github.com/RoyalCities/RC-stable-audio-tools
Qwen3-TTSFeatured
Qwen3-TTS is an open-source series of TTS models developed by the Qwen team
MMAudioFeatured
Generate synchronized audio from video and/or text inputs https://github.com/hkchengrex/MMAudio
StableAudioFeatured
An Open Source Model for Audio Samples and Sound Design https://github.com/Stability-AI/stable-audio-tools
MAGNeTFeatured
MAGNeT is a text-to-music and text-to-sound model capable of generating high-quality audio samples conditioned on text descriptions https://github.com/facebookresearch/audiocraft/blob/main/docs/MAGNET.md
ZETAFeatured
Zero-Shot Text-Based Audio Editing Using DDPM Inversion https://huggingface.co/spaces/hilamanor/audioEditing
A multi-voice AI audiobook generator built on Qwen3-TTS — annotate scripts with an LLM, assign unique voices to each character, per-line style instructions for delivery, clone voices from reference audio, design new voices from text descriptions, train custom voices with LoRA fine-tuning, and export to MP3 or Audacity multi-track projects
X-Voice is a multilingual text-to-speech system that enables one speaker to speak 27 languages.
Native C++ AI music generation — no Python required
