Store
Pinokio wrapper for LongCat-AudioDiT with selectable 1B / 3.5B model downloads.
Goose sidecar dashboard draft with Pinokio onboarding-focused launcher.
XTTSFeatured
clone voices into different languages by using just a quick 3-second audio clip. (a local version of https://huggingface.co/spaces/coqui/xtts)
Run the Open-Hivemind multi-agent orchestrator locally with Pinokio.
YouTube to MP3, Cohere transcription, TranslateGemma translation, OmniVoice TTS. https://github.com/PierrunoYT/VidLingo-Pinokio
Local-first AI audiobook production with voice cloning and chapter repair tools. This is the easiest way to install locally, including an optional demo voice library so you can start exploring right away. Live demo: senigami.github.io/audiobook-studio

YouTube to MP3, Cohere transcription, TranslateGemma translation.
[NVIDIA ONLY] The most efficient way to run FLUX (Optimized to run even on low memory machines, as low as 3GB VRAM with 512x512 resolution) https://github.com/lllyasviel/stable-diffusion-webui-forge
🎨 FLUX.2 [klein] - Fast text-to-image generation with Black Forest Labs' FLUX.2 models. 6 variants available: 4B/9B (full precision) plus NVFP4/FP8 quantized versions. Consumer GPUs (~13GB) to high-end (~29GB) for sub-second image generation with outstanding quality.
Liquid Audio - LFM2.5-Audio-1.5B: speech-to-speech, ASR, and TTS powered by Liquid AI.
Ultra-lightweight text-to-speech (15M-80M params) — CPU optimized, 8 voices, ONNX-powered
Instant, Ultra-Realistic Text-to-Speech
A web interface for the Moondream3 vision-language model featuring image captioning, visual question answering, object detection, and object pointing.
High-quality Text-to-Speech powered by VyvoTTS LFM2 model with easy-to-use web interface
Gradio web interface for Photoroom's PRX-1024-t2i-beta text-to-image model
🎵 YouTube to MP3 downloader with a simple Gradio UI. Paste a YouTube link to download MP3. Requires ffmpeg installed on your system.
🌍 TranslateGemma - Google's open-source multilingual translation AI. Translate text across 55+ languages and extract/translate text from images. Powered by Gemma 3 architecture.
Advanced text-to-speech with voice cloning, multi-speaker support, and background music generation using Higgs Audio V2
Advanced 3B parameter language model with Gradio web interface, GPU acceleration, and complete privacy
State-of-the-art open-source speech recognition model supporting 14 languages. 2B parameter ASR model from Cohere Labs.