AI-Powered Text-to-Speech with Voice Cloning using Chatterbox TTS and Gradio interface. Includes Turbo, Multilingual (23+ languages), and Original models.
timoncool/VoxCPM2_portable-pinokiov6.0.0updated 1mo ago
ElevenLabs at home. Multilingual TTS with Voice Design, Voice Cloning, and end-to-end LoRA fine-tuning straight from a video or podcast. Built on VoxCPM2 by OpenBMB. 30 languages incl. Russian.
One-click launcher for the original HiDream-O1-Image web UI using lazy-downloaded drbaph Dev or Full FP8 checkpoints through a root FP8 runner. Requires an NVIDIA CUDA GPU.
Robust automatic speech recognition for challenging real-world audio. Handles noise, far-field, echo, reverberation, and more using a foundation model trained on 2.6M samples across 54 acoustic scenarios.
A unified image generation model that you can use to perform various tasks, including but not limited to text-to-image generation, subject-driven generation, Identity-Preserving Generation, and image-conditioned generation. https://huggingface.co/spaces/Shitao/OmniGen
AI-powered tool that automatically sorts thousands of gymnastics competition photos into folders by team and individual gymnast. Uses YOLOv8, CLIP, InsightFace, ReID and FAISS – fully offline, CUDA-accelerated.