Wanted
1,616 projectsNon-launcher projects without a Pinokio launcher yet.
A lightweight yet powerful audio-to-MIDI converter with pitch bend detection
A lightweight yet powerful audio-to-MIDI converter with pitch bend detection.
Contribute to igdev116/phoma-releases development by creating an account on GitHub.
Miso TTS is an 8 billion, highly emotive text-to-speech model
Rust CLI for AI subtitle workflows: transcribe, segment, translate, evaluate, and burn or mux subtitles.
金融团队(A股版):多Agent投研决策系统,深度改造自 virattt/ai-hedge-fund。
A WebUI to create song covers with any RVC v2 trained AI voice from YouTube videos or audio files.
Open source voice AI platform. Self-hosted alternative to Vapi and Retell. On Prem, BYOK across Speech to Speech or LLM/STT/TTS, with a visual workflow builder, MCP native and telephony support.
Agentic looper
Contribute to YunxuanMao/SAM2-GUI development by creating an account on GitHub.
📡 Your own AI-powered news radar. Generates daily briefings in English & Chinese. | 用 AI 构建你专属的新闻雷达
🔊 Text-Prompted Generative Audio Model
A variety of custom ComfyUI nodes and workflows
Official Implementations for Paper - MagicQuillV2: Precise and Interactive Image Editing with Layered Visual Cues
Contribute to mikecastrodemaria/RhythmBeatDetection development by creating an account on GitHub.
Track-Anything is a flexible and interactive tool for video object tracking and segmentation, based on Segment Anything, XMem, and E2FGVI.
Official implementation of the CVPR 2026 paper "SonoWorld: From One Image to a 3D Audio-Visual Scene."
Gradio WebUI for creators and developers, featuring key TTS (Edge-TTS, kokoro) and zero-shot Voice Cloning (E2 & F5-TTS, CosyVoice), with Whisper audio processing, YouTube download, Demucs vocal isolation, and multilingual translation.
Large-scale LLM inference engine
Stable Diffusion web UI
