Store
Project page for ICCV 2025 paper "Controllable and Expressive One-Shot Video Head Swapping"
Robust Speech Recognition via Large-Scale Weak Supervision - openai/whisper
Unified Image Understanding and Generation. Text-to-Image Generation, In-context Generation, Instruction-guided Image Editing, Visual Understanding (Minimum Requirements 12GBV RAM / 48GB RAM, Recommended Requirements 24GB VRAM / 32GB RAM)
MIDIfren is an Audio Stem & MIDI Processor in Python🎵. Convert audio to MIDI, extract stems, sonify MIDI files ...
a state-of-the-art open-source model for fast feedforward 3D reconstruction from a single image, developed in collaboration between Tripo AI and Stability AI. https://huggingface.co/spaces/stabilityai/TripoSR
Contribute to bmaltais/kohya_ss development by creating an account on GitHub.
Contribute to presenton/presenton_docker development by creating an account on GitHub.
Contribute to presenton/presenton_electron development by creating an account on GitHub.
[NVIDIA Only] Dead simple web UI for training FLUX LoRA with LOW VRAM support (From 12GB)
Full-stack AI video generation app with image/text input and premium NSFW toggle
This repository contains the codes of "A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild", published at ACM Multimedia 2020. For HD commercial model, please try out Sync Labs
ICCV‑23 video in‑/out‑painting
Contribute to mannaandpoem/OpenManus development by creating an account on GitHub.
Image inpainting tool powered by SOTA AI models. Remove any unwanted object, defect, or even people from your pictures, and replace (powered by stable diffusion) anything in your pictures. https://www.iopaint.com/
Public repository of the slides2video app for pinokio
Text-to-Speech using IndicF5 for Indian languages
A GUI for music separation AI demucs
[NVIDIA ONLY] Direct3D-S2 is a scalable 3D shape generation framework leveraging sparse volumetric representations for high-resolution outputs. It features Spatial Sparse Attention (SSA), a novel mechanism that accelerates Diffusion Transformer computations on sparse data, achieving up to 9.6× speedup in training. The unified Sparse VAE architecture maintains a consistent sparse volumetric format across input, latent, and output stages, significantly improving efficiency and stability.
GLM-4-Voice | 端到端中英语音对话模型
