Store
[NVIDIA GPU ONLY] One click installer for Intel's ldm3d
Dense Text-to-Image Generation with Attention Modulation
An open source implementation of Microsoft's VALL-E X zero-shot TTS model
Demo showcasing ~real-time Latent Consistency Model pipeline with Diffusers and a MJPEG stream server (https://github.com/radames/Real-Time-Latent-Consistency-Model)
Demo showcasing ~real-time Latent Consistency Model pipeline with Diffusers and a MJPEG stream server (https://github.com/radames/Real-Time-Latent-Consistency-Model)
Text-to-Video (T2V) generation framework from Vchitect https://github.com/Vchitect/LaVie
An AI powered mirror
A Realtime Creation Engine
Vid2DensePoseFeatured
Convert your videos to densepose and use it on MagicAnimate https://github.com/Flode-Labs/vid2densepose
Integrates Florence2 and SAM2 models for detailed image captioning and object detection. Florence2 generates detailed captions that are then used to perform phrase grounding. The Segment Anything Model 2 (SAM2) converts these phrase-grounded boxes into masks. https://huggingface.co/spaces/SkalskiP/florence-sam
Stable Diffusion web UI UX: https://github.com/anapnoe/stable-diffusion-webui-ux
AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation:https://github.com/Zejun-Yang/AniPortrait
Langflow is a dynamic graph where each node is an executable unit. Its modular and interactive design fosters rapid experimentation and prototyping, pushing hard on the limits of creativity: https://github.com/langflow-ai/langflow
Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding/ https://github.com/Tencent/HunyuanDiT
Your image is almost there!:https://github.com/lllyasviel/Omost
Drag & drop UI to build your customized LLM flow: https://github.com/FlowiseAI/Flowise
[Need 24GB VRAM] Cambrian-1 is a family of multimodal LLMs with a vision-centric design: https://github.com/cambrian-mllm/cambrian
Dough is a open source tool for steering AI animations with precision
moondream1 is a tiny (1.6B parameter) vision language model trained by @vikhyatk that performs on par with models twice its size. It is trained on the LLaVa training dataset, and initialized with SigLIP as the vision tower and Phi-1.5 as the text encoder. https://huggingface.co/spaces/vikhyatk/moondream1
