Pierre Bruno

@pierrunoyt

3 posts21 checkpointsJoined 1/27/2026, 9:46:34 AM

@pierrunoyt

Activity 24 Posts 3 Checkpoints 21 Apps 8 Creations 50 Following 0 Followers 4

Creations by @pierrunoyt

50 total

PRX-1024 Text-to-ImageUpdated 4 weeks ago

https://github.com/PierrunoYT/Photoroom-PRX-Pinokio

Gradio web interface for Photoroom's PRX-1024-t2i-beta text-to-image model

VyvoTTS LFM2Updated 4 weeks ago

https://github.com/PierrunoYT/VyvoTTS-LFM2-Pinokio

High-quality Text-to-Speech powered by VyvoTTS LFM2 model with easy-to-use web interface

Moondream3 Gradio UIUpdated 4 weeks ago

https://github.com/PierrunoYT/moondream-3-pinokio

A web interface for the Moondream3 vision-language model featuring image captioning, visual question answering, object detection, and object pointing.

Soprano TTSUpdated 4 weeks ago

https://github.com/PierrunoYT/soprano-tts-pinokio

Instant, Ultra-Realistic Text-to-Speech

KittenTTS 😻Updated 4 weeks ago

https://github.com/PierrunoYT/KittenTTS-Pinokio

Ultra-lightweight text-to-speech (15M-80M params) — CPU optimized, 8 voices, ONNX-powered

Liquid AudioUpdated 4 weeks ago

https://github.com/PierrunoYT/liquid-audio-pinokio

Liquid Audio - LFM2.5-Audio-1.5B: speech-to-speech, ASR, and TTS powered by Liquid AI.

VoxCPM 2Updated 4 weeks ago

https://github.com/PierrunoYT/VoxCPM-2-Pinokio

Tokenizer-free TTS for context-aware speech, voice cloning, and voice design. 2B params, 48kHz, 30 languages (Gradio UI).

LFM2.5-450M-VLUpdated 4 weeks ago

https://github.com/PierrunoYT/LFM2.5-450M-VL-Pinokio

LFM2.5-VL-450M (Liquid AI): compact vision–language model for image understanding. Gradio UI with upload/URL, prompt, and generation sliders.

Z-Image-TurboUpdated 4 weeks ago

https://github.com/PierrunoYT/Z-Image-Pinokio

⚡️ Efficient 6B parameter image generation model with sub-second inference. Generate high-quality, photorealistic images with only 8 inference steps. Features bilingual text rendering (Chinese & English) and Single-Stream Diffusion Transformer architecture.

Cohere TranscribeUpdated 4 weeks ago

https://github.com/PierrunoYT/cohere-transcribe-pinokio

State-of-the-art open-source speech recognition model supporting 14 languages. 2B parameter ASR model from Cohere Labs.

GLM-TTSUpdated 4 weeks ago

https://github.com/PierrunoYT/GLM-TTS-Pinokio

🎙️ Controllable & Emotion-Expressive Zero-shot TTS with Multi-Reward Reinforcement Learning. High-quality text-to-speech synthesis supporting zero-shot voice cloning and streaming inference with natural emotional expression.

OrpheusTTSUpdated 4 weeks ago

https://github.com/PierrunoYT/OrpheusTTS-Pinokio

Standalone Text-to-Speech using Orpheus TTS with a Gradio UI

LuxTTS 🎙️Updated 4 weeks ago

https://github.com/PierrunoYT/LuxTTS-Pinokio

High-quality rapid TTS voice cloning model (150x+ realtime) — 48kHz speech, voice cloning

Audio Flamingo 3Updated 4 weeks ago

https://github.com/PierrunoYT/Audio-Flamingo-3-Pinokio

NVIDIA's Audio Flamingo 3 - Large Audio-Language Model for speech, sound, and music understanding with Gradio web interface

Supertonic 3Updated 4 weeks ago

https://github.com/PierrunoYT/Supertonic-3-Pinokio

Lightning-Fast, On-Device, Multilingual TTS — Gradio, ONNX, 44.1kHz

Hy-MT2Updated 4 weeks ago

https://github.com/PierrunoYT/Tencent-HY-MT2-Pinokio

Hy-MT2 multilingual translation — Gradio UI for 33-language translation with Hy-MT2-1.8B, Hy-MT2-7B, and Hy-MT2-30B-A3B.

ChatterBoxUpdated 4 weeks ago

https://github.com/PierrunoYT/chatterbox-tts-pinokio

AI-Powered Text-to-Speech with Voice Cloning using Chatterbox TTS and a Gradio interface. Includes Turbo, Multilingual (23+ languages), and Original models. Runs locally; CUDA GPU recommended, CPU supported. Windows, Mac, and Linux.

PocketTTSUpdated 4 weeks ago

https://github.com/PierrunoYT/pocket-tts-pinokio

Lightweight CPU text-to-speech with preset voices and optional Hugging Face-authenticated voice cloning.

ChatterBoxUpdated last month

https://github.com/PierrunoYT/chatterbox-tts-app

AI-Powered Text-to-Speech with Voice Cloning using Chatterbox TTS and Gradio interface. Includes Turbo, Multilingual (23+ languages), and Original models.

PiDUpdated last month

https://github.com/PierrunoYT/NVIDIA-PiD-Pinokio

NVIDIA PiD — Pixel Diffusion Decoder for high-resolution latent decoding. Gradio UI for Z-Image + 4× PiD upscale, plus CLI demos for Flux.

Creations

More · 50

Transcribr

Bulk transcribe many YouTube videos, whole playlists, or your own uploaded audio/video files at once with faster-whisper. Outputs txt, srt, vtt, or json.Updated 15 hours ago

PersonaPlex

🗣️ PersonaPlex - NVIDIA's real-time speech-to-speech conversational AI model. Natural full-duplex conversations with customizable personas and voices.Updated 5 days ago

OmniVoice

Zero-shot multilingual TTS (600+ languages) with voice cloning and voice design — Gradio UI (app/app.py)Updated 5 days ago

Higgs Audio v3 TTS

Pinokio launcher for Higgs Audio v3 TTS with Gradio UI, SGLang-Omni backend, and automatic model download.Updated 5 days ago

DramaBox

Expressive TTS with voice cloning, prompt-driven speech synthesis built on LTX-2.3 by Resemble AIUpdated 5 days ago

Sana

Fast Image Generation with Sana Diffusion ModelUpdated 5 days ago

ScribeTube

Download and transcribe many YouTube videos or whole playlists at once with faster-whisper. Outputs txt, srt, vtt, or json.Updated last week

MOSS-TTS

All-in-one Gradio UI for the MOSS-TTS Family: voice cloning, dialogue generation, voice design from text, and sound effects.Updated last week

Ideogram 4 Studio

Ideogram 4 (nf4) open-weights text-to-image model (9.3B params, Qwen3-VL-8B text encoder, structured JSON prompting, native 2k resolution)Updated 2 weeks ago

PRX Pixel

Pixel-space PRX text-to-image pipeline (~7B params, Qwen3-VL text encoder, no VAE)Updated 2 weeks ago