Feedback & Feature Request: Local Voice Training & Export for Qwen3-TTS
Hello,
First of all, thank you for this excellent tool. I have been thoroughly testing Qwen3-TTS and the results are exceptional. The current workflow is remarkably efficient—it works perfectly without the overhead of Visual Studio or complex dependencies. This stability and ease of use in a lightweight environment are exactly what set this tool apart from more complex alternatives.
To take this project from a great tool for one-off clips to an essential tool for creative production, I would like to offer some feedback and a feature proposal regarding voice management:
Proposal: Persistent Voice Training & Model Management
1. Local Training: It would be a game-changer to have an integrated "Voice Training" module. This would allow users to train custom voice models based on their own high-quality recordings, going beyond simple cloning.
2. Persistence & Export: Crucially, these trained models must be exportable and backup-friendly (local files). This would allow users to secure their work on external storage (like NAS systems) and move between setups without losing their custom character voices.
3. Inference-time Style Control: Combining these custom, trained models with the current "Style Prompting" system would allow for the perfect balance of emotional nuance and character consistency during generation.
Use Case & Ethical Framework
I am currently putting a significant amount of effort into high-quality recording sessions to create consistent character voices. My goal is to use these for personal creative projects: producing high-quality, self-authored books and audio dramas to share with friends and family. I envision a local collaborative workflow where I can train and manage multiple unique voices to produce professional-grade audio plays, all within this clean and accessible environment.
By keeping this process local and user-controlled, you provide a secure way to work on original, ethical content without the need for complex developer environments. A built-in training and export feature would enable creators like me to build a sustainable, long-term library of characters that are safely backed up and managed.
Best regards,
Hisuinoi
P.S.: Thank you very much for the time you’ve invested in this excellent tool, which, in my opinion, is the best available on "Pinokio." If this feature were available, it would be absolutely perfect. I just started using it yesterday and ran it for about 1.5 hours to generate a short story—it ran smoothly and reliably. The "Voice Clone" feature works great, too. I’m using a 1:30-minute audio clip and plan to expand and test it today to improve the results even further.
