Kyrgz AI speech model extends language support

Gadget Staff

5 months ago

An advance in open-source generative AI has emerged from an unexpected source: the central Asian republic of Kyrgyzstan.

The AI startup NineNineSix has released Kani TTS 2, a next-generation open-source text-to-speech (TTS) model that, it says, significantly extends generation length, improves stability, and reinforces its mission to bring high-quality speech AI to underrepresented languages.

The new version introduces a stable generation of up to 40 seconds of continuous speech in a single pass, more than doubling the practical limit of the previous release. The model is trending on Hugging Face, currently ranking among the top TTS models on the platform.

A structural upgrade

The original Kani TTS gained attention for its lightweight architecture, efficient deployment, and multilingual adaptability. It was adopted by developers beyond its core team and has been used as a foundation for community-trained models in Urdu, Vietnamese, Turkish, and Creole, among others. Kani TTS 2 builds on that momentum.

NineNineSix says the expanded generation window enables:

Long-form responses for conversational AI agents
Multi-turn dialogue synthesis
Extended narration and content production
More natural prosodic flow in continuous speech

The architecture remains optimised for efficiency, requiring approximately 3 GB of GPU memory, making it suitable for both local and server deployments.

Zero-shot voice cloning

Kani TTS 2 supports zero-shot voice cloning, allowing developers to replicate a speaker’s tone and style from a short audio reference without additional fine-tuning.

One of the most consequential decisions by the team was releasing the full pretraining code. This enables organizations and research groups to train TTS systems from scratch for any language, dialect, or domain.

“Kani TTS 2 is the next step after our first release: we made speech generation more stable and enabled the model to produce longer audio segments,” says Nursultan Bakashov, co-founder of nineninesix.ai. “We focus on compact and open models – they are easier to deploy and adapt to different languages and accents, including low-resource ones.

“For us, it is important to demonstrate that world-class technologies can be built in Kyrgyzstan. That is why we released not only the model weights, but the entire pretraining code – so any team can train a TTS system from scratch for their own language.”

Language expansion

The model currently supports:

English
Spanish
Kyrgyz

Support for Kyrgyz is particularly notable, as it demonstrates the feasibility of building high-quality TTS for low-resource languages.

The previous version of Kani TTS already proved its adaptability. Community contributors independently trained new language models, including Urdu and Vietnamese, using the open architecture. In several cases, these community-driven extensions achieved production-level quality.

This scalability suggests that Kani TTS is not only a single model, but a flexible foundation for speech generation in languages often overlooked by large AI providers.

Zero-shot cloning

With Kani TTS 2, NineNineSix positions itself not merely as a model developer, but as a contributor to the global effort to democratize speech AI.

See the Pretrained model in English at: https://huggingface.co/nineninesix/kani-tts-2-en

A structural upgrade

Zero-shot voice cloning

Language expansion

Zero-shot cloning

Share