Startup NineNineSix has released an open-source text-to-speech (TTS) model that brings high-quality speech AI to underrepresented languages.

The AI startup NineNineSix has released Kani TTS 2, a next-generation open-source text-to-speech (TTS) model that, it says, significantly extends generation length, improves stability, and reinforces its mission to bring high-quality speech AI to underrepresented languages.

The new version introduces a stable generation of up to 40 seconds of continuous speech in a single pass, more than doubling the practical limit of the previous release. The model is trending on Hugging Face, currently ranking among the top TTS models on the platform.

A structural upgrade

The original Kani TTS gained attention for its lightweight architecture, efficient deployment, and multilingual adaptability. It was adopted by developers beyond its core team and has been used as a foundation for community-trained models in Urdu, Vietnamese, Turkish, and Creole, among others. Kani TTS 2 builds on that momentum.

NineNineSix says the expanded generation window enables:

Long-form responses for conversational AI agents

Multi-turn dialogue synthesis

Extended narration and content production

More natural prosodic flow in continuous speech

The architecture remains optimised for efficiency, requiring approximately 3 GB of GPU memory, making it suitable for both local and server deployments.

Zero-shot voice cloning

Kani TTS 2 supports zero-shot voice cloning, allowing developers to replicate a speaker’s tone and style from a short audio reference without additional fine-tuning.

One of the most consequential decisions by the team was releasing the full pretraining code. This enables organizations and research groups to train TTS systems from scratch for any language, dialect, or domain.

“Kani TTS 2 is the next step after our first release: we made speech generation more stable and enabled the model to produce longer audio segments,” says Nursultan Bakashov, co-founder of nineninesix.ai. “We focus on compact and open models – they are easier to deploy and adapt to different languages and accents, including low-resource ones.

“For us, it is important to demonstrate that world-class technologies can be built in Kyrgyzstan. That is why we released not only the model weights, but the entire pretraining code – so any team can train a TTS system from scratch for their own language.”

Language expansion

The model currently supports:

English

Spanish

Kyrgyz

Support for Kyrgyz is particularly notable, as it demonstrates the feasibility of building high-quality TTS for low-resource languages.

The previous version of Kani TTS already proved its adaptability. Community contributors independently trained new language models, including Urdu and Vietnamese, using the open architecture. In several cases, these community-driven extensions achieved production-level quality.

This scalability suggests that Kani TTS is not only a single model, but a flexible foundation for speech generation in languages often overlooked by large AI providers.

