The Kyrgyz Speech Synthesis Model Kani TTS 2 Ranked Top on the Hugging Face Platform

Виктор Сизов Society / Exclusive
VK X OK WhatsApp Telegram

The Kyrgyz team of developers has once again showcased its achievements on the international technology stage, as reported by the High Technology Park (HTP) of Kyrgyzstan.
The startup NineNineSix presented an updated version of the Kani TTS2 speech synthesis model, which has already become one of the most sought-after TTS models on the Hugging Face platform—one of the largest collections of artificial intelligence in the world.

Kani TTS 2 is the result of the evolution of the team's previous developments. The model is now capable of generating up to 40 seconds of speech in a single pass, which is more than twice the performance of its predecessor.

The HTP noted that for an open model created in Kyrgyzstan, entering the top three TTS on Hugging Face is a rare and significant achievement.
About the NineNineSix Team

NineNineSix is a Kyrgyz team of developers specializing in language technologies in the field of artificial intelligence.

Previously, they introduced the first version of Kani TTS and developed a voice speaker, as well as the AI assistant AkylAi, which became the first artificial intelligence to speak the Kyrgyz language.
Voice for Low-Resource Languages

While many large AI companies focus on English and other widely spoken languages, low-resource languages often remain overlooked. NineNineSix chose a different path.

The Kani TTS 2 model supports English, Spanish, and Kyrgyz languages, and its architecture allows it to be adapted for virtually any language, accent, or dialect.

One of the key features of the project is that the team published the complete code for pre-training, allowing any country or research group to create their own voice model based on Kani TTS 2.

As noted by co-founder of nineninesix.ai Nursultan Bakashov: “Kani TTS 2 is the next step after the first version; we improved the stability of speech generation and taught the model to handle long segments. Our goal is to develop compact and open models that are easier to adapt to different languages and accents, including low-resource ones. We want to demonstrate that world-class technologies can be created in Kyrgyzstan, which is why we opened not only the model weights but also the entire code for pre-training so that any team can train TTS for their language.”
The main improvements of Kani TTS 2 include:

* The ability for stable speech generation of up to 40 seconds in a single pass;

* Support for zero-shot voice cloning technology—cloning a voice from a short audio fragment;

* Fully open architecture and training code;

* Entry into the top 3 TTS models on Hugging Face.

According to the HTP, the model consists of about 400 million parameters, was pre-trained on approximately 10,000 hours of speech data, and can operate on a GPU with 3 GB of video memory, making it accessible for both local and server use.

“Kani TTS 2 is not just another AI model. It is proof that Kyrgyz specialists are capable of developing world-class technologies and competing in the global AI market. NineNineSix demonstrates that Kyrgyzstan can be not only a consumer but also a creator of advanced AI solutions,” the HTP added.
VK X OK WhatsApp Telegram