Most popular now

Ukraine Seeks Content to Train Its National AI Language Model

Training national language model in Ukraine
Україна активно шукає інформацію для навчання власної мовної моделі штучного інтелекту. Photo: Мінцифра

Data Collection for a National Language Model

According to Мінцифра: Ukraine, in partnership with the company Kyivstar, has launched a data collection initiative to train a national Large Language Model (LLM). This AI model is intended for use in services expected to reach millions of users. To train it effectively, high-quality Ukrainian-language texts are needed from businesses, media outlets, and educational institutions. This effort is part of a broader global trend where nations are developing sovereign AI capabilities to preserve their linguistic and cultural digital footprint.

Content Required to Build the Language Model

The project requires diverse content from multiple domains to build a robust and capable model. Specifically needed are materials from the following areas:

  • Media: news articles, interviews, blogs, and opinion pieces
  • Science and Education: course materials, textbooks, scientific papers, and dissertations
  • Literature: fiction, literary criticism, and reviews
  • History: digitized archival materials
  • Business Data: technical documentation, product descriptions, open-source reviews, and corporate blog posts

Organizations wishing to participate in the data collection must complete a registration form. The project aims to strengthen Ukraine's national language infrastructure and increase the availability of Ukrainian-language content for a wide audience. The creation of this national LLM is a significant step for the development of natural language processing technologies in Ukraine. It will not only improve access to information in Ukrainian but also foster the development of new services and products tailored to the needs of Ukrainian users. The involvement of businesses, media, and educational institutions underscores the importance of cross-sector collaboration to achieve the shared goal of enhancing content quality and accessibility.

Read also

Advertisement