AI and Indigenous Languages
A Path to Preserve Indigenous Languages
Over 7,000 documented languages are currently spoken across the world. More than half of the world’s languages will become extinct in 75 years, according to an article from the United Nations. The number may even be conservative. More than 4,000 languages are spoken by less than 6% of the world’s population.
AI is starting to play a role in preserving the language of some of these at-risk languages with Small Language Models (SLM)s, small datasets, and input from specific Indigenous communities.
The Challenge
Many of these heritage languages are spoken by very few people. Many of the languages are polysynthetic, meaning that one, long word expresses a complex idea. It might take a full sentence in English, for example, to communicate the same thought.
The complexity of the language allows words and thoughts to be culturally significant, and privacy is of key importance. Indigenous people need to be able to honor their sovereignty of their language and culture.
Small datasets due to limited remaining speakers and limited audio recordings that frequently do not have transcripts make large language models with large datasets unfeasible.
As computer scientists and researchers are learning, developers need to include the voices of the applicable community. Particularly with SLMs, the voices of the community members help ensure cultural nuances and foster cooperation for implementation and adoption of model usage.
Artificial Intelligence’s Role
AI, with its ability to rapidly recognize patterns is helping Indigenous groups to document Indigenous languages. The use of automatic speech recognition (ASR) models is part of an initiative underway by Quebec Artificial Intelligence Institute (Mila).
From a technology perspective, the use of SLMs is important, as they are designed for specific activities, leverage a smaller dataset, and can operate faster with less energy.
Highlights of a Solution
Mila researchers are working on a specific solution. Mila has a concentration of deep learning academic researchers, focusing on research areas of health, environment, and climate change. It uses Open Science approach for its research, thus allowing for broad collaboration and knowledge sharing.
Mila’s First Languages AI Reality (FLAIR) initiative is focusing first on establishing a solution for a specific set of North American Indigenous languages. The aim is to then scale the solution to communities worldwide. This approach will ideally allow for Indigenous communities that are underserved/under-resourced. The vision of FLAIR is to provide these members easier access to their native languages, to learn/understand and continue to share their culture and knowledge via their own language.
FLAIR’s technical director provided an overview and his vision at a TEDx event in Boston.
The Future
Other initiatives are underway, focusing on a variety of heritage languages and their communities. Brookings published a commentary in 2025 with a look at other such initiatives.
Future AI-powered uses may include interactive AI tutors in Indigenous languages, voice assistants, and virtual reality/augmented reality immersive language environments.




Couldn't agree more. This piece is so insightful about AI's potencial for language preservasion.