Reviving Lost Languages through AI

1share

Adam Conner-Simons introduces an MIT CSAIL system that aims to help linguists decipher languages without advanced knowledge of relations to other languages.

Researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) have recently developed a new system that can automatically decipher a lost language without needing advanced knowledge of its relation to other languages. They also showed that their system can itself determine relationships between languages, and they have used it to corroborate recent scholarship suggesting that the language Iberian is not actually related to Basque.

The team’s ultimate goal is for the system to be able to decipher lost languages that have eluded linguists for decades, using just a few thousand words.

Spearheaded by MIT professor Regina Barzilay, the system relies on several principles grounded in insights from historical linguistics, such as the fact that languages generally only evolve in certain predictable ways. For instance, while a given language rarely adds or deletes an entire sound, certain sound substitutions are likely to occur. A word with a “p” in the parent language may change into a “b” in the descendant language, but changing to a “k” is less likely due to the significant pronunciation gap.

By incorporating these and other linguistic constraints, Barzilay and MIT PhD student Jiaming Luo developed a decipherment algorithm that can handle the vast space of possible transformations and the scarcity of a guiding signal in the input. The algorithm learns to embed language sounds into a multidimensional space where differences in pronunciation are reflected in the distance between corresponding vectors. This design enables them to capture pertinent patterns of language change and express them as computational constraints. The resulting model can segment words in an ancient language and map them to counterparts in a related language.

The project builds on a paper Barzilay and Luo wrote last year that deciphered the dead languages Ugaritic and Linear B, the latter of which had previously taken decades for humans to decode. However, a key difference with that project was that the team knew that these languages were related to early forms of Hebrew and Greek, respectively.

With the new system, the relationship between languages is inferred by the algorithm. This question is one of the biggest challenges in decipherment. In the case of Linear B, it took several decades to discover the correct known descendant. For Iberian, the scholars still cannot agree on the related language: some argue for Basque, while others refute this hypothesis and claim that Iberian doesn’t relate to any known language. The proposed algorithm can assess the proximity between two languages; in fact, when tested on known languages, it can even accurately identify language families. The team applied their algorithm to Iberian considering Basque, as well as less-likely candidates from Romance, Germanic, Turkic, and Uralic families. While Basque and Latin were closer to Iberian than other languages, they were still too different to be considered related.

In future work, the team hopes to expand beyond the act of connecting texts to related words in a known language—an approach referred to as cognate-based decipherment. This paradigm assumes that such a known language exists, but the example of Iberian shows that this is not always the case. The team’s new approach would involve identifying semantic meaning of the words, even if they don’t know how to read them.

“For instance, we may identify all the references to people or locations in the document, which can then be further investigated in light of the known historical evidence,” says Barzilay.

The project was supported, in part, by the Intelligence Advanced Research Projects Activity (IARPA).

This article was originally published by MIT News (https://news.mit.edu/2020/translating-lost-languages-using-machine-learning-1021).

1share

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Spanish Studies

Building Young Writers’ Stamina

Unlocking the Vietnamese Language: A Student’s Perspective in Saigon

Fastest Growth in US International Students in 40 Years

Spanish Studies

Building Young Writers’ Stamina

Unlocking the Vietnamese Language: A Student’s Perspective in Saigon

Fastest Growth in US International Students in 40 Years

Forever

Recommended

1-Year

1-Month

Become a member

Celebrate English & Spanish Language Days at the UN

The Legacy and Tensions of the Lau Decision

Congolese Bishops Condemn Discrimination Against Swahili Speakers

California Education Chief Challenges ICE in Schools, Promotes Dual Language Programs

Congolese Bishops Condemn Discrimination Against Swahili Speakers

California Education Chief Challenges ICE in Schools, Promotes Dual Language Programs

Wisconsin Awaits $50 Million for Reading

Federal Funding Tied to DEI Suppression

Reviving Lost Languages through AI

Reading Legislation Update

Iowa Gives Every G1 Student Decodable Books

MA Court Could Rule on Science of Reading

Subscribe for exclusive content

Subscribe to Liberty Case

Forever

Recommended

1-Year

1-Month

Become a member

Reviving Lost Languages through AI

Subscribe for exclusive content