ISI Natural Language Experts to Attack Undeciphered Ancient Scripts

August 6, 2009

The National Science Foundation has awarded Kevin Knight and collaborators $1.6 million to develop computational tools to analyze now-unreadable written texts from antiquity.

Knight, a senior research scientist in ISI's Intelligent Systems Division and Research Associate Professor in the USC Viterbi School Department of Computer Science, will work with Regina Barzilay of the M.I.T. Computer Science & Artificial Intelligence Lab.

According to Knight, the recently approved project, called DECIPHER takes on two problems:

(1) deciphering ancient texts using computers, and (2) training automated language translation systems without using parallel texts.

According to the abstract of the proposal for the work, "Statistical language processing software has played little role to date in the analysis of ancient texts, where data is limited and human intuition has so far ruled.

"Data for automated language translation is more plentiful, and research has made great strides in the 21st century. However, researchers are addicted to training on large parallel texts, which are limited for the diversity of languages and domains for which people need automated translation.

"The project develops unsupervised methods that compensate for the lack of parallel data, using alternative sources of linguistic knowledge. For ancient languages, these sources include known languages as decipherment targets, capitalizing on tight connections within a language family.

"In translation, large quantities of untranslated data are exploited to induce strong bilingual connections. Formulating these tasks in a decipherment framework brings powerful cryptographic theory and algorithms to bear.

"Such theory also helps estimate expected translation accuracy given fixed data resources, and gauge whether a lost language is decipherable, given a fixed amount of script.

"Computational analysis of ancient scripts offers a better understanding of ancient cultures, and unsupervised techniques construct language connections of great interest to historical linguists.

"Applying such techniques to automated language translation offers the chance to bring many more language pairs and domains to the population at large."

Knight gave a tutorial at the North American Chapter of the Association for Computational Linguistics - Human Language Technologies (NAACL HLT) 2009 conference on "Writing Systems, Transliteration and Decipherment."

DECIPHER is funded under the American Recovery and Reinvest Act of 2009 (ARRA).