Artificial Intelligence


Picture the scene. After years of drought, flash flooding pummels Ethiopia's Oromia and Tigray regions, spreading cholera and exacerbating food shortages. Civil unrest follows and protests swell, killing hundreds and injuring thousands.

As international relief efforts ramp up, tweets, blog posts and news articles pour out from the affected regions in real- time. But aid and military organizations are hampered by one major barrier: language.

Most of the world's languages are still largely unknown to computer linguistics. In fact, current online translation services, including Google Translate, support only about 100 of the world's 7000 languages. As a result, in the wake of a crisis, vital information can easily get lost in translation.

A team of 10 researchers and students from USC's Information Sciences Institute (ISI), led by ISI research director Kevin Knight, hopes to change that by building machine-learning systems to quickly decrypt any language and automatically produce actionable information.

To hone their algorithms, Knight and the team from ISI's Natural Language Lab recently took part in a three-week assignment to translate Oromo and Tigrinya, two Ethiopian languages spoken by around 34-million people, yet unknown to machine-translation systems.

The assignment was part of DARPA's Low Resource Languages for Emergent Incidents (LORELEI) project, which aims to create a rapid automated language toolkit for languages currently missing from the linguistic databases that feed online translations systems.

By creating platforms that could be used in any region where international disaster relief teams have little or no local language expertise, the team hopes to take a step closer to the holy grail of machine translation: a universal translator that supports all the world's languages.

"Let's say there's an earthquake in Armenia, the language they speak in that area is probably not covered with current technology," says Knight.

"We want to be able to look at these messages and say: these are the ones that are describing the earthquake, these are the ones that are asking for food and water. That way, the aid organization knows, for example, what food and supplies to put on trucks and where to send them."

Foraging for clues

When the assignment launched on August 7, DARPA gave participating teams the names of the languages, a humanitarian disaster scenario based on real events, and a data pack of text in the assigned languages.

Machine translation relies on huge annotated data sets: the bigger the dataset, the better it learns. But since massive data sets do not exist for low resource languages, including Oromo and Tigrinya, the team had to forage for clues using its artificial intelligence (AI) toolkit.

Their tactics included:


Using a program, developed by an ISI team member, that turns any language's writing system into the Latin-based alphabet.
A name-finding tool that highlights the names of people, places and organizations to quickly contextualize the document and identify what supplies are needed most and where.
Using a known language, which is a member of the same language family as the unknown language, as a stepping-stone for decoding similar words.

The team was also permitted a one-hour Skype call with a native speaker, who could translate specific words and terms that proved tricky to decode using AI alone.

As the team assembled the linguistic puzzle, the newly translated data was fed back into the systems, allowing the computer to gradually "learn" the previously unknown languages.

On August 28, DARPA issued a final translation test to the ISI team and two other teams taking part in the assignment from Raytheon BBN Technologies and Carnegie Mellon University.

The final score will be revealed at the DARPA principle investigator meeting on September 13 in Raleigh, North Carolina. Ultimately, however, itâ&euro&trades collaborative project â&euro"after the result announcement, the teams come together to share their experiences and lessons learned.

Knight says: "It really allows us to put our technology to the test. It's a way to drive the research and technology forwards and work on something that could make a huge impact."