ISI Researchers Devise Efficient and Comprehensive Strategy to Decrypt Ciphers

by Rene Van Steenbergen

Pixabay // jonmdyson

Throughout history, ciphers have always been used to relay hidden or secretive information that can only be read by the intended recipient or an expert. Needless to say, deciphering hidden messages has contributed a great deal to the work of historians and countless authorities. A prominent example is the Enigma Machine, which was used by Nazis for top-secret military communication, a story that was later popularized by the blockbuster film The Imitation Game. However, despite the occasional genius decipherer and substantial improvements in technology, there still exist countless historical documents that remain unsolved.

To tackle this problem, Jonathan May, research team leader at USC’s Information Sciences Institute and research assistant professor of computer science at USC Viterbi, teamed up with Nada Aldarrab, graduate student and research assistant at ISI, to devise an extremely efficient strategy for solving ciphers. Building upon existing methods, May and Aldarrab are improving the accuracy and efficiency of deciphering hidden messages.

Aldarrab and May have created a system which operates in two steps: 1) converting ciphertext into frequency ranks and 2) feeding the frequency-encoded cipher into a neural network, which computes the results into plaintext, or readable data that doesn’t need to be further encrypted. Frequency ranks work by coding specific letters by their frequency rate.

Improving the Status Quo

Existing deciphering methods involve a two-part process: first determining the language of the text, then, depending on the identified language, deciphering the text into plaintext using the language model that was confirmed. In certain situations, this can lead to a domino effect — since the final result depends fully on the success and accuracy of the first step, if the language ID is unsuccessful, the decipherment will be as well.

To combat this, May and Aldarrab have devised an “end-to-end” strategy that greatly lowers the risk of inaccuracy. They built a model that implements the two steps at the same time, thereby lowering the Symbol Error Rate (SER).

“SER is error rate, which is a measure of how accurate our decipherment is. If our model outputs the word “dofr” instead of “door”, the SER is 0.25 (i.e. a quarter of the characters is wrong),” explained Aldarrab.

What’s impressive about this work is that it is able to withstand the “noise” that often interferes with decryption. As described in their paper, “noise can come from the natural degradation of historical documents, human mistakes during a manual transcription process, or misspelled words by the author, as in the Zodiac-408 cipher. Noise can also come from automatically transcribing historical ciphers using Optical Character Recognition (OCR) techniques.”

Applications Beyond History

The applications of May and Aldarrab’s work are not limited to decrypting historical ciphers. To illustrate, Aldarrab shares the infamous example of the ciphers which were written and sent to the police by the Zodiac Killer, a serial killer in the 1960s. Evidently, decoding ciphers not only contributes to our understanding of history, but can also be essential to ensuring public safety.

Additionally, decipherment techniques have also been applied in machine translation, as Aldarrab explained. “For example, treating a foreign language as a cipher and trying to figure out what the foreign text says. It can be useful when we don’t have parallel data for some languages. It might turn out to be useful if we are invaded by aliens and need to decipher their language,” she joked.

Going Forward

When asked about future goals for the project, Aldarrab named automatic transcription of historical documents as a priority. While May and Aldarrab currently conduct manual transcription, one of their biggest focuses is to automate this process to allow for more efficiency.

Beyond automation, coming up with more efficient models for decipherment (i.e. reducing computational cost), building models that target other types of ciphers (homophonic ciphers, etc), and covering more language types are a few goals that Aldarrab and May are working towards.