Artificial Intelligence

Fighting COVID-19 using Linear-Time Algorithms from Computational Linguistics

Thursday, May 21, 2020, 11:00am - 12:00pm PDTiCal
VTC Only, see link below
This event is open to the public.
NL Seminar
Liang Huang (Oregon State University)


To defeat the current COVID-19 pandemic, which has already claimed 250,000+ deaths as of early May, a messenger RNA (mRNA) vaccine has emerged as a promising approach thanks to its rapid and scalable production and non-infectious and non-integrating properties. However, designing an mRNA sequence to achieve high stability and protein yield remains a challenging problem due to the exponentially large search space (e.g., there are 10^632 possible mRNA sequence candidates for the spike protein of SARS-CoV-2).
We describe two on-going efforts at solving this problem, both using linear-time algorithms from my group inspired by my earlier work in parsing. On one hand, the Eterna OpenVaccine project from Stanford Medical School takes a crowd-sourcing approach to let game players all over the world design stable sequences. To evaluate sequence stability (in terms of free energy), they use LinearFold from my group (2019) since it’s the only linear-time RNA folding algorithm available (which makes it the only one fast enough for COVID-scale genomes). On the other hand, we take a computational approach to directly search for the optimal sequence in this exponentially large space via dynamic programming. It turns out this problem can be reduced to a classical problem in formal language theory and computational linguistics (intersection between CFG and DFA), which can be solved in O(n^3) time, just like lattice parsing for speech. In the end, we can design the optimal mRNA vaccine candidate for SARS-CoV-2 spike protein in 1 hour with exact search, or just 11 minutes with a beam of 1000 at the cost of only ~0.6% loss in energy.
Liang Huang is currently an Assistant Professor of EECS at Oregon State University and Distinguished Scientist (part-time) at Baidu Research USA. Before that he was Assistant Professor for three years at the City University of New York (CUNY) and a part-time Research Scientist with IBM's Watson Group. He graduated in 2008 from Penn and has worked as a Research Scientist at Google and a Research Assistant Professor at USC/ISI. Most of his work develops fast algorithms and provable theory to speedup large-scale natural language processing, structured machine learning, and computational structural biology. He has received a Best Paper Award at ACL 2008 (sole author), a Best Paper Honorable Mention at EMNLP 2016, several best paper nominations (ACL 2007, EMNLP 2008, and ACL 2010), two Google Faculty Research Awards (2010 and 2013), a Yahoo! Faculty Research Award (2015), and a University Teaching Prize at Penn (2005). He was a keynote speaker at ACL 2019. His recent interest is to apply computational linguistics to computational biology, where he works on RNA folding & design using his earlier work on incremental parsing.

« Return to Events