Seminars and Events

ISI Natural Language Seminar

NL Seminar-Event Extraction for Epidemic Prediction

Event Details

Speaker: Tanmay Parekh, UCLA

Conference Rm Location: ISI-MDR #689

REMINDER:

Meeting hosts only admit on-line guests that they know to the Zoom meeting. Hence, you’re highly encouraged to use your USC account to sign into Zoom.

If you’re an outside visitor, please inform us at (nlg-seminar-host(at)isi.edu) to make us aware of your attendance so we can admit you. Specify if you will attend remotely or in person at least one business day prior to the event. Provide your: full name, job title and professional affiliation and arrive at least 10 minutes before the seminar begins.

If you do not have access to the 6th Floor for in-person attendance, please check in at the 10th floor main reception desk to register as a visitor and someone will escort you to the conference room location.

Early warnings and effective control measures are among the most important tools for policymakers to be prepared against the threat of any epidemic. Social media is an important information source here, as it is more timely than other alternatives like news and public health and is publicly accessible. Given the sheer volume of daily social media posts, there is a need for an automated system to monitor social media to provide early and effective epidemic prediction. To this end, I introduce two works to aid the creation of such an automated system using information extraction. In my first work, we pioneer exploiting Event Detection (ED) for better preparedness and early warnings of any upcoming epidemic by developing a framework to extract and analyze epidemic-related events from social media posts. We curate an epidemic event ontology comprising seven disease-agnostic event types and construct a Twitter dataset SPEED focused on the COVID-19 pandemic. Experimentation reveals how ED models trained on COVID-based SPEED can effectively detect epidemic events for three unseen epidemics of Monkeypox, Zika, and Dengue. Furthermore, we show that reporting sharp increases in the extracted events by our framework can provide warnings 4-9 weeks earlier than the WHO epidemic declaration for Monkeypox.

Since epidemics can originate across the globe, social media posts discussing them can be in varied languages. However, training supervised models on every language is a tedious and resource-expensive task. The alternative is the usage of zero-shot cross-lingual models. In this work, we introduce a new approach for label projection that can be used to generate synthetic training data in any language using the translate-train paradigm. This novel approach, CLaP, translates text to the target language and performs contextual translation on the labels using the translated text as the context, ensuring better accuracy for the translated labels. We leverage instruction-tuned language models with multilingual capabilities as our contextual translator, imposing the constraint of the presence of translated labels in the translated text via instructions. We benchmark CLaP with other label projection techniques on zero-shot cross-lingual transfer across 39 languages on two representative structured prediction tasks — event argument extraction (EAE) and named entity recognition (NER), showing over 2.4 F1 improvement for EAE and 1.4 F1 improvement for NER.

 

Speaker Bio

Tanmay Parekh is a third-year PhD student in Computer Science at the University of California Los Angeles (UCLA). He is advised by Prof. Nanyun Peng and Prof. Kai-Wei Chang. Previously, he completed his Masters at the Language Technologies Institute at Carnegie Mellon University (CMU) where he worked with Prof. Alan Black and Prof. Graham Neubig. He has completed his undergraduate studies at the Indian Institute of Technology Bombay (IITB). He has also worked in the industry at Amazon and Microsoft. He has worked on a wide range of research topics in multilingual, code-switching, controlled generation, and speech technologies. His current research focuses on improving the utilization and generalizability of Large Language Models (LLMs) for applications in Information Extraction (specifically Event Extraction) across various languages and domains.

If speaker approves to be recorded for this NL Seminar talk, it will be posted on the USC/ISI YouTube page within 1-2 business days: https://www.youtube.com/user/USCISI.

Subscribe here to learn more about upcoming seminars: https://www.isi.edu/events/ 

For more information on the NL Seminar series and upcoming talks, please visit:

https://www.isi.edu/research-groups-nlg/nlg-seminars/

Hosts: Jonathan May and Justin Cho