Seminars and Events

Artificial Intelligence Seminar

Fixing the NLP Pipeline with Humans and Data

Event Details

NLP systems trained on standard machine learning pipelines are limited to causing various problems; for instance, the dataset collected from crowd workers often contains annotation artifacts or repeating patterns; as the systems are deployed to real-world users, they are not well controlled, interpreted, or interacted with real users. To address these problems, I propose human-centric and data-centric NLP pipelines. For the human-centric aspect, we collect human’s perception on linguistic styles and then make the model to mimic how humans perceive styles. Then we develop interactive NLP systems that help scholars better read and write academic papers. In the data-centric NLP, we model data informativeness based on various training dynamics and then use them to find new important data points for data augmentation and annotation. We believe more involvement of humans and consideration of data dynamics transforms the traditional ML-driven NLP pipeline to be more robust, interactive, and information-effective.

Speaker Bio

Dongyeop Kang is an assistant professor in the Computer Science Engineering department at the University of Minnesota, Twin Cities. He leads the Minnesota Natural Language Processing (NLP) group that aims to develop human-centered language technologies. His group's research lies at the intersection of computational linguistics, machine learning, and human-computer interaction. He completed postdoc at the University of California, Berkeley, and obtained a PhD in the Language Technologies Institute of the School of Computer Science at Carnegie Mellon University.

YOU ONLY NEED TO REGISTER ONCE TO ATTEND THE ENTIRE SERIES – We will send you email announcements with details of the upcoming speakers.

Register in advance for this webinar:

After registering, you will receive an email confirmation containing information about joining the Zoom webinar.

The recording for this AI Seminar talk will be posted on our USC/ISI YouTube page within 1-2 business days:

HOST: Muhao Chen

POC: Alma Nava