Improving Moderation of Online Discussions via Nonviolent Communication and Linguistic Heritage-Aware Language Model Adaptation for Diasporic Languages

ISI Natural Language Seminar

Improving Moderation of Online Discussions via Nonviolent Communication and Linguistic Heritage-Aware Language Model Adaptation for Diasporic Languages

When

Thursday, August 4, 2022 11:00am - 12:00pm PDT

Add to calendar:

Presenter

Presented by:

Taiwei Shi and Jonne Saleva

Location

Conference Rm #1135-37 in-person attendance will be permitted for USC/ISI faculty, staff, students only. Open to the public virtually via the zoom registration link and online.

This event is open to:

Everyone

Event Details

This seminar will not be recorded, the presentations will be Live Only.

REMINDER:

Meeting hosts only admit guests that they know to the Zoom meeting. Hence, you’re highly encouraged to use your USC account to sign into Zoom.

If you’re an outside visitor, please inform us at (nlg-seminar-host(at)isi.edu) beforehand so we’ll be aware of your attendance and let you in.

In-person attendance will be permitted for USC/ISI faculty, staff, students only. Open to the public virtually via the zoom registration link and online.

For more information on the NL Seminar series and upcoming talks, please visit:

https://nlg.isi.edu/nl-seminar/

1.)Abstract-Taiwei Shi

The growing number of comments makes online discussions problematic to moderate by human moderators only. A crucial limitation of current automated moderation is that the generations are repetitive, generic, and judgmental, which is not effective in terms of changing someone’s mind and behaviors. We seek to build dialogue models that can intervene in an adversarial conversation involving participants that have abandoned reasoned discussion and descended into personal attacks. While also a difficult problem among humans, we would like to explore the effectiveness of Nonviolent Communication (NVC), an approach to restoring breakdowns in communication. In this talk, we will discuss the strategies of incorporating one aspect of NVC called observation without evaluation (O-vs-E) into dialogue models. First, we obtain a sufficiently large set of O-vs-E dialogue data to train an O-vs-E classifier. We then expand this to a sufficiently large set to fine-tune a dialogue model. We also explore text style transfer to rewrite moderation datasets, so the model could actively intervene in toxic conversations while being less judgmental at the same time. Finally, we will discuss the strategies for evaluating the dialogue model and conclude with future directions.

2.)Abstract-Jonne Saleva

Multilingual language models (MLLMs) have proven their effectiveness as cross-lingual representation learners that perform well on several downstream tasks and a variety of languages, including many lower-resourced and zero-shot ones. Although effective, MLLMs remain somewhat opaque and the nature of their cross-linguistic transfer is difficult to understand. While it seems plausible that higher- and lower-resourced languages should share information within the model, what is less clear is how such transfer is mediated by linguistic relatedness.

In this talk, we investigate this problem through the lens of diasporic languages which can be (crudely) understood as a combination of a “co-cultural language” and a “co-territorial language”. Specifically, we ask whether augmenting MLLM adaptation using these ancestral languages, or some mixture of them, can improve MLLM performance on a lower-resourced diasporic language, both in terms of perplexity as well as extrinsically on a named entity recognition task. We outline preliminary results on Yiddish, a Germanic language spoken by Ashkenazi Jews, and discuss the effectiveness of using German and Hebrew as ancestral languages. Finally, we contrast regular ancestral pretraining with recent lexicon-based adaptation approaches by Wang et al (2022) and conclude with directions for future work.

Speaker Bio

1.)Bio-Taiwei Shi

Taiwei Shi is a current summer intern for the Natural Language Group at USC ISI under Professors Jonathan May and Xuezhe Ma. He is also an undergraduate student at the Georgia Institute of Technology, majoring in Computer Science and Mathematics. He has previously worked at Georgia Tech’s SALT lab under Professor Diyi Yang. He is working towards a career where he can pursue his interests and make an impact in natural language processing, especially in the fields of computational social science and philosophy.

2.)Bio-Jonne Saleva

Jonne Sälevä is a summer intern in the Natural Language Group at USC ISI, working on language modeling for lower-resourced diasporic languages under Prof. Jonathan May. Jonne is also a Ph.D. student in Computer Science at Brandeis University, where he is working on NLP for morphologically rich and lower-resourced languages as part of the Broadening Linguistic Technologies Lab led by Prof. Constantine Lignos. Prior to his doctoral studies, Jonne received his M.S. in Computer Science from Brandeis University and A.B. in Statistics from Harvard College in 2017.

The recording for this NL Seminar talk will be posted on our USC/ISI YouTube page within 1-2 business days: https://www.youtube.com/user/USCISI.

Information Sciences Institute

Seminars and Events

Improving Moderation of Online Discussions via Nonviolent Communication and Linguistic Heritage-Aware Language Model Adaptation for Diasporic Languages

Event Details

Speaker Bio