Seminars and Events

ISI Natural Language Seminar

Harmful Speech Detection by Language Models Exhibits Gender-Queer Dialect Bias

Event Details

Speaker: Rebecca Dorn, USC/ISI

Conference Room Location: ISI-MDR CR#689

REMINDER:

Meeting hosts only admit on-line guests that they know to the Zoom meeting. Hence, you’re highly encouraged to use your USC account to sign into Zoom.

If you’re an outside visitor, please inform us at (nlg-seminar-host(at)isi.edu) to make us aware of your attendance so we can admit you. Specify if you will attend remotely or in person at least one business day prior to the event. Provide your: full name, job title and professional affiliation and arrive at least 10 minutes before the seminar begins.

If you do not have access to the 6th Floor for in-person attendance, please check in at the 10th floor main reception desk to register as a visitor and someone will escort you to the conference room location.

Join Zoom Meeting

https://usc.zoom.us/j/98709918457?pwd=sVnp7kgGtL42MLRYEPaGjofzrjJFHL.1

Meeting ID: 987 0991 8457

Passcode: 592675

Content moderation on social media platforms shapes the dynamics of online discourse, influencing whose voices are amplified and whose are suppressed. Recent studies have raised concerns about the fairness of content moderation practices, particularly for aggressively flagging posts from transgender and non-binary individuals as toxic. In this study, we investigate the presence of bias in harmful speech classification of gender-queer dialect online, focusing specifically on the treatment of reclaimed slurs. We introduce a novel dataset, QueerReclaimLex, based on 109 curated templates exemplifying non-derogatory uses of LGBTQ+ slurs. Dataset instances are scored by gender-queer annotators for potential harm depending on additional context about speaker identity. We systematically evaluate the performance of five off-the-shelf language models in assessing the harm of these texts and explore the effectiveness of chain-of-thought prompting to teach large language models (LLMs) to leverage author identity context. We reveal a tendency for these models to inaccurately flag texts authored by gender-queer individuals as harmful. Strikingly, across all LLMs the performance is poorest for texts that show signs of being written by individuals targeted by the featured slur (F1 ≤ 0.24). We highlight an urgent need for fairness and inclusivity in content moderation systems. By uncovering these biases, this work aims to inform the development of more equitable content moderation practices and contribute to the creation of inclusive online spaces for all users.

Speaker Bio

Rebecca Dorn is a PhD candidate at the University of Southern California's Information Science Institute where they are co-advised by Kristina Lerman and Fred Morstatter. Previously, they earned their B.S. in Computer Science at UC Santa Cruz, advised by Lise Getoor. Their research focuses on the intersection between AI fairness, natural language processing and computational social science. Lately, their focus has surrounded how NLP systems treat dialects of historically marginalized communities.

If speaker approves to be recorded for this NL Seminar talk, it will be posted on the USC/ISI YouTube page within 1-2 business days: https://www.youtube.com/user/USCISI.

Subscribe here to learn more about upcoming seminars: https://www.isi.edu/events/ 

For more information on the NL Seminar series and upcoming talks, please visit:

https://www.isi.edu/research-groups-nlg/nlg-seminars/