The Surprising Ineffectiveness of Countering Hate Speech on Reddit

A USC ISI study reveals that counterspeech fails to change user behavior in hate-based subreddits - except when it is hostile

by Stephanie Lee

October 28, 2024

See a hateful post on social media? Say something back. This strategy, known as counterspeech, is a way to combat harmful online rhetoric by directly yet empathetically challenging it.

Counterspeech is meant to have a constructive effect, debunking misinformation or offering alternative viewpoints to extremism. But a new study from USC Viterbi’s Information Sciences Institute (ISI) suggests that counterspeech is ineffective no matter how it’s delivered. The results could challenge our assumptions about the best way to fight online hate.

The study found that on Reddit, non-hostile, or polite, counterspeech is largely ineffective in reducing engagement with hate-focused subreddits. Surprisingly, hostile responses that attacked the user proved more impactful in discouraging participation in these harmful spaces — but with the unwanted effect of driving online negativity.

“Our work speaks mostly to the ineffectiveness of typical counter speech,” said Keith Burghardt, a computer scientist at ISI, who worked on the study. “As we see it, the typical, polite counterspeech that people tend to promote isn’t doing enough, and the impolite counterspeech has its own harms.”

In the study, researchers examined 25 hate-based subreddits, which have already been banned by Reddit, focusing on how counterspeech affected new users’ behavior. The sample of subreddits have already been banned. The team developed a novel, state-of-the-art method using AI and natural language processing to detect and categorize counterspeech in vast amounts of Reddit data.

Their analysis distinguished between two types of counterspeech: non-hostile (or polite) and hostile (or attacking). The researchers then investigated how these different approaches impacted newcomers to hate communities, who Burghardt noted are more susceptible to influence than established members.

Specifically, the team analyzed whether non-hostile or hostile counterspeech led to reduced frequency of offensive posting or departures from the hate subreddit. This approach allowed the team to measure the effectiveness of different counterspeech strategies in discouraging participation in harmful online spaces.

Contrary to expectations, the study found that polite counterspeech, typically recommended as an anti-hate tactic, had little to no effect on reducing users’ engagement in hate subreddits. In contrast, hostile counterspeech, which involved attacking the original poster, proved more effective, increasing the likelihood of them leaving the hate group by about 10%. “It’s not a huge jump,” Burghardt said, “but there is a significant effect.”

Despite this impact, Burghardt stressed that hostile counterspeech is not a suitable method to combat online hate. While it might reduce engagement in the harmful group, it also increases overall negativity and toxicity online. “Hostility drives negativity, and could create flame wars,” he said.

Worse, hostile counterspeech could drive users to join even more extremist, unmoderated forums where they’re unlikely to receive counterspeech. While an aggressive response might push someone away from Reddit, it may intrench their underlying harmful views, said Burghardt.

The study comes at a crucial time, as the link between online radicalization and real-world violence is becoming more pronounced. A database of extremists in the United States, known as the Profiles of Individual Radicalization in the United States (PIRUS), shows that a significant number of individuals involved in violent activities up until 2022 reported being radicalized, at least in part, through social media.

Studies have also shown how platforms like Facebook contributed to xenophobic attacks in countries such as Germany. With social media playing such a central role in radicalization, understanding how to effectively counter hate speech online is vital.

Looking ahead, Burghardt and his team are exploring novel solutions to improve online discourse. Their current research focuses on integrating psychological methods with large language models to develop more effective and potentially automated counterspeech strategies. While counterspeech might not change users’ harmful beliefs, new strategies still could discourage them from interacting with other hate group members, which would reduce further online radicalization.

“We’re trying to prune what their future would turn into,” said Burghardt. “What we need now is more research into effective strategies that reduce the chances of people going down a deeper rabbit hole.”

Published on

Last updated on

This article may feature some AI-assisted content for clarity, consistency, and to help explore complex scientific concepts with greater depth and creative range.