Like many people these days, Kristina Lerman finds herself thinking about how society has become increasingly divisive.
“I’m fascinated by trying to understand why our society looks like it’s falling apart,” she said.
Lerman, a research professor of computer science and principal scientist at USC’s Information Sciences Institute (ISI), has turned her fascination into leading-edge research.
A recent paper she wrote with two current and two former ISI colleagues describes a natural language processing tool they developed that can quickly detect polarized topics from reams of articles from partisan (liberal vs. conservative) news sources.
This is relevant because the tool could be used to mitigate or de-amplify polarization, explained Lerman, an expert in mining social media sites such as Twitter, Digg, and Flickr for clues about human behavior.
“The media environment has grown increasingly polarized in recent years, creating social, cultural and political divisions,” Lerman and her co-authors wrote in the paper, “Detecting Polarized Topics Using Partisanship-aware Contextualized Topic Embeddings.”
Zihao He, a third-year computer science Ph.D. student at the USC Viterbi School of Engineering, presented the paper in early November at the 2021 Conference on Empirical Methods in Natural Language Processing.
“Unchecked polarization in the news media can lead to disagreements, conflict, and even violence,” He said.
Public media watchdogs and social media platforms could use such a simple-yet-effective tool to flag discussions that have grown divisive so that action could be taken to reduce partisan divisions and improve civil discourse, Lerman said.
ANALYZED COVID-19 COVERAGE
Lerman’s study looked at COVID-19 news coverage.
To discover polarization between politically divided news media, Lerman and her team used natural-language processing tools to analyze 66,368 articles from January 2020 to July 2020.
The articles came from six well-known U.S. news sources – three that lean to the left (CNN, Huffington Post, The New York Times), and three known for their conservative bent (Fox, Breitbart and the New York Post).
One example of a liberal take on the topic of COVID-19 quarantining is, “People should stay at home to practice social distancing.” A conservative media outlet would argue, “States should reopen.”
“We wanted to develop an automated tool that would allow us to very quickly find polarized topics from the cultural wars as they arose and grew,” Lerman said. “By breaking through these cultural wars, we hope to improve the health of democratic discourse.”
Lerman’s team first extracted a set of topics utilizing LDA topic modeling – a type of statistical modeling for discovering the abstract topics that occur in a collection of documents.
Next, the researchers fine-tuned a pretrained language model to recognize the partisanship of the news articles (rendering it “partisanship aware”).
For each article, Lerman and her colleagues then represented its ideology on a topic by a vector, called document-contextualized topic embedding. This approach concentrated on the topic-oriented semantics in the context of the article, instead of the global semantics from the article that might contain irrelevant and “noisy” information.
For example, topics included work, economy, and COVID case counts. These topics were represented by the distribution of keywords – “state,” “order,” “reopen,” “governor,” “business,” etc. The tool weeded out irrelevant information such as, “The pandemic has caused thousands of deaths across the world.”
After more computer calculations, Lerman’s team aggregated the document-contextualized topic embeddings. As a result, the ideology of the news outlet on a topic could be represented by a single vector.
Using a trigonometry formula taught in high school, they then were able to measure the polarization between news sources by topic.
“Our paper describes a method we came up with to measure, for each topic, how much the two sides disagreed on the topic,” Lerman explained.
A MORE PRECISE TOOL
Unlike other methods being used widely to study partisan polarization in the U.S. media, the tool Lerman and her team developed can more precisely and meaningfully capture topical polarization, she said.
“We hope that more natural language processing and researchers and contributors can contribute to this research area that is promising but receiving insufficient attention,” they wrote in the paper.
Lerman said the tool her team developed can be easily employed.
“It’s been well documented that adversaries try to manipulate social media to exploit culture wars,” she said. “This paper explains one tool to try to identify such opportunities for manipulation.”
Published on February 2nd, 2022
Last updated on February 2nd, 2022