‘That’s Just Common Sense’. USC researchers find bias in up to 38.6% of ‘facts’ used by AI

by Magali Gruet

Illustration/Michael Rogers

Water is wet. Dogs bark. There are 24 hours in a day. The Earth is round. (We checked.)

Those facts are what we call common knowledge: statements about the universe that are considered true, scientifically proven and known by everybody. Not stereotypes or biases toward any group or individual.

For those working on artificial intelligence algorithms meant to think like a human, commonsense knowledge databases are the starting point of their work. They feed the machine with this data so it can cogitate on its own to think like a person would. It is used for auto-generated content in the media, for copywriting in marketing, by chatbots and by virtual assistants like Google, Siri or Alexa. The most popular and widely used database is called ConceptNET, which is crowdsourced to collect those “facts” that people contribute to it, like they would on Wikipedia.

This data has to be fair to generate fair results that treat people of all races, sexual orientations, genders or nationalities equally.

But what if this data was biased from the get-go, leading to unfair treatment of different groups of people? A team of researchers from the USC Information Sciences Institute (ISI), studied the ConceptNET and the GenericsKB (a smaller player in the AI game) databases to see if their data was fair.

They found that it wasn’t.

More than a third of those “facts” are biased
“People have curated these large commonsense resources, and we tend to download those and include them in our AI systems. What we wanted to do is look at this data that is being edited by humans and see if it is going to reflect human biases. What biases are there? To what extent? And how do we characterize them?” explained Fred Morstatter, an ISI research team lead and USC Viterbi research assistant professor.

The USC team used a program called COMeT, a commonly used knowledge graph completion algorithm that takes data then spits out rules when solicited. This algorithm was created to think like a human by analyzing the information it is given and give out answers.

Depending on the database studied and the type of metrics they looked at, the researchers found 3.4% (ConceptNET) to 38.6% (GenericsKB) of data was biased. Those biases were both positive and negative. “We studied different groups from categories like religion, gender, race and profession to see if the data was favoring or disfavoring them, and we found out that, yes, indeed, there are severe cases of prejudice and biases,” said Ninareh Mehrabi, a Ph.D. candidate at USC-ISI who worked on the project.

‘Shocking’ results
The results showed that women are seen more negatively than men, and even described with qualifiers that can’t be said on broadcast television before 10 p.m., like the “B” word. Muslims are associated with words like terrorism, Mexicans with poverty, policemen with death, priests with pedophilia, and lawyers with dishonesty. Performing artists, politicians, detectives, pharmacists and handymen are also discriminated against. So are British people. The list goes on.

“Some results were so shocking that we questioned putting them in our paper,” Mehrabi said. “It was that bad.”

The database, mostly sourced from people in the United States who volunteer to provide this information through surveys, also seemed Western-focused and not representative of the global population, despite being used all around the world. Overall, the data was not fair, but how does one describe fairness to start with? Merriam-Webster’s dictionary mentions “lack of favoritism toward one side or another,” while IBM and its AI Fairness 360 project wants to “make the world more equitable for all.”

Those definitions are important because they tell different stories about the algorithm. In 2016, when ProPublica conducted its study on COMPAS — a software that uses an algorithm to assess a defendant’s risk of committing future crimes — it looked at the distribution of scores by racial groups. It quickly became apparent that the algorithm was unfair. Black Americans were disproportionately given higher violent risk scores, and white defendants with similar criminal histories were disproportionately given lower ones, Morstatter said.

According to ProPublica, AI-generated scores or risk assessments are “increasingly common in courtrooms across the nation. They are used to inform decisions about who can be set free at every stage of the criminal justice system, from assigning bond amounts…to even more fundamental decisions about defendants’ freedom. In Arizona, Colorado, Delaware, Kentucky, Louisiana, Oklahoma, Virginia, Washington and Wisconsin, the results of such assessments are given to judges during criminal sentencing.”

Mainstream uses
With auto-generated content on the rise, unbiased data becomes increasingly important. There are an estimated 135 million users of voice assistants — like Amazon’s Alexa or Google Assistant — in the United States. E-commerce websites are using chatbots to replace human customer service, and marketers are rushing to adopt software that writes “copy that converts” at the click of a button.

Even the media use AI to save time — and time is money in an industry hurting for the latter.

“The Associated Press estimates that AI helps to free up about 20 percent of reporters’ time spent covering financial earnings for companies and can improve accuracy,” reports Forbes. “This gives reporters more time to concentrate on the content and storytelling behind an article rather than the fact-checking and research.”

Finding solutions
The USC team also discovered that the algorithms were regurgitating “information” even more biased than the data they were given.

“It was alarming to see that this biased data tends to be amplified, because the algorithm is trying to think like us and predict the intent behind the thought,” said Pei Zhou, a USC Viterbi Ph.D. candidate at ISI who participated in the research. “We are often concerned about our own data and say it’s not too bad and we can control it. But sometimes the bias is amplified downstream, and it is outside of our control.”

“It was disappointing to see that a little bit of bias can strongly affect predictive models,” said Jay Pujara, a USC Viterbi research assistant professor of computer science and director of the Center on Knowledge Graphs at ISI. “AI’s entire reason for existing is that they identify patterns and use them to make predictions. Sometimes it’s very helpful — like they see a change in atmospheric pressure and predict a tornado — but sometimes those predictions are prejudiced: they decide who to hire for the next job and overgeneralize from prejudice that is already in society.”

So how can developers eliminate bias from their databases?

“What we need to do is add an extra step between the moment we send the data and the moment that data is interpreted,” said Zhou. “During that step, we can identify biased data and remove it from the database so the information that is used is fair, like adding a filter with rules about what is wrong.”

Pujara would go even further by creating an algorithm that could correct the biased data at its source. “This is a very exciting era for researchers,” he said. “What can we do that is better than throwing out the biased data? Is there something we can do to correct it? Is there some way that we can manipulate the data to make it fair?”

The ISI research team is working on those answers. Disclaimer: No algorithm was involved in the writing of this article.

The team’s research was conducted in 2020 and 2021 and supported in part by the Defense Advanced Research Projects Agency (DARPA) MCS program, and based upon work supported by DARPA and Army Research Office (ARO).

Published on May 26th, 2022

Last updated on May 26th, 2022