Where does vaccine hesitancy exist? ISI researchers can predict on the zip code-level, in real-time

by Maya Abu-Zahra

Photo Credit: Thirdman/Pexels

The term “vaccine hesitant” has nearly bound itself to every conversation surrounding COVID-19, its latest variants, and evolving news on vaccine developments. It’s almost hard to imagine a day where such topics don’t creep into casual lunch conversations and our waking thoughts.

With new variants endlessly emerging, the phenomenon of vaccine hesitancy is only becoming more powerful and present within communities across the U.S. But before public health officials can understand and engage these communities, they must first solve an initial problem: how to efficiently pinpoint where large communities of vaccine hesitant individuals reside in the first place.

In a new paper published in PLOS Digital Health, researchers at the USC Viterbi School of Engineering proposed a natural language processing (NLP) software that learns where skepticism surrounding vaccines lives in real-time.

Mayank Kejriwal, research assistant professor in industrial and systems engineering and research team lead at USC’s Information Sciences Institute (ISI), was inspired by the current deficits in predicting vaccine hesitancy. The software makes improvements in NLP strategies, including word embedding algorithms that detect keywords related to vaccines. These advancements make data collection on zip code-level vaccine hesitancy remarkably simpler, faster, and more accurate.

Using publicly available Twitter data and already-existing machine algorithms to process it, the study’s system outperforms local and national survey data in its intent to reflect public opinions on the COVID-19 vaccine.

Not all data is created equal

Sara Melotte, a master’s student in computer science at the USC Viterbi School of Engineering and research assistant at ISI, commented on the study’s metrics for acquiring such data and how it furthers the goal of making such predictions on a community level.

“We show that only the text tweet and hashtags are sufficient to predict zip code-level vaccine hesitancy with reasonable accuracy, even if the tweets are not all related to the COVID-19 pandemic,” said Melotte.

It also eliminates the possibility of bias inherently tied to surveys, an inevitable consequence that surfaces when individuals know their personal information is being collected. In fact, the algorithm picks up on hashtags without the added need for personal information or deterrence of people expressing their unadulterated opinions.

“Historically, a lot of things depend on surveys. When you see poll numbers, those are collected by surveys, which are expensive,” said Kejriwal. Not only does cost become a limitation, but the factor of timeliness and constantly evolving opinions further complicates the matter of acquiring accurate and current data.

“What typically ends up happening is we have to wait for the survey to come out, and by then, you’d already be too late,” said Kejriwal. “But we showed that you can use publicly available Twitter data and scrape it out using a program,” and get results in real time.

Guided by real-word intuitions, the model also uses external data as sources, such as the number of hospitals or scientific establishments in a neighborhood. “We investigate the extent to which the use of these independent sets of features helps in improving the model,” said Kejriwal.

However, one of the caveats of collecting such data includes varying state and city regulations that limit the availability of public information. Still, the study provides reliable methods and data for predicting vaccine hesitancy in metropolitan cities — heavy traffic Twitter areas — that can be replicated and confirmed using independent survey data.

A tool for policymakers

The study provides local communities, public health experts and policymakers with a supplementary source for detecting and addressing reservations towards vaccines. A tool to enact policies beneficial for communities that need it the most — before it’s too late.

“We provide an early warning system,” continued Melotte.

Historically, federal policies often overlook the nuances of each community’s compositions and historical backgrounds. This has led to distrust toward federal institutions and the policies that originate from them. Kejriwal stressed the importance of using the study’s methods to help restore such trust in a community-driven, bottom-up manner.

“We can help communities in designing local policies and in making their own decisions that will foster trust,” said Kejriwal. Illuminating vaccine reluctance highlights the need for rethinking current and broad approaches to vaccine policies. This attempt to approach the situation from a refreshed perspective supports the creation of more organic solutions that will meet the needs of each community.

If uncertainty of vaccines fluctuates in intensity and zip-code, policies and resources can appropriately reassess and modify approaches to vaccine administration and communication.

“For any public health crisis, there will always be signals in social media,” said Kejriwal. “This [study] is an opportunity because it’s a living record and can provide us with a blueprint for getting signals in any public health crisis.”

Published on May 16th, 2022

Last updated on May 16th, 2022