USC Researchers Release Public Coronavirus Twitter Set for Academics

by Amy Blumenthal

Published on April 30th, 2019Last updated on March 26th, 2020

viterbitwitterbots1200x600Researchers at the USC Viterbi School of Engineering Information Sciences Institute (ISI) and the Department of Computer Science have released a public coronavirus twitter dataset for scholars.

Emilio Ferrara and Kristina Lerman, the principal researchers on this project, have a history of studying social media and bots to understand how misinformation, fear and influence spread online.

Ferrara and Lerman who worked on this project in collaboration, with Emily Chen a PhD in computer science, are hopeful that this will dataset will help for tracking public sentiment and help researchers understand the social dimensions of the pandemic.

Emilio Ferrara, the research team leader at the USC Information Sciences Institute, and the principal investigator at the USC ISI Machine Intelligence and Data Science (MINDS) group, says, “Data access is the first barrier to research. With this contribution, our goal is to enable researchers in computational and social sciences and provide them with a shared framework and dataset to study the important issues revolving around COVID-19 and social media discussions. As most of us have been spending time in isolation, more and more discussion occurs online. Social media are now more than ever a mirror to society and studying online platforms can be more informative than ever to understand how we are collectively coping with this unprecedented crisis. Issues such as misinformation and manipulation, and much more, could be studied thanks to our data collection.”

Kristina Lerman, a principal scientist at USC ISI and research associate professor in the USC Viterbi School of Engineering’s Computer Science Department says,  “The data represents the conversations about the novel coronavirus from mid-February through mid-March, including the period when it was declared a pandemic. These conversations capture the fears and the collective sensemaking communities all over the world about this health emergency. We hope that many other researchers will interrogate the data to learn how people are responding to this unprecedented emergency.”

The dataset is been collected since January 22, 2020, includes over 50 million tweets and is tracking a variety of key words related to the pandemic.

The dataset is available at:

https://github.com/echen102/COVID-19-TweetIDs

For further information please contact

Emilio Ferrara:

[email protected]

Kristina Lerman:

[email protected]

Published on March 25th, 2020

Last updated on November 15th, 2022

Want to write about this story?