Construction of Large-Scale Misinformation Labeled Datasets from Social Media Discourse using Label Refinement

Abstract

Malicious accounts spreading misinformation has led to widespread false and misleading narratives in recent times, especially during the COVID-19 pandemic, and social media platforms struggle to eliminate these contents rapidly. This is because adapting to new domains requires human intensive fact-checking that is slow and difficult to scale. To address this challenge, we propose to leverage news-source credibility labels as weak labels for social media posts and propose model-guided refinement of labels to construct large-scale, diverse misinformation labeled datasets in new domains. The weak labels can be inaccurate at the article or social media post level where the stance of the user does not align with the news source or article credibility. We propose a framework to use a detection model self-trained on the initial weak labels with uncertainty sampling based on entropy in predictions of the model to …

Date: April 25, 2022
Authors: Karishma Sharma, Emilio Ferrara, Yan Liu
Conference: WWW '22: Proceedings of the ACM Web Conference 2022
Pages: 3755-3764

View Paper

Information Sciences Institute

Publications

Construction of Large-Scale Misinformation Labeled Datasets from Social Media Discourse using Label Refinement

Abstract