Publications
Construction of Large-Scale Misinformation Labeled Datasets from Social Media Discourse using Label Refinement
Abstract
Malicious accounts spreading misinformation has led to widespread false and misleading narratives in recent times, especially during the COVID-19 pandemic, and social media platforms struggle to eliminate these contents rapidly. This is because adapting to new domains requires human intensive fact-checking that is slow and difficult to scale. To address this challenge, we propose to leverage news-source credibility labels as weak labels for social media posts and propose model-guided refinement of labels to construct large-scale, diverse misinformation labeled datasets in new domains. The weak labels can be inaccurate at the article or social media post level where the stance of the user does not align with the news source or article credibility. We propose a framework to use a detection model self-trained on the initial weak labels with uncertainty sampling based on entropy in predictions of the model to …
- Date
- April 25, 2022
- Authors
- Karishma Sharma, Emilio Ferrara, Yan Liu
- Conference
- WWW '22: Proceedings of the ACM Web Conference 2022
- Pages
- 3755-3764