Synthetic Data Generation for Machine Learning Models with Cognitive Agent Simulations

Abstract

The use of synthetic data for training machine learning models (ML) in social media domains can address issues such as data availability and bias, but poses challenges, including properly reflecting causal relationships and matching the consistency of real data. In this paper, we explore the benefits and limitations of using synthetic data generated by cognitive agent simulations. By simulating human interactions and social media dynamics, these models can capture constraints and nuances of real-world scenarios. We report initial experiments that show that ML algorithms trained on real data augmented with synthetic data outperform those trained solely on original data, achieving up to 25% improvement in KS distance and RMSE metrics. This approach is applied to two domain problems: predicting code quality based on open-source code discussions and detecting and countering bot attacks on social media …

Date: June 25, 2024
Authors: Jim Blythe, Alexey Tregubov
Book: International Conference on Practical Applications of Agents and Multi-Agent Systems
Pages: 73-83
Publisher: Springer Nature Switzerland

Information Sciences Institute

Publications

Synthetic Data Generation for Machine Learning Models with Cognitive Agent Simulations

Abstract