Publications

Innovations in machine assessment of replicability

Abstract

Automated methods for the assessment of replicability of scientific claims offer a scalable complement to replication studies and traditional peer review. Drawing on a large dataset of claims, human judgments, and a limited set of replication outcomes, we developed and evaluated three distinct artificial intelligence systems designed to predict human expert assessments of replicability using diverse methodologies—including synthetic prediction markets, interpretable feature-based modeling, knowledge graph reasoning, and semantic parsing with argument structures. While these systems achieved modest calibration to human judgment distributions, they failed to discriminate between replicable and non-replicable claims. Our findings suggest that while machine assessments of research replicability may complement human reasoning, their current performance limitations and opportunities for bias demand careful evaluation before real-world application.

Date
2026
Authors
Sarah Rajtmajer, Laxmaan Balaji, DM Benjamin, James Caverlee, Tatiana Chakravorti, Yiling Chen, TM Errington, Qizhang Feng, Fiona Fidler, RD Fraleigh, A Frank, T Fritton, J Gentile, CL Giles, B Goldfedder, C Griffin, T Gulden, XB Hu, Y Huang, S Koneru, AM Kwasnica, D Lee, K Lerman, Y Liu, M Mclaughlin, A Menon, F Morstatter, NS Nakshatri, BA Nosek, D Pennock, J Pujara, A Russell, V Singh, AM Squicciarini, L Tran, AH Tyner, J Wang, Z Wang, X Wei, J Wu