noodls browser compatibility check

The security settings of your browser are blocking the execution of scripts.

To use noodls, javascript support must be enabled. Please change your browser's security settings to enable javascript.

If you have changed your browser's security settings, you can click here.

related announcements

News

U.S. Department of Energy

DOE’s Office of Critical Minerals and Energy Innovation Launches[...]
U.S. Department of Energy

NNSA Administrator Williams Breaks Ground on Cutting-Edge Nuclear[...]
Rhode Island Department of[...]

Smoother Sailing into Misquamicut State Beach this Summer

Resources for the Future Inc.

05/28/2026 | Press release | Archived content

AI-Assisted Teams Outperform AI-Led Teams but Not Human-Only Teams in Assessing Research Reproducibility in Quantitative Social Science

With large language models (LLMs) becoming widespread, this study tests whether artificial intelligence (AI) tools like ChatGPT could help social scientists check if published research findings can be reproduced.

View Journal Article

Date

May 28, 2026

Authors

Lucija Muehlenbachs and Other Coauthors

Publication

Journal Article in Proceedings of the National Academy of Sciences of the United States of America

Reading time

1 minute

Abstract

Large Language Models (LLMs) such as ChatGPT are transforming how scientists conduct and validate research, offering promise as tools to improve scientific reproducibility. However, computational reproducibility and error detection remain expensive and labor-intensive. We experimentally test how collaboration between researchers and LLM assistants influences the reproduction of quantitative social science findings across different levels of AI autonomy. We randomly assigned 288 researchers to 103 teams working under three conditions: human-only, AI-assisted (using ChatGPT as a collaborative tool), or AI-led (ChatGPT operating with minimal human oversight). Teams reproduced published results from leading social science journals, detected coding errors, and proposed robustness checks. Human-only and AI-assisted teams achieved comparable reproduction rates (94% vs. 91%) and performed similarly on most outcomes, except human-only teams identified significantly more major coding errors. Both substantially outperformed AI-led teams, which achieved only a 37% reproduction rate, detected fewer errors across all categories, proposed weaker robustness checks, and required more time. This autonomous approach, however, likely represents only a lower bound of AI capabilities. Despite rapid model advances, expert human judgment currently remains indispensable for reliable empirical verification. While AI assistance did not degrade most outcomes, it provided no measurable advantages and was associated with reduced detection of major errors. However, the 37% autonomous reproduction rate indicates that AI could provide value in settings where scale or cost constraints preclude human review of papers, even though general-purpose LLMs offer no immediate advantages for human-supervised verification.

Topics

Authors

Lucija Anna Muehlenbachs

University Fellow

Lucija Muehlenbachs is a University Fellow at RFF. She researches economic and environmental implications from fossil fuel production and consumption.

Other Coauthors

related announcements

News

Environment

Resources for the Future Inc.

AI-Assisted Teams Outperform AI-Led Teams but Not Human-Only Teams in Assessing Research Reproducibility in Quantitative Social Science

AI-Assisted Teams Outperform AI-Led Teams but Not Human-Only Teams in Assessing Research Reproducibility in Quantitative Social Science

Abstract

Topics

Topics

Authors

Related Content