Beta
Podcast cover art for: Why so many studies can’t be replicated
Science Friday
Ira Flatow·11/04/2026

Why so many studies can’t be replicated

This is a episode from podcasts.apple.com.
To find out more about the podcast go to Why so many studies can’t be replicated.

Below is a short summary and detailed review of this podcast written by FutureFactual:

DARPA SCORE Replication Crisis: Open Science, Data Sharing and Reproducibility Across Economics, Education, and Psychology

Science Friday explores how replication of findings in the social sciences is evolving, focusing on the DARPA SCORE project and what it reveals about trust in research. The discussion highlights how large-scale checks across economics, education and psychology found replication rates around fifty percent, underscoring the need for data and code sharing, robust methods, and new tools to assess confidence in findings.

  • The replication crisis is a process, not a binary truth; large-scale analyses reveal how much confidence we should place in published results.
  • SCORE examined thousands of papers across several fields, illustrating broad replication challenges.
  • Data and code sharing are essential to enable reproduction and scrutiny of findings.
  • AI may help automate replication and robustness checks while normative changes in journals, institutions, and funders shape incentives.

Introduction: The replication crisis and SCORE

The podcast introduces the replication crisis as a core reason science uses a method—testing assumptions to gain trustworthy results. Ira Flatow summarizes how SCORE, a DARPA-funded, multi-disciplinary project, sought to go beyond re-running experiments to develop AI-assisted tools for assessing confidence in research findings. The SCORE study analyzed thousands of papers in economics, education, and psychology over a decade, revealing that only about half could be replicated, underscoring the scale of the challenge in the social behavioral sciences.

"Science is a process. It's really easy to forget that." - Tim Arrington

SCORE’s scope and what makes it different

Tim Arrington explains SCORE’s breadth, noting that it was designed to create a ground truth for testing automated confidence assessment tools, not merely to repeat experiments. The project spans 62 journals across economics, education, and psychology, covering ten years of research. This breadth is presented as a key factor that distinguishes SCORE from earlier replication efforts and positions it at the intersection of open science and AI-assisted research evaluation.

"There’s a lot of good research out there, and unfortunately there’s also research that is less good." - Abel Brodeur

Why replication matters for policy and daily life

The hosts and guests discuss the implications of replication for policy decisions and individual behavior. The research areas cited include how public employees leave civil service or how crime victimization might influence political participation, illustrating the tangible policy outcomes tied to research reliability.

Common replication barriers and data sharing

Both Tim Arrington and Abel Brodeur describe barriers to replication, especially around data access, data collection methods, and analysis pipelines. They emphasize that many studies do not openly share data or the precise methodologies used, which makes exact reproduction difficult and sometimes impossible. The discussion points to the cultural and systemic shifts needed—sharing data, sharing code, and improving incentives for researchers to expose their data and methods to scrutiny.

"it’s hard to do that when you don’t share data." - Tim Arrington

New replication findings in economics and political science

A separate replication study from the universe of economics and political science shows improvements in data sharing since SCORE began in 2019. Data sharing is rising, but issues persist: coding errors appeared in 15-20% of papers, and results were robust about 75% of the time. The Institute for Replication notes significant variation in data sharing norms across fields such as economics, political science, psychology, public health, and environmental research, indicating that progress is real but uneven.

"coding errors in 15, 20% of papers. Results are robust maybe 75% of the time." - Abel Brodeur

AI's promise and the path forward

The discussion pivots to AI’s role in replication, including two futures: one where AI can generate content that clouds reproducibility and another where AI helps automate reproductions, describes data, and explores robust, plausible alternative analyses. The guests agree that AI could support more comprehensive robustness checks, but careful governance and clear norms are essential to ensure AI tools enhance, rather than undermine, scientific integrity.

Takeaways for trust in science and headlines

Bell and Tim Arrington stress that trusting science requires patience and a willingness to see results replicated by multiple independent researchers over time. The speakers advocate for a stronger culture of replication, better sharing practices, and improved training for researchers in documenting data and code. The takeaway is to assess new results cautiously, waiting for independent replication before embracing headlines as truth.

"I don’t know which result I can really trust versus those that I cannot trust." - Abel Brodeur

Conclusion

The podcast ends with a shared optimism that the replication crisis can drive meaningful improvements in how science is conducted and shared, highlighting opportunities to harness AI for reproducibility while strengthening norms, governance, and incentives across journals, institutions, and funders.