What are the main causes of the reproducibility crisis (The Four Horsemen)?

The four main causes, termed 'The Four Horsemen of the Reproducibility Apocalypse,' are: 1) Publication bias (easier to publish positive results), 2) P-hacking (manipulating data analysis to find significant results), 3) Low statistical power (using samples that are too small), and 4) Harking (hypothesizing after results are known).

How do Registered Reports help address the reproducibility crisis?

Registered Reports is a publication model where studies are reviewed and accepted based on their proposed methods and research question *before* data collection. This prevents p-hacking and publication bias, as the decision to publish is decoupled from the study's outcome.

Why has it taken so long for psychologists to address issues like p-hacking and small sample sizes?

For a long time, many researchers didn't grasp the severity of the problem and misunderstood statistical significance. Additionally, social media has empowered junior scientists to voice concerns more effectively than traditional methods like letters to the editor.

What is being done at Oxford University to tackle reproducibility issues?

Oxford is fostering interdisciplinary initiatives, including journal clubs and bids for dedicated coordination funding. The focus is on improving scholarship credibility across all disciplines by emphasizing open documentation, available scripts, and credible science over flashy journal publications.

Should scientists be open with the public about the reproducibility crisis?

While there's a risk of public confidence being eroded or scientific findings being 'weaponized' by those with agendas, openness is preferred over sweeping problems under the rug. The best approach is to be honest about issues while actively working to improve scientific rigor and self-correction.

Is the argument 'everyone does it' a valid excuse for p-hacking?

No, even if p-hacking and harking are common practices, it does not make them acceptable. Relying on such practices significantly contributes to the reproducibility problem and undermines the credibility of scientific findings.

Key Moments

The Reproducibility Crisis

Q: What is the reproducibility crisis in psychology?

The reproducibility crisis refers to the recent observation, particularly over the last 10-15 years, that many published results in psychology (and other scientific fields) are difficult or impossible to replicate. A major study found that only 30-40% of published findings could be reproduced.

Sabine Hossenfelder

Science & Technology3 min read22 min video

Feb 15, 2020|96,007 views|4,924|804

science psychology reproducibility crisis university of oxford p-hacking statistical significance problems in science crisis in science reliability of science science skepticism hypothesis hypothesis fishing

Save to Pod

Key Moments

TL;DR

Psychology faces a reproducibility crisis due to publication bias, p-hacking, low statistical power, and harking. Solutions include registered reports and open data.

Key Insights

The reproducibility crisis means many published results, particularly in psychology, cannot be replicated.

Key factors contributing to the crisis include publication bias (favoring positive results), p-hacking (manipulating data analysis), insufficient statistical power (small sample sizes), and harking (hypothesizing after results are known).

Pharmaceutical companies are concerned as they cannot reproduce foundational scientific results for drug development.

Registered reports, where study methods are peer-reviewed and accepted before data collection, offer a promising solution by decoupling publication from results.

Social media has empowered early-career researchers to voice concerns and advocate for scientific reform.

Addressing the crisis requires changes in incentive structures within academia, greater awareness from funders, and public engagement that fosters trust rather than distrust.

IDENTIFYING THE REPRODUCIBILITY CRISIS

The reproducibility crisis refers to the growing concern that many scientific results, especially within psychology, are not being replicated. Initially dismissed by some as a minor statistical issue, alarming findings from studies attempting to reproduce published work revealed that only 30-40% of results were reproducible. This prompted widespread alarm and a deeper investigation into the causes and potential solutions across various scientific disciplines, including the biomedical sciences.

THE FOUR HORSEMEN OF THE REPRODUCIBILITY APOCALYPSE

Dorothy Bishop outlines four primary contributors to the crisis: 1. Publication bias, where positive and exciting results are preferentially published, distorting the scientific literature and creating a 'big file drawer' of null or negative findings. 2. P-hacking, the practice of analyzing data in multiple ways until a statistically significant result (p < 0.05) is found, which can lead to spurious correlations. 3. Low statistical power, stemming from studies with sample sizes too small to reliably detect the often-modest effect sizes common in psychology. 4. Harking, or hypothesizing after the results are known, where researchers present findings as pre-planned hypotheses after discovering them through exploratory data analysis.

CHALLENGES IN SAMPLE SIZE AND DATA ANALYSIS

Obtaining large sample sizes can be challenging in fields that study specific populations or require expensive data collection methods, such as brain imaging or rare patient groups. This often necessitates collaboration to pool resources and increase sample sizes. Furthermore, the flexibility in data analysis techniques can lead to incorrect statistical significance if each analytical approach is treated as a new attempt to find a result, underscoring the importance of pre-specified analysis plans.

REGISTERED REPORTS AS A PUBLICATION SOLUTION

A promising solution gaining traction is the 'registered report' publication model. In this system, a study's introduction and methods are peer-reviewed and accepted in principle before data collection begins. This pre-review process specifies the analysis plan, and if the researchers adhere to it, the paper is guaranteed publication. This approach effectively mitigates publication bias, p-hacking, and harking by divorcing the publication decision from the study's outcomes.

THE EVOLVING ROLE OF SOCIAL MEDIA AND EDUCATION

Social media has become a crucial platform for early-career researchers to voice concerns about reproducibility issues, fostering a more militant approach to demanding scientific rigor. New educational methods, such as using simulated data to teach statistical concepts, are also being introduced to help researchers understand the implications of P-hacking and insufficient power. These educational shifts aim to equip future scientists with better tools and awareness from the outset.

INCENTIVE STRUCTURES AND PUBLIC TRUST

Addressing the deep-seated issues requires reforming academic incentive structures, so that universities and funders prioritize research credibility over publication in high-impact journals. Funders, motivated by a desire for reliable research, are increasingly pushing for change. While openness about the crisis is vital, it must be balanced against the risk of it being 'weaponized' by those who wish to undermine scientific consensus. Ultimately, improving the self-correcting mechanisms of science will build public trust.

Mentioned in This Episode

●Software & Apps

●Companies

●Organizations

●Books

●Concepts

●People Referenced

Common Questions

The reproducibility crisis refers to the recent observation, particularly over the last 10-15 years, that many published results in psychology (and other scientific fields) are difficult or impossible to replicate. A major study found that only 30-40% of published findings could be reproduced.

Topics

Mindset & Self-Improvement Science & Mathematics Society & Philosophy Peer Review Academic Publishing Scientific Methodology Research Integrity Open Science Statistical Analysis Research Bias

Mentioned in this video

Organizations

Nature

A scientific journal where Dorothy Bishop published a commentary on the reproducibility crisis.

MRC

Medical Research Council, a UK research council that supported the Academy of Medical Sciences symposium and motivates change in research credibility.

PLOS ONE

A journal that has agreed to offer Registered Reports as a publication option.

Cardiff University

University where Chris Chambers is based, a proponent of Registered Reports.

BBSRC

Biotechnology and Biological Sciences Research Council, a UK research council that supported the Academy of Medical Sciences symposium and motivates change in research credibility.

Academy of Medical Sciences

An institution that held a symposium on the reproducibility crisis, particularly concerning biomedical sciences and the pharmaceutical industry.

Wellcome Trust

A funder that supported the Academy of Medical Sciences symposium on reproducibility and motivates change by not wanting to fund non-credible research.

People

Chris Chambers

Developed the Registered Reports publication model.

Dorothy Bishop

Professor of psychology at the University of Oxford and author of a Nature magazine commentary on the reproducibility crisis.

Lera Fortunato

Colleague of Dorothy Bishop at Oxford University in anthropology who is heading a bid for university-wide funding to coordinate reproducibility initiatives across disciplines.

Companies

Twitter

A social media platform where junior scientists can quickly voice concerns and draw attention to issues in scientific publications, fostering a more proactive community.

Pharmaceutical industry

Expressed concern over the reproducibility crisis as it impacted their ability to build on fundamental research from experimental labs.

Concepts

P-hacking

Analyzing data in multiple ways or collecting data on many variables and only reporting those yielding statistically significant results (p < 0.05), distorting the literature.

Low statistical power

The issue of conducting studies with sample sizes that are too small to reliably detect genuine effects, leading to missed findings or erroneous conclusions.

Publication bias

The tendency to publish positive, exciting results more easily than null or negative results, leading to a distorted literature.

Harking

Hypothesizing After the Results are Known; a practice where researchers analyze data, find an interesting result, and then present it as if it were the original hypothesis.

Registered Reports

A publication model where studies are reviewed and accepted in-principle based on their introduction and methods before data collection, mitigating issues like p-hacking and publication bias.

Statistical significance

The threshold (commonly p < 0.05) used to determine if a result is likely due to chance or represents a real effect, often misunderstood.