Can psychology research be trusted?

Can we rely on results published by researchers in the field of psychology? How reproducible are the results? Turlough Heffernan describes the Reproducibility Project and its findings.

SCITECHCan psychology research be trusted? That was the question that Brian Nosek and his team at the University of Virginia set out to answer when they established their Reproducibility Project. As the name suggests, this was an attempt to determine the extent to which major findings in the field of psychology could be replicated by independent researchers.

In total, the participants in the project looked at 100 findings drawn from three of the most prominent psychology journals. They discovered that only 39% of the replication attempts were successful. In other words, in over 60% of cases the results of the independent researchers did not match the results obtained by the original authors.

It might seem strange that over 270 researchers from five continents would devote their time and money to repeating studies that had already been done before. After all, there are no prizes for doing something second. However, this focus on reproducibility is critical to the scientific method. It is not enough to trust a researcher’s claims based on their past successes. Instead, their results must be shown to hold true when the experiment is conducted by someone else. Indeed, a scientifically true effect was defined by the philosopher Karl Popper as that “which can be regularly reproduced by anyone who carries out the appropriate experiment in the way prescribed”.

Before the Reproducibility Project even began, there were plenty of reasons to be sceptical about the reliability of many findings in the field of psychology. For instance, a psychologist named Daryl Bem decided on a whim to conduct a scientific study of parapsychology i.e. whether or not people have psychic powers. To the surprise of everyone, including himself, the results were positive! As one would hope, numerous other scientists were dubious about these findings and decided to try to replicate Bem’s experiments. It shouldn’t come as a surprise to learn that their replication attempts failed. Unfortunately however, these researchers found it much more difficult to get their papers into a prestigious journal than Bem had. This is a common problem in science known as publication bias. Journals want to publish sexy, counter‐intuitive discoveries but are nowhere near as interested in studies with negative findings, particularly if they are replications. In fact, only 14% of published papers actually report negative findings.

We should note that publication bias and irreproducibility are issues in every discipline. In fact, when the drug company Amgen tried to replicate 53 cancer research studies, a mere six of their attempts were successful. Researchers in all areas are faced with perverse incentives that reward them for findings that are novel and eye‐catching, not necessarily those that are true. A good publication record is an absolute necessity for moving up the job ladder in academia so it is easy to see why scientists find themselves under pressure to get positive results. These people aren’t frauds, they’re

Having said that, there is some evidence that suggests that the problem is worse in psychology (or perhaps that psychologists are more honest about their field). A previous study of over 2,000 psychologists found that more than half were willing to admit that they had checked how statistically significant their results were before deciding whether to collect more data or not. In other words, if they had already obtained the desired result then they would have ended the study early rather than risk ruining their chance of getting published. This might seem like a trivial issue but when everybody does it then the scientific literature becomes inundated with papers reporting findings that aren’t legitimate! This is what inspired John Ioannidis, a professor at Stanford University, to write his now classic 2005 paper “Why Most Published Research Findings are False”.

What can we conclude from the findings of the Reproducibility Project? While some commentators have seized on this study as evidence of science being broken, others point out that we may have simply underestimated how difficult it is to replicate all of the features of the original papers.

Furthermore, the failure of a given study to replicate doesn’t necessarily imply that the original study was flawed. The participants in the Reproducibility Project did their utmost to use the same methodologies as the original authors but even miniscule differences such as location and time can sometimes skew results in an out‐sized manner. The difference in results could also be attributable simply to chance, especially if the effect size was small to begin with.

Thankfully, most researchers accept that a problem exists and are participating in efforts to solve it. One solution that has been mooted is for papers to be pre‐registered with journals meaning that scientists would publish the hypothesis they mean to test and the methodology they intend to use before the experiment even begins. This should prevent them from fiddling around with their data until they get the results that they want, a practice that is known as p‐hacking. Researchers are also being encouraged to conduct studies with larger sample sizes, in order to reduce the risk of getting a positive result purely by chance.

We have already seen how scientific journals fall victim to publication bias but the mainstream media must also share a portion of the blame. Newspapers are only too happy to publish dubious claims (such as those recently made by Rosanna Davison about gluten being responsible for autism) but are far less likely to ever issue a retraction.

While most people know not to take tabloids seriously on these issues (the list of things that the Daily Mail claims can cause cancer is a running joke at this stage), broadsheets can perhaps do even more damage because of the trust that exists for them. For instance, a poorly conducted study about trauma being inherited by descendants of Holocaust survivors was reported uncritically last month by The Guardian and subsequently taken as fact by large swathes of their readership. Of course, it is probably unrealistic to expect a critical appraisal of a paper from journalists forced to work according to the rules of the 21st century news cycle.

In conclusion, it might seem like psychologists would be in despair after learning of the results of the Reproducibility Project but for many, this is a triumph. They point out that the entire process has been one of self‐reflection, of science holding up a mirror to itself and understanding that changes must be made. For them, Nosek’s study is a welcome return to what science is supposed to be all about; remaining sceptical, asking for evidence and accepting that the search for truth is a cumulative process. As Nosek said himself, “The goal is to get less wrong over time.”

Illustration: Natalia Duda