About a week ago, I happened upon a number of stories about a study and project that demonstrates a key difference between science and pseudoscience. They had titles like, “Rigorous replication effort succeeds for just two of five cancer papers” (Science), “Cancer reproducibility project releases first results: An open-science effort to replicate dozens of cancer-biology studies is off to a confusing start” (Nature), and “What Does It Mean When Cancer Findings Can’t Be Reproduced?” (NPR). Basically, these stories all reference a review of the initial results of the Reproducibility Project in Cancer Biology. The studies are summed up in an overview by Brian A. Nosek and Timothy M. Errington (“Making sense of replications“) and an editorial published by eLife, the open-access journal reporting the results of various reproducibility projects (“Reproducibility in cancer biology: The challenges of replication“). After all the politically-charged topics I’ve been dealing with the last week, a post purely about science is just the break I need. So let’s dig in, starting with some background, noting that only two of the five papers could be rigorously replicated.
Reproducibility: One cornerstone of science
Reproducibility is key to science. If science is the best method that we have of figuring out how nature works, if our hypotheses and theories are to have any basis in reality, then the observations upon which those hypotheses and theories are based must be reproducible. To the average lay person without a background in science, this doesn’t sound like a particularly difficult issue. After an interesting scientific paper is published, why can’t other scientists just do what the scientists publishing the paper did? However, as any scientist knows, particularly biological scientists, it’s nowhere near that simple. First, there is little or no reward for just reproducing the work of other scientists. Certainly, a scientist is not going to get a grant to reproduce those results, and publications reporting reproduced results will not be published in high impact journals. As the Reproducibility Project: Cancer Biology puts it:
Despite being a defining feature of science, reproducibility is more an assumption than a practice in the present scientific ecosystem (Collins, 1985; Schmidt, 2009). Incentives for scientific achievement prioritize innovation over replication (Alberts et al., 2014; Nosek, et al., 2012). Peer review tends to favor manuscripts that contain new findings over those that improve our understanding of a previously published finding. Moreover, careers are made by producing exciting new results at the frontiers of knowledge, not by verifying prior discoveries.
Which is, of course, true. Scientists go into science in the first place to make new discoveries, and translational scientists go into cancer research to discover new understandings of what causes cancer and how to use those new understandings to find new and innovative treatments for cancer.
Usually, one of the only times it’s deemed worthwhile to reproduce another scientist’s results is as the first step to trying to expand on the observations of that scientist, and that in fact is probably how most scientific research is replicated when it is replicated. Basically, you have to know that you’re doing things the same way and getting the same results using the same materials and methods before you can build on those results. Even so, such replications are usually not direct or complete replications; usually scientists only replicate as little as they need to assure themselves they’re on the right track. Complete sets of experiments are rarely replicated, the more expensive and time-consuming the experiment the less frequently replicated.
Another aspect of reproducibility is how well scientists record their methods in scientific papers; i.e., the transparency of science. The standard should be to record the methods in sufficient detail that a scientist knowledgeable in the field could replicate the experiments using the published description alone, but that standard is rarely met. If you read a number of scientific papers, you will find that there is huge variability in the amount of detail provided in the Methods sections of scientific papers. For some journals, like Cell, the amount of detail is pretty high, although often still not high enough to easily reproduce an experiment. For other journals (like, ironically enough, very high impact journals like Science and Nature), the level of detail can be frustratingly low. For most journals, it’s somewhere in between. I, like any other scientist, know from personal experience, particularly during graduate school and my PhD studies, just how difficult it can be to look at the Methods section of a paper and figure out how to replicate an experiment as the first step towards asking additional experiments . Not uncommonly, it was necessary to contact the lab that published the work I was trying to replicate. Sometimes we needed their reagents, such as plasmids or other recombinant DNA constructs. Sometimes we needed help troubleshooting when we didn’t get the same results.
Again, as the Reproducibility Project: Cancer Biology puts it:
Reproducing prior results is challenging because of insufficient, incomplete, or inaccurate reporting of methodologies (Hess, 2011; Prinz et al., 2011; Steward et al., 2012; Hackam and Redelmeier, 2006; Landis et al., 2011). Further, a lack of information about research resources makes it difficult or impossible to determine what was used in a published study (Vasilevsky et al., 2013). These challenges are compounded by the lack of funding support available from agencies and foundations to support replication research. When replications are performed, they are rarely published (Collins, 1985; Schmidt, 2009). A literature review in psychological science, for example, estimated that 0.15% of the published results were direct replications of prior published results (Makel et al., 2012). Finally, reproducing analyses with prior data is difficult because researchers are often reluctant to share data, even when required by funding bodies or scientific societies (Wicherts et al., 2006), and because data loss increases rapidly with time after publication (Vines et al., 2014).
Finally, although not really discussed that much, there are intangible reasons—or seemingly intangible reasons—why it can be difficult to reproduce research. Some experimental techniques, for example, require considerable skill to produce meaningful measurements. Immunofluorescence, for instance, is one, particularly when using multiple antibodies to label different proteins with different fluorescent colors. Techniques that depend on surgical skill on small animals (e.g., mice and other rodents) are another. I’ve known a few scientists over the years who suddenly had trouble reproducing their own work when a skilled technician or postdoc left the lab. The explanation was not fraud but rather because the remaining personnel didn’t know all the ins and outs of the experimental technique. It’s not uncommon for a lot of time to be wasted due to loss of skilled personnel as those left behind troubleshoot and figure out subtleties of an experimental technique that aren’t recorded in their lab protocol books, no matter how detailed. Basically, the “institutional” memory of a laboratory is difficult to maintain, given that, other than the principal investigator and (sometimes) a permanent technician and/or lab manager, most personnel in labs are only there for at most a few years to get their PhD or do a postdoctoral fellowship. Turnover is high by design. Often there are little “tricks” or nuances to various experimental techniques to get them to work well that are lost when someone leaves a lab. That’s why maintaining protocol notebooks is so important, but few labs do this as rigorously as they should, and even detailed protocol books aren’t always enough.
Is there a “reproducibility crisis”?
Part of the impetus to form the Reproducibility Project, a collaboration between Science Exchange and the Center for Open Science, was based on the perception that there is a “crisis” in reproducibility. Although I know there were papers and commentaries dating long before that, the first commentary that brought this topic to the public consciousness in a big way was written by C. Glenn Begley, a consultant for Amgen, and Lee M. Ellis, a cancer surgeon at the University of Texas M.D. Anderson Cancer Center, that concluded that 47 out of 53 “landmark” preclinical studies in cancer (i.e., basic science studies in cancer) couldn’t be replicated by Amgen sufficiently rigorously to proceed with using the results to design drugs to target the interventions. As I pointed out at the time, that was a very high bar for any finding in science, given that not all discoveries of molecular targets or mechanisms would necessarily be druggable or suitable for therapy. I also noted that the papers were from high impact journals which are known for publishing only the most “cutting edge” science, which tends to be the kind of science whose findings are most often later overturned or found to be incorrect.
Of course, there are other studies and other indications. Just last year, Nature published a survey that found that more than 70% of scientists have failed reproduce another scientist’s experiment and that 50% had even failed to reproduce their own. Some 52% of the scientists surveyed thought that there was a reproducibility “crisis.” I agreed that there was a problem, but I don’t believe it is a “crisis.”
Reproducibility: A personal anecdote
To illustrate the complexities of “reproducibility,” I often recount an incident from my early scientific career as a surgical oncology fellow working in a radiation oncology laboratory in the late 1990s. At the time, Dr. Judah Folkman had recently published papers describing the angiogenesis inhibitors angiostatin and endostatin and how strikingly they shrank tumors down to the point where they became dormant as a small clump of cells that didn’t grow. There were a lot of exaggerated headlines at the time along the line of “Is this the cure for cancer?” (I shudder to think what reporting would have been like if Facebook, Twitter, and the like had existed at the time,) Angiogenesis inhibitors block the formation of blood vessels by blocking the action of factors secreted by the tumor to induce the growth of new blood vessels to feed its growth and thereby hijack the normal physiologic process of angiogenesis. Our laboratory wanted to combine angiostatin and radiation therapy in an animal model to see if the effects were additive or synergistic.
Our results were ultimately published in Nature, the only Nature paper on my CV, but the path to these results was not straight. It was widely known through the grapevine at the time that other laboratories were having difficulty reproducing Folkman’s striking results. In our case, we were not observing nearly as potent an antitumor effect as Folkman had described with angiostatin in our angiostatin alone group, which we wanted to compare with a group of mice treated with both angiostatin and radiation therapy. We wondered if it was something to do with the angiostatin itself, which was being made in bacteria from a plasmid by our collaborators. Given that Folkman was one of the best scientists I ever met, none of us doubted his results and assumed that it must be something we were doing.
It actually was. We contacted Folkman, who provided reagents, protocols, and advice, as well as some angiostatin made in his laboratory. It turns out that the peptide we were making was easily denatured (unfolded), which was why it was not as potent as Folkman had reported. Now here’s why I say we couldn’t replicate his results. It’s because we couldn’t fully replicate his results. Our angiostatin inhibited the growth of a wide variety of tumors, but, even after applying the tweaks to our angiostatin production suggested by Folkman, in our hands angiostatin never inhibited tumor growth as potently as Folkman had reported. So in other words, there could easily have been something else going on that we never figured out. Be that as it may, Folkman had the best attitude I’ve ever seen in a scientist regarding reproducibility, as we learned later when we heard of how he had done the same thing for several other labs, even to the point of dispatching one of his postdocs to help other investigators to get angiostatin and endostatin to work. Still, few investigators could ever quite replicate Folkman’s initial results, although many, including our lab, demonstrated that angiostatin and endostatin were potent angiogenesis inhibitors.
So why do I repeat this anecdote almost every time discussions of scientific reproducibility come up? Simple. It’s to illustrate that reproducibility falls on a spectrum. Did we fail to reproduce Folkman’s results? Yes and no. Yes, we reproduced the key result that angiostatin inhibits tumor growth by blocking angiogenesis, but, no, we didn’t reproduce the same very powerful effect size reported by Folkman. The point is that replication of any given scientific finding can range from total failure to replicate (e.g., if we had failed to show any antitumor effect of angiostatin at all) to partial failure to replicate (e.g., what actually happened) to success at replication (e.g., we had shown angiostatin to block tumor growth in the angiostatin along group as powerfully as Folkman had). It all becomes even more confusing when we consider that there is no standard definition of or clear consensus over what does and doesn’t constitute scientific reproducibility.
One of the strengths of the initial results of the Reproducibility Project: Cancer Biology, is that this messiness is appreciated and explained.
Reproducibility in Cancer Biology Research: The setup
Before I discuss the results, I should explain a bit more about what the Reproducibility Project: Cancer Biology is and does. In brief, the Reproducibility Project: Cancer Biology, which was founded in 2014, established a core team to design, prepare, and monitor project operations dedicated to testing the reproducibility of results reported in 50 high-impact papers published between 2010 and 2012. The plan was to replicate a subset of experimental results from each article. For each chosen paper a Registered Report detailing the proposed experimental designs and protocols for each subset of experiments to be replicated was to be peer reviewed and published prior to data collection. Following completion of data collection, results were to be published as a Replication Study. The report that made the news last week represents the results of Replication Studies for the first five papers chosen.
Here are the five papers, with links to the Registered Report and the Replication Study for each:
- BET Bromodomain Inhibition as a Therapeutic Strategy to Target c-Myc (Registered Report; Replication Study).
- Coadministration of a Tumor-Penetrating Peptide Enhances the Efficacy of Cancer Drugs (Registered Report; Replication Study).
- Discovery and Preclinical Validation of Drug Indications Using Compendia of Public Gene Expression Data (Registered Report; Replication Study).
- The CD47-signal regulatory protein alpha (SIRPa) interaction is a therapeutic target for human solid tumors (Registered Report; Replication Study).
- Melanoma genome sequencing reveals frequent PREX2 mutations (Registered Report; Replication Study).
There’s your reading assignment. No, just kidding. I only provide the links to make it easier for interested readers to check out the studies and replication studies if they are so inclined. Also, it’s easier to refer to each study by number. But how were these studies chosen? That’s a fair question.
The sampling frame was defined as the 400 most cited papers from both Scopus and Web of Science using the search terms (cancer, onco*, tumor*, metasta*, neoplas*, malignan*, carcino*) for 2010, 2011, and 2012. Citations were counted from all sources, which include primary research articles and reviews. This produced an initial sample of 501 articles from 2010, 444 from 2011, and 438 from 2012. Altmetrics scores from Mendeley and Altmetric.com were collected for the entire dataset and used to create a final impact score for each paper. Citation rates and altmetric scores were each standardized by dividing each metric by the highest in the dataset to give each paper a normalized metric score between 0 and 1, which was summed to create an aggregate impact score. Within each year, articles were reviewed for inclusion eligibility starting with the highest aggregate impact article. Articles were removed if they were clinical trials, case studies, reviews, or if they required specialized samples, techniques, or equipment that would be difficult or impossible to obtain. Also, articles reporting sequencing results, such as publications from The Cancer Genome Atlas project, were excluded. However, if sequencing or proteomic experiments were only part of an article, the other experiments in those papers could still be eligible. Review of articles continued until a total of 50 articles, about one-third from each year, were identified as eligible. The final set included 17 papers from 2010, 17 from 2011, and 16 from 2012. From each paper, a subset of experiments were identified for replication, prioritizing those that support the main conclusions of the paper while also attending to feasibility and resource constraints.
So what we are seeing is just one-tenth of the original plan. Actually, it’s more than that, because in late 2015, the Reproducibility Project: Cancer Biology announced that it had to cut back its ambitions and now plans to do only 37 papers, largely because of budgetary constraints. As a bit of a reality check for reproducibility efforts, I note that the project had originally budgeted around $25,000 to $35,000 for each experiment, but this figure turned out to be too low. Thanks to time-consuming peer-reviews, material transfer agreements, and expensive animal experiments, the team came up with a new estimated cost of $40,000 per experiment on average. This is, of course, one reason why replications are not done nearly as often as they should be. In fact, only 29 will now be done thanks to both budgetary problems and the difficulties investigators had obtaining information and materials.
I also note an inherent bias in the choice of papers. Highly cited reports, or “high impact” reports, tend to be the most novel, interesting, or even controversial (i.e., again, “cutting edge”), which also means that, again, they are “frontier science,” whose results are more frequently overturned.
Before I discuss the actual results, here’s another wrinkle, thrown in to emphasize the complexity involved in these replication studies. I can’t help but briefly quote Nosek and Errington themselves:
There is no such thing as exact replication because there are always differences between the original study and the replication. These differences could be obvious (like the date, the location of the experiment, or the experimenters) or they could be more subtle (like small differences in reagents or the execution of experimental protocols). As a consequence, repeating the methodology does not mean an exact replication, but rather the repetition of what is presumed to matter for obtaining the original result.
Direct replication is defined as attempting to reproduce a previously observed result with a procedure that provides no a priori reason to expect a different outcome (Open Science Collaboration, 2015; Schmidt, 2009). In a direct replication, protocols from the original study are followed with different samples of the same or similar materials: as such, a direct replication reflects the current beliefs about what is needed to produce a finding. Conducting a direct replication tests those beliefs empirically. In a conceptual replication, on the other hand, a different methodology (such as a different experimental technique or a different model of a disease) is used to test the same hypothesis: as such, by employing multiple methodologies conceptual replications can provide evidence that enables researchers to converge on an explanation for a finding that is not dependent on any one methodology.
Most replication in science is conceptual replication.
Reproducibility in cancer research: The results
So here is how Nosek and Errington describe the results of the first five replication papers:
The first five Replication Studies have now been published. Two of the studies reproduced important parts of the original papers (Kandela et al., 2017; Aird et al., 2017), and one did not (Mantis et al., 2017). The other two Replication Studies were uninterpretable because the control tumors grew too quickly or too slowly (or exhibited spontaneous regressions) to reliably measure whether the experimental intervention had the predicted effect (Horrigan et al., 2017a; Horrigan et al., 2017b): however, in one of these two cases the original paper (Willingham et al., 2012) has led to clinical trials for anti-CD47 antibody therapy that will provide extensive additional data on the effectiveness of this approach. Three of the Replication Studies are also accompanied by Insight articles (Dang, 2017; Davis, 2017; Sun and Gao, 2017).
First, let’s look at the clear failure to replicate (#2). The original paper reported the effects of a tumor-penetrating peptide (short protein), iRGD peptide, which in the paper increased cellular uptake of the chemotherapy agent doxorubicin in a xenograft model of prostate cancer. A xenograft model involves injecting human tumor cells into immunosuppressed mice and measuring their growth and the ability of the intervention tested to inhibit or reverse that growth. Basically, the Replication Study failed to find statistically significant differences in the penetrance of doxorubicin into the tumor cells, the tumor weight for mice treated with DOX and iRGD compared to DOX alone, or a measure of programmed cell death (apoptosis).
Neither of the two studies reported to be replicated (#1 and #3) was exactly a resounding replication, either. For example, in #3, the results were mixed, as described in an accompanying commentary. In the original paper, for example, the authors noted that cimetidine induced the death of human lung cancer cell line A549. So they tested three doses of the drug against A549 tumor xenografts (tumor cells implanted in mice) and noticed that cimetidine decreased the growth rate of these cells, an affect that was statistically significant, an effect that was close to as strong as that of low dose doxorubicin, a chemotherapy agent. In the replication study, it was found that cimetidine still reduced in decreases in tumor sizes, but they were not statistically significant when a Bonferroni correction for multiple comparisons was applied. However, a statistically significant effect was observed when the dataset from the Replication Study was combined with that from the original paper in a meta-analysis. What stood out to me was that the doxorubicin also failed to produce a statistically significant result in the Replication Study. Doxorubicin is a powerful chemotherapeutic agent; so I have to ask what was going on in the Replication Study. Be that as it may, as noted in the commentary, there can be many factors that influence the robustness of the xenograft models used in these experiments, including “batch effects on the efficacy of the drugs used; changes in the properties of cell lines over time; the strains of the mice used, and also their sex; factors related to microbiome and chow; circadian effects; temperature; and the antimicrobials that might be used in certain facilities.”
The second study (#1) was also somewhat mixed. As noted in an accompanying commentary, the treatment tested in mice did work in decreasing the level of c-Myc (the gene targeted) in multiple myeloma cell lines and did increase the overall survival of the mice in the tumor model used, but the results as measured by bioluminescence were not statistically significant. It is speculated that it was because many of the control mice had to be euthanized early before their pre-specified endpoint because of disease progression and high tumor burden.
In fact, anyone who’s ever done tumor xenograft experiments in mice has encountered the problem of tumors either growing so fast that the mice have to be euthanized prior to the planned end of the experiment or not growing at all. Certainly I have. Indeed, that is the reason that the other two studies uninterpretable. In #4, several tumors exhibited spontaneous regression, and in one Replication Study (#5) melanoma xenografts in the control group grew much faster than they did in the original study, which made the detection of the accelerated tumor growth due to mutations in the PREX 2 gene very difficult to detect compared to the original study. Even the author of the accompanying commentary seemed puzzled:
This Replication Study represents a cautionary tale concerning the impact of biological variability on experimental design. While strenuous efforts were made to precisely copy the experimental conditions employed in the original study, the xenografts in the Replication Study behaved in a fundamentally different way to those in the original study. The mechanistic basis for the observed differences is unclear. Presumably, there was a difference in the melanoma cells and/or the mice. Although the cells were obtained from the same source, small differences in culture conditions or passage history could have contributed to differences between the studies. Similarly, although the mice were obtained from the same source, housing the animals in a different facility may have contributed to differences between the studies.
My guess as someone who’s done quite a few xenograft experiments in a career dating back to the 1990s is that, for whatever reason, in the Replication Study, the number of tumor cells injected was too high for the mice and conditions, for whatever reason. There is a lot of biological variability in the behavior of tumors and a lot of factors that can affect their growth that might not be obvious. That’s why, if you’re doing experiments in a different institution, doing preliminary dose-response experiments to determine the optimal number of cells to inject is highly advisable. Also, what all of the Replication Studies suggest, whether they replicated key parts of the original studies or not, is that many of the animal experiments reported in the literature are underpowered to test the hypotheses under consideration.
Unsurprisingly, some scientists are not exactly fans of the Reproducibility Project. For example, Robert Weinberg, one of whose studies is a Reproducibility Project currently on hold, isn’t thrilled, having said, “It’s a naÏveté that by simply embracing this ethic, which sounds eminently reasonable, that one can clean out the Augean stables of science.” Maybe, but he doesn’t exactly say exactly what’s wrong with this ethic either.
Other scientists react this way:
This past January, the cancer reproducibility project published its protocol for replicating the experiments, and the waiting began for [Richard] Young to see whether his work will hold up in their hands. He says that if the project does match his results, it will be unsurprising —the paper’s findings have already been reproduced. If it doesn’t, a lack of expertise in the replicating lab may be responsible. Either way, the project seems a waste of time, Young says. “I am a huge fan of reproducibility. But this mechanism is not the way to test it.”
Almost every scientist targeted by the project who spoke with Science agrees that studies in cancer biology, as in many other fields, too often turn out to be irreproducible, for reasons such as problematic reagents and the fickleness of biological systems. But few feel comfortable with this particular effort, which plans to announce its findings in coming months. Their reactions range from annoyance to anxiety to outrage. “It’s an admirable, ambitious effort. I like the concept,” says cancer geneticist Todd Golub of the Broad Institute in Cambridge, who has a paper on the group’s list. But he is “concerned about a single group using scientists without deep expertise to reproduce decades of complicated, nuanced experiments.”
This is not an unreasonable concern. Nor is that of Erkki Ruoslahti, author of the one study that clearly failed to replicate:
Ruoslahti, a cancer biologist at the Sanford Burnham Prebys Medical Discovery Institute in La Jolla, California, disputes the verdict on his research. After all, at least ten laboratories in the United States, Europe, China, South Korea and Japan have validated the 2010 paper1 in which he first reported the value of the drug, a peptide designed to penetrate tumours and enhance the cancer-killing power of other chemotherapy agents. “Have three generations of postdocs in my lab fooled themselves, and all these other people done the same? I have a hard time believing that,” he says.
This is, of course, an argument in favor of conceptual replication. Yes, it is self-serving, but that doesn’t mean it’s not a valid argument.
Is there a crisis in cancer biology research reproducibility?
Traditionally, the way science is replicated is not usually through the direct replication of key experiments in papers, as the Reproducibility Project: Cancer Biology is doing. It is usually through other laboratories doing what Ruoslahti says and testing the same hypothesis using different methods and taking the next steps. Unsurprisingly, Ruoslahti is concerned that the recently published report will harm his ability to raise capital for DrugCendR, a company in La Jolla that he founded to develop his therapy. Is that fair? It might not be if his results have indeed been replicated by other groups. On the other hand, to me all science is fair game for attempts at replication.
Tim Errington, manager of the Reproducibility Project, emphasizes that a single failure to replicate is not proof that the initial findings were wrong and shouldn’t put a stain on individual papers, but surely he must know that scientists will interpret a failure to replicate in just that manner. After all, scientists have pride and ego as well. Not surprisingly, many are alarmed and defensive when informed that another scientist failed to replicate their findings. I’m sure even Judah Folkman was disturbed by the news that scientists were having a hard time replicating his results with angiostatin and endostatin. That’s where scientists need to strive to be more like Folkman. Most at least try to be, but far too many have a hard time separating their ego from their work.
Overall, I tend to look as the results of these five Replication Studies as two out of three being replicated, with the other two not counting because of flukes that introduced problems not seen in the original paper, meaning:
Such conflicts mean that the replication efforts are not very informative, says Levi Garraway, a cancer biologist at the Dana-Farber Cancer Institute in Boston, Massachusetts. “You can’t distinguish between a trivial reason for a result versus a profound result,” he says. In his study, which identified mutations that accelerate cancer formation, cells that did not carry the mutations grew much faster in the replication effort — perhaps because of changes in cell culture. This meant that the replication couldn’t be compared to the original.
Even so, the optimistic interpretation is that more than 33% of studies are likely to be difficult to replicate and the more conventional interpretation that 60% of the studies were not replicated, with the remaining 40% having some problems, I’d conclude that there is definitely a problem. However, one of the greatest strengths of science (and science-based medicine) is that it is self-correcting. The process is, of course, messy and slow, but it is ongoing. Also remember that this is a small sample, only five studies, all of which of the type that were considered “cutting edge” and therefore less “safe” than the average study. I look at research like the Reproducibility Project as the means to identify the problem. What we next need to do is to figure out what causes the problem and to focus on solutions. I’m not convinced that replicability problems in preclinical research are the major reason why so many drugs that make it to clinical trials fail to be approved by the FDA, given how much conceptual replication goes on before a drug ever makes it to clinical trials in the first place, but certainly it couldn’t hurt.
The NIH agrees. It’s recently released guidelines meant to improve the reproducibility of cancer research and recommend that journals ask for more thorough methods sections and more sharing of data, and the Reproducibility Project have produced a wiki describing how they went about their work and describing the changes they would like to see. Change is coming, and it appears to be for the better.
27 replies on “Is there a reproducibility “crisis” in biomedical research? (2017 edition)”
Thanks for bringing this up! This is a huge problem with biological and pharmacological studies.
The possibilities for fraud are manifold. One you know that the nature of your experiment isn’t expected to be reproducible, then you don’t really have to conform to any standards at all. Your work is largely uncheckable!
I don’t think you can have fraud in pure chemistry. Everything is entirely reproducible, even on a limited budget. You would be exposed within three years for any serious errors you’ve made.
Not so in biology! You always have plausible deniability. You can just play the “statistical outlier” card or the “idiosyncratic initial conditions” card or some other card.
Nice article. It’s nice to get a break from the vaccine business.
I would love a little more science, and a little less politics here.
As I have said before, biomedical research is much harder than physics research. You have a lot more variables that you can’t always fully control, and as you note with your efforts at reproducing Folkman’s results, details of how samples are produced can matter. Furthermore, you can rarely get enough subjects in your study to demand a more stringent significance test than p < 0.05, which means that inevitably there will be statistically spurious results. The only way to distinguish between genuine and spurious results is to get more data, i.e., attempt to reproduce the experiment.
There are situations in physics where details of sample preparation matter, but those physicists have the tools to figure out when that’s an issue. And sometimes a dodgy cable connection can throw off your results. But again, you can look closely at your experiment and find what’s wrong.
Well, Iggy, when politics tries to deny science, I am damned glad that someone in science speaks up.
You can’t have been around for very long, because except for this last fortnight the politics is pretty rare. Which begs the question, what brought you here?
Thank you for this piece, Orac. This is completely outside my normal range and items like it are one of the reasons I keep coming back.
Rich, it is just another Fendelsworth sockpuppet, as is the first comment above. Orac deleted three of them this morning.
The possibilities for fraud are manifold.
Was this intentional?
Some chemistry experiments are highly reproducible, others less so. Catalyzed reactions in organic chemistry are frequently technique-sensitive (there’s a reason that a journal called “Organic Syntheses” exists, to provide reproducible/reproduced methods of synthesis). But chemistry generally pales in comparison with biology – bring a live anything into the mix (especially an animal) and you add complexity and technique sensitivity up the wazoo.
Erm, biology is, at its very heart, pure chemistry. It’s just that proteins don’t always fold the way you want them to, peptides can denature very easily (with some delicate that a stern glare may denature them), etc.
DNA replication, is that or is that not pure chemistry? Except when something crosslinks, gets duplicated or deleted.
It’s pure, bloody damnably complex chemistry, where anything from brownian motion, through temperature, ph, through a miss by a shepherding protein can cause an error.
Gilbert @5, I remember that one well. The team itself stated outright that they didn’t trust the result and were examining their equipment in detail to see how they arrived at such an anomalous reading. Considering the complexity of the entire apparatus and ancillary equipment, I was surprised how quickly that they found that dodgy fiber optic connection.
Eric Lund @3, I dunno. When one gets into the bleeding edge of the quantum mechanical realm, one’s experimental setup can become quite critical. Measuring closely planc distance or planc time come to mind, or the Opera erroneous results, which the team itself didn’t trust and for a while, were left scratching their heads as they went over every component and subassembly.
Last I heard, the Planck scale is still 15-20 orders of magnitude beyond what our present experiments are capable of. The Planck length is of order 10^-35 meters, compared to a radius of order 10^-15 m for a nucleon.
But yes, details of your experimental setup do matter. Not just in quantum systems, either; my work is purely classical physics, but one of the things to watch out for in experimental design is logic race conditions. The speed of light in vacuum is about one foot per nanosecond (and in a medium it can be significantly slower). There are situations when this matters, like the OPERA anomalous result, where the bad connection produced enough of a signal delay to make the neutrinos appear to travel faster than light.
Of course in physics, when you get a result that is so contrary to the basic theory underlying your work, you know that you have to check every part of the apparatus and make sure that it works properly. And you do this because you can. In biomedical work, you often don’t have a deep underlying theory to work from, let alone one as well worked out as quantum electrodynamics. So it’s easy to overlook something in your setup. That’s honest error, not fraud, but it still holds back progress when other labs can’t reproduce your results.
@Eric Lund @12, I never did read the detailed OPERA failure analysis, but I’d not be surprised if it was short duration noise letting a schmitt trigger toggle. I’ve experienced that failure mode many times.
Don’t even get me into logic race conditions, those are one of the banes of my existence in information assurance.
Still, if one gets results that are far beyond expectation or theory, one has either discovered a fundamental law of the universe or far more likely, something is wrong with the experiment.
Still, a lot easier in physics, such as unanticipated ringing caused HF feedback into a circuit at 10x the calculated fundamental frequency, which caused the output to behave like a voltage tripler. (Yeah, I spent a day calculating the effects of the breadboard I was using and the observed frequency, then working back what happened in that circuit.)
No, biology is not chemistry. Too many people confuse biology with biochemistry, this is a real problem.
And to illustrate, and to come back to the reproducibility problem, have a look at this:
Most medical students know that insulin is released from granules contained in pancreatic beta cells in response to hyperglycemia. Not Nature editors neither Nature reviewers. It took 9 years before it was realized that the results could not be reproduced. However, a limited knowledge of physiology was enough to understand that the results were impossible. Should we reproduce every experiment we see to compensate for the problems of peer review? It is not necessary to put 10 figures with the latest technology to be true, just one experiment with appropriate controls, testing a plausible hypothesis.
Actually, your insulin example rather supports my point that biology is, at its lowest level, chemistry. With a counterexample of Insulin glargine and its mechanism of action.*
Higher level systems, yeah, not so much biochemistry unless one is trying to understand systemic interactions and control loops.
Of course, that’d be physiology. 😉
*I’m also making a joke, biochemists vs biologists. You should see me around taxonomists (is it Pan Sapiens or Homo Troglodytes? Scamper aside to watch the riot ensue).
@Eric Lund #3
As I have said before, biomedical research is much harder than physics research.
Amen to that!
My favorite story along these lines comes from chemistry, in which the conditions are much cleaner than usually found in medicine or biochem.
A post-doc blundered into completely unprecedented class of reactions – his technique was superb, and in working up the mess left behind from a failed synthesis he discovered a molecule that was not supposed to be there.
The group immediately leaped upon this new project, generating numerous examples of related reactions. They were easy and reliable, once you knew enough to try.
This constituted the “replication” of the original synthesis, rather than a literal repetition of the origin reaction.
Which is important, because the post-doc himself was not able to make it go in subsequent attempts. To my personal knowledge, no one has succeeded on the particular starting material – even as the class as a whole remains valid and useful. Quite a mystery, that.
Years ago when I worked in an academic lab we had a collaboration with another lab on the other side of the country. For one series of experiments on the immune system in mice the other lab actually flew one of their post-docs out to do the immunization (the physical injections) because their lab felt that you had to do the injection *just so* or the study wouldn’t work.
Even my boss thought it was weird, but the other PI was a big deal so we just rolled with it.
You want reproducibility issues, ugh, work with human samples some time. What a nightmare.
Is I believe what Wzrd1 was getting at.
…a post purely about science is just the break I need.
Outstanding work, it’s a great read for a few highly educated people I’m sure.
In my opinion, when you write like this most of the minions, including myself, scatter like cockroaches when the lights are turned on.
I understand why you decided to quickly follow up this science-based masterpiece with a Holocaust story.
The Pied Piper of ScienceBlogs (aka. Orac) needs to keep the lights off if he’s going to maintain a significant and consistent following.
MJD, why do you feel the need to imply that everyone here is stupid?
Yes, on posts on more ‘scientific’ topics there are fewer comments, but that doesn’t mean people aren’t reading and understanding the post. Often it just means that they don’t have anything they want to contribute to the conversation.
Whereas you go out of your way to insult everyone and our host. What does that contribute to the conversation?
Good point, I’ll mellow out.
I’m completing ~1,400 references for a book and my mind is Jello®.
You missed my point: in reducing biology to biochemistry you lose the important information and consider irrelevant details, as if if you want to understand a game of chess by knowing the wood the pieces are made of.
Daniel, as JustaTech said…
Thanks for such an insightful article!
[…] A wonderful blog post from a biomedical research view […]