One of the most frequent complaints about evidence-based medicine (EBM), in contrast to science-based medicine (SBM), is its elevation of the randomized clinical trial as the be-all and end-all for clinical evidence for an intervention for a particular disease or condition. Unknown but enormous quantities of “digital ink” have been spilled explaining this distinction right here on this blog, and I tened to like to refer to this aspect of EBM as “methodolatry,” a term I originally learned from another ScienceBlogs blogger (now moved on) and defined as profane worship of the randomized controlled clinical trial (RCT) as the only valid method of clinical investigation. The problem, of course, with methodolatry, is that it completely ignores prior plausibility, and when that prior plausibility is as close to zero as you can imagine (e.g., for clinical trials of homeopathy), then the only positive results that you see in such trials can reasonably be concluded to be due to noise, shortcomings in trial design, and bias. Unfortunately, a failure to realize this has led to many pointless clinical trials and contributed to the rise of a whole new “specialty” known as integrative medicine, dedicated to “integrating” quackery and pseudoscience into science-based medicine.
So we know that practitioners of “complementary and alternative medicine” (CAM), now referred to more frequently as integrative medicine, don’t like RCTs. They love pragmatic trials, because such trials are usually unblinded, often not randomized, and generally face a lower bar of evidence. That pragmatic trials are intended to test the “real world” use of medical and surgical interventions that have already been shown to be safe and effective in RCTs and that the vast majority of CAM nostrums have not met that standard appears not to concern them in the least. However, CAM practitioners are not the only ones critical of RCTs, as I learned when, via Steve Novella, I came across an article in The New England Journal of Medicine entitled “Assessing the Gold Standard — Lessons from the History of RCTs” by Bothwell et al. Given that the article is three weeks old, I wonder how I missed it. Be that as it may, although Bothwell et al make some good points, I tend to agree with Steve that the overall gist of the article is overly critical, to the point of, as Steve put it, portraying the RCT as broken rather than flawed and advocating revolution rather than reform.
The history of RCTs
In 2016, the RCT is so ubiquitous and unquestioned that we frequently forget that what we now know as the modern RCT didn’t exist until the 1940s, when British epidemiologist Austin Bradford Hill formalized RCT methods. Even more unbelievable to someone trained since then, the FDA did not actually require RCTs before drug approval until 1970, which was actually eight years after the Kefauver–Harris Amendments to the Food, Drug, and Cosmetic Act passed in the wake of the thalidomide disaster. That was when the FDA finally interpreted the clause in the Kefauver–Harris Amendments mandating that new drugs be proven safe and efficacious in “adequate and well-controlled investigations” to mean requiring RCTs before the approval of new pharmaceuticals. By the 1980s (which, coincidentally, encompassed the eight years when I was in college and medical school), Bothwell et al note that the RCT had been declared the “gold standard” of medical knowledge. The other relevant history that Bothwell et al recount is that by the 1990s, pharmaceutical companies had surpassed medical academia and the government as the primary producer of RCTs.
To this, I would add that the concept of “evidence-based medicine” wasn’t really formalized until the 1990s, either, complete with its now familiar hierarchies of evidence (with meta-analyses of RCTs and large RCTs themselves at the very top of the hierarchy), making EBM as a construct only a couple of decades old. That’s not to say that medicine hadn’t been evidence- and science-based before that. After all, the Flexner Report, published in 1910, was pivotal in making medicine much more scientific than it had been. It is, however, always humbling to remind oneself that for most of its history medicine hasn’t really been science-based.
I myself have discussed some of these issues, as have others. Mostly, we discuss the history of how RCTs, after being declared the “gold standard” of medical knowledge, became a tool with which advocates of unscientific medicine could ignore the incredibly low prior plausibility of a treatment and go straight to human studies based on a questionable rationale that “people are using them anyway.” As we’ve explained many times, the “blind spot” of EBM is that it relegates basic science knowledge, especially basic science knowledge that shows a particular treatment (e.g., homeopathy or reiki) to be so close to impossible as to be, for all intents and purposes, impossible, to the lowest level on its hierarchy of evidence. The whole idea of RCTs depends on the assumption that the investigation of a medical intervention won’t reach the level of RCTs without considerable preclinical scientific evidence supporting its plausibility.
CAM practitioners aren’t alone, however, in not liking RCTs because they find them too constraining. Bothwell et al note that many clinicians were not exactly thrilled with RCTs when they were first introduced, in particular the possibility of withholding promising new interventions from control groups. Bothwell et al basically provide a rundown of some of these objections c. 2016.
The problem(s) with RCTs: They’re a religion!
My issues with this article don’t mean that it doesn’t make some valid observations, some of which I myself have made. They also concede that RCTs minimize bias and have strengthened the scientific rigor of clinical study. They’ve also been important for identifying some interventions that don’t work, such as the classic example of internal mammary ligation, an operation commonly performed for angina from the 1930s to 1950s based on the rationale that ligating this artery would redirect more blood flow to the coronary arteries. It was an operation that surgeons widely believed to be effective because patients usually demonstrated an impressive improvement in symptoms. Unfortunately, when an RCT, complete with “sham surgery” was carried out, investigators discovered that the patients undergoing sham surgery exhibited the same improvement in symptoms. We’ve since learned that surgery itself induces powerful placebo effects, explaining the previous apparent efficacy of the operation.
Unfortunately, Bothwell et al then devolve into a “special flower” sort of argument against RCTs of surgical interventions:
As more surgical RCTs appeared in the 1960s and 1970s, however, surgeons increasingly recognized their limitations: each patient had unique pathological findings, each surgeon had different skills, and each operation involved countless choices about anesthesia, premedication, surgical approach, instrumentation, and postoperative care, all of which defied the standardization that clinical trials required. Sham controls could not be used for major operations, which limited opportunities for blinded trials.
Such concerns played out in debates about RCTs for coronary-artery bypass grafting (CABG). When the first major RCT of CABG revealed that most patients with chronic stable angina received no survival benefit from CABG critics pounced: the participants were too healthy, the surgeons too inexperienced, the operative mortality too high, and the statistical analysis suspect. Prominent surgeons argued that RCTs were inappropriate for surgery. René Favaloro, who had played a key role in developing CABG, argued that “randomized trials have developed such high scientific stature and acceptance that they are accorded an almost religious sanctification. . . . If relied on exclusively they may be dangerous.”
I can’t help but mention that, to my shame, surgeons not infrequently dismiss the results of RCTs using rationales that the procedure is just too complicated and has too many different moving parts, an argument not unlike that of CAM advocates, who basically argue that RCTs can’t study their woo. Be that as it may, whenever anyone compares anything in science to a religion, my skeptical antennae start twitching fiercely. But, wait, you say. Isn’t my use of the term “methodolatry” exactly the same thing? Not exactly. My use of the term is clearly meant as ironic exaggeration, something most readers will immediately recognize, whereas the surgeon quoted, René Favaloro, was clearly dead serious when he compared RCTs to religion and Bothwell quotes him that way to bolster the criticisms of RCTs. If there’s one thing I’ve observed over the years, it’s how funny it is that you almost never hear anyone criticizing a positive result of an RCT by unironically referring to belief in RCTs as religious. In contrast, it’s not infrequent to hear such a charge made about negative RCTs, usually by members of the specialty that profits from the intervention found wanting. It is depressing that Bothwell et al fell into the trap of uncritically citing objections of the nature those made by Dr. Favoloro.
Of course, as Dr. Prasad notes, it’s impossible to completely prove a negative with a clinical trial, and his retort was so good that I’ll quote it directly:
This strategy of saying negative data is not good enough is akin to asking someone to prove Santa Claus does not exist. After the person does a census of all people in the world, and finds no Santa Claus, believers note that he didn’t look under the ocean, in all of the deserts, etc. You cannot prove an intervention does not work under ANY circumstances, you must instead show that it can work under SOME circumstances. Proponents of stable coronary stenting have FAILED to do that, and now are sabotaging the ongoing ISCHEMIA trial from enrolling enough patients. After all, ISCHEMIA can only erode their 10-15 billion dollar market share.
Does this sound familiar? It’s the same thing CAM advocates do after a negative clinical trial of whatever woo they have tested. They slice and dice the data looking for a subgroup, no matter how small, in which the treatment works, and sometimes the fact that the positive finding was only in a subgroup gets lost in how the message is communicated. My favorite example of this is TACT (Trial to Assess Chelation Therapy), a fairly frequent topic of this blog, as an example of how integrative medicine embraces pseudoscience, does RCTs on it, and then either rejects the negative ones or spins them as being positive. TACT was basically a completely negative trial — except for one subgroup for whom the seeming positive result had a lot of red flags that suggested that it was probably spurious. Basically TACT showed that chelation therapy doesn’t work for coronary artery disease, with the possible exception of diabetics. Even if you accepted that result, taking it at face value, the honest thing to do would be to stop using chelation therapy in any patient with coronary artery disease other than diabetics. Of course, that’s not what happened. Believers still use chelation for everyone with heart disease.
In fairness, clinical trialists not studying woo can be prone to this very tendency. We’re all human, and when you’ve put a lot of work, time, and effort into a clinical trial, it can be very hard to accept a negative result and move on. Human nature demands that you dig as deep as you can to salvage something from all that work. Given that, my tendency is to counter Bothwell et al by suggesting that physicians’ faith in some medical treatments can be almost religious in nature, to the point where they lash out at negative clinical trials, no matter how well-designed.
The problem(s) with RCTs: More problems with doctors than RCTs?
Failure of adherents to pay attention to negative clinical trials is a problem, as Bothwell et al point out, which brings me to another complaint:
Even well-conducted RCTs sometimes failed to influence medical practice. In the late 1960s, the meticulously designed University Group Diabetes Program trial linked the antidiabetic drug tolbutamide with increased cardiovascular mortality. Yet tolbutamide prescriptions paradoxically increased as controversies over the trial’s conduct and interpretation persisted for more than a decade. A similar scenario occurred when the publicly funded ALLHAT (Antihypertensive and Lipid-Lowering Treatment to Prevent Heart Attack Trial) revealed in 2002 that generic thiazide diuretics were as effective as newer, expensive calcium-channel blockers and angiotensin-converting–enzyme inhibitors in treating hypertension. As these findings were contested by pharmaceutical manufacturers and skeptical physicians, sales of the newer antihypertensives grew faster than those of diuretics. Another 2002 RCT — a sham-surgery–controlled trial — defied conventional wisdom by showing no benefit of arthroscopic débridement for chronic osteoarthritis of the knee. Many orthopedic surgeons dismissed the results and continued performing the procedure, even as the findings were confirmed repeatedly.
I’m with Dr. Prasad again on this one, when he notes that failure to abandon treatments after large negative RCTs is more a failure to educate doctors how to evaluate evidence and a failure of third party payors to stop paying for such procedures. He’s particularly spot on here, noting that doctors’ refusal to abandon “debunked” treatments is:
…testimony to how hard it is to put the genie back inside the bottle once it is out, and if anything it should make us more worried about approving, using and paying for therapies that have not shown upfront benefit in RCT.
I’m also confused here by Bothwell et al. They just spent much of their article complaining about how RCTs have become the “gold standard” that trumps every other form of medical knowledge, a complaint that is not completely without validity but does tend to be overblown and, whether Bothwell et al realize it or not, used as a tool to argue for accepting less rigorous scientific evidence as the basis for adopting a treatment. Now they complain that physicians ignore the results of negative clinical trials, which is undeniably true more often than any of us would care to admit. I’ve even discussed such examples myself. My point here is not to deny that both can be true (RCTs are undeniably considered the “gold standard” of medical knowledge and doctors do from time to time reject RCT results they don’t like) but rather as a prelude to addressing another complaint by Bothwell et al, which basically boils down to a “Well, duh!” observation:
Yet RCTs have never monopolized medical knowledge production. A quick scan of the medical literature reveals that older methods, including case series and even case reports, continue to be valuable. New methods of observational research continue to emerge — for instance, using large databases of patients to produce comparative effectiveness data on various treatment outcomes relatively efficiently in settings of routine care. Physicians also continue to rely on physiological rationales in addition to empirical data. Coronary angioplasty and the stents that followed rose to prominence thanks not to a successful RCT but to the intuitive logic of the techniques and the compelling visual evidence provided by angiography.
Yes, less rigorous forms of clinical study are indeed valuable and frequently published in the medical literature, but they are not nearly as valuable in the way implied by Bothwell et all, namely, as Vinay Prasad puts it, “MAKING A THERAPEUTIC RECOMMENDATION TO A PATIENT AND KNOWING YOU AREN’T SELLING THEM BULLSHIT.” (Capitalization his.) They can be valuable for hypothesis generation. They can be valuable to see if there is any evidence of an effect of a new treatment that would justify a full RCT. They can be valuable for looking for adverse reactions and complications. But for deciding definitively if a treatment works or not? Not so much. That’s not to say that they can’t be used for that. Sometimes we have no choice, for example, when for whatever reasons RCTs haven’t been done and/or can’t be done for ethical reasons to address a question. In those cases, which are not infrequent, less rigorous forms of evidence are all we have to fall back on until better evidence is available. It’s also hard not to mention here that if physicians can find reasons to justify rejecting the results of rigorous RCTs, relying on less rigorous studies will just give them a better excuse to ignore clinical evidence.
Also, doctors do continue to apply physiological rationales to justify treatments. Need I remind you that these physiological rationales are frequently incorrect? Doctors at the time thought they had a perfectly valid physiological rationale for using internal mammary artery ligation, for instance, to treat angina. When studied in an RCT, it didn’t work better than placebo. Dr. Prasad also notes another example, using stents to “open up” blockages causing stable angina (angina that comes on after a certain amount of exertion that isn’t changing). After all, the before-and-after pictures are so impressive! Look at how that blood vessel has been opened up! But:
If you want proof of this, see a recent Twitter string started by John Mandrola, where he questioned whether afib ablation requires a sham RCT to demonstrate QoL benefits beyond placebo. The response from EP doctors was fierce. Many tweets in reply showed images of afib that terminated after ablation, but of course, this is besides the point. No one doubts that you can terminate afib (a surrogate) in some patients, the question is whether this translates into an improvement in quality or quantity of life beyond the procedure itself. Same is true for stenting for stable coronary disease. No one doubt it opens the narrowed artery, but it doesn’t decrease MIs or mortality (COURAGE), and are gains in symptoms real, or simply an expensive placebo effect? (COURAGE did not have a sham control) A sham trial will tease it out. Sham controls aren’t needed for OBJECTIVE outcomes, but they are needed for SUBJECTIVE ones.
Which is what I’ve been saying about oncology trials. If the outcome you’re looking at is overall survival, that’s about as objective an outcome as you can imagine. Either you’re alive or you’re dead. Also, it’s unethical to use a sham control in many instances when the outcome being measured is so objective, which is why most oncology trials compare standard of care versus standard of care plus new intervention. Also, what Dr. Prasad doesn’t address is that some seemingly “objective” outcomes might not be entirely objective. For instance, relapse-free survival can be affected by how often and how carefully one screens for relapse, and interpretation of radiologic scans isn’t always 100% objective because human beings do it. That’s why blinding of radiologists reading scans for trial patients is important.
Unfortunately, Dr. Prasad also seems a little too blithe about the ethics of some RCTs for my taste:
Now what about sham trials of total knee arthroplasty for pain—in other words big surgeries. Well, I bet if we bucked up, wiped away surgeon tears, and ran them (control arm gets sedation and a long incision on the skin, and maybe some superficial hardware for them to palpate to simulate replacement), we would find that there may be a benefit for people with true joint instability, but maybe not for those merely with pain. Don’t get me wrong– I don’t know what such a trial will show, but it is entirely plausible that we will find that the procedure is no more than a sham operation for a large portion of customers. It will probably take another 10 years for orthopedic surgeons to muster the courage to do these trials, as it would threaten 30 billion a year in revenue. But such a trial would be a tremendous good for the public. Even if it confirmed benefit it would assure us that hundreds of billions and many complications were not in vain (and give us real $ per QALY figures not guesses). I however do not assume benefit here.
Which is reasonable, but greatly downplays the ethics of doing sham surgery-controlled RCTs of surgical interventions. Ethical issues in human subject experimentation are why in a lot of cases, we don’t have evidence from RCTs with a sham control group to support surgical interventions. I also can’t help but note that for objective outcomes sham controls aren’t necessary anyway. Dr. Prasad tells me so. Of course, most outcomes from such procedures are not, strictly speaking, objective: pain, stiffness, etc.
The problem(s) with clinical trials: They’re too slow!
Not all of the criticisms leveled at RCTs by Bothwell et al are invalid. Indeed, they make a good point that RCTs have become more bureaucratic and, as a result, very expensive, now requiring costly infrastructure for research design, patient care, record keeping, ethical review, and statistical analysis, noting that a single phase 3 RCT can cost $30 million. (Remember how I mentioned that I could do a decent-sized RCT for the $25 million that it cost to do the rat study that allegedly showed a link between cell phone radiation and cancer?) This is undeniably true. Unfortunately, it’s not always clear how to decrease the bureaucracy and still protect patient safety and enforce ethical standards, particularly given how the high cost of clinical trials provides a huge incentive to investigators and stakeholders to produce positive results. Although they don’t make a judgment on the 21st Century Cures Act (I describe it as nothing more than a false promise that loosening regulatory standards will deliver cures faster), they do correctly note that it is in part a reaction to these problems with clinical trials and would curtail the use of RCTs in drug and device development in the name of speed and increased efficiency.
One consequence of how slowly clinical trials are conducted is this:
One long-standing, possibly intractable, concern has been the discrepancy between the time frame of RCTs and the fast pace of innovation. In debating how best to evaluate CABG in 1976, surgeons complained that “just when we have accumulated enough data over a sufficient time period, we find that surgical technique has improved or medical therapy changes, or both, and conclusions no longer apply.” Major RCTs have often required many years for patient enrollment, follow-up, and analysis. In cases of rapidly evolving therapies, RCT results have seemed outdated before they were published. When the COURAGE (Clinical Outcomes Utilizing Revascularization and Aggressive Drug Evaluation) trial showed disappointing efficacy results for coronary angioplasty in 2007, the procedure’s advocates argued that the results were no longer relevant because the bare-metal stents tested in the trial had been replaced by newer drug-eluting stents. This logic, which assumes the superiority of any innovation, has created a setting in which trialists struggle to keep up with continuous innovations, similar to the “Red Queen” effect in evolutionary biology.
No doubt this is a problem, and we do tend to assume, often without justification, that new is better, particularly when it impacts our livelihood. So when a trial of old stents show they don’t work as well as believed, it’s not surprising that the counterargument would be that we have this nifty new and improved stent that really is better and must be working. Of course, by the time these new drug eluting stents are fully tested, there’ll be something else.
Then there are patient expectations and demand:
The AIDS crisis brought many tensions into stark relief in the late 1980s. Patients, frustrated that RCTs would delay approvals of antiretroviral drugs, demanded access before trials had been completed. Clinicians felt conflicted between their roles as physicians and as scientists. Activists won support for more flexible approaches to clinical research, including the use of surrogate end points, conditional FDA approvals, and parallel tracks to provide access to drugs outside of trials. Critics worried that the loosened standards undermined scientific rigor and encouraged a risky deregulatory agenda championed by the drug industry.
This issue lives on in the “right to try” movement, which has resulted in laws in many states that purport to provide access to experimental drugs to terminally ill patients off trial. Jann Bellamy and I have addressed how deceptive the claims behind these laws are and how they won’t really do anything to help terminally ill patients, but clearly part of the impetus for these laws, exploited brilliantly by the free market worshiping and libertarian-leaning Goldwater Institute, is the understandable frustration at how long clinical trials take, just as the activism of AIDS activists and victims in the 1980s and 1990s derived from the same fears.
Then, of course, there’s the development of “precision” medicine (formerly known as “personalized” medicine), which involves using various genomic markers and other biomarkers from individual patients to determine which treatment should be used. On the surface, this appears to be a major challenge to RCTs, but I’ve come to think that the challenge, while formidable, is perhaps not so insurmountable as critics of RCTs would like you to think. After all, it is quite possible to test standard-of-care against a precision medicine strategy designed to tailor the each patient’s treatment based on a set of biomarkers. This was done in the SHIVA Trial, to which I alluded in another post, and guess what? No advantage was observed in the “precision oncology” arm of the trial. That doesn’t mean precision medicine doesn’t have promise (the strategy used just might not have been optimal), but it does suggest that we should be a bit more circumspect about the promise of “precision medicine” and perhaps less pessimistic about the ability of scientists to develop new RCT methodology to study precision medicine approaches. RCTs are evolving.
Is the proposed solution worse than the disease?
There is no doubt that developments in medicine, particularly the genomics revolution, are placing strain on the standard method of doing RCTs and the infrastructure that supports it. Couple that with patient demand for faster drug approval (although I wonder what they’ll say if that faster drug approval process leads to more issues like Vioxx or even worse), and it’s clear that RCTs must evolve. I’m just not sure I can get behind what Bothwell et al say about how they should evolve:
The idea that RCTs would be the only authoritative arbiter to resolve medical disputes has given way to more pragmatic approaches. Experimentalists continue to seek new methods of knowledge production, from meta-analyses to controlled registry studies that can easily include large numbers of diverse patients. Observational methods are seen as complementary to RCTs, and new forms of surveillance can embed RCTs into the structure of data collection within electronic health records. RCTs are now just a part — though perhaps the most critical part — of a broad arsenal of investigative tools used to adjudicate efficacy and regulate the therapeutic marketplace. This status may continue to evolve with the recent turn (back) to personalized or precision medicine. As medicine focuses on the unique pathophysiology and coexisting conditions of individual patients, the applicability of the generalized data produced by RCTs will come under intensified scrutiny.
Before I provide my take, I can’t help but quote Dr. Prasad again about the issue of meta-analyses:
Uh, meta-analysis is most often pooled RCTs—so this makes no sense. You can’t say we don’t need potatoes because we have French fries—what are fries made of?
But what about the rest? Controlled registry studies are useful for post-marketing surveillance but for determining what works and what doesn’t? Again, not so much. Observational studies, again, can be useful for tracking outcomes and adverse events, which can definitely complement RCTs, but for determining efficacy they aren’t that great. It’s hard not to interpret an underlying attitude in this article that the way to overcome the problems with RCTs is to start accepting less rigorous forms of evidence for efficacy, particularly in light of this introduction to one section:
RCTs have also unintentionally limited the producers of medical knowledge. When case reports constituted valid evidence of therapeutic efficacy, a single physician, drawing on clinical experience, could write an article that might change clinical practice.
They say that as though it’s a bad thing that doctors churning out case series don’t have nearly as much influence as they used to. It’s not. Most assuredly, it is not. Also, there are times when RCTs cannot be done, such as for rare diseases or for questions for which an RCT would be unethical, such as a randomized, controlled clinical trial of vaccinating versus not vaccinating. In those cases, we have to make do synthesizing lesser forms of evidence.
In the end, I can’t agree with Dr. Prasad that this article is some sort of subtle and dangerous threat to RCTs or that the NEJM is somehow intentionally trying to undermine RCTs as the primary method of clinical investigation. I do, however, agree with Steve that the answer is reform, not overthrow, and with Dr. Prasad that the solution to the problem of RCTs that no longer serve our medical needs as well as we require is to design better RCT methodology, not to start accepting less rigorous forms of evidence.