Last week, I tried to argue how the case of how ivermectin, the antihelminthic medication that is used in both animals and humans to prevent and treat diseases caused by roundworms, became the focus of “miracle cure” claims for COVID-19 based on its in vitro antiviral activity, was an excellent illustration of how science-based medicine isn’t just for highly implausible treatments being embraced by so-called “complementary and alternative medicine” (CAM, or, as it’s now more frequently called, “integrative medicine”). I won’t rehash the argument, given how long last week’s post was, other than to point out that ivermectin always had a very low plausibility and probability of working in a randomized controlled trial (RCT) in humans because the concentration required to produce antiviral activity in cell culture was at least 50-fold higher than the highest concentration that can safely be achieved in vivo in human beings. Unsurprisingly, high-quality trials increasingly find that ivermectin is indistinguishable from placebo when used to treat COVID-19. As great a medicine as it is for roundworm infestations, ivermectin just doesn’t work against COVID-19. There’s no good reason to have expected that it would either, at least if you take basic science (specifically, basic pharmacology) into account. I even compared ivermectin to acupuncture, an even more improbable treatment for which high quality RCTs increasingly find no effect distinguishable from placebo/sham treatments but for which advocates have pivoted to lower quality evidence.
Last week I came across an article that basically reinforces this point. Published in The BMJ by an “international panel including patients, clinicians, researchers, acupuncture and surgery trialists, statisticians, and experts in clinical epidemiology and methodology”, it is titled “How to design high quality acupuncture trials—a consensus informed by evidence“. Thinking that I might need another break from writing about COVID-19, this consensus article seems like yet another example to illustrate what I mean regarding the differences between science-based medicine (SBM) and evidence-based medicine (EBM), particularly in the importance of taking into account prior plausibility based on basic science and, more importantly, how advocates of pseudoscience (like acupuncture) and bad science (like the conspiracy theorists promoting ivermectin as a treatment for COVID-19) use the same sorts of arguments about evidence.
In fairness, it’s not all bad. Indeed, on the surface the article seems to be advocating more rigorous science. However, as I read it I couldn’t help but notice how the authors, rather than focusing on studies to ask if acupuncture actually has any specific effect on any condition and emphasizing the importance of double blinding (including the acupuncturists) and proper use of sham acupuncture (i.e., retractable needles or acupuncture at the “wrong” acupuncture points), they dilute the discussion of scientific rigor with discussions of when and why to use less rigorous trial designs. I suspect that this betrays the belief among many acupuncture proponents that they need to test how or where it works, rather than if it works.
The article begins:
The original and revised Standards for Reporting Interventions in Clinical Trials of Acupuncture (STRICTA)10 11 focus on reporting not optimal trial conduct. Other acupuncture guidance covers type of trial (efficacy,12 effectiveness13), design, and lack of representativeness (include an international panel14), but is not informed by comprehensive and systematic evidence synthesis.
To address the most prevalent design and methodological concerns in current acupuncture randomised trials, an international panel including patients, clinicians, researchers, and trialists in acupuncture, surgery, statistics, patient engagement, and clinical epidemiology developed guidance for research teams planning acupuncture trials. To inform the guidance, we conducted a systematic survey15 exploring characteristics associated with acupuncture treatment effects.
We established a steering committee including frontline acupuncture clinicians, acupuncture trialists, clinical trial methodologists, and a statistician (XHJ, JPL, LXL, CMW, LT, YQZ, RMJ, and GHG) with extensive experience in acupuncture and trial methodology. The steering committee recruited an international expert panel of 27 experts (from Asia, Europe, America, and Australia), including patients, frontline acupuncture clinicians, acupuncture and surgical trialists, clinical trial methodologists, and statisticians. Acupuncture trialists and methodologists were appointed to the panel on the basis of their h index and assessment of their research expertise, and frontline clinicians on the basis of their clinical experience and reputation.
If there’s one thing that I’ve been discussing for a very long time both here and elsewhere, it’s how purveyors of pseudoscientific, unscientific, and religion-based treatments love to set up pseodomedical organizations. (Retired colleague Dr. Kimball Atwood dubbed them “pseudomedical pseudoprofessional organizations” way back in 2008.) Naturopaths have done it, as have a number of other pseudomedical specialties, resulting in highly dubious “evidence-based” guidelines incorporating quackery, as was done for breast cancer several years ago (with two naturopaths as co-authors) and for cancer in general by naturopaths.
So let’s look at The Good, The Bad, and The Ugly.
The Good: Questions not specific to acupuncture
Let’s first concede what’s good about the guidelines. First, the authors emphasize points that are important for designing any clinical trial, starting with very basic questions:
Choosing the research question
Consideration 1: Is the question important?
Trialists should establish the rationale for the research question informed by a systematic review of the relevant literature.
Consideration 2: Who should the participants be?
Trialists with a primary explanatory or mechanistic objective could include people who are most responsive to the intervention. With a primary pragmatic or practical objective, trialists could include a broad population varying in age, severity, comorbidity, and exposure to other interventions.
Consideration 3: How can trialists address possible differences in effect across patient groups?
For pragmatically oriented trials include heterogeneous populations, with prior specification of subgroup analyses to consider hypothesised effect modification.
Of course, Consideration 1 is very important. Those of us who, based on its lack of a plausible mechanism and the observation that the more rigorous a clinical trial of acupuncture is, the more likely it is to be a negative study, view acupuncture as a theatrical placebo, might argue that no question involving acupuncture is important any more. However, I accept that not all, even on “our side” would agree with this assessment.
Let’s, for the sake of argument, postulate that we have an important question about acupuncture to address. Then the next consideration about patient selection becomes important. But what do the authors mean by “people who are most responsive to the intervention”? My point is simple: How does one know if someone is likely to be most “responsive” to acupuncture? One can’t, other than that maybe people more responsive to placebo effects in general might be more responsive to the ultimate in theatrical placebo.
Note, however, the mention of pragmatic trials. As I wrote last week and emphasize again this week, pragmatic trials are clinical trials that seek to assess the efficacy of an intervention in the general population, outside of the strictly defined inclusion and exclusion criteria of a clinical trial and the strictly enforced protocols for monitoring outcomes. The difference between RCTs and pragmatic trials is that pragmatic trials generally don’t include a placebo group, include a wider range of people, and sometimes are not even randomized. Sometimes there aren’t even control groups. Why? As Steve Novella and I have emphasized for years and years, pragmatic trials operate under a very important assumption; that is, that the treatment being studied has been validated as efficacious and safe in RCTs. Remember, the idea behind pragmatic trials is to assess how well a treatment validated in RCTs works “in the wild”, so to speak, not to determine whether it does work in the first place. Doing a pragmatic trial of a treatment that has never been validated as efficacious for anything is, as I so frequently say, putting the proverbial cart before the horse. Basically, pragmatic trials are very prone to highly “positive” results due to placebo effects, because there is no placebo control.
Again, in fairness, I’ll note that it is reasonable in a pragmatic trial to include heterogeneous populations. That is, after all, rather the point of a pragmatic trial (or at least one of the main points). It is also mandatory to prespecify subgroups, in order to avoid post hoc analyses and “data mining” that find apparent effects that aren’t real. Such post hoc analyses are always meant—or should be intended—as hypothesis-generating exercises, not as hypothesis testing exercises.
Other examples of The Good (or at least the non-objectionable) in this paper include its discussion of engaging patients, deciding the duration of followup, and choice of outcomes (although that last one does skew towards “soft”—i.e., subjective—outcomes, like pain).
Now let’s move on to The Bad.
The Bad: Typical acupuncture tropes
Much of the rest of the paper demonstrates the biases of the acupuncture believers who wrote it. For example, let’s start with a rather telling omission. It’s not exactly an omission. It’s almost as though the authors acknowledge an issue in a rather sidelong way without actually addressing the most important analysis to avoid it. I’m referring to adherence to treatment:
Consideration 8: How should trialists deal with adherence?
When designing a trial that primarily takes the individual patient’s perspective, trialists might consider implementing strategies to achieve optimal adherence. When trialists take a public health perspective, there is no need to implement strategies to increase adherence.
“Intention to treat analysis”, anyone? It’s funny how acupuncturists accept that adherence might be a problem in RCTs, but not once do they mention an intention-to-treat analysis. I can’t help but note that the recent negative RCT of ivermectin for COVID-19 that I discussed included not just one but more than one variety of intention-to-treat analyses. Yet, seemingly, the authors of this study can’t seem to bring themselves to type the words. It’s just an example of how acupuncturists mistake rigor for bias against them, and, just as the pandemic seemingly hasn’t stopped bad acupuncture studies, it also hasn’t stopped promotion of acupuncture pseudoscience.
Instead, we get handwaving like this:
Thus, when designing a trial that primarily takes the individual patient’s perspective, trialists should consider implementing strategies to achieve optimal adherence. One strategy is to engage patients to help design the trial. Patients can identify and improve onerous trial design, therefore increasing participation. Another strategy is to include a run-in period before randomisation and only randomise patients who are highly compliant in the run-in.
Including a run-in period is highly feasible in drug trials when patients may be offered a placebo during the run-in or when the trial is focused on morbid or fatal events and short term exposure will have no influence on the outcome. The situation is more complicated in acupuncture trials focusing on symptoms in which initial exposure to treatment during a run-in may complicate inferences about what occurs after randomisation. This is probably why acupuncture trials rarely use run-in periods to enhance adherence.
While this is all well and good, it’s not enough. Again, I find it telling that nowhere in the paper is the importance of intention-to-treat analysis in an RCT emphasized. Another option is, as I discussed elsewhere, the per-protocol analysis, in which only patients who complete the entire clinical trial according to the protocol are counted towards the final results. Again, best practice in RCTs generally includes both an intention to treat analysis and a per-protocol analysis.
Next up is the actual intervention:
Selecting the intervention
Consideration 4: Who should perform the intervention?
Trialists should report the expertise of the acupuncturists. A trial that aims to show whether an acupuncture treatment can work under ideal conditions will choose the most expert practitioners available. Trialists aiming to establish the effect of treatment in ordinary practice will select clinicians with typical levels of expertise.
Consideration 5: What specific acupuncture technical features should be considered if aiming to design trials for maximum treatment effect?
If aiming for maximum treatment effect, trialists should select a high frequency of acupuncture treatment sessions and penetrating type of acupuncture (manual and electroacupuncture) over lower frequency or non-penetrating (transcutaneous electrical acupoint stimulation (TEAS), laser, acupressure) acupuncture.
I do find it rather amusing how the authors emphasize “expertise”. Obviously, no one would discount the importance of expertise in an intervention for surgery, procedure-based interventions, and the like. Again, this makes me think of how pseudoscientific specialties like acupuncture love to mimic real medicine and claim that expertise matters, even if the entire rationale behind acupuncture is based on prescientific vitalism and acupuncture points have no anatomical or physiological correlates. Also interestingly, notice how the recommendation is for high frequency over low frequency and lower invasiveness (e.g., acupoint stimulation or “laser acupuncture”).
Finally for this point, notice how the authors have mixed together electroacupuncture and acupuncture as though they were interchangeable. I like to point out how this is a “bait-and-switch” because, contrary to the “ancient practices” to which acupuncture enthusiasts refer to justify sticking needles into people to treat them, there was no understanding of, much less ability to harness, electricity for anything millennia ago, when acupuncture was supposedly devised. “Electroacupuncture”, in reality, is nothing more than a clever rebranding of acupuncture by hooking up a battery or generator to the needles to run low level electrical current between the needles. It is, in essence, transcutaneous electrical nerve stimulation, which might actually work for some indications. I’ve always suspected that the reason acupuncturists started hooking up acupuncture needles to a current source was because they realized that acupuncture really didn’t do much of anything.
It gets worse.
The Ugly: Twisting EBM to support quackery
The most obvious propaganda aspect of this set of “guidelines” comes in its discussion of the best comparator to use in a clinical trial of acupuncture:
Choosing the comparator
Consideration 6: Are trialists interested in the specific effect or the overall (specific and non-specific) effect of acupuncture?
Trialists should blind data collectors, outcome assessors, and data analysts and carefully consider the desirability of a sham that leads to underestimation of acupuncture’s treatment effects in clinical practice.
Fine tuning the flexibility of intervention and comparator
Consideration 7: To what extent should the trialist choose a flexible intervention and comparator?
Trialists with a primary explanatory or mechanistic focus might specify that practitioners administering both the intervention and the control use a highly standardised approach. Trialists with a primary pragmatic or practical focus might include clinicians with varied techniques reflecting practice in the community and instruct them to use their usual treatment approaches.
Consideration 6 is, of course, rather obvious, but notice the phrasing. The authors do advocate blinding the data collectors, outcome assessors, and data analysts, but what about the patients and the acupuncturists themselves? In that case, they only weakly advocate that a “sham” acupuncture intervention should be “considered.” Also notice the part about a “sham that leads to underestimation of acupuncture’s treatment effects in clinical practice”. Let me rephrase this briefly. Imagine a new medication being tested as, say, pain reliever. Imagine my arguing, “Trialists should blind data collectors, outcome assessors, and data analysts and carefully consider the desirability of a sham that leads to underestimation of the drug’s treatment effects in clinical practice.”
The whole point of a placebo control or a “sham” treatment is to reduce nonspecific (i.e., placebo) effects. Yet the authors consider what is normally considered a desirable effect of using a placebo, to control for nonspecific effects that might produce a false “positive” result, a negative. That’s because acupuncture is a placebo. Even if acupuncturists will never admit that, at least not directly, they do seem to realize it, even as they invoke all sorts of caveats regarding which sham to use or even whether to use a sham at all:
In drug trials, which are closer to a explanatory or mechanistic design, trialists often use a placebo to achieve blinding. Acupuncture trials, however, face multiple challenges when using placebo or sham acupuncture for blinding.15 It is nearly impossible to blind clinicians who deliver acupuncture, although one device exists that allows practitioner blinding but limits the acupuncture to a rather superficial version (maximum 5 mm insertion).45 Participants who have had previous acupuncture experience may also be hard to blind.47 48
Whether to attempt blinding in acupuncture trials depends on the hypotheses and objectives of a trial. Here, we will refer to a treatment’s biological effects as specific effects and placebo effects as acupuncture’s non-specific effects. Trialists examining explanatory questions should include sham acupuncture control with adequately blinded participants to differentiate acupuncture’s specific and non-specific effects.45
The results of such explanatory studies require careful interpretation, however. The clinical and basic science literature support the possibility of specific effects generated by sham acupuncture49 50— that is, a failure to show a difference between real and sham acupuncture may be because the sham has effects closely related to that of the intervention. Therefore, when using sham control, trialists must consider the possibility of a specific effect generated by the sham and thus underestimating the effects of acupuncture in clinical practice compared with no intervention or other interventions such as drugs.
Contrary to this claim, it is clearly not “nearly impossible” to blind acupuncturists to which intervention they’re using “acupuncture” or “sham” acupuncture. I will certainly concede that it is certainly not easy to do so—indeed, it is quite difficult—but “nearly impossible” betrays a very common attitude among acupuncture researchers that it’s so difficult to blind the acupuncturists that it’s just not worth doing, particularly given that it will lead to an “underestimation” (in their deceptive words) of nonspecific effects. I’ve written about just such studies going back to almost the very beginning of this blog. Certainly, for “electroacupuncture,” it’s almost trivially easy; you just have to make a current source that looks as though it’s working, whether it’s actually delivering current or not. Given that the current generally used is so low, chances are good that the subjects won’t be able to tell the difference.
Also, note how the authors try to claim that there are “specific” acupuncture-like effects from “sham” acupuncture, which—of course!—dilute any “true” differences between acupuncture and sham. So, even though the “best practice” is to use sham/placebo control acupuncture in RCTs, the authors tack so many caveats onto that recommendation suggesting that the use of sham acupuncture as a placebo control isn’t actually scientifically justifiable because, supposedly, you can’t blind acupuncturists to the treatment group, “sham acupuncture” isn’t sufficiently the same as acupuncture to be a good sham, and sham actually works like acupuncture anyway. So why bother?
Observe what I mean:
Previous studies of shams have focused on pain and chronic pain and conducted univariable analyses (web appendix 2, 2.2.2 (2)). We used multivariable analyses in our systematic survey, adjusting for other potential factors such as treatment frequency, flexibility of the acupuncture regimen, and sample size. Like a previous systematic review,51 we found the type of sham did not influence acupuncture’s effect.15 This is perhaps unsurprising considering the finding of meta-epidemiological studies in areas beyond acupuncture: results of such studies have proved inconsistent, with one recent thorough review showing no systematic effect of blinding.52
Considering all the evidence, the effect of the type of sham remains uncertain and might vary with the type of medical condition. Over 90% of acupuncture trials focus on pain, quality of life, function, and other symptoms. These are all subjective outcomes in which blinding may be particularly important.15 Trialists should therefore carefully consider the desirability of a sham in randomised trials investigating efficacy and, if desirable, the nature of the sham in relation to their trial’s specific objectives. Trialists should also be aware that the use of non-penetrating needle shams mandates the use of the same device in the real acupuncture group. These devices may impede real acupuncture treatment effects.
The authors cite a “meta-epidemiological study” (i.e., a meta-analysis of meta-analyses) of acupuncture studies that purports to claim that, in contrast to basically everything we know about the use of placebo controls in clinical trials that use outcomes with a degree of subjectivity to them (e.g., pain), that the use or non-use of a sham control or blinding of the patients, clinicians, or evaluators doesn’t matter. Interestingly, this study actually suggests that, as the subjectivity of the outcome being assessed increases, the chances that a lack of blinding could contribute to a “positive” outcome but didn’t reach statistical significance. Of course, this sort of study has a number of problems, not the least of which is that few studies included in the meta-analyses actually assessed whether blinding was effective, with the authors also noting that, “all instances the credible intervals were wide, including both considerable difference and no difference.” In other words, there was lots of noise.
The authors also note:
We did not expect to find that our study does not firmly underpin standard methodological practice. Further, our results are coherent with other meta-epidemiological studies that have reported similar results. The implication seems to be that either blinding is less important (on average) than often believed, that the meta-epidemiological approach is less reliable, or that our findings can, to some extent, be explained by lack of precision.
I would tend to suspect the latter two explanations, given the generally poor quality of most acupuncture studies, even those ostensibly reported as RCTs and the fact that meta-epidemiological studies are still rather ill-defined. Also, it is true that blinding matters less for “hard” outcome measurements, such as death, than it does for more subjective outcomes, such as pain, but few are the acupuncture studies that look at such “hard” outcome measures.
And, again, the authors almost seem to realize that acupuncture is placebo:
Acupuncture therapies are multifaceted interventions that incorporate patient-practitioner interactions, treatment theories, tools to deliver the stimulations, manipulation techniques, and points selection. The trial objectives should determine the flexibility of the intervention or control.
“Patient-practitioner interactions” are one of the strongest influencers of placebo effects.
Acupuncture: The ultimate placebo
We’ve discussed how acupuncturists have, as more rigorous RCTs that incorporate best practices of RCTs such as sham controls, placebos, and (at least) double blinding tended to be negative, relied on less rigorous pragmatic trials, which are more prone to bias and placebo effects appearing to be specific effects. While placebo effects are not as relevant when considering ivermectin trials, what is relevant is the shift to citing lower quality evidence as higher quality evidence increasingly comes back negative. Similarly, so is the tendency of acupuncturists to make excuses for why they don’t want to do more rigorous studies and why rigorous clinical trials tend to be negative, such as casting doubt on the scientific legitimacy of sham interventions as the best comparators. In the case of ivermectin trials, I’ve discussed how there are endless excuses in the form of this:
What is most important to me, however, is that for both interventions (indeed for all interventions) the bar of evidence for a positive effect should be much higher when studying an incredibly implausible intervention, be it acupuncture, ivermectin for COVID-19, or even homeopathy, because basic science matters. What we find instead is that advocates of incredibly implausible treatments always cite lower quality evidence and even, while seeming to advocate guidelines for high quality RCTs, still try to tilt the field to increase the likelihood that their treatment will appear to be validated by science.
10 replies on “Is it even possible to design high quality acupuncture trials?”
Orac discusses how electro-acupuncture/ TENS might be blinded for the experimenter: they wouldn’t know if the current was actually applied or not although the set-up looked functional.
BUT how about the difficulties of blinding subjects?
“real” acupuncture ( that may be an oxymoron) is probably usually not felt** whilst TENS may be an obvious vibration or more apparent sensation: wouldn’t participants be able to discern that that one was actual and report accordingly that they were improved? I might liken it to subjects in a medical trial ( in the old days) tasting the drug’s contents to see whether it was bitter or not and then having that realisation affect their subjective reporting about whether they were affected.
** I have had personal experience with both
Are the TENS location-specific? If so, could you put them in not-quite the right spot for the placebo?
I don’t know about these particular studies but in general you can place the units where you choose ( with a few obvious exceptions), usually where you have pain.
For studies that simulate acupuncture, they could probably place them at ( non-existent) “acupuncture points” as dictated by lore vs at “non” points.
As I mention above, subjects can tell if the current is on or not so that might affect how they respond.
The greater the size of the error bars, the greater the number of hypotheses that can be accommodated. When you have a favored hypothesis there is a motivation to design a study with error bars that encompasses the hypothesis.
Can you translate that into English?
That was a humble attempt to translate a statistical argument into English. I think I’ll just stop there.
I’ll try: if your measurement methods are so imprecise that your error bars are larger than the measurement, then you can say pretty much anything and still have it fit inside the error bars.
Imagine something where your output is a percentage from 0 to 100, and while mean is let’s say 45, if the error bars go from 0 to 100 then you haven’t really measured anything at all because all possible outcomes fit inside your error bars.
Or, to try a different way, let’s imagine that you’re measuring the length of acupuncture needles and your hypothesis is that they are all the same length. If you use a ruler that measures to the millimeter, you would see that the needles are not all the same length. But if you used a yardstick instead, then by the inch they would all be the same length.
Does that help?
RS, is that what you were getting at?
Well, it does a pretty good job of the basics. Of course it’s more complicated than that and I was running out the door so I wasn’t motivated to dive in, especially since I didn’t know if sadmar was just joking or asking a serious question.
However, we must keep in mind that we’re talking about real data, honestly measured. As Orac repeatedly points out, acupuncture studies that show a positive effect show unwarranted confidence in the data (often subjective measurement) if not outright misrepresentation or fraud.
Back to real data. Take any set of measurements and there is an infinity of curves that will include those points. With error bars, you get a larger infinity of curves 😉
In science you often want the simplest curve (say, a low order polynomial or periodic function), but not too simple. It is always amusing to read a paper where a straight line is drawn through a sea of data points with error bars as wide as a barn door.
This is where theory helps to justify the curve fitting. For example, a huge number of experiments have been done to measure c, the velocity of EM radiation. All the data has error bars. Almost all the error bars include zero. Although there are good theoretical reasons to believe it is zero, that is not certain. That’s true of all science, and Orac is always careful regarding this point.
Curve fitting in general can be fun! Back in university I learned many ways to do analytical and numerical curve fitting. Even the professors got into the fun, showing bizarre high order functions some have used to fit some very doubtful data and error bars. It isn’t easy to get away with that nonsense in the harder sciences, and it is seemingly much the same in the medical sciences, but it still seems to happen to some degree in the social sciences.
Which for some reason reminds me of a true story from decades ago. A sociology grad student walks up to a friend of mine (who was doing duty as a computer “consultant” to answer questions from undergrads) and drops a box full of punched cards on his little desk (I told you it was long ago). He asks, what’s the problem? The student answers, it’s data and I want to process it.
Argh! Mistake. I need to replace one of the paragraphs. You’ll know which it is.
This is where theory helps to justify the curve fitting. For example, a huge number of experiments have been done to measure the mass of a photon. All the data has error bars. Almost all the error bars include zero. Although there are good theoretical reasons to believe it is zero, that is not certain. That’s true of all science, and Orac is always careful regarding this point.
Whether or not you can design a “high quality acupuncture trial” may depend on how acupuncture figures into the research question, before we even get to questions of methodology. It seems to me that acupuncture advocates – determined to seek evidence of some “unique” physiological benefit from the practice – tend to frame their questions in ways that adoption of the recommendations from the BMJ panel might yield an illusion of quality, but nothing more.