Clinical trials Medicine

Another reason why I’m leery of meta-analyses

ResearchBlogging.orgThe single most necessary task for a physician practicing science- and evidence-based medicine is the evaluation of the biomedical literature to extract from it just what science and the evidence support as the best medical therapy for a given situation. It is rare for the literature to be so clear on a topic that different physicians won’t come to at least somewhat different conclusions. Far more common is the situation where the studies are conflicting, although usually with a preponderance of studies tending to support one or two interventions more than others, or where there are few or only low quality studies (usually for diseases and conditions that are not very common and thus not easy to study in large randomized clinical trials). The entire paradigm of evidence-based medicine is one manner of ranking the quality of evidence supporting a therapy, but even that is not without its problems, not the least of which is the relative low weight it gives to basic scientific principles and prior probability. In essence, the paradigm of EBM ranks equivocal clinical trial evidence over well established basic science, even when that basic science demonstrates a proposed intervention to be utterly ridiculous on the basis of well-established physics and chemistry, as in the case of homeopathy (1, 2, 3). That is why I have become more interested in the concept of “science-based” medicine, in which medicine is informed not just by clinical trial data, but by science as well.

One increasingly common method of trying to make sense of the morass of data addressing various clinical questions is the medical literature phenomenon known as meta-analysis. A meta-analysis is different from a clinical trial in that it is a statistical reanalysis of data from existing trials. Generally the highest quality trials are chosen in accord with the principles of EBM, and the data from these trials is all lumped together and analyzed in order to provide in essence a more rigorous treatment of existing data than a systematic review of the literature. To be subject to meta-analysis, a medical question must have multiple studies addressing it, and the results must be quantitative. Better still is if the studies analyzed are of high quality (randomized, placebo-controlled, double-blind). Of course, when this is the case, if the studies trend in the same direction, a meta-analysis is not necessary. Most of the time, meta-analyses are done when there is conflicting data in the hope that amalgamating the various studies will result in a statistically significant trend one way or another that will allow one to draw inferences over whether an intervention works.

Unfortunately, meta-analyses are a favorite of mavens of so-called “complementary and alternative medicine” (CAM), where they are frequently used to take several weak studies and try to make them strong by lumping them together, based on the apparent belief that lumping a bunch of weak studies together will somehow produce a strong result. It seldom works that way. Worse, one weak study with a strong result can have inordinate influence on the results, which makes study selection important–another huge potential source of bias if selection criteria are not spelled out prospectively or are insufficiently rigorous. More often, we get dubious meta-analyses purporting to show that acupuncture helps fertility. Alternatively, ideologues sometimes use meta-analyses to push ideologically-motivated “science,” such as this meta-analysis claiming that oral contraceptive pills increase the risk of breast cancer.

For these reasons and others, I’ve always been leery of meta-analyses, preferring instead to rely on my own reading of the literature. However, that isn’t always possible, particularly for questions that are not part of my expertise. Unfortunately, I’ve just come across a study that provides quantitative data about how prone to bias met-analyses are. The study is a few months old, but DB’s Medical Rants pointed it out.

The investigators from McGill University took a clever approach. From the abstract:

We searched the literature for all randomized clinical trials (RCT) and review articles on the efficacy of intravenous magnesium in the early post-myocardial infarction period. We organized the articles chronologically and grouped them in packages. The first package included the first RCT, and a summary of the review articles published prior to first RCT. The second package contained the second and third RCT, a meta-analysis based on the data, and a summary of all review articles published prior to the third RCT. Similar packages were created for the 5th RCT, 10th RCT, 20th RCT and 23rd RCT (all articles). We presented the packages one at a time to eight different reviewers and asked them to answer three clinical questions after each package based solely on the information provided. The clinical questions included whether 1) they believed magnesium is now proven beneficial, 2) they believed magnesium will eventually be proven to be beneficial, and 3) they would recommend its use at this time.

What makes this study interesting is that the reviewers to which these packages were sent all had published meta-analyses themselves and were thus experienced in interpreting meta-analyses. In addition, each package was constructed based only on what was known at the time of the most recent randomized clinical trial in each package. Moreover, the analyses of the articles in each package were performed by strict criteria:

Data were abstracted from the articles by a trained research assistant using standardized data abstraction forms, and verified by a second trained person. Differences were resolved by consensus. We assessed the quality of original manuscripts using the Jadad scale [16,17] and included the information in the reports to the reviewers (there was no a priori exclusion criteria or subgroup analysis). We had initially also used the Chalmers scale [17,18] but abandoned it when the reliability between data abstractors was very poor. After data abstraction, we conducted separate meta-analyses (comparison treatment was always placebo) based on the first RCT, the first 3 RCTs, 5 RCTs, 10 RCTs, 20 RCTs and 23 RCTs. At each time point, the reviewer was given a meta-analysis for mortality, and a separate meta-analysis for arrhythmias. Each meta-analysis included random and fixed effects analyses, a forest plot [8], cumulative forest plot [8], Galbraith plot [19], L’Abbe plot [20] and publication bias statistics and/or plots [8].

All of these aspects are standard statistical measures of the quality of studies included in the meta-analysis. If meta-analyses are objective analyses of the studies included in them, then we would expect that skilled investigators who do meta-analyses routinely would look at the same data, the same papers, and the same analyses and come to fairly similar conclusions. That’s not what happened, though. In fact, there was considerable heterogeneity in the interpretations of the same data, with some subjects concluding that magnesium was effective and some that it wasn’t, with a couple calling the evidence equivocal. Moreover, as the number of studies increased, so did the heterogeneity in interpretation and conclusion:

The discrepancies increased after 20 RCTs, when heterogeneity increased and the OR from the fixed effects and random effects models diverged; 1 reviewer strongly agreed the effect was beneficial, 4 reviewers agreed it was beneficial, and 3 reviewers disagreed it was beneficial. Similar discrepancies were observed when the reviewers were asked if they believed the treatment would eventually be proven beneficial. Finally, when asked if they would recommend the treatment, 4 reviewers fairly consistently said yes (excluding the meta-analysis based on 1 RCT), and 4 reviewers fairly consistently said no.

In other words, given the same studies and the same extracted and abstracted data, different investigators came to very different conclusions. This is in contrast to the dogma that tells us that meta-analyses represent the most objective method of reviewing large bodies of studies. This study casts considerable doubt on this contention, as the authors point out:

Although systematic reviews with meta-analyses are considered more objective than other types of reviews, our results suggest that the interpretation of the data remains a highly subjective process even among reviewers with extensive experience conducting meta-analyses. The implications are important. The evidence-based movement has proposed that a systematic review with a meta-analysis of RCTs on a topic provides the strongest evidence of support and that widespread adoption of its results should lead to improved patient care. However, our results suggest that the interpretation of a meta-analysis (and therefore recommendations) are subjective and therefore depend on who conducts or interprets the meta-analysis.

The significance of this study is that it doesn’t look at differences in the selection of studies for the meta-analysis or the interpretation of or extraction of data from the studies included in the meta-analysis. Every reviewer was given the same package, the same data, and the same statistical analyses of the included studies, thus eliminating this issue. Even given that, reviewers still interpreted the results of the meta-analyses very differently. Not surprisingly, the more studies with more heterogeneity between them, the more divergent the interpretations of the reviewer became. The results of this clever exercise provide just one more bit of evidence that leads me to believe that meta-analyses are nothing more than systematic reviews of the literature with attitude. That’s not to say that meta-analyses of the literature aren’t often useful, just as systematic reviews of the literature, are. They are in the same way that systematic reviews are: They boil down a large number of studies and suggest an interpretation. Let’s just not pretend that meta-analyses are so much more objective than a systematic review as to be considered anything more.


Shrier, I., Boivin, J., Platt, R.W., Steele, R.J., Brophy, J.M., Carnevale, F., Eisenberg, M.J., Furlan, A., Kakuma, R., Macdonald, M., Pilote, L., Rossignol, M. (2008). The interpretation of systematic reviews with meta-analyses: an objective or subjective process?. BMC Medical Informatics and Decision Making, 8(1), 19. DOI: 10.1186/1472-6947-8-19

By Orac

Orac is the nom de blog of a humble surgeon/scientist who has an ego just big enough to delude himself that someone, somewhere might actually give a rodent's posterior about his copious verbal meanderings, but just barely small enough to admit to himself that few probably will. That surgeon is otherwise known as David Gorski.

That this particular surgeon has chosen his nom de blog based on a rather cranky and arrogant computer shaped like a clear box of blinking lights that he originally encountered when he became a fan of a 35 year old British SF television show whose special effects were renowned for their BBC/Doctor Who-style low budget look, but whose stories nonetheless resulted in some of the best, most innovative science fiction ever televised, should tell you nearly all that you need to know about Orac. (That, and the length of the preceding sentence.)

DISCLAIMER:: The various written meanderings here are the opinions of Orac and Orac alone, written on his own time. They should never be construed as representing the opinions of any other person or entity, especially Orac's cancer center, department of surgery, medical school, or university. Also note that Orac is nonpartisan; he is more than willing to criticize the statements of anyone, regardless of of political leanings, if that anyone advocates pseudoscience or quackery. Finally, medical commentary is not to be construed in any way as medical advice.

To contact Orac: [email protected]

Comments are closed.


Subscribe now to keep reading and get access to the full archive.

Continue reading