I not infrequently use the term “methodolatry” to refer to the seeming belief on the part of certain dogmatic evidence-based medicine (EBM) advocates who are so in love with the “pyramid of evidence” image frequently used in EBM to rank the strength of clinical evidence that double-blinded placebo-controlled clinical trials are the be-all and end-all of clinical research. Obviously, I didn’t coin the term, but rather learned it from a certain epidemiologist by the ‘nym of revere who used to be a fellow ScienceBlogger back in the day and defined “methodolatry” as “profane worship of the randomized clinical trial as the only valid method of investigation.” Ironically, the first time I encountered the term, way, way back in the day (11 years ago now!) was in the context of a risibly bad article in The Atlantic about Tom Jefferson and his work with the Cochrane Collaboration on the effectiveness of the influenza vaccine during the H1N1 pandemic. (Remember that pandemic from 2009-2010? We thought that one was pretty bad, but it seems quaint next to this year’s COVID-19 pandemic and its massive—and growing—death toll.) The reason revere (and I) accused Tom Jefferson of methodolatry back then was in part because of his annoying tendency to equate lack of statistical significance as the affirmation of the null hypothesis, which can be a serious interpretive error. He also basically failed to put randomized controlled trials (RCTs) into proper context with the totality of evidence. In any event, the reason I mention methodolatry and Jefferson again is due to an article and study that I came across co-authored by—you guessed it!—Tom Jefferson on masks and slowing the transmission of COVID-19. You probably also guessed that he referenced a recently published negative randomized controlled trial (RCT) of mask wearing to prevent COVID-19 as the be-all and end-all of evidence because it didn’t achieve statistical significance.
In the age of COVID-19, it seems, everything old is new again, including methodolatry.
Before I discuss the article, Landmark Danish study shows face masks have no significant effect, by Carl Heneghan and Tom Jefferson, and the Danish study on which it is based, let me just preempt one criticism that cranks are likely to throw back at me. I fully expect that they’ll accuse me of methodolatry in extreme skepticism early on that hydroxychloroquine is an effective treatment for COVID-19. My doubt was first based on the lack of, yes, RCTs and the reliance of hydroxychloroquine advocates on anecdotal and poor quality observational evidence. In fact, my doubt was justified and ultimately validated when RCT after RCT failed to find a therapeutic or preventative effect of hydroxychloroquine on COVID-19. The point is this: RCTs are indeed considered the “gold standard” for determining whether a specific treatment intervention works, but when it comes to complex public health interventions, such trials might well be impossible to do in such a way as to give a good answer. Similarly, in some cases, RCTs are unethical. For example, the classic “vaxxed/unvaxxed” RCT to determine if vaccines cause autism or whether unvaccinated children are “healthier” that antivaxxers frequently advocate would be utterly unethical because it would be unethical to randomize children to the unvaccinated (or saline placebo) control group, because that would leave the control group unprotected against potentially deadly vaccine-preventable diseases. To study such questions, we have to rely on epidemiology.
But back to Heneghan and Jefferson:
Do face masks work? Earlier this year, the UK government decided that masks could play a significant role in stopping Covid-19 and made masks mandatory in a number of public places. But are these policies backed by the scientific evidence?
Yesterday marked the publication of a long-delayed trial in Denmark which hopes to answer that very question. The ‘Danmask-19 trial’ was conducted in the spring with over 3,000 participants, when the public were not being told to wear masks but other public health measures were in place. Unlike other studies looking at masks, the Danmask study was a randomised controlled trial – making it the highest quality scientific evidence.
Around half of those in the trial received 50 disposable surgical face masks, which they were told to change after eight hours of use. After one month, the trial participants were tested using both PCR, antibody and lateral flow tests and compared with the trial participants who did not wear a mask.
In the end, there was no statistically significant difference between those who wore masks and those who did not when it came to being infected by Covid-19. 1.8 per cent of those wearing masks caught Covid, compared to 2.1 per cent of the control group. As a result, it seems that any effect masks have on preventing the spread of the disease in the community is small.
They even conclude:
And now that we have properly rigorous scientific research we can rely on, the evidence shows that wearing masks in the community does not significantly reduce the rates of infection.
This is not what the trial showed, or even what the authors of the trial were trying to show. Before coming back to Heneghan and Jefferson’s polemics, let’s take a look at the actual Danish study, published on Wednesday in the Annals of Internal Medicine, shall we? The publication, from a number of researchers at various Danish hospitals, reports the results of the DANMASK-19 trial (Danish Study to Assess Face Masks for the Protection Against COVID-19 Infection, ClinicalTrials.gov: NCT04337541).
The hypothesis to be tested was that wearing surgical masks outside of the home reduces the wearers’ risk for contrating COVID-19 “in a setting where masks were uncommon and not among recommended public health measures.” That last part is important, as this study was carried out early in the pandemic (April and May), before mask mandates became widespread. (More on that later.) In any event, let’s look at the trial’s endpoints:
The primary outcome was SARS-CoV-2 infection, defined as a positive result on an oropharyngeal/nasal swab test for SARS-CoV-2, development of a positive SARS-CoV-2 antibody test result (IgM or IgG) during the study period, or a hospital-based diagnosis of SARS-CoV-2 infection or COVID-19. Secondary end points included PCR evidence of infection with other respiratory viruses (Supplement Table 2).
These are not unreasonable endpoints for such a study, nor was the definition of COVID-19 infection unreasonable given what was known at the time. Before I get more into the weeds, let’s look at what the study found:
A total of 3030 participants were randomly assigned to the recommendation to wear masks, and 2994 were assigned to control; 4862 completed the study. Infection with SARS-CoV-2 occurred in 42 participants recommended masks (1.8%) and 53 control participants (2.1%). The between-group difference was −0.3 percentage point (95% CI, −1.2 to 0.4 percentage point; P = 0.38) (odds ratio, 0.82 [CI, 0.54 to 1.23]; P = 0.33). Multiple imputation accounting for loss to follow-up yielded similar results. Although the difference observed was not statistically significant, the 95% CIs are compatible with a 46% reduction to a 23% increase in infection.
Oh, no! An 1.8% infection rate in those wearing masks versus 2.1% in those not wearing masks, and it wasn’t even statistically significant! This must mean that masks don’t work! Damn Jefferson and his methodolatry, which might in this case be correct. Not quite, and not so fast, there, pardner:
Underpowered, Dr. Topol says? Let’s go to the tape and look at the power calculations:
The sample size was determined to provide adequate power for assessment of the combined composite primary outcome in the intention-to-treat analysis. Authorities estimated an incidence of SARS-CoV-2 infection of at least 2% during the study period. Assuming that wearing a face mask halves risk for infection, we estimated that a sample of 4636 participants would provide the trial with 80% power at a significance level of 5% (2-sided α level). Anticipating 20% loss to follow-up in this community-based study, we aimed to assign at least 6000 participants.
So the trial was designed and powered to look for a 50% decrease in risk for infection for the wearers, with an 80% power to detect such a decrease in risk if it were observed. Now, you might wonder why the authors didn’t look at mortality from COVID-19. The reason, I surmise, is that detecting a decline in deaths due to COVID-19 would have taken a much larger sample size, given that, even in the heat of the early part of the pandemic, infection fatality rates were in the low single digit percentages. In any event, a 50% decrease in risk to the wearer (the study didn’t even look at whether masks decreased the risk of transmission to others) would have been “quite a lot“! Of course, looking for less than 50% decrease in risk would have resulted in the need for a lot more participants, depending on what level of decline (e.g., 25% or 10%) in risk associated with the mask wearing.
Also, the study participants were only followed for one month after enrollment, with antibody testing performed at the beginning and end of the one month:
Participants in the mask group were instructed to wear a mask when outside the home during the next month. They received 50 three-layer, disposable, surgical face masks with ear loops (TYPE II EN 14683 [Abena]; filtration rate, 98%; made in China). Participants in both groups received materials and instructions for antibody testing on receipt and at 1 month. They also received materials and instructions for collecting an oropharyngeal/nasal swab sample for polymerase chain reaction (PCR) testing at 1 month and whenever symptoms compatible with COVID-19 occurred during follow-up. If symptomatic, participants were strongly encouraged to seek medical care. They registered symptoms and results of the antibody test in the online REDCap system. Participants returned the test material by prepaid express courier.
Written instructions and instructional videos guided antibody testing, oropharyngeal/nasal swabbing, and proper use of masks (Part 8 of the Supplement), and a help line was available to participants. In accordance with WHO recommendations for health care settings at that time, participants were instructed to change the mask if outside the home for more than 8 hours. At baseline and in weekly follow-up e-mails, participants in both groups were encouraged to follow current COVID-19 recommendations from the Danish authorities.
Why is this important? This is why:
Basically, one month is a very short period of time if the development of antibodies to SARS-CoV-2 was the most common method by which COVID-19 was diagnosed in the study population, and it was. 84% (80 of 95) diagnoses were made through antibody testing.
Trish Greenhalgh notes:
She also points out:
As an accompanying editorial by Thomas Frieden and Shama Cash-Goldwasser noted:
Perhaps the most important limitation of this study was the use of antibody tests to diagnose COVID-19. Of COVID-19 diagnoses in this study, 84% (80 of 95) were made by antibody testing. The accuracy of anti–SARS-CoV-2 antibody tests varies widely (7). Although an internal validation study of the assay used in DANMASK-19 estimated a specificity of 99.5%, the manufacturer reported (www.accessdata.fda.gov/cdrh_docs/presentations/maf/maf3285-a001.pdf) a specificity of 97.5% (CI, 91.3% to 99.3). If test specificity was 98.5% and the 1.5% (1 − specificity) chance of a false-positive result was due to random laboratory variation, Bayes’ law implies that all of the antibody-positive results in both intervention and control groups could have been false positives. False positivity due to cross-reactive antibodies would have resulted in baseline exclusion, so the actual rate of false positives in participants after 1 month may be low. Nevertheless, given the very low (at most 2%) prevalence of infection, many of the follow-up positives may have been falsely positive and would be randomly distributed between intervention and control groups. This would bias the study’s findings toward the null.
To put it more simply, in a situation in which the disease being tested for is present at low prevalence (in this case, less than 2% of the population), even a test that is pretty specific and sensitive can produce a lot of false positives. I won’t go into the gory details (if you want discussions of sensitivity, specificity, and positive and negative predictive values, go here, here, and here), but what we are interested in in this case is the positive predictive value (PPV), which is the likelihood, given a positive test, that there really is disease present. The PPV depends on the prevalence of the disease in the population being tested. To put it simply (but hopefully not simplistically), the lower the prevalence of a disease, the lower the positive predictive value of a test for that disease, even a good one, is likely to be, because even low rates of false positivity will be high (half, equal, or even greater) compared to the actual prevalance of the disease in the population. The authors of the paper did not even try to correct their estimates for the sensitivity and specificity of the tests use.
Another issue is that this study was not even blinded, much less double-blinded. Of course, it would be difficult to design a double-blinded trial of mask wearing, although one might imagine the use of “placebo” mask that filter very little for the control group. Moreover, it’s true that not every intervention can be subjected to a blinded comparison to a “placebo” intervention, but RCTs can still be done and still be useful. Even so, this lack of blinding concerns me. Given that the study relied on email followup surveys to “collect information on antibody test results, adherence to recommendations on time spent outside the home among others, development of symptoms, COVID-19 diagnosis based on PCR testing done in public hospitals, and known COVID-19 exposures,” it’s easy to see that recall bias could also be a major factor in whatever results were obtained. Similarly, there could be all sorts of other confounders associated with wearing a mask, none of which were really examined, although adherence to mask wearing was. Guess what? The results weren’t great:
Based on the lowest adherence reported in the mask group during follow-up, 46% of participants wore the mask as recommended, 47% predominantly as recommended, and 7% not as recommended.
So less than half of the masked group actually wore their masks as recommended, most of the other half either didn’t wear them correctly or only “predominantly as recommended.” When you have a study that is already underpowered, it doesn’t help if less than half of your experimental group actually does the preventive intervention correctly all the time, that just makes it far more likely that you will fail to find a statistically significant result.
The bottom line is that this study does not show that masks don’t work to slow the spread of SARS-CoV-2! Even the authors hasten to point this out in the discussion:
In this community-based, randomized controlled trial conducted in a setting where mask wearing was uncommon and was not among other recommended public health measures related to COVID-19, a recommendation to wear a surgical mask when outside the home among others did not reduce, at conventional levels of statistical significance, incident SARS-CoV-2 infection compared with no mask recommendation. We designed the study to detect a reduction in infection rate from 2% to 1%. Although no statistically significant difference in SARS-CoV-2 incidence was observed, the 95% CIs are compatible with a possible 46% reduction to 23% increase in infection among mask wearers. These findings do offer evidence about the degree of protection mask wearers can anticipate in a setting where others are not wearing masks and where other public health measures, including social distancing, are in effect. The findings, however, should not be used to conclude that a recommendation for everyone to wear masks in the community would not be effective in reducing SARS-CoV-2 infections, because the trial did not test the role of masks in source control of SARS-CoV-2 infection. During the study period, authorities did not recommend face mask use outside hospital settings and mask use was rare in community settings (22). This means that study participants’ exposure was overwhelmingly to persons not wearing masks.
The most important limitation is that the findings are inconclusive, with CIs compatible with a 46% decrease to a 23% increase in infection. Other limitations include the following. Participants may have been more cautious and focused on hygiene than the general population; however, the observed infection rate was similar to findings of other studies in Denmark (26, 30). Loss to follow-up was 19%, but results of multiple imputation accounting for missing data were similar to the main results. In addition, we relied on patient-reported findings on home antibody tests, and blinding to the intervention was not possible. Finally, a randomized controlled trial provides high-level evidence for treatment effects but can be prone to reduced external validity.
Our results suggest that the recommendation to wear a surgical mask when outside the home among others did not reduce, at conventional levels of statistical significance, the incidence of SARS-CoV-2 infection in mask wearers in a setting where social distancing and other public health measures were in effect, mask recommendations were not among those measures, and community use of masks was uncommon. Yet, the findings were inconclusive and cannot definitively exclude a 46% reduction to a 23% increase in infection of mask wearers in such a setting. It is important to emphasize that this trial did not address the effects of masks as source control or as protection in settings where social distancing and other public health measures are not in effect.
Or, as our old friend points out:
Precisely, which brings me back to Heneghan and Jefferson. Given that they are both EBM aficionados, you’d think that they wouldn’t have referred to this study as the “highest level of evidence,” given its many shortcomings. Let’s just put it this way. There are times when a well-designed, adequately powered epidemiological study is more valuable than an underpowered RCT with a lot of holes in its design (if you’ll excuse my using the word “holes” in the context of debates about masks), particularly when it comes to an issue as complex as preventing the spread of COVID-19, which is a multifactorial process in which multiple interventions will interact with each other and also depend on the prevalence and rate of spread of the virus, as Christine Laine, Steven Goodman, Eliseo Guallar note in another accompanying editorial:
Two aspects are important to note. First, the study examined the effect of recommending mask use, not the effect of actually wearing them. Adherence to public health recommendations is always imperfect, as it was in this study, and can differ dramatically in communities with different attitudes toward such recommendations. Second, the effect of a mask recommendation also depends on many other factors, including the prevalence of the virus, social distancing behaviors, and the frequency and characteristics of gatherings. Mask wearing is just one of several interacting strategies to reduce viral transmission, with each reinforcing the others.
They somewhat drolly note that mask wearing “by a minority of persons—even with high-quality surgical masks like the ones provided to trial participants—does not make the wearers invulnerable to infection.” Masks can, however, decrease one’s risk of infection and act as source control to decrease the risk of infection of others by persons currently infected with COVID-19.
Community mask use can substantially reduce risk for SARS-CoV-2 transmission, especially when enough people use them and when mask use is combined with other effective public health and social measures. Multiple observational studies have documented an association between mask mandates and reduced COVID-19 incidence (9). Although randomized controlled trials are often presumed to provide the highest-quality data, observational studies may in some settings be more accurate and can overcome some limitations of other data sources (10).
The CDC recently summarized the evidence that community use of cloth masks is an effective means to control the spread of COVID-19 recently, and the evidence is compelling, which is why it is disheartening to see Tom Jefferson engaging in methodalatry again, and, embarrassingly for him, doing it rather badly, given how he’s misrepresenting what this study actually finds, as well as its actual usefulness in determining if mask mandates are good public health policy.
Finally, prepare yourself for the antimask cranks using this study to “prove” that masks don’t work and are fascism/socialism/communism/deep state conspiracy (pick one or more). It’s already happening.