Believe it or not, I frequently peruse Retraction Watch, the blog that does basically what its title says: It watches for retracted articles in the peer-reviewed scientific literature and reports on them. Rare is it that a retracted paper gets by the watchful eyes of the bloggers there. So it was that the other day I noticed an post entitled Journal temporarily removes paper linking HPV vaccine to behavioral issues. I noticed it mainly because it involves a paper by two antivaccine “researchers” whom we’ve met several times before, Christopher A. Shaw and Lucija Tomljenovic in the Department of Ophthalmology at the University of British Columbia. Both have a long history of publishing antivaccine “research,” mainly falsely blaming the aluminum adjuvants in vaccines for autism and, well, just about any health problem children have and blaming Gardasil for premature ovarian failure and all manner of woes up to and including death. Shaw even prominently featured in the rabidly antivaccine movie The Greater Good.
Normally, Shaw and Tomljenovic tend to publish their antivaccine spew in bottom-feeding journals, but what also caught my attention was that this time they seemed to have managed to score a paper in a journal with a good reputation and a reasonable impact factor: Vaccine. Here’s the story from Retraction Watch:
The editor in chief of Vaccine has removed a paper suggesting a human papillomavirus (HPV) vaccine can trigger behavioral changes in mice.
The note doesn’t provide any reason for the withdrawal, although authors were told the editor asked for further review.
Two co-authors on the paper — about Gardasil, a vaccine against HPV — have previously suggested that aluminum in vaccines is linked to autism, in research a World Health Organization advisory body concluded was “seriously flawed.”
Approximately 80 million doses of Gardasil were administered in the U.S. between 2006 and 2015. Both the the WHO and the Centers for Disease Control and Prevention have ruled the vaccine to be safe — the CDC, for instance, calls it “safe, effective, and recommended.”
The journal published an uncorrected proof of “Behavioral abnormalities in young female mice following administration of aluminum adjuvants and the human papillomavirus (HPV) vaccine Gardasil” online on January 9th, 2016. In its place now is a note that says:
The publisher regrets that this article has been temporarily removed. A replacement will appear as soon as possible in which the reason for the removal of the article will be specified, or the article will be reinstated.
Since the article had not yet been officially published in the journal, it’s not indexed by Thomson Scientific’s Web of Knowledge.
Curiouser and curiouser. Fortunately for me, an online bud happened to have downloaded the paper and supplied me with the PDF. I say this because, as noted above, the paper is no longer available on the Vaccine website. Its corresponding author, it turns out, is Yehuda Shoenfeld, whom we have also met before and have encountered speaking at antivaccine conferences. In antivaccine circles, Shoenfeld is best known for coining the term “ASIA” (“Autoimmune/Inflammatory Syndrome Induced by Adjuvants”). Did I say “coin the term”? Really, I should have said “pulled the term out of his nether regions, leaving a coating of what one’s nether regions generally expels all over it.” Because ASIA is a made-up syndrome with no compelling evidence that it’s real. Basically, as I’ve described before, its criteria are so vague as to be able to be applied to almost anything.
Oh, and he edited a journal very sympathetic to antivaccine “science.”
So right off the bat, I knew something fishy was going on here, and, thanks to the person who supplied me with a PDF of the article, I knew I had to take a look to see if I could figure out what happened. What I can say, having read the article, is that it is so shoddily done that it represents a massive failure of peer review that a journal as good as Vaccine ever accepted it for publication. I don’t know who the editor to whom this manuscript was assigned was (clearly it wasn’t Vaccine‘s editor-in-chief Gregory Poland, as it was he who requested temporary removal of the paper), but whoever it was should hang his or her head in shame and resign in disgrace from the editorial board of the journal.
I really have to wonder if this travesty is the result of the problem of “fake peer review. This is a problem that has come to light recently that takes advantage of the practice of some journals to allow authors to suggest peer reviewers for their manuscript. I perused Vaccine‘s Instructions for Authors and found that the journal encourages authors to suggest up to five potential reviewers. I’ve described the problem before, and Vaccine editors ought to stop this practice immediately. One can’t help but wonder who reviewed this manuscript. One almost has to wonder if it was Andrew Wakefield, given how bad the manuscript is.
I’ll show you what I mean. Skeptical Raptor has already discussed this study. He’s also done an excellent job of laying out the safety studies involving millions of doses of Gardasil that show that Gardasil is incredibly safe. His post is almost enough. Almost. However, in this discussion, I have the advantage of having the actual PDF of the paper in front of me. As good as the Raptor is, he lacked that.
Since the paper isn’t currently available, and I’m not about to violate copyright by making the whole PDF available to anyone who wants it, let’s look at the abstract, and then I’ll discuss the actual paper’s findings, such as they are:
Vaccine adjuvants and vaccines may induce autoimmune and inflammatory manifestations in susceptible individuals. To date most human vaccine trials utilize aluminum (Al) adjuvants as placebos despite much evidence showing that Al in vaccine-relevant exposures can be toxic to humans and animals. We sought to evaluate the effects of Al adjuvant and the HPV vaccine Gardasil versus the true placeboon behavioral and inflammatory parameters in young female mice. Six week old C57BL/6 female micewere injected with either, Gardasil, Gardasil + pertussis toxin (Pt), Al hydroxide, or, vehicle control inamounts equivalent to human exposure. At six months of age, Gardasil and Al-injected mice spent sig-nificantly more time floating in the forced swimming test (FST) in comparison to vehicle-injected mice(Al, p = 0.009; Gardasil, p = 0.025; Gardasil + Pt, p = 0.005). The increase in floating time was already highlysignificant at three months of age for the Gardasil and Gardasil + Pt group (p ≤ 0.0001). No significant differences were observed in the number of stairs climbed in the staircase test nor in rotarod performance,both of which measure locomotor activity. Since rotarod also measures muscular strength, collectivelythese results indicate that differences observed in the FST were not due to locomotor dysfunction, butlikely due to depression. Additionally, at three months of age, compared to control mice, Al-injectedmice showed a significantly decreased preference for the new arm in the Y maze test (p = 0.03), indi-cating short-term memory impairment. Moreover, anti-HPV antibodies from the sera of Gardasil andGardasil + Pt-injected mice showed cross-reactivity with the mouse brain protein extract. Immunohisto-chemistry analysis revealed microglial activation in the CA1 area of the hippocampus of Gardasil-injectedmice compared to the control. It appears that Gardasil via its Al adjuvant and HPV antigens has the abilityto trigger neuroinflammation and autoimmune reactions, further leading to behavioral changes.
I’m quite familiar with C57BL/6 mice. Indeed, I cut my research teeth, so to speak, studying mouse tumor models and the induction of angiogenesis (new blood vessel growth) back in the late 1990s. Now, right from the beginning, I have to wonder who the peer reviewers were to have accepted statements like those in the very first few sentences; i.e., claims that vaccine adjuvants and vaccines can cause autoimmune disease (basically the only people claiming adjuvants can cause autoimmunity are Shoenfeld and antivaccine “scientists” associated with him), that aluminum adjuvants are toxic to humans at vaccine-relevant doses (no, they aren’t).
So what did the authors do? They basically tested whether Gardasil or its aluminum adjuvant cause behavioral issues in mice. Upon perusing the abstract, my first question was this: Why this hypothesis? Why do the authors think that HPV vaccine might cause behavioral changes? We don’t really learn why. Their reasoning seems to be “Because vaccines are bad” or “because aluminum is bad.” Certainly they don’t present any publications with convincing evidence supporting a potential link between vaccines or aluminum adjuvants and behavioral problems. Now, if all this study were doing were testing vaccines or adjuvants in cultured cells, this wouldn’t be such a big deal. However, if you’re going to subject animals to pain and distress and then kill them at the end to look at their brains, you need some compelling evidence to show (1) why your hypothesis is so compelling and (2) why only an animal model can answer the question you want to answer. Shoenfeld’s group utterly fails at this. I can only speculate that animal research regulations must be more lax in Israel than they are in the US.
So let’s look at the tests these poor mice were subjected to:
- The forced swimming test (FST): This is a test purported to model depression. How one models depression in a mouse, I don’t know, but there is literature to support this use. In any case, increased floating time (not swimming) is indicative of depressive behavior and can also indicate locomotor dysfunction. In these experiments: “mice were placed in individual glass beakers with water 15 cm deep at 25° C. On the first day, mice were placed in the cylinder for a pretest session of 10 min,and later were removed from the cylinder, and then returned totheir home cages. Twenty-four hours later (day 2), the mice were subjected to a test session for 6 min. The behavioral measure scored was the duration (in seconds) of immobility or floating, defined as the absence of escape-oriented behaviors, such as swimming,jumping, rearing, sniffing or diving, recorded during the 6 min test.”
- Staircase test: This test is supposed to measure anxiety, as more anxious mice won’t explore as much. In these experiments: “The staircase mazeconsisted of a polyvinyl chloride enclosure with five identicalsteps, 2.5 × 10 × 7.5 cm. The inner height of the walls was con-stant (12.5 cm) along the whole length of the staircase. The box wasplaced in a room with constant lighting and isolated from externalnoise. Each mouse was tested individually. The animal was placedon the floor of the staircase with its back to the staircase. The num-ber of stairs climbed and the number of rears were recorded duringa 3-min period. Climbing was defined as each stair on which themouse placed all four paws; rearing was defined as each instancethe mouse rose on hind legs (to sniff the air), either on the stairor against the wall. The number of stairs descended was not takeninto account. Before each test, the animal was removed and the boxcleaned with a diluted alcohol solution to eliminate smells.”
- Novel object recognition test: This is a visual recognition memory test that involves measuring the time spent exploring each object.
- Y maze test: This test is used to assess spatial short term memory and involves blocking one arm of the Y-maze in the first trial and then assessing the mouse’s memory on subsequent tests. Rationale: “A normal cognitively non-impaired mouse is expected torecognize the old arm as old and spend more time in the new arm.”
- Rotarod test: The rotarod tests general motor function and motor learning and measures the time that a mouse can remain walking on a rotating axle without either falling or clenching onto the axle.
So, basically, the investigators injected mice with either saline (negative control), adjuvant only, Gardasil, or Gardasil and pertussis toxin (presumably the positive control), and there were 19(!) animals per experimental group, which is a huge number for most animal studies. The various injections were scaled down from an estimated 40 kg teenaged girl to a 20 g mouse (a 2000-fold difference). The mice received three injections, spaced one day apart. The behavior of the mice, as measured by the parameters above, was then examined at three and six months.
One thing I looked very carefully for in the methods section was something I always look very carefully for in any study of complementary and alternative medicine (CAM). Can you guess what it is? If you’re a regular, if you’ve been reading for a while, I hope that I’ve imparted enough of my skepticism for you to know right away what I’m talking about. That’s right; I looked for any evidence of blinding of observers. I found none. Why is blinding so important? Easy! These measurements are not entirely objective. An observer has to watch the mouse being tested and decide, for instance, what constitutes “immobility” versus “escape-oriented” behaviors in the FST. Without blinding, the observer knows which experimental group each mouse is in, and subtle biases in observation can creep in, without the observer even knowing it. If I had been a reviewer for this paper, I would have immediately noted the lack of any mention of blinding and demanded that the authors clarify whether the observers were blinded or not. If they were not, I would reject the paper.
There’s another fatal flaw in the paper as well. Take a look at the statistics section:
Results are expressed as the mean ± SEM. The differences inmean for average immobility time in the FST, the staircase testparameters (number of rearing and stair-climbing events), novelobject recognition and Y maze tests were evaluated by t-test. Significant results were determined as p < 0.05.
Those of you unfamiliar with basic statistics (and, believe me, the problem with this passage is very, very basic) won’t recognize the problem, but those of you who’ve taken a basic statistics course will recognize immediately what the problem is here. What is the t-test? No doubt the authors are referring to Student’s t-test, which is a test designed to look for differences between two groups. Now, how many groups are being tested? Yes, indeed! It’s four. In other words, it’s a number of experimental groups for which Student’s t-test was never intended. What is the significance of this? Basically, it means that the authors must have compared, pairwise, all the groups. So what? you might ask. Here’s the problem. The more comparisons you make, the greater the chance of finding a “statistically significant result” by random chance alone. That’s why other statistical tests were developed, specifically the ANOVA test, which would have been the correct test to use to analyze these data. That’s another thing I would have insisted on the authors redoing, if I had been a reviewer.
And I would have done it with extreme prejudice, as I have done before in papers I’ve been asked to review that didn’t grasp this very basic bit of statistics.
Now, this might not matter for comparisons for which the calculated p-value is very low, but there are a lot of p-values like 0.02, 0.03, etc., with the cutoff for statistical significance being 0.05. Combine the failure to use an appropriate statistical test with the appropriate post-test to correct for multiple comparisons with the lack of blinding for the behavioral tests, and these p-values are almost certainly not significant. The differences in the FST came with p-values in the range of 0.0001 to 0.025; so some of the differences are probably significant, even if the correct statistical test were used, while others (such as the control versus Gardasil at 6 months, for which p=0.025) are almost certainly not. Regardless, given the apparent lack of blinding of the observers for the behavioral measures, even the differences reported that would likely stand up to an actual correct statistical analysis are very much questionable.
The authors did additional experiments that demonstrated that—surprise! surprise!—Gardasil does induce antibodies to the HPV L1 protein. Amazing. The vaccine works, even in mice! The authors also looked at brain sections of one quarter of the mice in each experimental group, staining brain tissue sections for Iba-1, which is a microglia/macrophage-specific protein that participates in membrane ruffling and phagocytosis in activated microglia; presumably in this case it was being looked at as a measure of inflammation. Of course, if I were reviewing the paper I would have insisted on other measures of inflammation besides staining for just one protein. Basically, they harvested five mice out of each group every month and sectioned their brains. That means a comparison of four groups, each with five mice in them. That’s not a particularly robust sample to produce statistically significant results. I’m also a bit cynical about the quantitative claims made for the measurement of Iba-1, given that the selection of “areas of interest” was also manual and apparently also not blinded.
Even given that, the authors found no statistically significant difference between control and Gardasil (p=0.06) but did find a difference between the aluminum adjuvant group and the Gardasil group (p=0.017), which doesn’t make a lot of sense if it is the adjuvant that is being blamed for the behavioral changes. Again, this is almost certainly a negative result, given that the authors didn’t do the correct statistical test. None of this is to say that there was intentional cooking of the data to give a desired result. However, no observer is entirely objective. That’s why blinding of observers is so important in behavioral experiments and experiments that involve human choice of areas to measure on images, even when software is being used to qualify the staining, as studies of immunohistochemistry often do.
Basically, this study is worthless, as it’s unblinded and doesn’t use the correct statistical analysis. Had I been a reviewer, I would have pointed these issues out and recommended rejecting the paper. I can see why Dr. Poland was probably horrified to discover that this paper was published in his journal. Perhaps he should ask himself how such a travesty could have been published in his journal.