Regular readers probably know that I’m into more than just science, skepticism, and promoting science-based medicine (SBM). (If they’re regular readers of my other, not-so-super-secret other project, they might also realize that they’ve seen this post before elsewhere. I had to stay out late for a work-related event and decided to tart it up and recycle. So sue me.) I’m also into science fiction (hence the very name of this blog, not to mention the pseudonym I use), computers, and baseball, not to mention politics (at least more than average). That’s why our recent election, coming as it did hot on the heels of the World Series in which my beloved Detroit Tigers, alas, utterly choked and got swept away by the San Francisco Giants, got me to thinking. Actually, it was more than just that. It was also an article that appeared a couple of weeks before the election in the New England Journal of Medicine entitled Moneyball and Medicine, by Christopher J. Phillips, PhD, Jeremy A. Greene, MD, PhD, and Scott H. Podolsky, MD. In it, they compare what they call “evidence-based” baseball to “evidence-based medicine,” something that is not as far-fetched as one might think.
“Moneyball,” as baseball fans know, refers to a book by Michael Lewis entitled Moneyball: The Art of Winning an Unfair Game. Published in 2003, Moneyball is the story of the Oakland Athletics and their manager Billy Beane and how the A’s managed to field a competitive team even though the organization was—shall we say?—”revenue challenged” compared to big market teams like the New York Yankees. The central premise of the book was that that the collective wisdom of baseball leaders, such as managers, coaches, scouts, owners, and general managers, was flawed and too subjective. Using rigorous statistical analysis, the A’s front office determined various metrics that were better predictors of offensive success than previously used indicators. For example, conventional wisdom at the time valued stolen bases, runs batted in, and batting average, but the A’s determined that on-base percentage and slugging percentage were better predictors, and cheaper to obtain on the free market, to boot. As a result, the 2002 Athletics, with a payroll of $41 million (the third lowest in baseball), were able to compete in the market against teams like the Yankees, which had a payroll of $125 million. The book also discussed the A’s farm system and how it determined which players were more likely to develop into solid major league players, as well as the history of sabermetric analysis, a term coined by one of its pioneers Bill James after SABR, the Society for American Baseball Research. Sabermetrics is basically concerned with determining the value of a player or team in current or past seasons and with predicting the value of a player or team in the future.
There are a lot of parallels between moneyball and “evidence-based medicine” (EBM), as you might imagine, as Phillips et al point out:
In both medicine and baseball, advocates of evidence-based approaches argued for the enhanced vision of statistical techniques, which revealed what tradition or habit had obscured. The difference between an all-star and an average hitter, for example, works out to about one hit every other week, a distinction that’s almost impossible for even a trained scout to recognize. Statistical power can be as relevant as opposite-field hitting power in the assessment of players. Early proponents of controlled medical trials similarly pointed to how difficult it was for an individual practitioner to determine a treatment’s efficacy or distinguish real effects from apparent ones after seeing only a small number of clinical cases. Mathematical measurements and calculations were meant to push practitioners away from naive visual biases — a player who “looks right” or a therapy that seems to work. Walks are far more important than they first appear in baseball; walking is more important than it first appears in medicine.
Moneyball has also entered politics in a big way over the election cycles of 2008, 2010, 2012. In the run-up to the 2012 election, I, like many others, became hooked on FiveThirtyEight, a blog devoted to applying rigorous statistical analysis to the polls. (FiveThirtyEight refers to the number of votes in the Electoral College.) As political junkies (and even many casual observers) know, the man responsible for the blog, Nate Silver, got his start as a “moneyball”-style sabermetrics baseball analyst. In 2002, he developed a model to assess and predict a baseball player’s performance over time, known as PECOTA, which stands for “Players Empirical Comparison and Optimization Test Algorithm.” Silver brought his model to the Baseball Prospectus. Several years later, he was applying his statistical methods to the 2008 election, and the rest is history. Indeed, in this year’s election, Silver correctly called all 50 states.
Silver’s model basically works by looking at polling data from a wide variety of polls, particularly state-level polls, weighting the data based on how recent it is, sample size, and the pollster’s history of accuracy. The polling data are then used to calculate an adjusted polling average subject to trend line adjustment, house effects adjustment, and likely voter adjustment, after which they are adjusted based on various factors, such as a state’s Partisan Voting Index, and the like. Ultimately, Silver takes his aggregated poll data and uses them to run simulations that estimate the likelihood of a given outcome. That’s how he came up with his famous estimates of how likely various scenarios were. For example, the day before the election Silver was estimating that Mitt Romney only had an 8% chance of winning the Electoral College. Republicans howled. We all know the results of the election and that President Obama won reelection more easily than the conventional wisdom had been predicting. Silver also had an excellent track record predicting the Republican landslide in the 2010 midterm elections. Back then, Democrats howled.
Not surprisingly, there was considerable resistance in baseball to “moneyball” at first. The “old guard” initially didn’t like the implication that statistical modeling could judge the value of a player better than scouts, managers, and the front office. Moreover, the statistical predictions made by “moneyball”-inspired sabermetric analysis often clashed with conventional wisdom in baseball. Similarly, this year we were treated to a similar spectacle, where Nate Silver unwittingly became a political flashpoint, as his numbers, which at no point over the last couple of months favored Romney. This led to considerable resistance to the statistically-based methods of not just Nate Silver, but Sam Wang of the Princeton Election Consortium and others as well. The highest Romney’s estimated probability of winning the Electoral College according to Silver’s model ever got was 39% on October 12. That’s actually not bad, but Obama was the favorite in Silver’s model and remained the favorite up until the end. So different was Silver’s prediction from the conventional wisdom (a super-tight election that would not be called until very late on election night or might even end up with Obama winning the Electoral College but losing the popular vote) and Republican expectations (a Romney win), that on election night when the networks called Ohio for Obama not long after 11 PM, Karl Rove initially refused to accept the outcome, even though Fox News had also called it. Overall, predicting this election ended up being portrayed as a battle between data-driven nerds over ideological political pundits.
Of course, doctors are not baseball managers or ideologically-driven political pundits. Or, at least, so we would like to think. However, we are subject to the same sorts of biases as anyone else, and, unfortunately, many of us put more stock in our impressions than we do in data. Overcoming that tendency is the key challenge physicians face in embracing EBM, much less SBM. It doesn’t help that many of us are a bit too enamored of our own ability to analyze observations. As I’ve pointed out time and time again, personal clinical experience, no matter how much it might be touted by misguided physicians like, for example, Dr. Jay Gordon, who thinks that his own personal observations that lead him to believe that vaccines cause autism trump the weight of multiple epidemiological studies that do not. The same sort of dynamic occurs when it comes to “alternative” medicine (or “complementary and alternative medicine” or “integrative medicine” or whatever CAM proponents like to call it these days). At the individual level, placebo effects, regression to the mean, confirmation bias, observation bias, confusing correlation with causation, and a number of other factors can easily mislead one.
Yet, in medicine there remains resistance to EBM. Indeed, there is even an organization of physicians that explicitly rejects EBM and frequently publishes screeds against it in its major publication. This organization is known as the Association of American Physicians and Surgeons, and I’ve written about it before. For example, a former president of the AAPS, Lawrence Huntoon, MD, once dismissed EBM, represented by clinical care pathways, systems-based care, and other products of EBM as “medical herdology.” Other articles include titles such as Evidence-Based Guidelines: Not Recommended. In it, its author, Dr. Norman Latov, writes:
On its face, evidence-based medicine is just what the doctor ordered. What rational person would argue that medical decisions should not be based on evidence?
Upon closer examination, however, the term is deceptive. Evidence-based guidelines (EBGs) in fact only use evidence from controlled trials, and deny other types of evidence or clinical judgment, thereby distorting the decision process.
Indeed, when you read various critiques of EBM, they are almost inevitably different from our critiques of EBM (namely that EBM doesn’t adequately take prior probability into account). There is also a difference between CAM practitioners and apologists and doctors who think they support evidence and science in medicine. For instance, when most doctors criticize EBM, they tend to attack it because they see it as limiting their “clinical judgment” or on the basis that there are things that “EBM can’t account for.” (See the parallel here between baseball and medicine?) While it is true that there are things that EBM can’t account for, that doesn’t necessarily mean that personal experience does better. Indeed, when it’s tested rigorously, it tends not to. As for CAM practitioners, I tend to liken them to political operatives: Driven by ideology more than science, rather than simply being used to old ways of doing things, like baseball leaders in the age before “moneyball.” For instance, take a look at Andrew Weil, who most definitely does not like EBM, going so far as to write:
RCTs have dominated decision making about efficacy in health care for almost 50 years. Many researchers have explored the difficulty of subjecting IM treatment approaches to RCTs. There are some characteristics of IM interventions that make RCTs particularly difficult to carry out, and perhaps even less relevant, than for conventional allopathic medicine. As Fønnebø pointed out, the gap between published studies of integrative approaches on the one hand, and the clinical reports by practitioners on the other hand, may partially result from the fact that placebo-controlled RCTs are designed to evaluate pharmaceutical interventions.
One can almost see Karl Rove railing against Fox News analysts about calling Ohio at 11:15 PM on election night 2012. (And what an amusing sight that was!) Whatever the reasons, be they ideological, a misguided belief in their own “clinical judgment,” or just plain cussedness, there are a lot of doctors who don’t like EBM, and there are a whole lot of CAM practitioners that don’t like SBM, because CAM exploits the blind spot in EBM. But let’s get back to the analogy between moneyball and EBM:
Critics of moneyball approaches have nonetheless been quick to emphasize the way in which perspective can be distorted, not enhanced, by statistics. One might overapply concepts such as Bayes’ theorem or develop a habit of plugging data into statistical software simply to gain a patina of precision, regardless of appropriateness (tendencies that cause medical practitioners, in Alvan Feinstein’s pithy phrase, to be blinded by the “haze of Bayes”).3 Critics have also pointed to what might be termed the “uncertainty principle” of statistical analysis: general data (How well does this player hit against left-handers? How well does this therapy work in myocardial infarction?) often fail to take into account consequential distinctions; but more specific data (How well does this player hit against hard-throwing left-handers on warm Sunday afternoons in late September? How well does this therapy work in right-sided myocardial infarction in postmenopausal women?) can involve too few cases to be broadly useful. Individuals, and individual scenarios, might always be idiosyncratic on some level — a truth perhaps borne out by long-standing efforts to appropriately apply the scientific results of clinical trials to individual patients in the clinic.
Here’s where Phillips et al go wrong. As we have pointed out here on this very blog many, many times before, the problem with EBM is not that it overapplies Bayes’ theorem. Rather it’s that EBM uses frequentist statistics over Bayes’, often massively underplaying the importance of prior plausibility, estimates of which Bayesian statistics demands. Indeed, not making Baysian estimates of prior plausibility leads to EBM’s blind spot towards CAM, such that statistical noise and bias in trial design can lead to the appearance of efficacy for therapies whose rationale, to be true, would require that much of modern physics and chemistry be not just wrong but massively wrong. In EBM, “plausibility bias” (or, as I like to call it, reality bias) plays only a relatively minor role, and that leads to noise and bias in clinical trials of, in essence, nothing (i.e., homeopathy or reiki) to appear to be “statistically significant” effects.
So, while “moneyball” does take Bayes into account somewhat, there is one big difference between moneyball and medicine, and that’s science. There are certain things in medicine that can be dismissed as so implausible that for all practical purposes they are impossible based on physical laws and well-established science (i.e., homeopathy). The result was hilariously illustrated in a very simple way in a recent XKCD cartoon that asked if the sun just exploded.
Where Phillips et al do better is in pointing out that EBM has difficulty as the groups under study are chopped finer and finer, with the ultimate goal of developing “personalized medicine,” EBM gets harder, as does SBM. It’s also where the critics miss the boat as well. CAM practitioners like to invoke this problem as evidence that EBM/SBM can never result in “personalized medicine,” but this criticism is disingenuous in the extreme. To CAM practitioners, “personalized medicine” means, in essence, “making it up as you go along,” without a scientific basis. In contrast, “personalized medicine” in EBM/SBM can and should mean science-based treatments plus clinical judgment plus patient values. It’s just for idiosyncratic cases where clinical judgment is still very important and EBM doesn’t banish clinical judgment. Clinical judgment instead becomes the science and art of applying medical and scientific evidence to individual patients who don’t fit the mold. They’re also correct in pointing out that another area in which moneyball can help in medicine is to increase value. Remember, the very reason moneyball was used in the first place was to help a team at a financial disadvantage compete against teams with a lot more money to burn. On the other hand, in medicine EBM and SBM are not necessarily the cheapest. It’s all tradeoffs. For instance, EBM would tell us that for women above a certain risk level for breast cancer it is better to add breast MRIs to their screening regimen, which definitely costs more. What SBM should be able to do is to produce better outcomes.
Finally, one of the biggest impediments to data-driven approaches to almost anything, be it baseball, politics, or medicine, is the perception that such approaches take away the “human touch” or “human judgment.” The problem, of course, is that human judgment is often not that reliable, given how we are so prone to cognitive quirks that lead us astray. However, as Philips et al point out, data-driven approaches need not be in conflict with recognizing the importance of contextualized judgment. After all, data-driven approaches depend on the assumptions behind the models, and we’ll never be able to take judgment out of developing the assumptions that shape the them. What the “moneyball” revolution has shown us, at least in baseball and politics, is that the opinions of experts can no longer be viewed as sacrosanct, given how often they conflict with evidence. The same is likely to be true in medicine.