Clinical trials Medicine

How not to predict the outcome of clinical trials

Yesterday, I came across a concept that I had never considered before (or even heard of before). In fact, it’s a concept that took me by surprise. Basically, it’s an application of a concept to a problem that I never considered applying the concept to before–probably with very good reason. Basically, it’s a guy named Michael Slattery writing about The Wisdom Of The Crowds And Sangamo’s Phase II B Clinical Trial Results.

I’ve never been convinced that there is a such thing as the “wisdom of crowds.” Certainly, the numerous examples throughout history of mob behavior and downright idiotic actions of crowds don’t help. Neither does the example of Wikipedia, which is supposed to be based on the wisdom of crowds but whose editors and managers have found in recent years that perhaps crowds are not so wise at all, forcing the to lock articles to prevent tampering. Never mind the fact that in Wikipedia and the wisdom of crowds, the opinion of an expert in the field being examined and a complete and utter crank can be of equal value if the rest of the participants can’t perceive the difference. So, no, I’m not a big believer in the wisdom of crowds; personally I tend to think that the stupidity that people are capable of is only magnified in crowds, except under situations that are very artificial indeed. Indeed, one of the best brief summation of the so-called “wisdom of crowds” that I’ve seen:

Collecting people in crowds increases exactly three things with any certainty, and none of them are wisdom:

  • It increases the likelihood of a single individual member of the crowd surviving against predators.
  • It increases the effective conservatism of decision-making, not only when consensus (or even majority) decisions are made, but often even individual decisions.
  • It increases the tendency of individuals within the crowd to excuse their own actions and justify them regardless of how reprehensible those actions may be when contemplated in solitude.

Be that as it may, Mr. Slattery proposes a particularly bizarre use of the “wisdom of crowds”:

Would you like to be able to accurately predict the outcome of Sangamo’s (SGMO) Diabetic neuropathy, Phase II clinical trial of SB-509? Given the certainty that the stock will more than double if successful and will jump again shortly thereafter when a partnership deal is announced, I will assume that you, like me, would like this information well in advance of Sangamo’s announcement. No single individual can predict the outcome of a double blinded human clinical trial. But working as a crowd, we can get very close to knowing with a high probability of success, what the results of this trial will be, well in advance of any announcement.

Huh? Actually, Slattery is quite wrong. I’m not at all familiar with SB-509 is or what the evidence is that it will alleviate the symptoms of diabetic neuropathy, nor do I need to know anything about it for purposes of this discussion. The reason that I know that Slattery is quite wrong is easy. I know that a single person can estimate the probability that a clinical trial will be positive if that person has a good set of studies and other data upon which to base an estimate of prior probability using Bayesian principles. True, it might not be so precise as to be within ten percentage points of a specific percent probability, but it can certainly be high, medium, or low, which is about all the precision that most investigators need. At least, there’s no evidence that any crowd can produce a better estimate of prior probability than a statistician or scientist applying the proper type of analysis.

But let’s look at it another way. Even if the “wisdom of crowds” was an actual phenomenon (and there is not really a lot of good evidence that it is, I’d be willing to bet that even James Surowiecki would cringe at this proposed use of the concept:

By soliciting a large number of individual opinions (Yours!) we can aggregate the contributed opinions and analyze them in a manner that will return to the group, the wisdom of the Seeking Alpha readership. In this case, the probability that Sangamo’s study of Diabetic Neuropathy will be successful.

Slattery explains why he thinks this will work thusly:

The reason Wisdom of Crowds (WOC) works so well is not completely known. The author of the book Wisdom of Crowds believes the following explains the basics of this amazing process. As the frequency of the selections decline, those selections will more often fall into one of two categories, negative error and positive error, eliminating them from the accurate information we are seeking. When positive error cancels out negative error you are left with the information that will most likely benefit you goals and in this case that will be the most likely probability for the success of Sangamos Phase II B Diabetic Neuropathy clinical trial.

Does anyone see the problem with this line of reasoning?

First off, it is not “information” that Slattery is seeking. At least, he’s not seeking any sort of information that is verifiable; that is, unless the “wisdom of crowds” concludes that the probability of success of the trial is either 0% or 100%, in which case its conclusion is falsifiable. After all, if the crowd predicts a 0% probability of success, and the trial is successful, that would pretty much bury the crowd’s prediction, and vice versa.

Anything else is useless and unfalsifiable, however. Take the example of the crowd coming up with a consensus that the probability of the trial’s success is 65%. If the trial is, in fact, successful, it proves nothing about the accuracy of the estimate. Neither does it prove anything if the trial turns out to be unsuccessful, because a 65% chance of succeeding still leaves a 35% chance of failure, which is more than 1 in 3 and therefore sufficiently common that it’s not so surprising if there’s a negative result even if the pre-trial probability was 65% In fact, even if the crowd estimated that the pre-trial probability of a positive result was 99%, a negative result would still not prove them wrong. Yes, if the crowd suggested that there was a 99% pre-trial probability that the clinical trial would be a success and it is a success, it would suggest that they got it right, but there would still be uncertainty.

Even leaving that aside, the way Slattery decided to ask for the crowd’s input violates even the very highly artificial conditions that Surowiecki declares as prerequisites for this whole “wisdom of crowds” thing works:

These elements are: 1) The crowd must be as diverse as possible. Each person should have private information even if it’s just an eccentric interpretation of the known facts. 2) The crowd must be large enough to guarantee this diversity. 3) Each contributor’s opinion must be completely independent. People’s opinions aren’t determined by the opinions of those around them. 4) Decentralization; People are able to specialize and draw on local knowledge. 5) Aggregation; Some mechanism exists for turning private judgments into a collective decision.

So, even if you accept the validity of the “wisdom of crowds,” I’d say that at the very least condition #3 is violated. People are posting estimates in the comments, and that’s bound to influence people who encounter the article later and decide to try their hand at coming up with an estimate. Also, I have no idea how large the blog’s readership is, but I’m betting that it’s not nearly as diverse as necessary to meet conditions #1 and #2.

In the end, I must admit that this particular post amused me. (If it didn’t, I wouldn’t have bothered to devote a post to it.) I had, however, hoped for more when I read the title in that I figured that maybe Slattery could educate me about how one might try to use the “wisdom of crowds” to predict clinical trial outcomes. Alas, it was not to be, and I was disappointed. I didn’t really learn anything, and all I encountered was an utterly useless attempt to apply “wisdom of crowds” thinking to a problem quite unsuited for it. After all, it’s not as if such an exercise can in any way bypass clinical trials.

I just hope that no investors are planning on basing their decision to invest or not to invest in Sangamo based on this.

By Orac

Orac is the nom de blog of a humble surgeon/scientist who has an ego just big enough to delude himself that someone, somewhere might actually give a rodent's posterior about his copious verbal meanderings, but just barely small enough to admit to himself that few probably will. That surgeon is otherwise known as David Gorski.

That this particular surgeon has chosen his nom de blog based on a rather cranky and arrogant computer shaped like a clear box of blinking lights that he originally encountered when he became a fan of a 35 year old British SF television show whose special effects were renowned for their BBC/Doctor Who-style low budget look, but whose stories nonetheless resulted in some of the best, most innovative science fiction ever televised, should tell you nearly all that you need to know about Orac. (That, and the length of the preceding sentence.)

DISCLAIMER:: The various written meanderings here are the opinions of Orac and Orac alone, written on his own time. They should never be construed as representing the opinions of any other person or entity, especially Orac's cancer center, department of surgery, medical school, or university. Also note that Orac is nonpartisan; he is more than willing to criticize the statements of anyone, regardless of of political leanings, if that anyone advocates pseudoscience or quackery. Finally, medical commentary is not to be construed in any way as medical advice.

To contact Orac: [email protected]

31 replies on “How not to predict the outcome of clinical trials”

The real key behind the “wisdom of crowds” is none of the things stated. It relies inherently on the various estimates not being systematically biased in either direction – i.e. it is assumed that the real answer is the mean of the probability distribution describing the estimates. And the estimates have to be independent (in the statistical sense). If those conditions are satisfied, then it just becomes a straightforward application of the central limit theorem. The latter can be accomplished, but the former is in general highly questionable.

So it’s real, but neither interesting nor profound – nor reliably applicable.

I think this stuff is getting some of its traction from a different idea of “crowdsourcing” answers, by which people mean “the more people I ask, the more likely one of them will have this information.” And that sort of crowdsourcing works best if people are prepared to keep quiet when they don’t know, and defer to the actual physicists, or apple growers, or transit geeks in the group.

It’s also different from something like SETI@home, where umpteen thousand people are running basically the same software: they aren’t asking me “Do you think there is life on other planets?” they’re saying “give us some cycles so we can run this software that may answer this question.”

Anyone considering an investment in the company needs to use his own internal “probability of success” (whatever success means, it is carelessly defined, imo, in the seeking alpha post) in making his decision. That is the way personal decisions are made. If you are not confident of your probability of success, it is natural to try to collect probabilities from others, just to see if you are completely out of line. The bad news, in many cases, is that it is not clear what all those probabilities refer to. If I am considering investing, I am also interested in safety of the treatment, not just effectiveness. Yet safety is completely omitted from the question posed.

Derren Brown pretended to use this idea when he performed an illusion that made it appear as if he had correctly guessed the UK National Lottery numbers. If each person had some independent source of information that enabled them to make an informed estimate, this might work, but in this context, as in the context of lottery numbers, they don’t.

I predict this experiment will result in an average estimate of 50%. Anyone else want to make an estimate of what their average estimate will be?

I’ve never understood why people are so down on Wikipedia… it’s an incredible resource, and used correctly one can get very accurate information from it. (You learn to “smell” which articles can be trusted and which can’t, and if there’s ever a question, you can go to the original source)

But as you point out, it is not pure anarchy. The “crowd” would fuck it up right quick then. It’s a constant battleground, a very messy process.

Seriously, will people do anything to avoid learning statistics?

Krebiozen, I agree with your estimate. Funny thing, when I first scanned the article seeing “wisdom of crowds” I immediately thought of UK football, US basketball, and some St Patrick’s Day “celebrations” all of which might involve over-turned cars, streets fires, and police curfews.-btw- Orac works in Detroit. I’m across the creek from NYC. Woo Hoo!

I agree the average of the probs will be about 50%. That’s because he asked the wrong question. Rather than to ask for probabilities, he should just ask “Will it succeed?” and then base his probability on the proportion of respondents who say “yea.” That’s how prediction markets work.

First to note, I am not a mathematician, statistician or anything remotely associated.
That said, in recent years there has been seen the use of what is sometimes called parimutual predictions, in which experts in a particular field usually involving human behavior bet with real money on the likelihood of some given event. There is some sense to this – the experts put something more tangible than their image behind their predictions and can bet at odds that reflect their level of confidence in their predictions.
Perhaps in this case, a panel of biochemists, pharmacologists, and endocrinologists might have some validity if induced to bet actual money, but still could not approach a reasonable degree of certainty. Just asking a large number of people without any expertise or specific knowledge to predict how a given molecule may behave in a living cell or, worse, the vast number of different cells in a single human being is irrelevant and silly.

There are other cases where the wisdom of crowds works. Markets produce the “optimal” price, American Idol’s phone banks choose the “best” singer, elections choose the “right” candidate – but in each case “optimal”, “best”, “right”, or whatever is defined as what the crowd determines. It’s quite possible that by some other standards, the selection is terrible.

It’ll be over 50%. A lot of the people who read these types of articles on SA are themselves investors who are notified of articles that are tagged with a stock symbol. That is, the most likely readers of the article, and the commenters, are already long the Sangamo stock, invested both financially and emotionally. A quick perusal of the responses so far shows that the average guess is indeed over 50%. It is meaningless.

If I ask 1,000 people to tell me the square root of 169, I’d bet money that the average response will be a number other than 13. Better to ask one mathematician.

I think the average estimate will be well above 50%. We all have biases and our biases are generally optimistic and not independent. That’s how you end up with a high school graduating class in Bellaire, Texas with more than 30% of the students having GPAs of 4.0 or higher. Everybody’s above average, as that guy from Minnesota says.

For some recent insight into how wisdom of crowds actually works in practice, read “The Big Short” by Lewis or “Confidence Game” by Richard.

That’s funny. Funny because yesterday I was reading an article in a newspaper talking about the benefits of drinking beer. Some scientists had a conference somewhere in the Western Europe and discussed about it. Well… why not?

So, I went through the whole article specially because it was promising some interesting stuff, like preventing osteoporosis. Well, after the article announced that the scientists met where they met and discussed about beer (over some good wine, I hope) and its benefits, the continuation was something like “over 14% of interviewed people from Spain believe that drinking 1 beer a day may prevent osteoporosis”. And other similar beliefs related to beer consumption.

So… why the hell did I spend 6 years in the University to study Physics? Black wholes? Ask the taxi drivers. String theory? Ask the underwear makers. Faster than light neutrinos? I suggest we should ask the crazy 18 years old kids that will never turn 19 because of the way they drive/ride.

— James Ph. Kotsybar

When the general public hears about
A breakthrough in scientific research
They want to add their voices to the shout,
So as not to feel they’re left in the lurch.
That they have opinions, there is no doubt.

They’ll foist themselves into the dialogue,
When something sensational’s put in print.
Though their comments reveal they’re in a fog
Without having the slightest clue or hint,
It won’t prevent them posting to the blog.

Most often, all they can add is their moan:
“Why can’t science leave well-enough alone?”

the experts put something more tangible than their image behind their predictions and can bet at odds that reflect their level of confidence in their predictions.

The key assumption here is that people’s confidence in their predictions or guesses are correlated with the reliability of those predictions… so that a “80% confident” guess is likely to be closer to the truth than a “20% confident” guess.
This has been tested in any number of situations, and it turns out to be untrue — there is no correlation. The Dunning-Kruger effect being the most obvious manifestation of this.

Yeah, it’s been demonstrated that the more confident people are of their predictions, the less likely they are to be right. The actual (as opposed to self-declared) experts tend to realise that their field of study is complicated, with lots of confounding variables that are hard to predict; whereas people who think they’re experts will just consider one or two factors, and declare that nothing else could possibly matter to their analysis, so they’re totes confident.

The point of the dubious opinion poll is entirely unserious. The point of this article is to advertise for an investment in which the author is admittedly “long.”

Remember: 80% of the people you ask about this will consider themselves to be above-average drivers.

If you ask someone to provide a confidence rating for their guess, in effect you’re asking them to rate the distance between their guess and the true answer, which by definition they don’t know. The best anyone can do is look back over similar situations in the past and check their own track record of being more-or-less correct, when there are all those retrospective-bias effects that make it impossible to do that.

@herr doctor bimler I looked at that discussion but it seems to be about a slightly different thing to me. They talk more of something that sounds like “extended brainstorming” and “parallel thinking” (as in parallel programming). Which is a completely different beast.

To me, Orac’s post produces a very strong image: a crowd given to solve a system of differential equations polling for the right solution. And something important like a 10 GW power plant built afterwards based on the voted result.

I’ve never understood why people are so down on Wikipedia… it’s an incredible resource, and used correctly one can get very accurate information from it. (You learn to “smell” which articles can be trusted and which can’t, and if there’s ever a question, you can go to the original source)

If your first “can” is changed to a “might,” I can agree with you. You’re right that a lot of articles that have been skewed by someone’s bias give off a “smell”, but I’m not so sure of the contrapositive claim (that an article which doesn’t give off a “smell” can be trusted) – I learned an important lesson from a sneaky vandalism that I found once where key names, dates, etc. had all been changed and the only clue that something was wrong was a reference to a “second crime” that didn’t make sense with the rest of the article. That vandalism had gone undetected for weeks.

I could go on for quite some time about the flaws of Wikipedia, but as it relates to the “wisdom of crowds,” the key problem is that Wikipedia doesn’t represent the wisdom of crowds. It represents the wisdom of crowds – as overruled by the loudest, stubbornest troll who wants to make sure their opinion, the “right” opinion, gets privileged above all others. Imagine what the comments on an RI post would look like if Thingy troll had the power to edit others’ comments and you get an idea.

Wasn’t there something–the Delphi project?–that purported to show that large masses of people were good predictors of some things?

I think that being in a crowd also helps individuals to survive in cold weather and to procreate and possibly to remember.

Finance folks love the “wisdom of the crowd” because it allows them to pretend that the modern stock market. All these models they use – CAPM being the most famous, are supposed to be based on expected future performance, but in fact become based on past performance. Because they also tell you to build a diverse portfolio, which means that you have no realistic chance of obtaining enough information about each firm in your portfolio to create a reality-based prediction.

On the stock market, this substitution of “market price” for “best available prediction” is mostly harmless, because the secondary market in common stock is not very important. When bond traders do the same thing, and replace evaluation of credit risk with past market risk, they… break things. Like the global macroeconomy.

– Jake

Proofreading fail. The first sentence should have ended with “pretend that the modern stock market is, for the majority of the participants, anything but a gambling hall where the gamblers are kept from losing money by vast subsidies from the real economy.”

– Jake

The Greg Laden site on the string of related American sites called “Scienceblogs” is arguably the most prominent.

Greg is a catastrophic warming supporter, which is his right. He censors opposing views or even questions put courteously which is his right because, as he explains its his site, though incompatible with any claim to “science”.

He has claimed to be opposed to censorship. saying “Censorship is the second to last refuge of tyrants, the last is violence” (#23) a refreshingly liberal (in the true meaning of the term) viewpoint on “scienceblogs” where 9 sites, at last count, promote censorship. Rather than answer the 7 questions any climate alarmist should be able to easily answer if it is true, he simply censored them.

Note that he does not delete ad homs or indeed obscenity, which are clearly, after all, the stock in trade of climate alarmist “scientists”, particularly those “peer reviewed and published in the finest journals” (#5) (although he did censor some criticisms made in return, neither ad hom nor remotely obscene since I don’t find that persuasive). Indeed, while censoring me, he recently passed a comment that I should be glad Greg hasn’t come round to my house and cut off my head which is the last argument he allegedly disapproves of.

It is his choice to run his site that way. However he does worse than that.
Greg has also claimed to be the sole scientist anywhere in the world who supports warming catastrophism and is not paid by the state. Not one single cent.

He has also claimed to be a “climate scientist”.

Indeed he has been given numerous opportunities to say the “misspoke” (a la Clinton), panicked or that the claim needs “clarifying” (a term often used by British politicians caught lying). He has, repeatedly, stood by his claim.


Greg Laden is a Biological Anthropologist, studying human evolution, with degrees from Harvard University. He has taught at several universities, including Harvard and is currently a part time Assistant Professor at the University of Minnesota. He is an independent scholar who blogs at

Not a wise move when elsewhere claiming to be a climate scientist receiving not one cent from government. Though his “scienceblogs” bio is replete with “did I mention Harvard”‘s it is astonishingly less forthcoming about his present role as a part time assistant teacher at Minnesota U.

University of Minnesota, Twin Cities (U of M) is a public research university and A public university is one that is predominantly funded by public means

So the alleged only scientist anywhere in the world who supports warming catastrophism while receiving not a cent from the government is actually an assistant teacher of anthropology, largely or entirely paid by said government (at what I understand Americans call a “cow college” rather than Harvard).

Laden has clearly, deliberately and continuously lied and if the entire “scienceblogs” site and anybody connected is not to it is not to be wholly discredited as not being in any way connected to scientific principles it is impossible he could remain on it.

Knowing a little about anthropology in academe in Britain I can say that it is largely a matter of keeping ones tongue between the cheeks of those above you on the ladder while refusing to notice any scientific evidence which does not suit the politically correct paradigm (admittedly difficult to do otherwise in such a position). Rather than being a real science it is very much the sort of “science” Richard Feynman described in his “Cargo Cult Science” lecture.

Perhaps American anthropology is totally different and a real science.

Perhaps his interest in (and possible limited understanding of CAGW) is inspired by coworkers, friends and neighbours. I haven’t visited Minnesota and it may be a warm place with a large coastal area which would explain the local’s interest in the possible bad effects of warming. Indeed it must be so because pathetic as it is to lie on the subject it would be unbelievably pathetic to lie in a way that will not impress coworkers and neighbours.

Comments are closed.


Subscribe now to keep reading and get access to the full archive.

Continue reading