Saturday 22 August 2015

How robust is empirical science?

I'd like to begin this post by admitting that I am writing far outside my own research experience. Nevertheless the paper that prompted it ("Likelihood of null effects of large NHLBI clinical trials has increased over time" by Robert Kaplan and Veronica Irvin) sounds such a potentially alarming message that I think it is worth publicising.

Kaplan and Irvin looked at all "large" research trials funded by the NHLIB (the US National Heart, Lung and Blood Institute) between 1970 and 2012 ("large" here was precisely defined in their paper). These trials examined a variety of drugs and dietary supplements for preventing cardio-vascular disease. The authors recorded whether the trial had a positive outcome (statistical evidence that the treatment was successful, a negative outcome (statistical evidence that the treatment was ineffective) or a null outcome (no evidence either way).

 There were 30 studies prior to 2000 of which 17 produced a positive outcome, and 25 afterwards of which only 2 produced a positive outcome. This is a large decline in the number of positive outcomes and the authors discussed various possible causes.

 Two possibilities that might, a priori, explain the decline are

  1. researchers prior to 2000 were more influenced to produce positive outcomes because these are preferred by drug companies, and 
  2. the use of a control group to which a placebo was administered occurred less prior to 2000.
However possibility 1 was refuted in that the proportion of trials sponsored by drug companies was essentially the same for both periods, and possibility 2 was refuted for similar reasons.

Kaplan and Irvin did however advance quite a compelling reason to explain the discrepancy. Before I tell you what it was, it is worth saying something about the culture in which such experiments take place.

Imagine you are a scientist who is testing the efficacy of a drug to combat high blood pressure (say). You set up your clinical trials complete with a control group to whom you administer only a placebo, gather your data, and analyse it. This may take a long time and you are heavily invested in your results. Naturally you are hoping that the results will demonstrate that your new drug will prove effective in reducing blood pressure.

But maybe that doesn't happen. Oh dear what could you do? Doesn't it seem rather lame to simply report that you found no effect either way (or, worse, a negative effect)? Since you have gathered all this data why not look at it again - after all, it might have some significance. Maybe your drug has had some unexpected positive effect and, when you find it, you can report a positive outcome.

What's wrong with that? Isn't it perfectly fair since your data did demonstrate some form of efficacy?

No, it's not fair for several reasons. One reason is that your experimental procedures were perhaps not so well-tailored to assessing the result you did, in fact, find. Perhaps a more important reason though is that any such conclusion only comes with a statistical likelihood and, in the regime that you actually implemented, you gave yourself many opportunities to "get lucky" with a statistical conclusion.

It has become increasingly recognised that such researcher bias should be minimised. In the year 2000 the NHLIB did introduce a mechanism to prevent researchers from gaming the system in the way I described above. They began to require researchers to register in advance that they were conducting a clinical trial and what hypothesis they were testing. So once the data is gathered the researchers cannot change their research question.

Pre-registration of trials is what Kaplan and Irving believe explains the drop in positive outcomes since 2000. They could easily be right and, at the very least, their work should be scrutinised and critiqued.

Now, just for the moment let's assume they have hit upon a significant finding. What would this mean? Surely it means that, for the 30 trials conducted prior to 2000, we might assume that, rather than a fraction 57% of them having positive conclusions (17 out of 30), the fraction should be closer to 8% (the fraction of trials post-2000 that were positive). Since we have absolutely no idea which of the positive-outcome trials lie in this much smaller set we should dismiss 30 years of NHLIB funded research. Worse than that thousands of patients have been administered treatments whose efficacy is unproven. I'm not suggesting a witch-hunt against researchers who have acted in good faith but surely there are lessons to be learnt.

One lesson to learn is that we should value trials with a negative or null outcome just as much as we value those with a positive outcome. In particular the bias towards publishing only positive outcomes must disappear. This is already becoming increasingly accepted but, as Ben Goldacre has demonstrated in his book Bad Pharma, there is still a very long way to go. In fact Goldacre shows that pharmaceutical companies have actively concealed studies with null outcomes and cherry-picked only those studies that shine a favourable light on the drugs they promote.

But there is another lesson, one with potentially much wider implications.  Many disciplines conduct their research studies by the "formulate hypothesis, gather data, look for statistical conclusion" methodology. In fact hardly a discipline is untouched by this methodology and, in most cases, they are years behind the medical disciplines in recognising what can contribute to researcher bias. It is therefore not an overstatement when I claim that it is a strong possibility that such disciplines have a track record of generating dodgy research results.

This shocking conclusion should be taken to heart by every research institution and, in particular, it should be taken to heart by our universities which claim to have a mission to seek out and disseminate truth. In my opinion it is now incumbent on the relevant disciplines (most of them) to at least conduct an analogue of the work carried out by Kaplan and Irvin. We need to extent of the problem (if indeed there is a problem) and we need to repeat, with all the new rigour we now know about, many previous investigations even their conclusions have been accepted for years.

It is not enough to aggregate the results of several investigations to raise confidence in their conclusions (meta-studies). We will often need to begin afresh. Well, at least that will give us all something to do.