Healthy Skepticism Library item: 5919
Warning: This library includes all items relevant to health product marketing that we are aware of regardless of quality. Often we do not agree with all or part of the contents.
 
Publication type: news
Statistically significant results reported in abstracts of clinical trials should generally be disbelieved, according to an analysis of 520 papers reporting odds ratios or relative risks.
Omnus 2006 Aug 10
http://www.omnus.com.au/Omnus/Topic/Editorial/Display/0,3275,3-211-0-9369,00.html?ds=20060816&src=nl
Full text:
10 Aug 2006
http://www.omnus.com.au/Omnus/Topic/Editorial/Display/0,3275,3-211-0-9369,00.html?ds=20060816&src=nl (registration required)
Statistically significant results reported in abstracts of clinical trials should generally be disbelieved, according to an analysis of 520 papers reporting odds ratios or relative risks.
The analysis, by the director of the Nordic Cochrane Centre in Denmark, found the first result reported in abstracts was statistically significant in 70% of randomised trials, 84% of cohort studies and 84% of case-control studies. “Although many of these results were derived from subgroup or secondary analyses, or biased selection of results, they were presented without reservations in 98% of the trials,” he stated.
The distribution of P values around 0.05 was also extremely skewed. Only five trials reported a P value between 0.05 and 0.06, traditionally regarded as non-significant, while 29 had P values between 0.04 and 0.05.
There were also apparent errors in the calculations. “I could check the calculations for 27 of these trials,” he said. “Four of the 23 significant results were wrong, five were doubtful, and four could be discussed… .Significant results in abstracts are common but generally should be disbelieved.”
The high rate of significant results did not inherently make sense. A general prerequisite for trials was the concept of ‘clinical equipoise’ – the relative merit of the treatments being compared was truly unknown, and a null hypothesis of no difference between them was equally likely to finding a significant difference.
Multiple statistical tests on trial data were very common but usually not accounted for. More than 200 statistical tests were sometimes specified in protocols. Even if a treatment was compared with itself, the chance that at least one of those 200 comparisons would be significant at the 0.05 level was 99.996%.
Several steps could be taken to improve statistical analysis and its reporting. If a conventional threshold for significance was needed (rather than simply reporting the P value), it should be set at less than 0.001. Analysis of data and writing of manuscripts should be done blind, and compared to that written by the investigators and sponsors. Finally, journal editors should scrutinise abstracts more closely and ask for trial protocols and raw data when necessary.
Reference
Gotzsche, P. 2006, ‘Believability of relative risks and odds ratios in abstracts: cross sectional study’, BMJ, vol. 333, pp. 231-234.