My lab group has been looking at CuffDiff to understand the underlying statistical methods and how best to interpret the output.
An area of concern is the change, between versions 1 and 2.0, from uniform to non-uniform p-values. We have observed this in multiple datasets, including the Drosophila RNA-seq data provided to run your published protocol (GSE32038).
I have attached some histograms illustrating this change with the "OK" differential expression tests obtained after running CuffDiff with the options provided in the protocol. It seems that, in changing how expression variances are modelled in version 2.0, the calculated test statistics are no longer standard normal under the null hypothesis. We observe that the p-value distribution is skewed too high, indicating test statistics too close to zero and variance estimates too large.
I note that version 2.1 has an entirely new testing method for differential expression, abandoning the normal approximation in favour of sampling from the posterior distribution for each condition. The p-values from these tests show similar non-uniformity as those in version 2.0. I'm aware that in some circumstances posterior predictive p-values do not exhibit uniformity under the null. Can you offer some explanation/justification for how that might apply here? With the new testing method, do you know what p-value distribution one should expect if the modelling and testing is well-founded? I'm still concerned that if something was inflating the variances in version 2.0, that may have carried over into version 2.1.
Any illumination you could offer would be much appreciated, thanks!