Hi Jonesy,
For HET outputs Wilcoxon and t-test columns do represent p-values. You likely aren’t seeing many with p<0.05 because with only 2 vs. 2 samples for a typical ENCODE RBP knockdown the tests are very underpowered. With small numbers of samples we recommend you use regular deltapsi quantification mode to find differential splicing in replicate experiments like these. HET's use case is more appropriate for larger heterogeneous sample groups.
The TNOM value output by HET represents a p-value based on permutation probabilities (This is why you see the values of only 0.33, 0.66, and 1 in your output; there aren’t many ways to order 2 vs 2 samples). You’re right that a TNOM score of 0 means the PSI values from group1 and group2 are perfectly separable, however I believe HET currently will only output the p-value version of this, there may be a way to output the score but you can easily derive it numerically.
TNOM score 0 is perfect separation so if you have a group of size K and a group of size M (in your case I believe M=K=2), then there are only two ways to get a score of 0 if you order the N=(M+K) samples from low to high PSI: Either group 1 (say of size K) is all on the left, or group 2 is all on the left (say the group of size M). It is easy to see that if you consider all possible ways to order N such samples by random you have N chose K such possible ordering so the best TNOM score of 0 will always have a value of 2/(N chose K). For example, with 2 vs 2 samples you can arrange the 4 PSI values in 4 choose 2 = 6 ways. From those, zero misclassifications between group1 and group2 represent two of the arrangements (0011 or 1100), so you get 2/6 = 0.33. For one misclassification you can arrange PSI values 4 ways so the cumulative probability of TNOM score <= 1 is 6/6 = 1. I’d have to check for 0.66, but I assume it’s an artifact of how ties are handled or perhaps it is a missing value for a sample.
Extending this to 3v3 the minimum observable for 0 misclassifications is 2 / (6 choose 3) = 2/20 = 0.1 and for 4v4 the minimum is 2 / (8 choose 4) = 2/70 = 0.029. So again at small sample sizes TNOM p-values aren’t very informative.
Note that the above p-values are NOT corrected for multiple hypothesis tests, and are not meant to be. They just serve as another type of score to assess group separation. In practice what users are applying is a composite test of differences between group medians AND a p-value threshold. This means a true null would need to assess against both criteria which is something more costly and would in turn depend on the underlying assumptions, so we have not implemented that.
Practical recommendation: You can use TNOM score of 0 misclassifications (p-val in your case of 0.33) as a better heuristic and combine that with a dPSI minimum to call changing events or use deltapsi quantification mode instead which uses a Bayesian framework to estimate the posterior distribution of dPSI. Under that quantification mode the test moves from "is the distribution of PSI values in group1 different from PSI values in group2" to "given the reads I observed from group1 and group2, what is the probability that the dPSI between the groups is meaningfully large?". For smaller sample sizes and well behaved replicates this second approach is typically more useful. You can then adjust the thresholds on dPSI reporting. There are two thresholds here - the minimal dPSI value you want to consider, and the posterior probability by the Bayesian model that the change is at least that. By default the minimal change is 0.2 and the confidence is very high (0.95) as the defaults are meant to be conservative and validate well. Sometimes users drop the dPSI to 0.15 or 0.1 but I would not go below that, and sometimes they drop the confidence a bit to say 0.7 or something similar.
Hopefully this clears things up and let me know if you have any questions.
-Matt Gazzara
--
You received this message because you are subscribed to the Google Groups "Biociphers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to majiq_voila...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/majiq_voila/603153ff-586e-4b51-822a-b607cf27c383n%40googlegroups.com.