--
You received this message because you are subscribed to the Google Groups "methylkit_discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to methylkit_discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/methylkit_discussion/3631.186296437.1612785609%40gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/methylkit_discussion/f578b727-0b96-444f-b254-72172ae73b00n%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/methylkit_discussion/79e3e8c0-c202-41bf-a765-79ac032928e5n%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/methylkit_discussion/dae5aa97-7e9a-4ebe-9153-302f70ccc357n%40googlegroups.com.
Dear Altuna,
No, I’m not missing the point. The tile in my example may be problematic and give rise to the “100 orders of magnitude”, but even a change in p-value of half an order of magnitude is unacceptable: it is not the change in p-value that is the problem3---that was merely a striking example---no, the problem lies in your use of Fisher’s Exact Test for 2by2 Tables after having changed the observed counts during the normalisation:
1884 0
1789 566
are no longer counts and cannot be analysed by Fisher’s Exact Test.
I see only one alternative to pulling Fisher following normalisation, and that is to restore the original counts without affecting the results of the normalisation. Sounds like a contradiction, but it is not. The normalisation is concerned only with relative not absolute coverage; Fisher requires that all four entries be counts but only that the original coverage be preserved. Let me explain by way of the treatment group in my tile:
+ - total
before normalisation 814 237 1051
after normalisation 1789 566 2355
1. multiply all entries in row 2 by 1051/2355 798.40 252.60 1051
2. to allocate the final bp, you can
(a)truncate entry with smaller remainder 798 253 1051
or (b)take into account that Fisher deals in factorials
truncate max(799*(1-0.40),253*(1-0.60)) 798 253 1051
or (c)consider effect in context of normalisation
choose according to effect on normalised meth.diff
(increase or decrease in absolute value)
requires processing both groups together
That is my suggestion. The relative coverage computed by the normalisation is preserved (except for the final integer, but even that becomes part of the normalisation if you adopt option (c), and Fisher is permitted because the normalisation did not change the total number of counts but merely reapportioned them between the + and - categories.
When you wrote “If you have better ways of dealing with this”, I thought a bit about your logistic regression but I don’t see how you can set up a logistic regression
in the present case (only 2 groups, albeit with replicates, and no covariates). What have I missed?
To view this discussion on the web visit https://groups.google.com/d/msgid/methylkit_discussion/af180ffc-7576-4fb0-b149-fe87ee6ce034n%40googlegroups.com.
Dear Altuna,
My tile doesn’t need help, my data doesn’t need help, I don’t need help—methylKit needs help: its algorithm for normalisation transforms the empirical data so that it cannot subsequently be analysed by Fisher’s Exact Test. I realise you do not understand this, so get a proper statistician. And while he (or she) is at it, have him (her) take a look at the rest of methylKit’s statistical procedures, I have my suspicions about them too.
MethylKit has been unwittingly providing erroneous results to scientists for years based on its sloppy statistics. You owe it to the scientific community to rectify the situation as quickly as possible.