Hi Friederike,
On Jul 3, 2012, at 10:16 AM, Friederike wrote:
> I have a quick question regarding the p- and q-values in MACSv2.
> In MACSv1.4 the p-value was output by the programme as -10log(p-value) and the FDR in %. Do you still multiply the negative log(p-value) by ten in MACS2?
No. In MACS2, I use -log10(pvalue) or -log10(qvalue) instead of multiplying by ten.
> I have major difficulties in reproducing a similar cut-off in MACSv1.4 and MACSv2 based on the FDR. I used to take FDR 5% as a cut off for peaks I considered for the downstream analysis, but with the newer MACS version I get many more peaks (in the exact same data set) eventhough the default cut-off is q-value = 0.05. I should add that the vast majority of the peaks that I were not identified by MACSv1.4 but by MACSv2 have very small fold enrichment values compared to those peaks that were identified by both MACSv1.4 and MACSv2.
> Did you change the way of calculating the q-value? Why were those "tiny" peaks not picked up by MACSv1.4 but suddenly appear quite statistically significant in MACSv2?
Yes. They are different.
In MACSv1.4, FDR is calculated by swapping treatment and control. MACSv14 assumes all peaks called under a cutoff in this way are false positives. So it can calculate empirical FDR for each pvalue cutoff. However this method is hugely influenced by unbalanced sequencing depth. For example, if control sample is much larger than treatment, FDR would be overestimated so you would have less 'good' peaks above cutoff.
In MACSv2, I use another approach. First, pvalue are calculated at every basepair in the genome, then I adopt Benjamini-Hochberg to correct multiple comparisons, convert pvalue into qvalue or minimum FDR that the peak is significant. This method is more robust in my experience.
Best,
Tao Liu
Research Fellow
Dept of Biostats and Comp Bio, DFCI / HSPH
450 Brookline Ave., Boston, MA 02215
(O)
617-582-7769