-10*log10(pvalue) significance

4,754 views
Skip to first unread message

joey tribbani

unread,
Jun 13, 2011, 6:32:11 AM6/13/11
to macs-ann...@googlegroups.com
Hi All,

I would like to understand the significance of -10*log10(pvalue)in MACS output. Why MACS report -10*log10(pvalue) value and not pvalue? What should be the criteria to filter the peaks on the basis of -10*log10(pvalue)? Which peak is better (i) One having lower -10*log10(pvalue) or (ii) one having higher -10*log10(pvalue)- I guess this one?

For following 3 peaks which is more significant?


chr     start   end     length  summit  tags    -10*log10(pvalue)   fold_enrichment   FDR(%)
chr1    14125   15560   1436    711     96      84.82               3.81              15.52
chr1    18727   20390   1664    1178    88      109.55              3.92              15.81
chr1    136335  137188  854     504     42      134.15              7.65              17.97

Is there any document which explains details formula's for calculating -10*log10(pvalue), fold_enrichment and FDR?
Would also like to know if pvalue can be calculated from the -10*log10(pvalue)...

--
Regards,
Jo


Alessandro Guffanti

unread,
Jun 13, 2011, 9:02:01 AM6/13/11
to macs-ann...@googlegroups.com
Hi.

On 13/06/2011, joey tribbani <jo.tr...@gmail.com> wrote:
> *Hi All,


>
> I would like to understand the significance of -10*log10(pvalue)in MACS
> output. Why MACS report -10*log10(pvalue) value and not pvalue?

Because in this way you have positive numbers; pvalue is by definition
between 0 and 1 and negative, logarithm will make a linear transformation,
-10 reverse it and give you numbers which are easier to manage ..

What should
> be the criteria to filter the peaks on the basis of -10*log10(pvalue)? Which
> peak is better (i) One having lower -10*log10(pvalue) or (ii) one having
> higher -10*log10(pvalue)- I guess this one?

The second - see above. The higher this value the lower the pvalue

>
> For following 3 peaks which is more significant?
>
>
> chr start end length summit tags -10*log10(pvalue)
> fold_enrichment FDR(%)
> chr1 14125 15560 1436 711 96 84.82
> 3.81 15.52
> chr1 18727 20390 1664 1178 88 109.55
> 3.92 15.81
> chr1 136335 137188 854 504 42 134.15
> 7.65 17.97


As a rule of thumb I vote for the peaks with a lower FDR and the
higher number of tags.
Hence my personal filtering preference will be the first. But they are
very similar. I could
suppose you could merge the first two .. which kind of experiment is this one ?

> Is there any document which explains details formula's for calculating
> -10*log10(pvalue), fold_enrichment and FDR?

MACS documentation and papers.

> Would also like to know if pvalue can be calculated from the
> -10*log10(pvalue)...

Er .. this is very simple maths indeed...

HTH,

A

Sean Davis

unread,
Jun 13, 2011, 9:07:55 AM6/13/11
to macs-ann...@googlegroups.com
On Mon, Jun 13, 2011 at 6:32 AM, joey tribbani <jo.tr...@gmail.com> wrote:
> Hi All,
>
> I would like to understand the significance of -10*log10(pvalue)in MACS
> output. Why MACS report -10*log10(pvalue) value and not pvalue? What should
> be the criteria to filter the peaks on the basis of -10*log10(pvalue)? Which
> peak is better (i) One having lower -10*log10(pvalue) or (ii) one having
> higher -10*log10(pvalue)- I guess this one?

Typically, math in a computer is limited to a certain number of
significant digits. Converting very small decimals (probabilities,
for example) to log space allows one to simply add numbers rather than
having to multiply very small numbers to get even smaller numbers,
thereby losing precision.

> For following 3 peaks which is more significant?
>

Higher is better.

> chr     start   end     length  summit  tags    -10*log10(pvalue)
> fold_enrichment   FDR(%)
> chr1    14125   15560   1436    711     96      84.82
> 3.81              15.52
> chr1    18727   20390   1664    1178    88      109.55
> 3.92              15.81
> chr1    136335  137188  854     504     42      134.15
> 7.65              17.97
>
> Is there any document which explains details formula's for calculating
> -10*log10(pvalue), fold_enrichment and FDR?
>
> Would also like to know if pvalue can be calculated from the
> -10*log10(pvalue)...

P = 10 ^ (-Q/10)

where Q is the -10*log10(pvalue)

Sean

Jo

unread,
Jun 14, 2011, 6:07:55 AM6/14/11
to macs-ann...@googlegroups.com
Hi Sean/Alessandro

Thanks for the explanation.

@Alessandro: I was trying to look for mathematical formula to calculate p-val, fold-enrichment and FDR. I had already tried to search that in MACS-paper and documentation but I could not find any formula given there. However there is theoretical description given which is little vague to me. The reason why I want to know the formula is, I want to manually calculate the p-val/fold-enrichment/FDR for a specific region to have a better understanding.

@Sean: I tried to calculate pvale from -10log10(pvalue) with P = 10 ^ (-Q/10) formula for able previously mentioned 3 regions and it gives me
(1) P = 10 ^ (-84.82 /10) = 1.518
(2) P = 10 ^ (-109.55/10)= -0.955
(3) P = 10 ^ (-134.14/10) = -3.415

I am not sure why P value is coming more than 1. Am I doing something wrong in this case?

Regards
Jo




--
You received this message because you are subscribed to the Google Groups "MACS announcement" group.
To post to this group, send email to macs-ann...@googlegroups.com.
To unsubscribe from this group, send email to macs-announcem...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/macs-announcement?hl=en.




--
Regards,
Jo


Jim Murrett

unread,
Jun 14, 2011, 6:19:03 AM6/14/11
to macs-ann...@googlegroups.com
Hey Jo, not sure why, but you're plugging in something wrong. A
-10log10(pvalue) of about 80 should be about 10^-8 and about 110
should be 10^-11. Not sure where those numbers came from but might
want to try calculating them again....

for example:
(1) P = 10 ^ (-84.82 /10) = 3.2961E-09

Good luck,
Jim

Jo

unread,
Jun 14, 2011, 8:06:48 AM6/14/11
to macs-ann...@googlegroups.com
Hey Jim,

Thanks for point it out..I will try it again and check it...
Btw can you tell me what could be the highest and lowest value of -10log10(pvalue) one can expect in macs output in general? The reason I am asking is because I am looking at dataset which has different level of enrichment in both samples. For one sample I see about almost 2-3 fold enrichment  compare to other at most of the regions. However FDR values are pretty high >20-30% so I want to filter out reads on the basis of  pvlaue and fold enrichment.

Regards
Jo

Sean Davis

unread,
Jun 14, 2011, 8:16:18 AM6/14/11
to macs-ann...@googlegroups.com
Hi, Jo.

We can certainly answer your questions in email, but I think you might
benefit from getting someone local (a colleague in your lab or even
your local math department) to go through this with you.
Alternatively, start up Excel, make a column with p-values of various
values and put in the formula (-10) * log10(pvalue) in a second
column. That will allow you to experiment and begin to answer these
questions yourself. You will have to justify your decisions when you
write up your results, so it pays to understand the analysis yourself.

Sean

Reply all
Reply to author
Forward
0 new messages