Finding significantly fewer peaks with input - how come?

1,232 views
Skip to first unread message

Moshe Olshansky

unread,
Jun 30, 2015, 12:45:09 AM6/30/15
to macs-ann...@googlegroups.com
I am using macs2 to call broad peaks. I am doing this in two different ways:
One way is to use treatment and control (input):
1. call macs2 callpeak with treatment and control and -B option to get treatment_pileup.bdg and control_lambda.bdg
2. then call macs2 bdgcmp -m ppois with these two files to get cmp.bdg file
3. call macs2 bdgbroadcall -i cmp.bdg -c 5 -C 3 -g 200 to find the list of peaks
Another way is exactly as above but there is no control in step 1.

My understanding that the only difference is the control_lambda.bdg which is produced in step 1. Without control it is based on the treatment only while with control it takes the maximum of lambda based on treatment and lambda based on control and so with control the lambda should be at least as large as without control which should produce no more peaks with control than without it (anything significant with control should definitely be significant without control).
However, I get about 5,000 peaks without control and 20,000 peaks with control. Why does this happen? Where am I wrong?

Thank you,
Moshe.

Moshe Olshansky

unread,
Jul 1, 2015, 12:24:12 AM7/1/15
to macs-ann...@googlegroups.com
Let me add that this happens without broad peaks as well and both with macs14 and macs2 - with input (control) I am getting many times more peaks than without it.

Moshe Olshansky

unread,
Jul 1, 2015, 1:07:33 AM7/1/15
to macs-ann...@googlegroups.com
A reasonable explanation could be that when there are several adjacent peaks with input, without input the regions between them also become significant and we end up with one big (broad) peak instead of several smaller ones and hence the difference in the number of peaks. But I checked the peaks and this is not so!


On Tuesday, June 30, 2015 at 2:45:09 PM UTC+10, Moshe Olshansky wrote:

abk

unread,
Jul 1, 2015, 1:12:51 AM7/1/15
to macs-ann...@googlegroups.com
When MACS2 is not provided an explicit control it estimates the background lambdas based on the ChIP-seq data itself. It computes 'local' lambdas in 5K and 10K windows around each base as well as a genome-wide lambda and uses the more conservative of these. So in the absence of a control, for bases within regions of enrichment local lambdas will almost surely dominate the global lambdas and get selected. These ChIP based local lambdas will more than often be larger than equivalent control-based local lambdas at peaks. Hence the p-values will be more conservative/stringent resulting in fewer peaks. So infact not using a control will generally result in a more conservative analysis if you have local lambda turned on. If you turn off local lambda and instead use  --nolambda you should see more peaks potentially closer to what you get with a control. You can check this and report back here if that is infact the case. If not its something else.

-Anshul.

Moshe Olshansky

unread,
Jul 1, 2015, 2:59:07 AM7/1/15
to macs-ann...@googlegroups.com
Hi Anshul,

Thank your for the explanations.
My impression was that when there is a control sample MACS estimates the background exactly as without control, then estimates the background based on (small local) control and then takes the maximum of the two to produce the final background (this is what I understood reading the MACS paper).
But based on the results I am getting you are probably right and if there is a control sample MACS estimates background based on control sample and global background from the treatment (but without local background from the treatment).
Could the MACS developers please comment on this?

Tao Liu

unread,
Jul 9, 2015, 12:32:54 PM7/9/15
to macs-ann...@googlegroups.com
Anshul is right on all his comments. While control is available, background is estimated from control sample, if not, it’s estimated from ChIP sample.

Tao
> --
> You received this message because you are subscribed to the Google Groups "MACS announcement" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to macs-announcem...@googlegroups.com.
> To post to this group, send email to macs-ann...@googlegroups.com.
> Visit this group at http://groups.google.com/group/macs-announcement.
> For more options, visit https://groups.google.com/d/optout.

Moshe Olshansky

unread,
Jul 10, 2015, 4:51:54 AM7/10/15
to macs-ann...@googlegroups.com
Hi Tao,

Thank you for making it clear. So is it correct that if there is a control the background is the maximum of (scaled) control pileup in slocal and 2d windows?

Tao Liu

unread,
Jul 10, 2015, 12:07:22 PM7/10/15
to macs-ann...@googlegroups.com
Hi Moshe,

It’s the max of average scaled control pileup in ‘d’ (—extsize or estimated from fragment size x-correlation), ‘slocal’ (—slocal) and ‘llocal’ (—llocal) windows.

Best,
Tao

Moshe Olshansky

unread,
Jul 12, 2015, 10:40:03 PM7/12/15
to macs-ann...@googlegroups.com
Thank you, Tao.

So if there is an input MACS looks at size d, size slocal and size llocal windows of the input and if there is no input it looks at total background and llocal window of the sample itself. Is this correct?

Tao Liu

unread,
Jul 13, 2015, 11:34:11 AM7/13/15
to macs-ann...@googlegroups.com
Yes!

Tao Liu
Reply all
Reply to author
Forward
0 new messages