Calling DNase-seq peak

4,582 views
Skip to first unread message

Xi Chen

unread,
Jun 12, 2014, 5:08:20 PM6/12/14
to macs-ann...@googlegroups.com
Hello,

I'm trying to use macs2 to call peaks from experiments, like DNase-seq, which investigate hypersensitivity sites in the genome. MACS is designed for calling ChIP-seq peaks, which assumes that reads are flanking the actually binding sites. Therefore, MACS will estimate fragment size, shift reads towards the middle, then call peaks.

However, DNase-seq is not exactly like ChIP-seq. In this kind of experiment the cut sites are of interest, so the location of the reads (actually, the 5 prime of the reads) are the actual sites we are interested in. I'm trying to call hypersensitivity peaks without shifting reads, i.e. just call the peak as "it is".

I passed the option "--nomodel --shiftsize 0" to macs2, but it does not seem to work (0 peaks returned).

Does my interpretation of the DNase-seq makes sense to you? Any suggestion about calling hypersensitivity peaks?

Thank you very much.

Regards,
Xi

Tao Liu

unread,
Jun 12, 2014, 5:52:18 PM6/12/14
to macs-ann...@googlegroups.com
Hi Xi,

First, do not use —shiftsize option, instead, using —extsize. Second, 0 fragment size doesn’t make sense. You can try "—extsize 1”. However, such ‘tag extension’ technic is not only for shifting signal to current binding site, but also for ‘smoothing’ the signals. So my suggestion is, if you are looking for the significant ‘cutting ends’, you can use '--half-ext —nomodel —extsize XX’, e.g. XX=100. In this case, a read will be extended to a 100 bps fragment *centered* around the 5’ cutting end, with a 100bps smoothing window. 

Best,
Tao

--
You received this message because you are subscribed to the Google Groups "MACS announcement" group.
To unsubscribe from this group and stop receiving emails from it, send an email to macs-announcem...@googlegroups.com.
To post to this group, send email to macs-ann...@googlegroups.com.
Visit this group at http://groups.google.com/group/macs-announcement.
For more options, visit https://groups.google.com/d/optout.

abk

unread,
Jun 12, 2014, 6:14:19 PM6/12/14
to macs-ann...@googlegroups.com
We have used MACS2 with DNase and ATAC-seq data and it does infact work very well.

There are 2 types of DNase data (single cut and double cut). For double cut there is a characteristic strand shift > 0 (which you can compute from strand-correlation). It usually in the range of 50 - 80 bp. You can use this strand-shift and MACS2 does just fine. For single-cut DNase and ATAC-seq characteristic strand shift is at 0. Here you can use a simple trick. Shift all + strand reads by 100 bp to the left and - strand reads 100 bp to the right in the alignment file. Then provide this alignment file to MACS2 with --shift-size=75 --nomodel. MACS2 will shift reads back to their original location but also use a appropriately sized window for aggregating reads for peak calling. This works very well and with sub-peaks mode also gives the subtle sub-peaks which are quite relevant for DNase and ATAC-seq.

Probably what Tao suggested does the same thing but directly without explicitly modifying the alignment file. Tao - Is that correct?

-Anshul.

Xi Chen

unread,
Jun 13, 2014, 12:50:19 AM6/13/14
to macs-ann...@googlegroups.com
Thank you for the promt reply, Tao.

Okay, I did not mean 0 bp fragment, and I thought setting shiftsize as 0 will let the reads stay where they are.

Now I will try your suggestion.

Thanks again.

Regards,
Xi

Xi Chen

unread,
Jun 13, 2014, 12:53:31 AM6/13/14
to macs-ann...@googlegroups.com
Thank you for the tips, Anshul.

Good to know that macs2 works well with ATAC seq data.

That's very helpful.


Regards,
Xi

Tao Liu

unread,
Jun 13, 2014, 1:04:35 AM6/13/14
to macs-ann...@googlegroups.com
Anshul,

You are right! My suggestion is for the second type of data where the characteristic strand bias as ChIP-Seq should not be expected. To use '--half-ext --nomodel --extsize' will do basically the same thing as first modifying alignments file then to shift the ends. By the way, to shift ends backwards in 3' to 5' direction for 100bps in alignment files then extend them as 75bps*2 fragment in 5' to 3' direction won't move back the cutting ends to original location -- there will be a 25bps shift. Or do I miss some information? 

-Tao
Message has been deleted

Xi Chen

unread,
Jun 13, 2014, 10:35:21 AM6/13/14
to macs-ann...@googlegroups.com
Hi Tao,

I just checked the documentation about --half-ext option. If I set --extsize as 100, then the --half-ext will let macs only extend 50 bp, presumably from 5' end and to both direction. Is this correct?

For example, if I have a read

chr1    100    136    read1    .    +

the "--extsize 100 --half-ext" will make this read as

chr1    50    150    read1    .    +

Is this correct?

Thank you,

Regards,
Xi

Tao Liu

unread,
Jun 13, 2014, 11:25:41 AM6/13/14
to macs-ann...@googlegroups.com
Let me do more tests… I am sure ‘macs2 pileup -B’ will work in this way, but the function for 'macs2 callpeak’ is not fully tested after a major update. In a quick test minutes ago, I found the result is not as what I expected and different with ‘pileup -B’ function. I am working on it now. Will keep you updated. 

But on the other hand, you can always use Anshul’s method to manipulate alignment files which is more flexible and controllable. 

Tao

abk

unread,
Jun 13, 2014, 11:27:39 AM6/13/14
to macs-ann...@googlegroups.com
Yeah sorry that was a typo I meant --shiftsize=100 not 75.

-Anshul.

Xi Chen

unread,
Jun 13, 2014, 11:30:20 AM6/13/14
to macs-ann...@googlegroups.com
Okay.

Thanks for the prompt reply, Tao.

Xi

Edward Britton

unread,
Jun 13, 2014, 11:36:00 AM6/13/14
to macs-ann...@googlegroups.com

Tao Liu

unread,
Jun 17, 2014, 5:48:14 PM6/17/14
to macs-ann...@googlegroups.com
Hi Xi,

To follow up: a new release 2.1.0 is just uploaded to PyPI site: https://pypi.python.org/pypi/MACS2/2.1.0.20140616

Now you can shift then extend reads using callpeak command to mimic what Anshul did for some DNAse-seq data. For example, if you want to smooth the cutting ends pileup with 200bps window, then:

macs2 callpeak —nomodel —extsize 200 —shift -100 …

MACS2 will move 5’ ends ‘backwards’ with 100bps then extend them ‘forwards’ with 200bps, so that the centers of the extended 200bps regions are at the original cutting sites.

Check the README for more detail. Here is a demo:

$cat test.bed
chr1 1000 1050 . . +
chr1 1950 2000 . . -
chr1 3000 3050 . . -

$macs2 callpeak -t test.bed -n test -B --nomodel --extsize 200 --shift -100

$cat test_treat_pileup.bdg
chr1 0 900 0.00000
chr1 900 1100 1.00000
chr1 1100 1900 0.00000
chr1 1900 2100 1.00000
chr1 2100 2950 0.00000
chr1 2950 3150 1.00000

Let me know if you find any problem. 

Best,
Tao

Xi Chen

unread,
Jun 17, 2014, 5:51:37 PM6/17/14
to macs-ann...@googlegroups.com
Thank you for the prompt update, Tao.

That's exactly what I intend to do. I will try the new release.


Regards,
Xi
To unsubscribe from this group and stop receiving emails from it, send an email to macs-announcement+unsub...@googlegroups.com.

John Urban

unread,
Jun 18, 2014, 1:25:19 PM6/18/14
to macs-ann...@googlegroups.com
Would "--shift 175 --extsize 50" recapitulate the legacy "--shiftsize 175" for 50bp reads?


Tao Liu

unread,
Jun 18, 2014, 1:32:27 PM6/18/14
to macs-ann...@googlegroups.com
No. The ’—shiftsize 175’ is equal to ‘—shift 0 —extsize 350’. ‘Extension’ is always in 5’->3’ direction. ’Shift’ can be any direction depending on the sign of value.

Tao
Message has been deleted

ara...@bu.edu

unread,
Jul 16, 2014, 12:14:13 PM7/16/14
to macs-ann...@googlegroups.com
Hi Tao,

I'm using (macs2 2.1.0.20140616), I'm trying to figure out the correct combination of parameters given I have shifted the read locations as suggested by Anshul to keep reads in their original location.  I prefer to shift the read locations myself since it gives me more control over the read locations and better understanding of the peak calling process.

Let's say I have correctly shifted my (+) strand reads to the left by 100bp and shifted my (-) strand reads to the right by 100bp.

Is it correct to use (--nomodel), (--shift 100), and leave (--extsize) to be default (200)?

In the macs2 callpeak help message it says:
--shift SHIFT         (NOT the legacy --shiftsize option!)

I want to make sure I'm using the parameters correctly.

Thanks,
Andy

Xi Chen

unread,
Jul 16, 2014, 2:21:43 PM7/16/14
to macs-ann...@googlegroups.com
Hi Andy,

My understanding of your peak calling setting is the same as using the un-shifted mapped reads with --shift 0 --extszie 200.

Say, you original reads are:

chr1    500    550    read1    .    +
chr1    700    750    read2    .    -

Then you shift the reads by yourself by 100bp, which yields:

chr1    400    450    read1    .    +
chr1    800    850    read2    .    -

Then you call peak with --shift 100 --extsize 200, which yields:

chr1    500    700    read1    .    +
chr1    550    750    read2    .    -

These reads will be used to smooth window and call peaks (which is equivalent of passing original un-shifted reads to macs2 with --shift 0 --extsize 200 options). I suppose it is okay, but your peak won't be centred on DNase cutting end.

Correct me if I'm wrong.

Regards,
Xi

abk

unread,
Jul 16, 2014, 2:31:29 PM7/16/14
to macs-ann...@googlegroups.com

On Wednesday, July 16, 2014 9:14:13 AM UTC-7, ara...@bu.edu wrote:
Hi Tao,

I'm using (macs2 2.1.0.20140616), I'm trying to figure out the correct combination of parameters given I have shifted the read locations as suggested by Anshul to keep reads in their original location.  I prefer to shift the read locations myself since it gives me more control over the read locations and better understanding of the peak calling process.

Let's say I have correctly shifted my (+) strand reads to the left by 100bp and shifted my (-) strand reads to the right by 100bp.

Is it correct to use (--nomodel), (--shift 100), and leave (--extsize) to be default (200)?

What you want is the following.

If you have pre-shifted reads then use --nomodel --shift 0 --extsize 200

If you are using original reads then use --nomodel --shift -100 --extsize 200

-Anshul.

ara...@bu.edu

unread,
Jul 16, 2014, 5:16:56 PM7/16/14
to macs-ann...@googlegroups.com
Thank you Xi and Anshul for your responses.  I'm going to piece together your responses in a way that makes sense to me.  Continuing from the useful toy example:

Original Reads:
chr1    500    550    read1    .    (+)
chr1    700    750    read2    .    (-)

The next question: what should be the input read coordinates for MACS2 DHS peak calling if we want precise DHS peak summits (centered on DNase cutting)? 
For (+) strand reads the cut site (5' end) is the "start" position - in the example above position 500 is the cut site for read1.
For (-) strand reads the cut site (5' end) is the "end" position - in the example above position 750 is the cut site for read2.
We've decided to use (--extsize 200), so our input read coordinates for MACS2 should be:

Input read coordinates for MACS2:
chr1    400    600    read1    .    (+)
chr1    650    850    read2    .    (-)

Why? Because we want position 500 to be the center for read1 and position 750 to be the center for read2.

I agree with Anshul's response based on the following toy example arithmetic:

If you have pre-shifted reads then use --nomodel --shift 0 --extsize 200

Original Reads:
chr1    500    550    read1    .    (+)
chr1    700    750    read2    .    (-)

Shift the reads yourself by 100bp, yields:
chr1    400    450    read1    .    (+)
chr1    800    850    read2    .    (-)

Then use callpeak with ( --nomodel --shift 0 --extsize 200), yields:
chr1    400    600    read1    .    (+)
chr1    650    850    read2    .    (-)
(Correct Input read coordinates for MACS2!)

If you are using original reads then use --nomodel --shift -100 --extsize 200

Original Reads:
chr1    500    550    read1    .    (+)
chr1    700    750    read2    .    (-)

(--shift -100) Note the negative sign (When this value is negative, ends will be
moved toward 3'->5' direction.) (So (+) reads move to the left and (-) reads move to the right), yields:
chr1    400    450    read1    .    (+)
chr1    800    850    read2    .    (-)

(--extsize 200), yields:
chr1    400    600    read1    .    (+)
chr1    650    850    read2    .    (-)
(Correct Input read coordinates for MACS2!)

To summarize, since I have already shifted my reads, I will use:

If you have pre-shifted reads then use --nomodel --shift 0 --extsize 200

Please correct me if I'm wrong - maybe this helps others with understanding this issue.

Andy

Reply all
Reply to author
Forward
0 new messages