difference between bampe and bam for macs2 callpeak

Chris Rhodes

unread,

Aug 27, 2014, 12:12:53 PM8/27/14

to macs-ann...@googlegroups.com

I need some quick clarification about –f BAMPE vs. –f BAM in mac2 callpeak. Does the bampe use a specific bampe format that is fundamentally different from bam? As an example, do –f bam and –f bampe formats differ in a similar manner to standard BED vs. the BEDPE that bedtools defined? Or does the –f bampe option search for both paired-end fragments based on various flags (such as those defined in SAM)? From a number of posts I have seen on this discussion group, the –f bampe option apparently does the latter, and uses flags to associate 2 reads contained in a single bam file to a peak. Am I interpreting this correctly?

I have tried both –f bam and –f bampe on my paired-end chipSeq data. Both work fine, and I really like the –f bampe option. I just want to verify I am doing the peak calls correctly

Thanks in advance for any help and input!

Chris

Here are snippets some of the earlier discussions that make me believe the –f bampe uses standard bam format for paired-end data, as opposed to requiring a special bampe format:

BAMPE in MACS2, original post 10/2/13

“# Command line: callpeak -t /media/Data/Storage/FILE.bam --format BAMPE”

Inquiry about Pair-end ChIP-seq data mapping:, original post 5/21/12

“To use pair-end support, try command like: macs2 callpeak -f BAMPE ...”

Tao Liu

unread,

Sep 2, 2014, 2:24:52 PM9/2/14

to macs-ann...@googlegroups.com

Chris,

In BAM mode, only the 5’ end of fragment will be recorded. In BAMPE mode, the 5’ end plus the observed template length will both be recorded so in later analysis, MACS2 piles up the actual entire observed fragment/template instead of estimating a fixed DNA fragment length. Technically, MACS2 will pick only the alignment with flag indicating it’s the first segment in template and both ends have been properly aligned, then read the TLEN value then take the absolute value.

Best,

Tao

--
You received this message because you are subscribed to the Google Groups "MACS announcement" group.
To unsubscribe from this group and stop receiving emails from it, send an email to macs-announcem...@googlegroups.com.
To post to this group, send email to macs-ann...@googlegroups.com.
Visit this group at http://groups.google.com/group/macs-announcement.
For more options, visit https://groups.google.com/d/optout.

Mark Farman

unread,

Dec 12, 2014, 8:38:29 AM12/12/14

to macs-ann...@googlegroups.com

So if I understand this answer correctly, does this mean that it is not necessary to do anything special (such as removing unpaired, interleaving pairs) when mapping paired-end reads for MACS2 analysis? Thanks.

Tao Liu

unread,

Dec 26, 2014, 4:58:35 PM12/26/14

to macs-ann...@googlegroups.com

It depends on the flag in BAM. Anything NOT marked as ‘proper pair’ will be discarded by MACS2.

Tao

hash

unread,

May 21, 2015, 7:24:11 AM5/21/15

to macs-ann...@googlegroups.com

Hi Tao,

I have a set of BAM files which I have pre-filtered for duplicate reads, singleton reads and to include only proper pairs. I have run MACS2 macs-2.1.0_20150420 on the position sorted and name sorted bam files separately with exactly the same parameters:

macs-2.1.0_20150420/bin/macs2 callpeak --verbose=2 --treatment=<TREAT_BAM> --format=BAMPE --gsize=hs --outdir=<OUTDIR> --name=<NAME> --control=<CONTROL_BAM> --keep-dup=all --qvalue=0.05 --broad --broad-cutoff=0.1 --cutoff-analysis

To get a rough idea I have done a "wc" on the xls file for each sample. You can see that in some cases the number of peaks are the same and in others I get upto several thousand more!

posSorted	nameSorted
18804	18804
32194	34648
9011	9011
51321	58742
25191	27453
41777	44672

If you are just using the first read from each fragment for proper pairs then am I right in saying that these should be the same? If not can you please advise as to which "sort" of bam files you would recommend to use based on your previous testing.

Many Thanks,

Harshil

Drew Hughes

unread,

Aug 11, 2015, 11:06:47 AM8/11/15

to MACS announcement

Hi Tao,

Can you clarify how duplicates are handled when -f BAMPE is specified? You state here that the observed fragment (both ends) are used to generate the pileup. But are both ends (i.e., the 5' end + the true insert length) used to flag and remove duplicates? I.e., will a read recorded as (5'=chr1:29739327, len=120) be deemed a duplicate of (5'=chr1:29739327, len=140)?

Thanks!

Tao Liu

unread,

Aug 11, 2015, 2:58:09 PM8/11/15

to macs-ann...@googlegroups.com

Hi Drew,

The duplicated pairs should have the SAME leftmost AND rightmost ends. BTW MACS2 won’t take the record in BAM file if the flag is:

4 segment unmapped
256 secondary alignment
512 not passing QC
1024 PCR or optical duplicate
2048 supplementary alignment

and require flag 1 (paired) and further discard records with flag:

8 next segment not mapped
128 the last one of the pairs

and require flag 2 (each segments/ends mapped)

So essentially, it only takes the leftmost read with ‘correct’ flags then records the observed template length in order to recover the whole fragment.

Best,
Tao Liu

Fan Li

unread,

Aug 24, 2015, 10:21:15 AM8/24/15

to MACS announcement

I have the same issue - it appears that macs2 clearly treats coord-sorted and name-sorted BAM files differently. But Tao perhaps you can answer quickly for us end users - which one should we use?