Is tagAlign format necessary for IDR analysis if using MACS2?

John Urban

unread,

Apr 2, 2013, 1:49:24 PM4/2/13

to idr-d...@googlegroups.com

Hi Anshul,

I have not used the TagAlign format for MACS or MACS2 peak calling in the past. Is that format necessary only for SPP peak calling (I have never used SPP) or is it somehow essential to IDR analysis? Put another way, will there be a problem in the IDR analysis if I call peaks with MACS2 from BAM files?

Many thanks,

John Urban

Daofeng

unread,

Jun 7, 2013, 10:28:53 AM6/7/13

to idr-d...@googlegroups.com

Just suggested by me, if the BAM was produced by BWA, better to use tagAlign containing only unique reads. If the BAM was generated by bowtie which have only unique reads contained, I guess there is not big difference if using BAM directly.

Ian Quigley

unread,

Sep 17, 2013, 6:38:39 PM9/17/13

to idr-d...@googlegroups.com

I should note that the shuf script in IDR won't produce bam or sam files that MACS2 can handle - maybe it's the header? - so to generate pseudoreplicates, I had to convert bams to bed files before shuffling.

Anshul Kundaje

unread,

Sep 17, 2013, 7:49:31 PM9/17/13

to idr-d...@googlegroups.com

The shuf script is a built in linux tool that simply shuffles lines in a text file so it doesnt work with binary BAM files.

You can use other tools that can shuffle and subsample reads from BAM files to generate pseudoreplicates. e.g. MACS2 has a randsample module that can do this with BAM files directly.

TagAligns are simple 6 column BED files that have the main information (i.e. chromosome coordinates and strand information) needed by peak callers so they save a lot of disk space over using BAMs. You can feed them into any peak callers that accept BED files (most that I know of do).

-Anshul.

Alvaro J. González

unread,

Sep 26, 2013, 8:51:58 AM9/26/13

to idr-d...@googlegroups.com

Hi Ian,

I actually use a perl command for shuffling. It goes like the following:

gunzip -c [pooledReps.tagAlign.gz] | perl -MList::Util=shuffle -e 'print shuffle(<STDIN>);' | split -d -l [numbPseudoReads] - [pseudoRepDir/pseudoRep]

which creates two files, pseudoRepDir/pseudoRep00 and pseudoRepDir/pseudoRep01.

I've used it in a variety of systems, and it runs pretty smoothly. Of course, you need perl installed, but this is ubiquitous.