STAR for ChIP-seq data

4,551 views
Skip to first unread message

ShengLin Mei

unread,
Apr 23, 2014, 3:21:44 PM4/23/14
to rna-...@googlegroups.com
Hi Alex,

STAR is super faster than bowtie and BWA, I want to know weather it can be used in ChIP-seq data anaysis. 
I have done some mapping result comparison about bowtie, BWA and STAR. if only considering unique mapped reads(set mismatch for 2 ), about 90% reads mapped into the same genome position.  Even for bowtie and BWA, overlap(Genome position) of mapping reads is 94%.  So I plan to use STAR for ChIP-seq data analysis,  Do you think it's appropriate if i just keep unique mapped reads and remove spliced reads.  And for removing spliced reads, do you have any suggestion. Thanks!


Best Wishes
Shenglin

Shawn Driscoll

unread,
Apr 24, 2014, 1:23:08 AM4/24/14
to rna-...@googlegroups.com
You can disable spliced alignments with --alignIntronMax 1 (setting the max intron to a value less than the minimum intron). I agree that you should discard non-unique alignments unless you have some compelling reason to try to use them. 

Alex has mentioned in the past that the logic of DNA alignment is slightly different than for RNA alignment - something about the rules of allowing gaps. If you're not allowing gaps and only going with low-mismatch alignments then it probably doesn't matter. It's likely that the 4% that BWA or bowtie can align past STAR's 90% are reads with complicated INDELs. BWA is particularly good at finding those alignments.

-shawn

Alexander Dobin

unread,
Apr 24, 2014, 4:11:25 PM4/24/14
to rna-...@googlegroups.com
Hi ShengLin, 

in addition to Shawn's spot-on suggestion to prohibit splicing with --alignIntronMax 1, you can try to enforce "end-to-end" alignment with --alignEndsType EndToEnd . This will prohibit soft-clipping of the reads and is beneficial if you have short (<50b) reads; it will make alignments more similar to bowtie1/BWA. If your reads are shorted than 50b, you can also increase sensitivity with --seedSearchStartLmax 30 at the cost of mild mapping speed decrease.

Cheers
Alex

ShengLin Mei

unread,
Apr 25, 2014, 10:55:11 AM4/25/14
to rna-...@googlegroups.com
Thanks!

Best Wishes

在 2014年4月24日星期四UTC-4下午4时11分25秒,Alexander Dobin写道:

ShengLin Mei

unread,
Apr 25, 2014, 11:57:54 AM4/25/14
to rna-...@googlegroups.com
Hi, Alex,

The parameter of "--alignEndsType EndToEnd"  is not in version 2.3.0.  For  version 2.3.1, I encounter an installation  error .  Do you know how to solve this problem ?

"cc1plus: error: unrecognized command line option "-std=c++0x"'

make: *** [Depend.list] Error 1

./STAR

./STAR: /usr/lib64/libz.so.1: no version information available (required by ./STAR)

./STAR: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.11' not found (required by ./STAR)

./STAR: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.9' not found (required by ./STAR)


 

Best

ShengLin



在 2014年4月24日星期四UTC-4下午4时11分25秒,Alexander Dobin写道:

Alexander Dobin

unread,
Apr 25, 2014, 3:34:11 PM4/25/14
to rna-...@googlegroups.com
Hi ShengLin,

the first - compilation - error means that the gcc is out of date, and the second means the libstdc++ and zlib are not up to date - you would need to ask you sys-admin to update them.
In the meantime you can try to use the static executable, STARstatic - it does not require external libraries.

Cheers
Alex

Ian D

unread,
May 15, 2014, 5:24:42 AM5/15/14
to rna-...@googlegroups.com
I have 2.3.1z4 installed, but it does not recognise '-alignEndsType EndToEnd':

"EXITING: FATAL INPUT ERROR: unrecoginzed parameter name "alignEndsType" in input "Command-Line-Initial"
SOLUTION: use correct parameter name (check the manual)
EXITING: FATAL INPUT ERROR: unrecoginzed parameter name "alignEndsType" in input "Command-Line-Initial"
SOLUTION: use correct parameter name (check the manual)"

Is the command in that version, or written wrong, or am i doing something wrong?

Thanks,
Ian

Ian D

unread,
May 15, 2014, 7:04:12 AM5/15/14
to rna-...@googlegroups.com
Another observation on using this mapper for ChIP-seq analysis.  I used '--alignIntronMax 1' as suggested, but could not get '--alignEndsType EndToEnd' to be recognised by 2.3.1z4.  The reads are 101bp paired-end.  The mapping looked good in terms of the number of paired unique reads.  However, MACS2 reported a very large mean fragment size of about 730bp, compared to that of about 200bp with bowtie2.  MACS2 was run using BAMPE so fragment sizes were taken directly from the input BAM file.  After looking at the paired reads with IGV it was clear that many of the pairs with large inserts contained soft-clipped reads, often half their length. 

So my question is how to prevent soft-clipping of reads.  I realise it is possible to use --alignMatesGapMax, but soft clipping are probably also an issue in shorter fragments.  I would be very interest for your opinion and more importantly whether I am using STAR incorrectly for PE ChIP-seq data.

Thanks!
Ian

Alexander Dobin

unread,
May 19, 2014, 3:54:04 PM5/19/14
to rna-...@googlegroups.com
Hi Ian,

I have tested 2.3.1z4 with --alignEndsType EndToEnd, and seems to work fine.
Could you please send me the Log.out of the failed run?

Cheers
Alex

Ian D

unread,
May 20, 2014, 9:12:26 AM5/20/14
to rna-...@googlegroups.com
My mistake!  My script was loading the wrong version of STAR.  It has now finished running so I can check out the improvement.

Thanks,
Ian
Reply all
Reply to author
Forward
0 new messages