About STAR 2-pass alignment

4,576 views
Skip to first unread message

Maoxiang Qian

unread,
Oct 13, 2014, 10:40:40 AM10/13/14
to rna-...@googlegroups.com
GATK Best Practices for RNA-seq suggests using the STAR 2-pass alignment steps to do the mapping. It sounds very good. But in operation, maybe there are some issues. As shown in the methods, for every sample, the genome index should be generated twice, which is time-consuming for big genome such as human. Moreover, it seems that we can not do alignment for two samples at same time with the same genome path, right? Should we copy a genome.fa for every sample in an individual place? Seems crazy for a big project.

Any suggestion?  Many thanks.

Alexander Dobin

unread,
Oct 14, 2014, 4:49:38 PM10/14/14
to rna-...@googlegroups.com
Hi Maoxiang,

there are two way to make the 2-pass STAR alignments efficient:
1. Run 1st pass STAR for all samples, and collect all junctions detected in the first pass, filter them and use them to generate the new genome indices for the 2nd pass only once for all samples. Then run the 2-nd pass for all samples with this single genome index. Some relevant posts:

2. With the latest versions it is possible to run STAR 2-pass without re-generating the genome - you need to set --twopass1readsN to the number of reads you want mapped in the 1st pass (the best approach is to set it > number of reads so that all reads are used in the 1st pass). This option will start mapping to a genome without known junctions, extract junctions from the 1st pass, insert them into the genome index, and re-map all reads in the 2nd pass. You need to specify --sjdbOverhang parameter at the mapping stage.

Note that I personally prefer the 1st approach, since it allows for more uniform detection of novel splicing across the samples.

Cheers
Alex

Sean Boyle

unread,
Jul 6, 2015, 7:31:53 PM7/6/15
to rna-...@googlegroups.com
Hello Alex,

Sorry to open this old thread again, but I have a question related to this topic. 

Will the results will be exactly or close to the same if I ran both:
1)  a traditional multi step 2-pass alignment 
2) the simpler and faster single step 2-pass alignment with with --twopassMode Basic and -twopass1readsN set to -1.

Each sample will be independent and will be run and will following the additional GATK best practices for RNA.

Thank you,
Sean

Alexander Dobin

unread,
Jul 8, 2015, 11:42:57 AM7/8/15
to rna-...@googlegroups.com, sean.mich...@gmail.com
Hi Sean,

the results should be exactly the same in most cases, apart from ordering of the reads in the BAM file, providing that you are using the same version of STAR/
There might be very tiny differences in some exceptional cases.

Cheers
Alex

Sean Boyle

unread,
Jul 8, 2015, 2:22:16 PM7/8/15
to rna-...@googlegroups.com, sean.mich...@gmail.com
Thank you, Alex!  That is great to know.
Reply all
Reply to author
Forward
0 new messages