lncRNA analysis using STAR

376 views
Skip to first unread message

Anand m t

unread,
Jul 13, 2015, 2:45:23 PM7/13/15
to rna-...@googlegroups.com
Hi,
I have around 100 RNA-seq samples from human. Primary aim is to look for linRNAs (novel, unannotated). Previously we have used tuxedo protocol on a small cohort. But the alignment itself takes so long time to complete, we are thinking of using STAR aligner instead.
What are the best settings for detecting novel junctions/boundaries ? My primary guess is to do two pass alignment. But I also see a new one in the manual --twoPassMode Basic. Are there any recommended settings that need to be modified or tuned before I submit the job to cluster ?

Thanks.

Alexander Dobin

unread,
Jul 13, 2015, 5:54:16 PM7/13/15
to rna-...@googlegroups.com, anand...@gmail.com
Hi Anand,

the default setting are generally good for detecting novel junctions, but 2-pass scheme will give a bit more sensitivity.

With 100 samples, the best approach is to collect the junctions from all sample after the 1st mapping, and use them in the 2nd pass.
The latter can be done by re-generating the genome before the 2nd pass, or by inserting these junctions on the fly while doing the 2nd mapping.
Most recent discussion about 2-pass strategies is here:

Some other parameters to think about are attached below.

Cheers
Alex


--outFilterMultimapNmax 20
max number of multiple alignments allowed for a read: if exceeded, the read is considered
unmapped
--alignSJoverhangMin 8
minimum overhang for unannotated junctions
--alignSJDBoverhangMin 1
minimum overhang for annotated junctions
--outFilterMismatchNmax 999
maximum number of mismatches per pair, large number switches o this lter
max number of mismatches per pair relative to read length: for 2x100b, max number of mis-
matches is 0.06*200=8 for the paired read
--alignIntronMin 20
minimum intron length
--alignIntronMax 1000000
maximum intron length
--alignMatesGapMax 1000000
maximum genomic distance between mates

Anand m t

unread,
Jul 20, 2015, 9:33:29 PM7/20/15
to rna-...@googlegroups.com, anand...@gmail.com
Thank you so much for this Alex. I also compared alignments for few samples (around 10) with default settings and above mentioned settings, as you said, the difference was not so significant but overall alignment increased by 1-2% per sample and number of Non-canonical splices also increased by couple of hundreds. Best part is its insanely fast !!

Now I hope cufflinks/stringtie works fine with these.

Thanks,
-Anand.
Reply all
Reply to author
Forward
0 new messages