Primary alignment and --outSAMmultNmax flag

Kirill Tsyganov

unread,

Feb 24, 2016, 11:26:03 PM2/24/16

to rna-star

Hi Alex,

I'd like to clarify a couple of points about primary alignment and how STAR records them into SAM/BAM file.

STAR will try to map a read a few times (few times set by --outFilterMultimapNmax [10]) and if the read maps more than this amount of time the read will be marked as unmapped with NH set to zero and HI also set to zero. Is this the case?

And if the read mapped less than the set value e.g 9 times NH will be set to 9 - NH:i:9 and HI tag will mark each read incrementally. How does STAR increments HI ? simply by the order of attempted locations ?

And if I don't want to have all of the attempted (mapped) locations for a single read in my SAM/BAM file I can use --outSAMmultNmax flag to specify the number of attempts to record in the SAM/BAM file, e.g if I still have a read mapped 9 time - NH:i:9, but I set --outSAMmultNmax 1 which one of the 9 attempted locations will STAR output into SAM/BAM file..?

There are a couple of post about this on this forum and I think two of them conflict..

In this one https://groups.google.com/forum/#!searchin/rna-star/STAR$20and$20alignment$20score/rna-star/dUKvviBixTQ/esI8pvPpiHwJ you said that you can't make STAR output only primary aligned reads

And in this one https://groups.google.com/forum/#!searchin/rna-star/%22primary$20alignment%22/rna-star/m6uCkgFahcI/MqnwQRyBGAAJ you are suggesting that using --outSAMmultNmax 1 should only output primary alignments ..

Am I miss reading those posts or it the case in latest STAR versions that if --outSAMmultNmax 1 STAR will only output primary alignment ?

Just to reiterate that primary alignment doesn't have to have HI:i:1 ? (this is why I was asking about how does STAR sets HI flag. So that if the best alignment is HI:i:2 and you set --outSAMmultNmax 1 STAR would not output read with HI:i:1 flag right ?)

Many thanks in advance

Kirill

Alexander Dobin

unread,

Feb 25, 2016, 12:58:55 PM2/25/16

to rna-...@googlegroups.com

Hi Kirill,

the default behavior is as follows:

STAR outputs the multiple alignments in a "quasi-random" order, that depends on the sorting of suffix array indexes. This order is fully reproducible from run to run provided you use the same genome indexes.

However, it's not truly random, so selecting only one of the alignments will introduce significant biases in the genome coverage.

The primary alignment is selected as the first alignment with the top score, so it's not guaranteed to be the first (HI:i:1) alignment.

Since 2.5.0b, I have introduced --outMultimapperOrder Random option to output multiple alignments in random order, and --outSAMmultNmax parameter to limit the number of output alignments for multimappers.

If you use either of these parameters, first of all the top scoring alignments are brought to the top of the output list, i.e. they will have HI:i:1,2...

Next, if you use --outMultimapperOrder Random the top scoring alignments will be shuffled, and separately all the other (poorer) alignments will be shuffled, so their order is truly random.

Next, the very first alignment (i.e. HI:i:1) will be primary, and all others marked as secondary.

Finally, only --outSAMmultNmax alignments from this list will be output.

If you use --outMultimapperOrder Random --outSAMmultNmax 1, you will get only one alignment in the output, randomly chosen from the top scoring alignments. This is the behavior most commonly sought after.

Parameter --runRNGseed can be used to set the random generator seed. However, even with the same seed, the ordering of multi-mapping alignments of each read, and the choice of the primary alignment will vary from run to run, unless only one thread is used.

Hope this makes it clearer, please let me know if you have further questions.

Cheers

Alex

Reply all

Reply to author

Forward

Message has been deleted