How does STAR calculate mapping statistics?

2,311 views
Skip to first unread message

Billy Lau

unread,
Sep 10, 2015, 10:51:39 PM9/10/15
to rna-star
I'm trying to figure out how STAR generates the mapping statistics in the .Log.final.out file, by trying to use standard commands such as samtools to count. But I'm having trouble getting the exact number that STAR reports.

First, samtools view -F 256 gets basically all read pairs (which is presumably what STAR is reporting). What is the criteria for "Uniquely mapped" and "Multimapped to multiple loci"? Does it require that both reads are mapped, or can one of them be unmapped? I've done samtools view -F 260 -F 264, but this gives me a number that is substantially lower than the "Uniquely mapped" + "Multiple loci" counts (excluding the too many loci row).

One way I've thought about doing it is to first filter out all the unmapped read pairs and secondary mappings, and then filter by the NH tag with grep. Would that work?

Alexander Dobin

unread,
Sep 15, 2015, 12:40:06 PM9/15/15
to rna-star
Hi Billy,

with default parameters, STAR will output only correctly paired alignments, i.e. two SAM lines per alignment.
For multi-mappers, there will >=2 alignments per read. You can extract just the unique mappers from SAM by filtering on MAPQ (=255 for unique, <255 for multi-mappers):
$ samtools view -c -q 255 - this is equal to the number of unique reads * 2.
Only one pair per multi-mappers is marked as a primary alignment, so to get the (unique+multiple) number use
$ samtools view -c -F 0x100 

Cheers
Alex

Martina

unread,
Dec 23, 2016, 12:26:20 PM12/23/16
to rna-star

Hi,

 

I’m using STAR 2.5.0a with default parameters and I found that the number of unique mappers in the Log.final.out file doesn’t match the number I get when running 

$ samtools view -c -q 255 * 2.

The difference is only a couple of thousands but I wondered what might be the causing it...

 

Thanks,

Martina

Alexander Dobin

unread,
Jan 3, 2017, 12:03:13 AM1/3/17
to rna-star
Hi Martina,

this difference is owing to the single-end alignments. You need to count them separately:
Paired aligns:
N12=samtools view -q255 -c -f 0x2 Aligned.out.bam
End 1 aligns:
N1=samtools view -q255 -c -F 0x2 -f 0x40 Aligned.out.bam
End 2 aligns:
N2=samtools view -q255 -c -F 0x2 -f 0x80 Aligned.out.bam

The total number of uniquely aligned read is then:
N12/2+N1+N2 which should exactly agree with the number in the Log.final.out

Cheers
Alex
Reply all
Reply to author
Forward
0 new messages