Discrepancy between STAR and HTSeq-count on multi-mapping read counts

498 views
Skip to first unread message

John Brothers

unread,
Mar 27, 2015, 10:30:12 AM3/27/15
to rna-...@googlegroups.com
Hi, I have a quick question. I have noticed a discrepancy between the number of reads mapped to multiple loci as reported in the STAR Log.final.out file and the number of reads counted as alignment_not_unique by htseq-count. 

I am guessing that perhaps STAR is counting a single read mapped to multiple locations as 1, whereas the primary and secondary alignments for the same read are being counted by HTSeq for each instance of multiple mapping. I just wanted to double check that this is probably the case. HTSeq counts multimapped reads using the NH:i:X tag. 

For example in one run of STAR, I see 

        Number of reads mapped to multiple loci |       2234538
 
And in HTseq-count, I see:
alignment_not_unique    6481131

HTSeq-count and STAR both count the correct number of uniquely aligned reads, the discrepancy arises just with the multimapped reads.

Thanks!




Alexander Dobin

unread,
Mar 29, 2015, 6:51:33 PM3/29/15
to rna-...@googlegroups.com
Hi John,

I am pretty sure that your guess is correct and HTseq counts the total number of multimapping alignments rather the reads.
You can check it by counting the number of multimapping lines yourself, e.g. with
awk 'substr($1,1,1)!="@" && substr($12,6)>1 {n++} END {print n}' Aligned.out.sam
This should be equal to the HTseq number for "alignment_not_unique".

Cheers
Alex
Reply all
Reply to author
Forward
0 new messages