flagstat data on STAR output

1,231 views
Skip to first unread message

ddmoo...@gmail.com

unread,
Jun 20, 2013, 1:09:10 PM6/20/13
to rna-...@googlegroups.com
Hi All,

When I run the Samtools flagstat command on my STAR output (converted to .bam), I get some results that I think are strange:

0 + 0 duplicates
107424988 + 0 mapped (100.00%:nan%)
107424988 + 0 paired in sequencing
53712494 + 0 read1
53712494 + 0 read2
107424988 + 0 properly paired (100.00%:nan%)
107424988 + 0 with itself and mate mapped
0 + 0 singletons (0.00%:nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

These results just seem too perfect, particularly the 100% properly paired number and zero singletons.

For reference, here is the flagstat output for a tophat alignment run on the same fastq files:

0 + 0 duplicates
94601845 + 0 mapped (100.00%:nan%)
94601845 + 0 paired in sequencing
47745095 + 0 read1
46856750 + 0 read2
80231952 + 0 properly paired (84.81%:nan%)
90974892 + 0 with itself and mate mapped
3626953 + 0 singletons (3.83%:nan%)
772436 + 0 with mate mapped to a different chr
278930 + 0 with mate mapped to a different chr (mapQ>=5)

So, while I love the increased mapped numbers in STAR, it seems unlikely that any aligner can produce such perfect results.  Has anyone else encountered this?  Is there a simple explanation for it, perhaps that STAR flags misreads differently or something?

Thanks for your time,

Dalton

Shawn Driscoll

unread,
Jun 20, 2013, 8:22:18 PM6/20/13
to rna-...@googlegroups.com
Neither of those aligners, by default at least, put unaligned reads in the SAM/BAM alignments.  As a result 'flagstat' will always report 100% alignment. 

For STAR I use the '--outSAMunmapped Within' option which puts the unmapped reads into the SAM output which makes it so flagstat will report some unaligned reads. 

For tophat you can merge the 'unmapped.bam' alignments into the 'accepted_hits.bam' file and get accurate percentages from flagstat with the following list of commands:
samtools view -bF 0x100 accepted_hits.bam > primary_hits.bam
samtools merge merged.bam primary_hits.bam unmapped.bam
samtools sort -n merged.bam merged.nsort
samtools fixmate merged.nsort.bam final.nsort.bam
samtools flagstat final.nsort.bam

Alexander Dobin

unread,
Jun 21, 2013, 8:47:12 AM6/21/13
to rna-...@googlegroups.com
Hi Dalton,

STAR - by default - only outputs correctly paired alignments. Both single-end and non-concordant paired are considered unmapped. The latter can be output in the separate Chimeric.out.sam file if you switch on chimeric detection. You can allow un-paired alignments in the Aligned.out.sam file by reducing --outFilterMatchNminOverLread and --outFilterScoreMinOverLread to below 0.5
By default these parameters are equal to 0.66, i.e. if either the number of matched bases OR the alignment score (which is number of mapped bases - penalties) is < 66% of the read length (which is the sum of the lengths for both mates), the alignment will not be output and will be reported as "too short".

I generally do not recommend using un-paired alignments - in my experience, they contain a large % of false positives. 

Cheers
Alex
Reply all
Reply to author
Forward
0 new messages