STAR and HTSeq: unique alignment inconsistency

1,279 views
Skip to first unread message

Leinal Sejour

unread,
Jul 12, 2018, 11:36:34 AM7/12/18
to rna-star
Hello,

I recently performed an alignment with STAR on a single-end, RNA-Seq file from our wet-lab. After generating the gene count table from the resulting SAM file, I noticed same inconsistencies between the unique mapped reads reported by the STAR summary and HTSeq-count program. The file contained about 18 million reads. STAR's alignment summary reported about 64% (~12 million reads) of those reads mapped to unique places. However, when I carried the SAM file through to the HTSeq-count step, I noticed about 13.3 million reads fell under the "alignment not unique" category. How should I interpret this result? Am I misinterpreting the meaning of mapping and aligning?

This is part of a larger project where I'm testing a few different alignment tools (specifically STAR and Bowtie2) and I'm trying to advise my team on which tool to use going forward.

I use the recommended STAR mapping arguments from the manual. For the quantification with HTSeq-count, I used the "union" setting. I'm attaching pictures of my inputs for STAR and results. 

Below is how I generated the counts.txt 

> htseq-count output.sam /vol/refs/gencode.v19.annotation.ercc.gtf > Counts.txt

Thank you for any insight! This google group is an extremely helpful resource!

Leinal Sejour
sample_counts.PNG
Star result.PNG

Dario Strbenac

unread,
Jul 12, 2018, 10:00:11 PM7/12/18
to rna-star
You are conflating the meaning of reads and alignments. A read can have more than one alignment if it maps to more than one place in the genome - they are not the same concept. HTSeq reports the number of alignments in the SAM file which are of reads that mapped to more than one place, whereas STAR reports the number of reads which align to more than one place. I think STAR's reported statistic is more useful that HTSeq's.

Anyway, comparing Bowtie 2 with STAR isn't suitable. Bowtie 2 is intended for DNA sequencing data but STAR is tailored to RNA sequencing data. Also, a comparison of RNA-seq aligners has already been made using simulated data and is published in Nature Methods. In the Discussion section, the researchers make the conclusion that STAR is one of the best aligners for RNA-seq data. 14 alignment software were compared - notice that none of them were Bowtie 2.
Reply all
Reply to author
Forward
0 new messages