Hello,
I recently performed an alignment with STAR on a single-end, RNA-Seq file from our wet-lab. After generating the gene count table from the resulting SAM file, I noticed same inconsistencies between the unique mapped reads reported by the STAR summary and HTSeq-count program. The file contained about 18 million reads. STAR's alignment summary reported about 64% (~12 million reads) of those reads mapped to unique places. However, when I carried the SAM file through to the HTSeq-count step, I noticed about 13.3 million reads fell under the "alignment not unique" category. How should I interpret this result? Am I misinterpreting the meaning of mapping and aligning?
This is part of a larger project where I'm testing a few different alignment tools (specifically STAR and Bowtie2) and I'm trying to advise my team on which tool to use going forward.
I use the recommended STAR mapping arguments from the manual. For the quantification with HTSeq-count, I used the "union" setting. I'm attaching pictures of my inputs for STAR and results.
Below is how I generated the counts.txt
> htseq-count output.sam /vol/refs/gencode.v19.annotation.ercc.gtf > Counts.txt
Thank you for any insight! This google group is an extremely helpful resource!
Leinal Sejour