How does STAR genecounts work?

487 views

Skip to first unread message

Billy Lau

unread,

Jan 4, 2016, 11:01:47 AM1/4/16

to rna-star

Hi there, I'm trying to figure out how STAR's genecounts feature works.

Does STAR use any criteria to exclude reads from the genecounts besides only including unique mappings? Eg. anything with low mapping quality or soft clips? In my dataset I have some ERCC spike-ins, so I'm using that as an easy example. When I am using samtools on a sorted by coordinate STAR (2.4.2) alignment and grepping for ERCC mappings, I get substantially more reads than what is reported by the genecounts readspergene file.

When I use -F 256 to get rid of multimappers, the counts are still substantially higher. I'm looking at the mapping qualities and there doesn't seem to be much of an effect.

Thanks again!

Billy

Alexander Dobin

unread,

Jan 4, 2016, 12:35:43 PM1/4/16

to rna-star

Hi Billy,

the only reads that are filtered out for gene counting are multimappers. To exclude them from the SAM file you need to use -q 256.

Note that for paired-end reads STAR count the fragments (i.e. pairs), while samtools counts lines (i.e. mates), so you would need to divide samtools counts by 2 (unless you allow single-end alignments, which makes the counting with samtools more complicated, see my response to the previous post).

If this does not help, please send me a minimal example (BAM file and ReadsPerGene.out.tab) that reproduces this discrepancy.

Cheers

Alex

Reply all

Reply to author

Forward

0 new messages