STAR GeneCounts values do not sum up

90 views
Skip to first unread message

Daniel Gerlach

unread,
Aug 24, 2016, 10:19:02 AM8/24/16
to rna-star
Dear Community,

I am using STAR/2.5.1b to map single-end 50bp QuantSeq FWD (Lexogen) reads to the GRCh38 genome (no alts). Using the ENCODE parameters plus parameter '--quantMode GeneCounts' I noticed two things: (1) Strand-specific reads counts are exactly the same for all genes for STAR and featureCounts (which is suspicious as I though STAR implements the htseq-count method, and htseq-count produces slightly different values than featureCounts in the default mode (more details in the respective publication)) (2) Column two is not always the sum of column three and four: e.g., good: ENSG00000210082 113565 113561    4, e.g. bad ENSG00000111640     18  40787    0.

Thanks, Daniel

Alexander Dobin

unread,
Aug 25, 2016, 6:36:50 PM8/25/16
to rna-star
Hi Daniel

1. When I was comparing STAR vs htseq-count vs featurCounts, STAR would agree exactly with htseq-count in all cases, and both of them will agree with featureCounts for single-end reads. The disagreement with featureCounts occurred for paired-end read only, since featureCounts resolves ambiguous reads if one of the mates overlaps one gene only.

2. If you have reads that map to genes that overlap on the opposite strands, the unstranded count will count them as ambiguous, while the stranded counting will assign them to separate genes.
Hence the unstranded count is <= sum of stranded.

Cheers
Alex
Reply all
Reply to author
Forward
0 new messages