--quantmode GeneCounts multi-mapped and strand information

praful aggarwal

unread,

Oct 5, 2015, 11:15:10 AM10/5/15

to rna-star

Hi Alex,

Thank you very much for implementing the quantmode as part of STAR. It is very useful indeed as it saves quite a bit of time.

I was wondering if you could explain (or reference to another post that i might have overlooked) that explains how the --quantmode handle multi-mapped reads. Since by default STAR assigns "primary" and "secondary" mapping flags for a multi-mapped read, is only the primary location counted and the read itself included in the "N_multimapping" or is it completely discarded and just included in "N_multumapping"?

Also, I need your opinion on what column to use (out of the 3 and 4 - strand based) from this output for downstream analysis. I aligned my sample (paired-end reads) to an ENSEMBL primary assemble reference and used GENCODE based annotation (I adjusted the chromosome names to match reference and GTF). Below is the header of the quantMode genecounts file:

I only asked for "uniquely mapped" reads (--outFilterMultimapNmax 1) and that's why the N_multimapping is 0. I know that this is a strand specific library prep and so i'll be using either column 3 or 4 for DE gene analysis. However, I am not quite sure which one of these to use (I still have to talk to the people who prepped these libraries to see what they did). Based on this output, as you'll notice that in column 3, there are almost 12million reads that could not be assigned to an annotated feature (GENCODE v23) and this makes me speculate that the library prep was "reverse-stranded". In your opinion is this a reasonable approach to make the assumption/speculation or am I overlooking something?

I appreciate any suggestions/comments you might have.

Thanks again for actively developing and maintaining this tool.

Praful

Alexander Dobin

unread,

Oct 6, 2015, 2:48:34 PM10/6/15

to rna-star

Hi Praful,

STAR does not count any alignments from the multimapping reads, and includes them all into the N_multimapping line.

Your approach to determine the strandedness of the library is correct, the stranded column (3 or 4) with the lowest N_noFeature count corresponds to the correct strand option.

A more quantitative way to do it is to calculate the number of reads that were assigned to the genes as N_genic=TotalReads-N_unmapped-N_multimapping-N_noFeature-N_ambiguous for columns 3 and 4,

(or just sum all the genic values in the file). The higher value for N_genic will point to the correct strand, and the ratio will be upper bound for the strand error in the library.

Cheers

Alex

praful aggarwal

unread,

Oct 6, 2015, 2:55:51 PM10/6/15

to rna-star

Hi Alex,

Thank you for explaining this.

Praful

Reply all

Reply to author

Forward