Hi Alex,
Thank you very much for implementing the quantmode as part of STAR. It is very useful indeed as it saves quite a bit of time.
I was wondering if you could explain (or reference to another post that i might have overlooked) that explains how the --quantmode handle multi-mapped reads. Since by default STAR assigns "primary" and "secondary" mapping flags for a multi-mapped read, is only the primary location counted and the read itself included in the "N_multimapping" or is it completely discarded and just included in "N_multumapping"?
Also, I need your opinion on what column to use (out of the 3 and 4 - strand based) from this output for downstream analysis. I aligned my sample (paired-end reads) to an ENSEMBL primary assemble reference and used GENCODE based annotation (I adjusted the chromosome names to match reference and GTF). Below is the header of the quantMode genecounts file:
I only asked for "uniquely mapped" reads (--outFilterMultimapNmax
1) and that's why the N_multimapping is 0. I know that this is a strand specific library prep and so i'll be using either column 3 or 4 for DE gene analysis. However, I am not quite sure which one of these to use (I still have to talk to the people who prepped these libraries to see what they did). Based on this output, as you'll notice that in column 3, there are almost 12million reads that could not be assigned to an annotated feature (GENCODE v23) and this makes me speculate that the library prep was "reverse-stranded". In your opinion is this a reasonable approach to make the assumption/speculation or am I overlooking something?
I appreciate any suggestions/comments you might have.
Thanks again for actively developing and maintaining this tool.
Praful