Salmon: (raw) count of reads per gene

942 views
Skip to first unread message

Dan

unread,
May 11, 2015, 8:35:03 AM5/11/15
to sailfis...@googlegroups.com
Hi!


If one would like to do DEG analysis using edgeR/DESeq then counts of reads per gene are needed. According to this http://salmon.readthedocs.org/en/latest/salmon.html#output it looks like Salmon give counts of reads per transcript.

Is ok to sum the counts of reads (column NumReads ) of all transcripts of the same gene and use them further for edgeR/DESeq?

Best,
Dan

Rob

unread,
May 11, 2015, 7:56:22 PM5/11/15
to sailfis...@googlegroups.com, daniel....@gmail.com
Hi Dan,

  There is a short answer and a long answer to your question.  The short answer is that summing the transcript-level counts to the gene level should give you results at least as accurate as (and likely, significantly more accurate than) other approaches for preparing counts for downstream DE analysis.

The longer answer is that there is no truly correct way to do gene-level differential expression analysis, as the question isn't particularly well-formed.  Specifically, the transcripts are the things truly being expressed in the cell, and asking for gene-level differential expression is like asking for differences in potentially complex mixtures by looking at a single number (the population-average, gene-level count).  There are many different ways you might approach the question (e.g. calling a gene differentially expressed if at least one of its isoforms is DE).  Certainly, doing DE on the aggregated counts obtained via Salmon is at least as good as doing DE on the aggregated counts obtained from some other tool (or just the raw counts of reads mapped to the gene), but we're currently working on a project that will try to convince people to avoid this common but difficulty-to-interpret type of analysis in the future.  If you want more details, I'd be happy to provide them (and/or) you could join the gitter page for the project linked above!

Best,
Rob



On Monday, May 11, 2015 at 8:35:03 AM UTC-4, Dan wrote:
Hi!


If one would like to do DEG analysis using edgeR/DESeq then counts of reads per gene are needed. According to this http://salmon.readthedocs.org/en/latest/salmon.html#output - private it looks like Salmon give counts of reads per transcript.

Dan

unread,
May 13, 2015, 12:50:38 PM5/13/15
to sailfis...@googlegroups.com, daniel....@gmail.com
Hi Rob!

Thanks for your short and long answers!


On Tuesday, May 12, 2015 at 2:56:22 AM UTC+3, Rob wrote:
Hi Dan,

  There is a short answer and a long answer to your question.  The short answer is that summing the transcript-level counts to the gene level should give you results at least as accurate as (and likely, significantly more accurate than) other approaches for preparing counts for downstream DE analysis.

Ok. Thanks!
 

The longer answer is that there is no truly correct way to do gene-level differential expression analysis, as the question isn't particularly well-formed.  Specifically, the transcripts are the things truly being expressed in the cell, and asking for gene-level differential expression is like asking for differences in potentially complex mixtures by looking at a single number (the population-average, gene-level count).  There are many different ways you might approach the question (e.g. calling a gene differentially expressed if at least one of its isoforms is DE).  Certainly, doing DE on the aggregated counts obtained via Salmon is at least as good as doing DE on the aggregated counts obtained from some other tool (or just the raw counts of reads mapped to the gene), but we're currently working on a project that will try to convince people to avoid this common but difficulty-to-interpret type of analysis in the future.  If you want more details, I'd be happy to provide them (and/or) you could join the gitter page for the project linked above!

Actually this is wrong and right in the same time!

First of all, I guess that here one refers to "messenger RNA-seq data" where many genes may have simultaneously several transcripts (that is mRNA). For this case I agree with you that theoretically is better to do the differentially expressed (DE) analysis with transcripts counts than gene counts BUT the big challenge is that today there not really "mature/serious" tool/packages which are specifically designed to do DE analysis using transcript counts. 

Second of all, there are microRNAs (a.k.a miRNA and miRNA-seq) where one has one and only one microRNA per gene (and no alternative splicing). Therefore here the above statement is not really true because doing DE using gene counts or transcript counts is exactly the same! For example, see here for more info regarding miRNA-seq and DE: http://www.translationalres.com/article/S1931-5244(15)00140-1/abstract

Cheers,
Dan
Reply all
Reply to author
Forward
0 new messages