Hi Dan,
There is a short answer and a long answer to your question. The short answer is that summing the transcript-level counts to the gene level should give you results at least as accurate as (and likely, significantly more accurate than) other approaches for preparing counts for downstream DE analysis.
The longer answer is that there is no truly
correct way to do gene-level differential expression analysis, as the question isn't particularly well-formed. Specifically, the transcripts are the things truly being expressed in the cell, and asking for gene-level differential expression is like asking for differences in potentially complex mixtures by looking at a single number (the population-average, gene-level count). There are many different ways you might approach the question (e.g. calling a gene differentially expressed if at least one of its isoforms is DE). Certainly, doing DE on the aggregated counts obtained via Salmon is at least as good as doing DE on the aggregated counts obtained from some other tool (or just the raw counts of reads mapped to the gene), but we're currently working on
a project that will try to convince people to avoid this common but difficulty-to-interpret type of analysis in the future. If you want more details, I'd be happy to provide them (and/or) you could join the gitter page for the project linked above!
Best,
Rob
On Monday, May 11, 2015 at 8:35:03 AM UTC-4, Dan wrote: