Raw (Non-normalized) counts

748 views
Skip to first unread message

Pankaj Agarwal

unread,
Jan 3, 2017, 9:30:33 AM1/3/17
to Sailfish Users Group

Hi,

I have performed expression quantification using Salmon for a single cell rna-seq data set.

Now I am trying to do a Differential Gene Expression using tools as such DESeq2 and SCDE.

These all require raw integer counts and not the normalized counts as as TPM/RPKM/FPKM.

An example of output that I get from Salmon is as follows:


Name Length EffectiveLength TPM NumReads

ENST00000448914.1 13 5.29621 0 0

ENST00000631435.1 12 4.64363 0 0

ENST00000632684.1 12 4.64363 0 0

ENST00000434970.2 9 2.10172 0 0


….

ENST00000632016.1 392 234.311 3698.51 871

….

ENST00000534447.5 5330 5157.43 0.417128 2.16222

..


Does the column "NumReads" provide the raw counts?  If that is the case how come some values are not integers.
If these are not the raw counts, is there a way to get the raw counts using Salmon?

I have read the post on this topic

https://groups.google.com/forum/#!msg/sailfish-users/jBf9SGiH1AM/Xr-EUIW5CQAJ;context-place=forum/sailfish-users

and some of the links from this post but I am still not able to figure out how to get the raw counts from Salmon.


Thanks,

- Pankaj

Rob

unread,
Jan 3, 2017, 9:46:06 AM1/3/17
to Sailfish Users Group
Hi Pankaj,

  There is a nice article that describes how to use Salmon with downstream DE tools here https://f1000research.com/articles/4-1521/v2.  In addition to a lot of other useful information, it describes a package, tximport (https://bioconductor.org/packages/release/bioc/html/tximport.html) that lets you easily import salmon results into DESeq2, EdgeR etc.  Basically, "raw" counts are not feasible with most transcript-level approaches based on statistical inference, since reads are allocated proportionally based on the manner that matches the parameters of the underlying model.  These estimated counts can be appropriately rounded in a manner that they perform as well as (actually, better than) raw counts.  How, exactly, tximport infers integer counts from the estimated counts is described in detail in the paper linked above.

Best,
Rob

Pankaj Agarwal

unread,
Jan 3, 2017, 11:48:33 AM1/3/17
to Sailfish Users Group
Hi Rob,
Thanks for your quick response.  I have used the tximport package successfully with DESeq2.  But DESeq2 has the ability to work with tximport directly.  For other software packages such as some that I am using for single cell RNAseq DE analysis, I will have to provide the counts.  Do you suggested using rounded off "NumReads" or "TPM" from Salmon output or using tximport and providing the Salmon input and using the output from tximport for further analysis by tools other than DESeq2.
Thanks for providing the link to the paper, I did a quick read but will read more in depth.  I think it will clear all the confusion that I have been having regarding this topic.
Best,
- Pankaj
Reply all
Reply to author
Forward
0 new messages