Counting Virus TPM using Salmon

64 views
Skip to first unread message

Benjy Jek Yang Tan

unread,
Aug 2, 2017, 7:58:43 PM8/2/17
to Sailfish Users Group
Hi all,

I am new to RNA-Seq and I hope someone would be able to answer my query here.

I just recently started to use Salmon for transcript quantification; I was using kallisto previously. What I liked about Salmon is I can quantify transcripts using mapped reads which is slightly more useful to me. 

But my dilemma comes here. This might be a bit long, please bear with me.

I am quantifying viral transcripts from PBMCs. Initially, I used the quasi-mapping mode where I provide the FastQ files and I quantify them against 2 references - viral transcripts only OR human genome + viral transcripts.

But then the number of viral transcripts is also dependent on the number of actively transcribing proviruses; so we should take into consideration the proviral load. So next I attempted to quantify using the alignment mode where I quantify only reads which mapped to the viral genome against 2 references as well - viral transcripts only OR human genome + viral transcripts.

A summarized version of my results are shown below in the format of “Number of viral transcripts (TPM) when mapped using <X> against reference <Y>”

A) Map using raw FASTQ against viral transcripts only = 64,978
B) Map using raw FASTQ against human genome + viral transcripts = 327
C) Map using reads aligned to virus against viral transcripts only = 42,495
D) Map using reads aligned to virus against human genome + viral transcripts only = 42,384

So, I have 4 different numbers here and I was wondering which one would better reflect the actual number of transcripts actually derived from the virus. 

Initially I thought (B) would probably be most accurate, but then considering that I only want to know viral transcripts only, I think (C) would be more reflective. So I’m kind of confused now. 

I hope someone could give me constructive feedback as I’m really at a loss here. 

Hope to have some feedback soon. Thank you for your time!

EDUARDO EYRAS

unread,
Aug 14, 2017, 12:08:54 PM8/14/17
to Sailfish Users Group
Hi Benjy,

I just came across your question. I'll try to give you some help:

TPM units gives you a relative measure of the number molecules: the number of copies of a specific RNA given that you sampled a total of 1 million.
In that sense, the TPM values that you are giving are relevant as long as their relative total is relevant.

For instance, if you quantify using viral, proviral and human, your TPM relates to all these molecules: Assuming that you saw 1M molecules drawn from virus, provirus and human transcripts, what number of those corresponded to virus. In this case you could say that a viral TPM would be comparable to a human TPM within the same sample.
If you only quantify virus and provirus, you could compare the TPMs among themselves, but not with the human ones if you quantified them separately. 

These values become relevant when you compare conditions, individuals, treatment, etc.... since you need to quantify in similar ways, and relative differences ( or alternatively log-rates) become important. 

I hope this helps

best

Eduardo
Reply all
Reply to author
Forward
0 new messages