Hi all,
I am new to RNA-Seq and I hope someone would be able to answer my query here.
I just recently started to use Salmon for transcript quantification; I was using kallisto previously. What I liked about Salmon is I can quantify transcripts using mapped reads which is slightly more useful to me.
But my dilemma comes here. This might be a bit long, please bear with me.
I am quantifying viral transcripts from PBMCs. Initially, I used the quasi-mapping mode where I provide the FastQ files and I quantify them against 2 references - viral transcripts only OR human genome + viral transcripts.
But then the number of viral transcripts is also dependent on the number of actively transcribing proviruses; so we should take into consideration the proviral load. So next I attempted to quantify using the alignment mode where I quantify only reads which mapped to the viral genome against 2 references as well - viral transcripts only OR human genome + viral transcripts.
A summarized version of my results are shown below in the format of “Number of viral transcripts (TPM) when mapped using <X> against reference <Y>”
A) Map using raw FASTQ against viral transcripts only = 64,978
B) Map using raw FASTQ against human genome + viral transcripts = 327
C) Map using reads aligned to virus against viral transcripts only = 42,495
D) Map using reads aligned to virus against human genome + viral transcripts only = 42,384
So, I have 4 different numbers here and I was wondering which one would better reflect the actual number of transcripts actually derived from the virus.
Initially I thought (B) would probably be most accurate, but then considering that I only want to know viral transcripts only, I think (C) would be more reflective. So I’m kind of confused now.
I hope someone could give me constructive feedback as I’m really at a loss here.
Hope to have some feedback soon. Thank you for your time!