Single gene novel isoform quantification

54 views

Skip to first unread message

kelregan

unread,

Oct 7, 2015, 9:19:36 PM10/7/15

to Sailfish Users Group

Hello,

I am wondering in a protocol to quantify expression of isoforms for a single gene across samples. I have sequences of several novel isoforms of a common human gene and I would like to estimate their expression in a set of RNA-seq samples. To save time, I am wondering if it would be possible to reduce the search space to only this gene for the reference transcripts as well as the fasta/bam sample files. Any advice would be greatly appreciated.

Thank you

Rob

unread,

Oct 10, 2015, 3:19:15 PM10/10/15

to Sailfish Users Group

Hi,

I would strongly caution against ignoring a large portion of the annotation when measuring the expression of a gene (either in a single condition or across multiple conditions). The problem with this is twofold. First, quantification methods can only assign abundance (i.e. assign reads) to what is present. Thus, by filtering the transcript set, it's quite possible you're eliminating targets that share sequence with your isoforms of interest — thus potentially artificially inflating their estimated abundance. Instead, if the other sequences are present in your reference, Salmon (and Sailfish) will attempt to infer the proper probabilities for multi-mapping reads, which should yield better expression estimates. Second, and equally important, when you want to assess differential expression in downstream analysis, it is important to be able to look at the expression of your isoforms, across conditions, in context. That is, how does the relative abundance of these transcripts change with respect to the "background distribution" of expressed transcripts in each condition? Without having a background, confidently assessing differential expression becomes difficult (e.g. did fewer reads map to your isoform in condition B because it was less abundant, or was it because condition B had fewer reads, or was it because condition B just had a lower mapping rate). From this perspective, it really makes sense to estimate abundance for a reasonable reference set + your novel isoforms. On the bright side, Salmon and Sailfish are fast enough that estimating abundances for the entire set of reference transcripts should still be very fast.

Best,
Rob

Reply all

Reply to author

Forward

0 new messages