Salmon to quantify PacBio IsoSeq reads

506 views
Skip to first unread message

Keyur

unread,
Feb 26, 2016, 2:35:55 PM2/26/16
to Sailfish Users Group
Is it possible to use Salmon to quantify long IsoSeq reads from Pacbio?

If yes then which mode you suggest it run

1. directly on fasta file of isoseq reads? or
2. Mapped (mapped with star) first and just quantify it with salmon.

Thanks you,
Keyur

Rob

unread,
Feb 26, 2016, 4:38:00 PM2/26/16
to Sailfish Users Group
Hi Keyur,

  This is a very interesting question.  I'd say that there's no reason that salmon shouldn't work with this data.  However, it would be very interesting to see how different the "estimated" counts you get from salmon are compared to the "raw" counts you get from mapping isoseq reads to transcripts.  The reason I mention this is that I'd expect most isoseq reads to be long enough that they map uniquely, so that the multi-mapping that is prevalent in "traditional" short-read sequencing should pose much less of a problem.  Nonetheless, there still may be some differences if reads originate from very similar transcripts.  Likewise, there is nothing about the quasi-mapping procedure (what salmon does when you run it directly on the fasta file) that is specific to short reads, and so it should work on the pacbio reads as well.  That being said, IsoSeq is rather new (and expensive), and we've not tested on this type of data yet.  Given the nature of the IsoSeq data (far fewer, far longer reads), the time difference between quasi-mapping and a traditional aligner (like STAR) is likely to be much smaller than for e.g. Illumina reads.  Personally, my approach would be to align the reads and then quantify the BAM, and then to compare the results you get thorough that approach to the results you get from using salmon to quantify the abundances directly from the fasta file.  Just be certain that when you generate the alignments you either (a) align directly to the transcriptome and not the genome or (b) have the aligner "project" the alignments into transcriptomic coordinates (I know that STAR and HISAT2 have options for this).  I'd be very interested in hearing how things work out, as this is a use case we haven't consider much to this point.

Best,
Rob

Anne Deslattes Mays

unread,
Feb 29, 2016, 2:54:44 PM2/29/16
to Sailfish Users Group
Hi there,

Nearly two years ago I used Sailfish to quantify combined IsoSeq reads and Illumina reads and it worked beautifully in that regards.   There was no problem -- I think I corresponded with Rob at the time.  In that case I combined all transcriptomes and made a uber-transcriptome collapsing to ensure that there was non-redundancy and then I quantified according to the then protocol (June 2014).

http://www.slideshare.net/adeslat/june-17-pacbio-user-group-meeting-presentation-how

I used the first option -- the fasta file (after collapsing together to get the single best longest reads -- script by Liz Tseng, PacBio).

Anne


On Friday, February 26, 2016 at 2:35:55 PM UTC-5, Keyur wrote:
Reply all
Reply to author
Forward
0 new messages