Long scaffolding with PacBio reads

196 views
Skip to first unread message

Matthew MacManes

unread,
Dec 20, 2014, 6:57:55 AM12/20/14
to abyss...@googlegroups.com
Hi All,

I am trying to rescaffold a (really crappy) assembly using PacBio data (about 10X coverage), and am getting negligible improvement in the quality of the assembly 

I have modified abyss-pe to accommodate the PacBio error profile (basically, adding `-x pacbio` to the bwa mem long scaffolding command. I do get a lot of reads mapping (pacbio reads map to 831229 ABySS contigs). If I'm reading the `kmer71-9.path`, correctly, there are only ~4000 instances where reads overlap >1 contig. This seems highly unlikely, given length N50 << length PacBio reads. 

I didn't get any errors during the run, though many warnings of the type 'Malformatted CIGAR string'

Thanks for any insight, Matt

Matthew MacManes

unread,
Jan 7, 2015, 5:22:37 PM1/7/15
to abyss...@googlegroups.com
Any ideas about this issue?
Matt
--
You received this message because you are subscribed to the Google Groups "ABySS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to abyss-users...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ben Vandervalk

unread,
Jan 7, 2015, 5:45:29 PM1/7/15
to Matthew MacManes, abyss...@googlegroups.com
(Forgot to include the mailing list in my first reply.)

Hi Matt,

ABySS-1.5.2 has a 'long' parameter for scaffolding with long reads.  For a usage example, see: https://github.com/bcgsc/abyss#rescaffolding-with-long-sequences.

As far as I understand, "long" is a very new feature that has only been tested using RNASeq contigs as the "long reads".  It is worth a try, though.

It is also worth mentioning that there is a specialized aligner for PacBio reads called BLASR.

Good luck!

Matthew MacManes

unread,
Jan 7, 2015, 5:53:43 PM1/7/15
to Ben Vandervalk, ABySS
Yes, I used the long option described at the github page - should have been more explicit about that in original post. 

Tony said that this should work for PB reads, but it has not worked as well as I (we) expected. I’m moving the twitter-based discussion over here with the thought that somebody may benefit from the discussion. 

About mapping - Heng Li has done a lot of engineering with the recent versions of BWA to allow for the mapping of very long reads. While I think Heng and Mark and the the PB people can argue about the best mapper, `bwa mem -x pacbio` is not too shabby. I’d be happy to modify the abyss-pe makefile to use blasr, but I don’t think that is the issue, given that the SAM file contains a lot of successfully mapped reads.

Happy to share whatever data is helpful in solving this issue. I think there are lots of people who would scaffold with PB reads using ABySS if it worked well enough.

Matt


On January 7, 2015 at 5:43:40 PM, Ben Vandervalk (ben....@gmail.com) wrote:

Hi Matt,

ABySS-1.5.2 has a 'long' parameter for scaffolding with long reads.  For a usage example, see: https://github.com/bcgsc/abyss#rescaffolding-with-long-sequences.

As far as I understand, "long" is a very new feature that has only been tested using RNASeq contigs as the "long reads".  It is worth a try, though.

It is also worth mentioning that there is a specialized for PacBio reads called BLASR.

Good luck!

- Ben

On Wed, Jan 7, 2015 at 2:22 PM, Matthew MacManes <macm...@gmail.com> wrote:

Ben Vandervalk

unread,
Jan 7, 2015, 7:01:18 PM1/7/15
to Matthew MacManes, ABySS
Hi Matt,

Fair enough about 'bwa mem -x pacbio'.

I agree that a better long-read scaffolder would be a really significant improvement for ABySS.  I'll ask around and see if anyone has the bandwidth to work on it.  In the meantime, I would advise using a third-party long-read scaffolder, if such exists.

- Ben
Reply all
Reply to author
Forward
0 new messages