Add-on for ABySS to extend using Pacbio reads - Cerulean

217 views
Skip to first unread message

Viraj Deshpande

unread,
Aug 15, 2013, 7:57:25 PM8/15/13
to abyss...@googlegroups.com
Hello ABySS Community,

We just developed an add-on package for abyss using Illumina + PacBio. 
The method is described in detail here http://arxiv.org/abs/1307.7933

Hope it can help!

Viraj. 


Steve

unread,
Aug 23, 2013, 8:36:13 AM8/23/13
to abyss...@googlegroups.com
Hi Viraj,

Have you tried this on genomes in the 100's Mb size range? I wanted to give it a try on a 400Mb genome. I have done an assembly with some PE data in Abyss and mapped back a 16X coverage in pacbio reads (uncorrected). I am running Cerulean at the moment but it has been sitting there using a single processor for three days now (although I specified 16), and not too much RAM. Any insights into how you would expect it to perform on a larger data set.

Cheers,

Steve

Viraj Deshpande

unread,
Aug 23, 2013, 8:50:14 PM8/23/13
to abyss...@googlegroups.com
Hello Steve,

Thanks for your feedback. Only part of the algorithm is parallelized. I can determine at which stage it is getting stuck by looking at your output.

I could run the code on a 100 Mbp genome on 24 threads in 30 mins with each thread using 2.6Mbp.

The output graph from ABySS had 65K vertices and 50K edges.

In my run although, I had disable the contig length threshold of 0 and only went down from 2000 to 250.

I have not validated the quality of the results of my assembly, so I cannot yet comment about the assembly quality for complex genomes.

Viraj












--
You received this message because you are subscribed to a topic in the Google Groups "ABySS" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/abyss-users/dJwmrhh5CUY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to abyss-users...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Steve

unread,
Aug 24, 2013, 7:31:14 AM8/24/13
to abyss...@googlegroups.com, vdes...@cs.ucsd.edu
Hi Viraj,

I actually hadn't started writing any output yet. Do you have a small 'toy data' set for testing? I was trying to use a small 600kb genome, which would have been ideal to test, but I only have single end reads and didn't generate the required .dot file in abyss. Can the .adj file substitute for this?

Cheers,

Steve

Viraj Deshpande

unread,
Sep 12, 2013, 5:35:28 PM9/12/13
to Steve, abyss...@googlegroups.com
Hello Steve,

Thanks for your comments. I have added a test bacterial data set.
I cannot take the .adj file as input.
The current version has been implemented with the bacterial genome in mind and the algorithm is very simple.
I plan to extend it to 100MB algal genome and release the next version by end of October.
I would not yet recommend to apply it for larger genomes yet.
But I find your comments useful. I will post accordingly on the sourceforge page so that other users do not have to take the time to run it on large datasets.

Regards,
Viraj



jungmin ha

unread,
Feb 5, 2014, 4:37:24 PM2/5/14
to abyss...@googlegroups.com
Hello Viraj

I attended your presentation at PAG and I am trying to run Cerulean on my assembly. 
The estimated genome size is 390 Mb and I am testing Cerulean. The ecoli test set in the package is perfectly working on our server but for a bigger data, 
contig.dot 155M, contigs.fa 269M, pacbio.fa 284M, contigs.fa.sa 991M fasta.m4 50M,
Cerulean.py just sit and doesn't give any output message for two days. 
Is cerulean working only for bacterial genome yet? 

Viraj Deshpande

unread,
Feb 5, 2014, 4:59:15 PM2/5/14
to jungmin ha, abyss...@googlegroups.com
Hello Jungmin,

I have implemented some optimizations on the code for larger genomes. However, I am in the process of  adding more reliability features before releasing for larger genomes. I hope to release this by end of February.

Viraj


macf...@onid.oregonstate.edu

unread,
Jul 10, 2014, 2:42:27 PM7/10/14
to abyss...@googlegroups.com
Hello Viraj, my name is Logan MacFarland. 

I am working on assembling a Cyanobacterial genome, and have chosen to use Cerulean to do so. However, when trying to assemble the genome using the procedure described in the 'readme', I didn't quite get the result I wanted.  I think the ABySS assembly might not have been the best tool to assemble the Illumina data, and we have gotten better results from IDBA. However, IDBA's fasta header format is slightly different, and does not generate a .dot file. Can you recommend a way to pass a idba short-read assembly to Cerulean? So far I've tried to generate the .dot file using a script I found in Trinity, although that script expects a different header as well. Also, in looking at the ABySS and IDBA fasta outputs, I realized that I could manually convert the name and length formats, although I don't think I could put in the third parameter typically found in the ABySS fasta header, the k-mer coverage. The only other information in the IDBA fasta header was the read coverage, but I don't think I can convert that over. Does Cerulean need the ABySS read coverage?

Thank you very much for your time. 

Logan.

Tony Raymond

unread,
Jul 10, 2014, 6:39:26 PM7/10/14
to <macfarll@onid.oregonstate.edu>, abyss...@googlegroups.com
Hi Logan,

Probably best to contact the developers of Cerulean directly for this type of request. This is the ABySS-users group, which is not affiliated with Cerulean.

Tony

--
You received this message because you are subscribed to the Google Groups "ABySS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to abyss-users...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

adithi r v

unread,
Jun 11, 2015, 3:05:27 PM6/11/15
to abyss...@googlegroups.com, macf...@onid.oregonstate.edu
Hi Viraj,
 
I am wondering if you ever came up with an answer for the appropriate time Cerulean takes to run. I was excited to try Cerulean for gap filing and scaffolding.
I followed excatly what is told in your manual. I run through Abyss successfully, followed by sawriter and blasr. All the files were generated and now I have been trying to run Cerulean. It is queing since 7 days, I neither see any output nor do I see any processes running in my top. I dont get whats going on? But if I kill the process it do stops. Could you kindly tell me whats going on.
 
Waiting for an quick reply.
Reply all
Reply to author
Forward
0 new messages