best way to bin indivdual (not co-) assemblies

728 views
Skip to first unread message

Jay Osvatic

unread,
May 10, 2017, 11:55:57 AM5/10/17
to Anvi'o
Hello,

I am currently working through binning several metagenomes, they have been assembled individually (metaSPAdes) and as a coassembly (megahit). I plan on comparing both at the end.

For the coassembly, I am following the standard metagenomic workflow but I have a few questions about the individual assemblies.

For these individual assemblies I am using the --cluster-contigs profile option. Does that automatically do the CONCOCT binning process or do I still have to merge the profile lone to perform that?

These samples are all closely related. Would the binning be better performed if I align all of my reads sets to each set of individually assembled contigs, producing multiple profiles for each assembly that would have a single profile normally?



Thank you,

Jay

Tom Delmont

unread,
May 10, 2017, 12:07:52 PM5/10/17
to Anvi'o
Hi Jay,

Good questions!

If you think that the same microbial populations exist in multiple samples, then mapping reads from those samples back to a single assembly, and following to classic recipe after that is a very good idea.

Not that if you use a single sample for mapping, then CONCOCT will not work properly, as it is designed to use differential coverage across multiple samples to work.

If you have less than 30,000 contigs you can use the "anvi-interactive" step to start manual binning without CONCOCT. 

For larger assemblies, I think that we can follow an ad-hoc strategy within anvi'o to use CONCOCT despite the mapping limitations. 

Maybe Meren could refresh my memory on the matter?

Tom

Jay Osvatic

unread,
May 10, 2017, 2:50:00 PM5/10/17
to an...@googlegroups.com
Thanks Tom!

Mapping all read sets to all individual assemblies sounds like the way to go for me then. The bacteria I am most interested in occur in high abundance (~>10%) in many, if not all, of my samples.

--
Anvi'o Paper: https://peerj.com/articles/1319/
Project Page: http://merenlab.org/projects/anvio/
Code Repository: https://github.com/meren/anvio
---
You received this message because you are subscribed to a topic in the Google Groups "Anvi'o" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/anvio/HNevA056sb0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to anvio+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/anvio/214d5aef-a759-4de5-aa3e-b93a1b86c686%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--

Jay Osvatic

Swingley Microbial Ecology Lab

Northern Illinois University

(262)309-9242

Bryan Merrill

unread,
May 11, 2017, 2:12:52 PM5/11/17
to Anvi'o
Hi Jay,

I'm also working on some single vs co-assemblies and looking at MEGAHIT vs metaSPAdes. What has been your experience so far with both assemblers?

Best,
Bryan
To unsubscribe from this group and all its topics, send an email to anvio+un...@googlegroups.com.

Jay Osvatic

unread,
May 11, 2017, 2:34:47 PM5/11/17
to Anvi'o
Hey Bryan,

MEGAHIT is relatively new to me since I have just begun more complex co-assemblies but it has come highly recommended by several people and it is impressively quick, while using small amounts of memory. MetaSPAdes on the other hand is much slower and uses a lot of memory. I am in the middle of Anvi'o binning of all of these sample (hoping to be done in a week, but lots of profiling left) to see if I can get similiar results with each. In my down time I will be comparing single assemblies of metaSPAdes to MEGAHIT to see how similiar the assemblies are (shouldn't take too long). I will happily report my results as soon as I get them.

Before I started this process I did a lot of asking around and have heard some labs avoid co-assemblies at all costs and instead do a fake one by aligning large contigs from the same taxonomy bins of single assemblies in order to complete larger portions of the genomes. 

In case you were interested, here is a paper that details metagenomic assemblers and their "quality" to each other in depth:

 http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0169662


Jay

Bryan Merrill

unread,
May 30, 2017, 4:03:00 PM5/30/17
to Anvi'o
Hi Jay,

Thanks for the reference! I'm wondering how your assembly comparisons went and what you're finding.

Best,
Bryan

Jay Osvatic

unread,
May 30, 2017, 4:22:59 PM5/30/17
to an...@googlegroups.com
Hey Bryan,

Sorry about the delay. After tinkering around with everything, I can say that I am pretty satisfied with the MEGAHIT co-assembly. In binning and comparing the binned results to my largest past SPAdes assembly, I found that the taxonomies of the bins seem to be similar and the co-assembly seems to have separated out ecotypes of a few very abundant species well (something that wouldn't be too realistic in individual assemblies).

I would say that a MEGAHIT co-assembly would be the way to go, as I ran into several limiting factors with metaSPAdes. The metaSPAdes assembly time for a single assembly is much much longer than MEGAHIT and, if you try to bin like Tom mentioned above, the generation and storage of that many anvi'o prep files (.bam/.sam mostly) uses a lot more time and takes up alot of hard drive space.

Jay

--
Anvi'o Paper: https://peerj.com/articles/1319/
Project Page: http://merenlab.org/projects/anvio/
Code Repository: https://github.com/meren/anvio
---
You received this message because you are subscribed to a topic in the Google Groups "Anvi'o" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/anvio/HNevA056sb0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to anvio+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/anvio/0321e237-ece5-4395-943f-6279f25d47e6%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages