subset a contigs database?

150 views
Skip to first unread message

dethlefs

unread,
Oct 8, 2017, 4:57:19 PM10/8/17
to Anvi'o
Hi Meren,

I have the feeling there is a simple sqlite command that would do this, but I know almost nothing about sqlite.

What I want is a new, smaller contig database containing only the contigs from one or several bins of a collection...but rather than just exporting the particular list of contigs as a new fasta file, then redoing anvi-gen-contigs-database (which will redo Prodigal gene calling, mindfull splitting, etc) I just want the new smaller contigs db to match *exactly* what's already in the existing larger contigs db, but only a subset of it.

In case it isn't obvious, what I'm aiming for is the idea of iterating CONCOCT that I mentioned in the JGI workshop: run CONCOCT with a large coassembly (>100k contigs) and many dozens of samples to get an initial set of bins.  And then take only the contigs of a one or a few of the initial bins, and run CONCOCT again.

Les

Tom Delmont

unread,
Oct 9, 2017, 1:20:09 AM10/9/17
to an...@googlegroups.com
Hi Les,

I think the anvi'o program you are looking for is called "anvi-split". It will create self contained CONTIGS.db and PROFILE.db files for each bin in a given collection. Of course, the collection can have CONCOCT as a source. Then, you could run CONCOCT on each bin generated from the first iteration. 

and so it goes.

Best,

Tom

--
Anvi'o Paper: https://peerj.com/articles/1319/
Project Page: http://merenlab.org/projects/anvio/
Code Repository: https://github.com/meren/anvio
---
You received this message because you are subscribed to the Google Groups "Anvi'o" group.
To unsubscribe from this group and stop receiving emails from it, send an email to anvio+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/anvio/d1e4b665-71dc-476a-ae37-68120d04cf5a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

A. Murat Eren

unread,
Oct 9, 2017, 3:39:35 AM10/9/17
to Anvi'o
Hi Les,

Yes, anvi-split is the best way to get a contigs database that is a subset of a larger one. If you have want to have a contigs database that describes multiple bins, then you will need to create a tentative collection with all splits from all these bins are described in a single bin, and use anvi-split on that to get your contigs database :) Then you can use the previous collection file to import into the resulting profile database to have them separated into individual bins in the resulting contigs+profile pair.

I hope this makes sense.


Best wishes,

--

A. Murat Eren (meren)
http://merenlab.org :: twitter :: gpg

Reply all
Reply to author
Forward
0 new messages