organizing anvi-interactive contigs by external bins rather than in-built- hierarchical clustering

251 views
Skip to first unread message

Dhwani Desai

unread,
Aug 21, 2017, 5:55:01 PM8/21/17
to Anvi'o
Hi Meren,
I binned my contigs using an external method and was able to import them into my contigs.db as a collection. I was able to re-format these bins in the form of a matrix as following:

ContigID  extBin1  extBin2  extBin3
c0001          1          0           0
c0002          0          1           0
c0003          1          0           0


I formatted it so that each contig is only present (has value 1) in one of the bins.

I converted this matrix to a tree using ani-matrix-to-newick.

When I use anvi-interactive, I can get anvio to display the external bins on top of the hierarchical that it automatically generates from the contigs. My question is, is it possible to replace this in-built hierarchical tree with the contigs treee that I generated from external bins, so that my external bins are not split? If this is possible, I can add an additional layer to visualize the external bins and then add specific function layers on top of that. Does this make sense? Is this possible?

regards,
Dhwani


A. Murat Eren

unread,
Aug 21, 2017, 6:48:19 PM8/21/17
to Anvi'o
Hi Dhwani,

This is possible. You can order your splits based on the external binning results by using the --items-order parameter (from the anvi-interactive help menu):

(...)
MANUAL INPUTS:
  Mandatory input parameters to start the interactive interface without
  anvi'o databases.

  --manual-mode         Using this flag, you can run the interactive interface
                        in an ad hoc manner using input files you curated
                        instead of standard output files generated by an
                        anvi'o run. In the manual mode you will be asked to
                        provide a profile database. In this mode a profile
                        database is only used to store 'state' of the
                        interactive interface so you can reload your visual
                        settings when you re-analyze the same files again. If
                        the profile database you provide does not exist,
                        anvi'o will create an empty one for you.
  -f FASTA, --fasta-file FASTA
                        A FASTA-formatted input file
  -d VIEW_DATA, --view-data VIEW_DATA
                        A TAB-delimited file for view data
  -t NEWICK, --tree NEWICK
                        NEWICK formatted tree structure
  --items-order COMMA_SEPARATED_FILE
                        Comma seperated file that contains order of leaves,
                        You may want to use this if you want to order your
                        leaves but do not want to display tree in the middle.
(...)


So, if you order your splits/contigs in such a way that each one that is in the same bin is next to each other, this would work.

An alternative way is to use the collection mode. If you have imported your external binning results as collection EXT, then you can run anvi-interactive in collection mode:


(..)
DEFAULT INPUTS:
  The interative interface can be started with and without anvi'o
  databases. The default use assumes you have your profile and contigs
  database, however, it is also possible to start the interface using ad hoc
  input files. See 'MANUAL INPUT' section for required parameters.

  -p PROFILE_DB, --profile-db PROFILE_DB
                        Anvi'o profile database
  -c CONTIGS_DB, --contigs-db CONTIGS_DB
                        Anvi'o contigs database generated by 'anvi-gen-
                        contigs'
  -s SAMPLES_DB, --samples-information-db SAMPLES_DB
                        Samples information database generated by 'anvi-gen-
                        samples-info-database'
  -C COLLECTION_NAME, --collection-name COLLECTION_NAME
                        If you have a collection in your profile database, you
                        can use this flag to start the interactive interface
                        with a tree showing your bins in your collection,
                        instead of each split. This is very useful when you
                        have imported your external binning results into
                        anvi'o, and want to see the distribution of your bins
                        across samples. In these cases anvi'o will cluster
                        your bins and based on multiple metrics. Because this
                        particular clustering will be done on the fly within
                        anvi'o interactive class, you get to define a
                        disntance metric and a linkage method using --linkage
                        and --distance parameters if you want!
(...)


If you care about resolution, i.e., your ability to identify individual contigs with functions, etc, the first approach is better. Otherwise, the second can work well. You can provide any additional data regarding your bins using the `--additional-layers` parameter.

I hope this helps.


Best,


--

A. Murat Eren (meren)
http://merenlab.org :: twitter :: gpg

--
Anvi'o Paper: https://peerj.com/articles/1319/
Project Page: http://merenlab.org/projects/anvio/
Code Repository: https://github.com/meren/anvio
---
You received this message because you are subscribed to the Google Groups "Anvi'o" group.
To unsubscribe from this group and stop receiving emails from it, send an email to anvio+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/anvio/af71bc20-8f69-43e3-a30f-cd74a626bb04%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Dhwani Desai

unread,
Aug 22, 2017, 9:56:54 AM8/22/17
to Anvi'o
Thank Meren,
One slight problem...when I do anvi-interactive -h, I don't see the --items-order option there. anvi-self-test returns version 2.4.0 ...so I am guessing I have the latest anvio installed. Is this option from a dev version?

regards,
Dhwani

To unsubscribe from this group and stop receiving emails from it, send an email to anvio+un...@googlegroups.com.

A. Murat Eren

unread,
Aug 22, 2017, 10:12:36 AM8/22/17
to Anvi'o
Aha :) I apologize for that suggestion. It seems Özcan has added this to the repo just five days ago.

You can either follow master, or unfortunately it will be available in the next release.


Best,

--

A. Murat Eren (meren)
http://merenlab.org :: twitter :: gpg

To unsubscribe from this group and stop receiving emails from it, send an email to anvio+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/anvio/23dd0f88-9329-4b14-b457-40be13089ad2%40googlegroups.com.

Dhwani Desai

unread,
Aug 22, 2017, 3:57:13 PM8/22/17
to Anvi'o
OK...So i got the dev version installed and reformatted the contigs order file to what I assumed you suggested (comma separated file where contigs are in order of the external bins). What I have got is a file where the contigs are listed in order of the bins like so:

If my initial external bins file was this 

ContigID  extBin1  extBin2  extBin3
c0001          1          0           0
c0002          0          1           0
c0003          1          0           0
c0004          0          1           0
c0005          1          0           0
c0006          0          0           1

my comma separted file is as follows:

c0001,c0003,c0005,c0002,c0004,c0006

When I run this 

anvi-interactive -p profile.db --manual --items-order level-5-contigs-ext-bins-comma-sep.txt -d anvio-contigs-ext-bins-matrix-gt1000.txt -t level5-braycurtis-tree.txt -f contigs.fa

it does not raise an error, but it doesn't draw anything either. I just get a template page with my samples listed on the left, but no circular diagram. It keeps displaying the splash image which says getting data from server, but never manages to draw anything.

What is the corerct format for the file to use with the --items-order option?

regards,
Dhwani

A. Murat Eren

unread,
Aug 22, 2017, 4:12:16 PM8/22/17
to Anvi'o
​Hmm. I will look into this now, ​Dhwani, and keep let you know.

--

A. Murat Eren (meren)
http://merenlab.org :: twitter :: gpg

To unsubscribe from this group and stop receiving emails from it, send an email to anvio+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/anvio/6c686db8-3466-4b86-8385-3c64d35b9ad0%40googlegroups.com.

A. Murat Eren

unread,
Aug 22, 2017, 5:26:00 PM8/22/17
to Anvi'o
Hi Dhwani,

I just committed something to the master. Can you please update your codebase and try again?

Meanwhile the format of the items order file has changed :) Now we want an item name every line, instead of a comma separated list of them.

Please let me know if it works.


Thanks!

--

A. Murat Eren (meren)
http://merenlab.org :: twitter :: gpg

Dhwani Desai

unread,
Aug 24, 2017, 9:16:56 AM8/24/17
to Anvi'o
Thanks Meren,
One more question though...I am still trying to figure out how anvio organizes the data internally...

So when I use this command:

 anvi-interactive -p profile.db --manual --items-order level-5-layer.txt -d anvio-contigs-ext-bins-matrix.txt -t level5-braycurtis-tree.txt -f contigs.fa

What should be the format of the -d data file?

In your example in the tutorial for the 690 genomes, you have the 690 samples and the coverages of the taxonomic bins  across these samples. I have 12 metagenomic samples, so in my case it should be something as follows:

Metagenome    ExtBin1     ExtBin2    ExtBin3
Samp1                0                0.2           0.4
Samp2                0.9              0             0.3

or should it be 
ContigID  extBin1  extBin2  extBin3
c0001          1          0           0
c0002          0          1           0
c0003          1          0           0
c0004          0          1           0
c0005          1          0           0
c0006          0          0           1

regards,
Dhwani

A. Murat Eren

unread,
Aug 24, 2017, 10:02:11 AM8/24/17
to Anvi'o
I think you should try both, and find out which one do you like more :)

--

A. Murat Eren (meren)
http://merenlab.org :: twitter :: gpg

To unsubscribe from this group and stop receiving emails from it, send an email to anvio+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/anvio/53e8079b-3052-4943-b9fe-1ebbc25bab75%40googlegroups.com.

Dhwani Desai

unread,
Aug 24, 2017, 12:10:06 PM8/24/17
to Anvi'o
Thank you very much for your patience...and the prompt replies...

I think I am getting the hang of it now.

Just one last question...for now...:-)

Is there a memory limitation for the interactive display? I was trying to visualize a tree with about 25000 contigs in the manual mode, and it keeps crashing. I had similar problems using the anv-matrix-to-newick script on a matrix os size 90,000 (contigs) x 15 (bins). I am working on a ubuntu workstation with 64GB RAM. Is there a memory usage limit in-built in Anvio? How can I change it? How much RAM would you think is sufficient for drawing about 95000 contigs using a tree in the manual mode?

regards,

Dhwani

A. Murat Eren

unread,
Aug 24, 2017, 12:22:30 PM8/24/17
to Anvi'o
Hi Dhwani,

20,000 splits is pretty much the limit for operations that require hierarchical clustering. From the tutorial:

Hierarchical clustering results are necessary for comprehensive visualization, and human guided binning, therefore, by default, anvi’o attempts to cluster your contigs using default configurations. You can skip this step by using --skip-hierarchical-clustering flag. But even if you don’t skip it, anvi’o will skip it for you if you have more than 20,000 splits, since the computational complexity of this process will get less and less feasible with increasing number of splits.


Other than that, the number of objects that can be displayed in the interface depends on the configuration of your computer.


Best,

--

A. Murat Eren (meren)
http://merenlab.org :: twitter :: gpg

To unsubscribe from this group and stop receiving emails from it, send an email to anvio+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/anvio/da70a9cf-5971-4804-ae09-576ca5eccbc7%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages