anvi-interactive phylogenomic analysis

Hannah Schweitzer

unread,

Dec 17, 2020, 5:24:20 AM12/17/20

to Anvi'o

Hey team!

I am relatively new to Anvi'o and am still learning all of its possibilities! I am currently trying to make a phylogenomic analysis figure (like the one attached) of my Bins, but I only want to show the bins that are greater then 80% completion instead of looking at all of the bins. What is the best way to make a new profile database and contig database that only contain the >80% completion bins?

Alternatively, I was going to try to follow the tutorials for the Anvi' Server (https://anvi-server.org/) but in order to make the newick tree would that mean I would take the fasta file of each of the bins I want manually and follow this tutorial (https://merenlab.org/2017/06/07/phylogenomics/)? I think I can follow this tutorial but because I have already made the phylogenomic analysis of the all the bins it seems like there would be an easy step to filter out bins below 80% completion that I am just missing. Thanks!

Hannah

phylogenomic_analysis.jpg

Iva Veseli

unread,

Dec 17, 2020, 10:19:56 AM12/17/20

to an...@googlegroups.com

Hi Hannah :)

If I were you, I would make a collection of the bins with greater than 80% completion, and then use anvi-split (https://merenlab.org/software/anvio/help/main/programs/anvi-split/) to get that collection into its own set of contig/profile dbs. Then you can do the phylogenomic analysis as you did before with those new dbs.

Let us know if you need any additional help with this.

Iva

-------------------------------------------------

Iva Veseli
Graduate Student, Meren and Jabri Labs
Biophysical Sciences Program
University of Chicago

--
Anvi'o Paper: https://peerj.com/articles/1319/
Project Page: http://merenlab.org/projects/anvio/
Code Repository: https://github.com/meren/anvio
---
You received this message because you are subscribed to the Google Groups "Anvi'o" group.
To unsubscribe from this group and stop receiving emails from it, send an email to anvio+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/anvio/bb6e5239-d0f7-4288-a670-7230deea4cbdn%40googlegroups.com.
<phylogenomic_analysis.jpg>

A. Murat Eren

unread,

Dec 17, 2020, 10:35:52 AM12/17/20

to Anvi'o

I agree with Iva's suggestion. I would probably follow a workflow like this:

First, generate a new collection from the existing one with new, more standardized bin names:

anvi-rename-bins -p PROFILE.db \
-c CONTIGS.db \
--collection-to-read EXISTING_COLLECTION \
--collection-to-write NEW_COLLECTION \
--call-MAGs \
--min-completion-for-MAG 80 \
--prefix PROJECT_NAME \
--report-file rename-report.txt

This will create a new collection in the profile database, essentially a copy of the existing one, but where every bin that is more than 80% complete will have _MAG_ in their names. Then, I would export this collection into a text file:

anvi-export-collection -p PROFILE.db \
-C NEW_COLLECTION \
-O NEW_COLLECTION

Select only those contigs and bin names that are affiliated with MAGs and put them into a new text file:

grep _MAG_ NEW_COLLECTION.txt > NEW_COLLECTION_MAGs.txt

Then import the new text file as a new collection:

anvi-import-collection NEW_COLLECTION_MAGs.txt \
-p PROFILE.db \
-c CONTIGS.db \
-C NEW_COLLECTION_MAGs

Now the collection NEW_COLLECTION_MAGs will only include those bins that you are interested in for phylogenomics, or anvi-refine, or anvi-interactive in collection mode, or any anvi'o program that can use a collection :)

Best,

--

A. Murat Eren (Meren) | he/him
http://merenlab.org :: twitter

To view this discussion on the web visit https://groups.google.com/d/msgid/anvio/32719CA6-A2EA-48EF-9514-3198CDCF7DE1%40uchicago.edu.

Hannah Schweitzer

unread,

Dec 20, 2020, 5:31:19 PM12/20/20

to an...@googlegroups.com

Hey team!

So first off thank you so much for the help so far! Your advice got me pointed in the right direction. I am unfortunately stuck again. I was able to make the 80 percent collection like you suggested (or at least I think I have made them) but then when I run:

anvi-interactive -p SAMPLESMERGED/PROFILE.db -c 03_CONTIGS/contigs.db -C percent80_collection

I get this error:

Traceback (most recent call last):
File "/software/miniconda3/envs/anvio-6/bin/anvi-interactive", line 115, in <module>
d = interactive.Interactive(args)
File "/software/miniconda3/envs/anvio-6/lib/python3.6/site-packages/anvio/interactive.py", line 212, in __init__
self.load_collection_mode()
File "/software/miniconda3/envs/anvio-6/lib/python3.6/site-packages/anvio/interactive.py", line 808, in load_collection_mode
self.load_full_mode()
File "/software/miniconda3/envs/anvio-6/lib/python3.6/site-packages/anvio/interactive.py", line 947, in load_full_mode
ProfileSuperclass.__init__(self, self.args)
File "/software/miniconda3/envs/anvio-6/lib/python3.6/site-packages/anvio/dbops.py", line 2330, in __init__
total_reads_mapped = [int(num_reads) for num_reads in self.p_meta['total_reads_mapped'].split(',')]
KeyError: 'total_reads_mapped'

I am not sure what the total_reads_mapped key error could be. At the same time as I was doing this I was also just trying to build a newick tree format following this tutorial (https://merenlab.org/2017/06/07/phylogenomics/#pangenomic--phylogenomics) and then I could just make a phylogenomic analysis figure on the anvi'o server (https://anvi-server.org). Unfortunately I also run into an error when I run this command:

anvi-get-sequences-for-hmm-hits -p SAMPLESMERGED/PROFILE.db -c 03_CONTIGS/contigs.db -C percent80_collection -o percent80_concatenate.fa --hmm-source Ribosomal_RNAs --gene-names Archaeal_23S_rRNA,Archaeal_5S_rRNA,Archaeal_5_8S_rRNA,Bacterial_16S_rRNA,Bacterial_23S_rRNA,Bacterial_5S_rRNA,Eukaryotic_28S_rRNA,Eukaryotic_5S_rRNA,Eukaryotic_5_8S_rRNA,Mitochondrial_12S_rRNA,Mitochondrial_16S_rRNA,Archaeal_16S_rRNA --return-best-hit --get-aa-sequences --concatenate

The error is:

Config Error: Drivers::Muscle: Something went wrong with this alignment that was working on 24
sequences :/ You can find the output in this log file:
/tmp/tmpc8087_s6/00_log.txt

When I look in the temp file it says my input file has no sequences:

*** ERROR *** No sequences in input file

I'm not sure how it is possible that there aren't any sequences. I also checked and only used genes that were found in all 88 bins.

Any help would be greatly appreciated! Also, side note but while going through your phylogenomics tutorial I saw the special thanks/shout-out to Luke McKay for the --concatenate flag! He is the one that introduced me to Anvi'o and gave me my first lessons on how to use it! It was fun seeing that shout-out. Thanks again and let me know if you want any of the files I am using.

Hannah

To view this discussion on the web visit https://groups.google.com/d/msgid/anvio/CAK5_LaeobVRE%2Bvh5d3LyoFCUwtN9Wxi825r5VxS11ByP7S7-BA%40mail.gmail.com.

A. Murat Eren

unread,

Dec 20, 2020, 5:50:12 PM12/20/20

to Anvi'o

Hey Hannah,

Sorry for the `total_reads_mapped` error. It is fixed in the main repository, but continues to be a nuisance in earlier stable versions. What is the output of this command on your environment?

anvi-self-test --version

From a theoretical standpoint you are using the wrong HMM source for phylogenomics as your command is essentially asking all 'ribosomal RNAs' to be concatenated. It certainly will not give you a proper phylogenomic tree. You should consider using ribosomal proteins through the bacterial single-copy core gene collection stored in your contigs database. Take a look at this for inspiration:

https://merenlab.org/data/parcubacterium-in-hbcfdna/#phylogenomic-analysis

From a technical standpoint, it is easy to explain why you're getting a "no sequences in the input file" error. Your command goes this way:

anvi-get-sequences-for-hmm-hits -p SAMPLESMERGED/PROFILE.db \

-c 03_CONTIGS/contigs.db \

-C percent80_collection \

-o percent80_concatenate.fa \

--hmm-source Ribosomal_RNAs \

--gene-names Archaeal_23S_rRNA,Archaeal_5S_rRNA,Archaeal_5_8S_rRNA,Bacterial_16S_rRNA,Bacterial_23S_rRNA,Bacterial_5S_rRNA,Eukaryotic_28S_rRNA,Eukaryotic_5S_rRNA,Eukaryotic_5_8S_rRNA,Mitochondrial_12S_rRNA,Mitochondrial_16S_rRNA,Archaeal_16S_rRNA \

--return-best-hit \

--get-aa-sequences \

--concatenate

Everything gene you ask is a Ribosomal RNA. Since ribosomal RNAs serve the cell through their secondary structures, their open reading frames do not result in amino acid sequences. Hence, the flag `--get-aa-sequences` results in an empty sequence file (in an ideal world anvi'o should have caught it earlier and warned you, but it didn't .. that's on us :)).

Best,

--

A. Murat Eren (Meren) | he/him
http://merenlab.org :: twitter

To view this discussion on the web visit https://groups.google.com/d/msgid/anvio/CAOD-%2Bf4Xanw4E%3DjBgBttk_%2BNVYfO8bdfQuCPUhYry%3DWwitj7rw%40mail.gmail.com.

Hannah Schweitzer

unread,

Dec 21, 2020, 2:50:38 AM12/21/20

to an...@googlegroups.com

And that makes perfect sense. I also should have thought about that sooner. Thank you. I ran the anvi-self test --version and these were the results:

Anvi'o version ...............................: esther (v6)
Profile DB version ...........................: 31
Contigs DB version ...........................: 14
Pan DB version ...............................: 13
Genome data storage version ..................: 6
Auxiliary data storage version ...............: 2
Structure DB version .........................: 1

I will give it another try using the ribosomal proteins from the bacterial single-copy gene collection. Thanks again!

To view this discussion on the web visit https://groups.google.com/d/msgid/anvio/CAK5_LacXUqEPiSuRm93L20NUsjnBXjCSo9JXH6%2BYQcock%2BfZuA%40mail.gmail.com.

Hannah Schweitzer

unread,

Feb 27, 2021, 3:29:15 AM2/27/21

to Anvi'o

So I had prevously found a way around my "total_reads_mapped" error and was able to work through it but now I am back to needing to use anvi-interactive and anvi-refine so I am running into this same error again:

Traceback (most recent call last):

File "/software/miniconda3/envs/anvio-6/bin/anvi-interactive", line 115, in <module>

d = interactive.Interactive(args)

File "/software/miniconda3/envs/anvio-6/lib/python3.6/site-packages/anvio/interactive.py", line 214, in __init__

self.load_full_mode()

File "/software/miniconda3/envs/anvio-6/lib/python3.6/site-packages/anvio/interactive.py", line 947, in load_full_mode

ProfileSuperclass.__init__(self, self.args)

File "/software/miniconda3/envs/anvio-6/lib/python3.6/site-packages/anvio/dbops.py", line 2330, in __init__

total_reads_mapped = [int(num_reads) for num_reads in self.p_meta['total_reads_mapped'].split(',')]

KeyError: 'total_reads_mapped'

I ran the anvi-self-test and the versions I am using are:

Anvi'o version ...............................: esther (v6)

Profile DB version ...........................: 31

Contigs DB version ...........................: 14

Pan DB version ...............................: 13

Genome data storage version ..................: 6

Auxiliary data storage version ...............: 2

Structure DB version .........................: 1

Any advice would help to get around this error. Thanks!

Hannah

A. Murat Eren

unread,

Feb 27, 2021, 10:11:55 AM2/27/21

to Anvi'o

I think you should install anvi'o v7 first, Hannah :) You will be able work with the same files and this error will very likely go away.

To view this discussion on the web visit https://groups.google.com/d/msgid/anvio/24410039-7d8f-4ae3-898d-a5bbc34dcd3dn%40googlegroups.com.

Reply all

Reply to author

Forward