anvi-interactive phylogenomic analysis

128 views
Skip to first unread message

Hannah Schweitzer

unread,
Dec 17, 2020, 5:24:20 AM12/17/20
to Anvi'o
Hey team!

I am relatively new to Anvi'o and am still learning all of its possibilities! I am currently trying to make a phylogenomic analysis figure (like the one attached) of my Bins, but I only want to show the bins that are greater then 80% completion instead of looking at all of the bins. What is the best way to make a new profile database and contig database that only contain the >80% completion bins?

Alternatively, I was going to try to follow the tutorials for the Anvi' Server (https://anvi-server.org/) but in order to make the newick tree would that mean I would take the fasta file of each of the bins I want manually and follow this tutorial (https://merenlab.org/2017/06/07/phylogenomics/)? I think I can follow this tutorial but because I have already made the phylogenomic analysis of the all the bins it seems like there would be an easy step to filter out bins below 80% completion that I am just missing. Thanks!

Hannah
phylogenomic_analysis.jpg

Iva Veseli

unread,
Dec 17, 2020, 10:19:56 AM12/17/20
to an...@googlegroups.com
Hi Hannah :)

If I were you, I would make a collection of the bins with greater than 80% completion, and then use anvi-split (https://merenlab.org/software/anvio/help/main/programs/anvi-split/) to get that collection into its own set of contig/profile dbs. Then you can do the phylogenomic analysis as you did before with those new dbs. 

Let us know if you need any additional help with this.

Iva

-------------------------------------------------
Iva Veseli
Graduate Student, Meren and Jabri Labs
Biophysical Sciences Program
University of Chicago


--
Anvi'o Paper: https://peerj.com/articles/1319/
Project Page: http://merenlab.org/projects/anvio/
Code Repository: https://github.com/meren/anvio
---
You received this message because you are subscribed to the Google Groups "Anvi'o" group.
To unsubscribe from this group and stop receiving emails from it, send an email to anvio+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/anvio/bb6e5239-d0f7-4288-a670-7230deea4cbdn%40googlegroups.com.
<phylogenomic_analysis.jpg>

A. Murat Eren

unread,
Dec 17, 2020, 10:35:52 AM12/17/20
to Anvi'o
I agree with Iva's suggestion. I would probably follow a workflow like this:

First, generate a new collection from the existing one with new, more standardized bin names:

anvi-rename-bins -p PROFILE.db \
                 -c CONTIGS.db \
                 --collection-to-read EXISTING_COLLECTION \
                 --collection-to-write NEW_COLLECTION \
                 --call-MAGs \
                 --min-completion-for-MAG 80 \
                 --prefix PROJECT_NAME \
                 --report-file rename-report.txt

This will create a new collection in the profile database, essentially a copy of the existing one, but where every bin that is more than 80% complete will have _MAG_ in their names. Then, I would export this collection into a text file:

anvi-export-collection  -p PROFILE.db \
                        -C NEW_COLLECTION \
                        -O NEW_COLLECTION

Select only those contigs and bin names that are affiliated with MAGs and put them into a new text file:

grep _MAG_ NEW_COLLECTION.txt > NEW_COLLECTION_MAGs.txt

Then import the new text file as a new collection:

anvi-import-collection NEW_COLLECTION_MAGs.txt \
                       -p PROFILE.db \
                       -c CONTIGS.db \
                       -C NEW_COLLECTION_MAGs

Now the collection NEW_COLLECTION_MAGs will only include those bins that you are interested in for phylogenomics, or anvi-refine, or anvi-interactive in collection mode, or any anvi'o program that can use a collection :)


Best,
--

A. Murat Eren (Meren) | he/him
http://merenlab.org :: twitter


Hannah Schweitzer

unread,
Dec 20, 2020, 5:31:19 PM12/20/20
to an...@googlegroups.com
Hey team!

So first off thank you so much for the help so far! Your advice got me pointed in the right direction. I am unfortunately stuck again. I was able to make the 80 percent collection like you suggested (or at least I think I have made them) but then when I run:

anvi-interactive -p SAMPLESMERGED/PROFILE.db -c 03_CONTIGS/contigs.db -C percent80_collection

I get this error:

Traceback (most recent call last):
  File "/software/miniconda3/envs/anvio-6/bin/anvi-interactive", line 115, in <module>
    d = interactive.Interactive(args)
  File "/software/miniconda3/envs/anvio-6/lib/python3.6/site-packages/anvio/interactive.py", line 212, in __init__
    self.load_collection_mode()
  File "/software/miniconda3/envs/anvio-6/lib/python3.6/site-packages/anvio/interactive.py", line 808, in load_collection_mode
    self.load_full_mode()
  File "/software/miniconda3/envs/anvio-6/lib/python3.6/site-packages/anvio/interactive.py", line 947, in load_full_mode
    ProfileSuperclass.__init__(self, self.args)
  File "/software/miniconda3/envs/anvio-6/lib/python3.6/site-packages/anvio/dbops.py", line 2330, in __init__
    total_reads_mapped = [int(num_reads) for num_reads in self.p_meta['total_reads_mapped'].split(',')]
KeyError: 'total_reads_mapped'


I am not sure what the total_reads_mapped key error could be. At the same time as I was doing this I was also just trying to build a newick tree format following this tutorial (https://merenlab.org/2017/06/07/phylogenomics/#pangenomic--phylogenomics) and then I could just make a phylogenomic analysis figure on the anvi'o server (https://anvi-server.org). Unfortunately I also run into an error when I run this command:

anvi-get-sequences-for-hmm-hits -p SAMPLESMERGED/PROFILE.db -c 03_CONTIGS/contigs.db -C percent80_collection -o percent80_concatenate.fa --hmm-source Ribosomal_RNAs --gene-names Archaeal_23S_rRNA,Archaeal_5S_rRNA,Archaeal_5_8S_rRNA,Bacterial_16S_rRNA,Bacterial_23S_rRNA,Bacterial_5S_rRNA,Eukaryotic_28S_rRNA,Eukaryotic_5S_rRNA,Eukaryotic_5_8S_rRNA,Mitochondrial_12S_rRNA,Mitochondrial_16S_rRNA,Archaeal_16S_rRNA --return-best-hit --get-aa-sequences --concatenate

The error is:

Config Error: Drivers::Muscle: Something went wrong with this alignment that was working on 24
              sequences :/ You can find the output in this log file:                          
              /tmp/tmpc8087_s6/00_log.txt


When I look in the temp file it says my input file has no sequences:

*** ERROR ***  No sequences in input file

I'm not sure how it is possible that there aren't any sequences. I also checked and only used genes that were found in all 88 bins. 

Any help would be greatly appreciated! Also, side note but while going through your phylogenomics tutorial I saw the special thanks/shout-out to Luke McKay for the --concatenate flag! He is the one that introduced me to Anvi'o and gave me my first lessons on how to use it! It was fun seeing that shout-out. Thanks again and let me know if you want any of the files I am using.

Hannah



A. Murat Eren

unread,
Dec 20, 2020, 5:50:12 PM12/20/20
to Anvi'o
Hey Hannah,

Sorry for the `total_reads_mapped` error. It is fixed in the main repository, but continues to be a nuisance in earlier stable versions. What is the output of this command on your environment?

anvi-self-test --version

From a theoretical standpoint you are using the wrong HMM source for phylogenomics as your command is essentially asking all 'ribosomal RNAs' to be concatenated. It certainly will not give you a proper phylogenomic tree. You should consider using ribosomal proteins through the bacterial single-copy core gene collection stored in your contigs database. Take a look at this for inspiration:


From a technical standpoint, it is easy to explain why you're getting a "no sequences in the input file" error. Your command goes this way:

anvi-get-sequences-for-hmm-hits -p SAMPLESMERGED/PROFILE.db \
                                -c 03_CONTIGS/contigs.db \
                                -C percent80_collection \
                                -o percent80_concatenate.fa \
                                --hmm-source Ribosomal_RNAs \
                                --gene-names Archaeal_23S_rRNA,Archaeal_5S_rRNA,Archaeal_5_8S_rRNA,Bacterial_16S_rRNA,Bacterial_23S_rRNA,Bacterial_5S_rRNA,Eukaryotic_28S_rRNA,Eukaryotic_5S_rRNA,Eukaryotic_5_8S_rRNA,Mitochondrial_12S_rRNA,Mitochondrial_16S_rRNA,Archaeal_16S_rRNA \
                                --return-best-hit \
                                --get-aa-sequences \
                                --concatenate

Everything gene you ask is a Ribosomal RNA. Since ribosomal RNAs serve the cell through their secondary structures, their open reading frames do not result in amino acid sequences. Hence, the flag `--get-aa-sequences` results in an empty sequence file (in an ideal world anvi'o should have caught it earlier and warned you, but it didn't .. that's on us :)).


Best,
--

A. Murat Eren (Meren) | he/him
http://merenlab.org :: twitter

Hannah Schweitzer

unread,
Dec 21, 2020, 2:50:38 AM12/21/20
to an...@googlegroups.com
And that makes perfect sense. I also should have thought about that sooner. Thank you. I ran the anvi-self test --version and these were the results:

Anvi'o version ...............................: esther (v6)
Profile DB version ...........................: 31
Contigs DB version ...........................: 14
Pan DB version ...............................: 13
Genome data storage version ..................: 6
Auxiliary data storage version ...............: 2
Structure DB version .........................: 1


I will give it another try using the ribosomal proteins from the bacterial single-copy gene collection. Thanks again!



Hannah Schweitzer

unread,
Feb 27, 2021, 3:29:15 AM2/27/21
to Anvi'o
So I had prevously found a way around my "total_reads_mapped" error and was able to work through it but now I am back to needing to use anvi-interactive and anvi-refine so I am running into this same error again:

 Traceback (most recent call last):
  File "/software/miniconda3/envs/anvio-6/bin/anvi-interactive", line 115, in <module>
    d = interactive.Interactive(args)
  File "/software/miniconda3/envs/anvio-6/lib/python3.6/site-packages/anvio/interactive.py", line 214, in __init__
    self.load_full_mode()
  File "/software/miniconda3/envs/anvio-6/lib/python3.6/site-packages/anvio/interactive.py", line 947, in load_full_mode
    ProfileSuperclass.__init__(self, self.args)
  File "/software/miniconda3/envs/anvio-6/lib/python3.6/site-packages/anvio/dbops.py", line 2330, in __init__
    total_reads_mapped = [int(num_reads) for num_reads in self.p_meta['total_reads_mapped'].split(',')]
KeyError: 'total_reads_mapped'

I ran the anvi-self-test and the versions I am using are:

Anvi'o version ...............................: esther (v6)
Profile DB version ...........................: 31
Contigs DB version ...........................: 14
Pan DB version ...............................: 13
Genome data storage version ..................: 6
Auxiliary data storage version ...............: 2
Structure DB version .........................: 1

Any advice would help to get around this error. Thanks!

Hannah

A. Murat Eren

unread,
Feb 27, 2021, 10:11:55 AM2/27/21
to Anvi'o
I think you should install anvi'o v7 first, Hannah :) You will be able work with the same files and this error will very likely go away.

Reply all
Reply to author
Forward
0 new messages