anvi-get-sequences-for-hmm-hits says 16S is there but then can't find it

48 views
Skip to first unread message

Jonathon Baker

unread,
Mar 4, 2020, 2:08:58 PM3/4/20
to Anvi'o
$ anvi-self-test --version
Anvi'o version ...............................: esther (v6.1-master)
Profile DB version ...........................: 31
Contigs DB version ...........................: 14
Pan DB version ...............................: 13
Genome data storage version ..................: 6
Auxiliary data storage version ...............: 2
Structure DB version .........................: 1

Hello Anvi'o team,
My question today is regarding getting 16S sequences from a Contigs databases.  If I have a contigs database that is pretty good (checkM was 100% complete and 0% contamination), and run:

$ anvi-get-sequences-for-hmm-hits -c Atopobium_sp._ICM42b_strain_JCVI_33_bin.10_contigs.db --hmm-source Ribosomal_RNAs --list-available-gene-names
* Ribosomal_RNAs [type: Ribosomal_RNAs]: Archaeal_23S_rRNA, Archaeal_5S_rRNA,                                                                                                             
Archaeal_5_8S_rRNA, Bacterial_16S_rRNA, Bacterial_23S_rRNA, Bacterial_5S_rRNA,
Eukaryotic_28S_rRNA, Eukaryotic_5S_rRNA, Eukaryotic_5_8S_rRNA,
Mitochondrial_12S_rRNA, Mitochondrial_16S_rRNA, Archaeal_16S_rRNA

It looks like this contigs database would have a Bacterial 16S rRNA sequence?  But, when I do:

$ anvi-get-sequences-for-hmm-hits -c Atopobium_sp._ICM42b_strain_JCVI_33_bin.10_contigs.db --hmm-source Ribosomal_RNAs --gene-name Bacterial_16S_rRNA -o Atopobium_sp._ICM42b_strain_JCVI_33_bin.10_16S.fa
Contigs DB ...................................: Initialized: Atopobium_sp._ICM42b_strain_JCVI_33_bin.10_contigs.db (v. 14)                                                                
                                                                                                                                                                                          

Config Error: Your selections returned an empty list of genes to work with :/


There doesn't seem to be anything...does Anvi'o always report that there is a 16S available if you did the anvi-run-hmms (even if the contigs database doesn't really have a 16S)?

Also, I noticed the previous post by Igot S. Pessi (https://groups.google.com/forum/#!searchin/anvio/16s%7Csort:date/anvio/4fqJ3tIIByI/cktt-DD2AAAJ), and tried to also get the Archaeal 16S, just in case it was hiding under that name, but the same result:

$ anvi-get-sequences-for-hmm-hits -c Atopobium_sp._ICM42b_strain_JCVI_33_bin.10_contigs.db --hmm-source Ribosomal_RNAs --gene-name Bacterial_16S_rRNA,Archaeal_16S_rRNA -o Atopobium_sp._ICM42b_strain_JCVI_33_bin.10_16S.fa
Contigs DB ...................................: Initialized: Atopobium_sp._ICM42b_strain_JCVI_33_bin.10_contigs.db (v. 14)                                                                
                                                                                                                                                                                          

Config Error: Your selections returned an empty list of genes to work with :/



Thanks in advance for checking this out!

Jon

A. Murat Eren

unread,
Mar 4, 2020, 2:59:45 PM3/4/20
to Anvi'o
Hi Jonathan,

The flag list available genes simply lists the gene names covered in the model :/ It doesn't mean that there are hits in the contigs db.

What do you get back when you don't specify a gene name and get all sequences for Ribosomal_RNAs?

--
Anvi'o Paper: https://peerj.com/articles/1319/
Project Page: http://merenlab.org/projects/anvio/
Code Repository: https://github.com/meren/anvio
---
You received this message because you are subscribed to the Google Groups "Anvi'o" group.
To unsubscribe from this group and stop receiving emails from it, send an email to anvio+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/anvio/e2d8ab15-a370-4749-8f21-8bd07de18fae%40googlegroups.com.

A. Murat Eren

unread,
Mar 4, 2020, 3:01:39 PM3/4/20
to Anvi'o
I apologise for misspelling your name, Jon :( I'm on my phone and I only realized what the autocorrect did after having sent the email.

Jonathon Baker

unread,
Mar 4, 2020, 5:10:44 PM3/4/20
to Anvi'o
No problem!  It is an abnormal spelling for sure.  Not specifying a gene name gives the same result:

$ anvi-get-sequences-for-hmm-hits -c Atopobium_sp._ICM42b_strain_JCVI_33_bin.10_contigs.db --hmm-source Ribosomal_RNAs  -o Atopobium_sp._ICM42b_strain_JCVI_33_bin.10_16S.fa
Contigs DB ...................................: Initialized: Atopobium_sp._ICM42b_strain_JCVI_33_bin.10_contigs.db (v. 14)                                                                
                                                                                                                                                                                          

Config Error: Your selections returned an empty list of genes to work with :/


Hmmm (no pun intended!).  If I use barrnap or infernal cmsearch on the contigs used to create the Anvi'o database, I do find at least the 5S rRNA:

$ barrnap --quiet Atopobium_sp._ICM42b_strain_JCVI_33_bin.10.fixed.fa 
##gff-version 3
c_000000000023  barrnap:0.9     rRNA    339     448     1.7e-13 +       .       Name=5S_rRNA;product=5S ribosomal RNA

$ cmsearch -Z 1000 --hmmonly --cut_ga -o Atopobium_sp._ICM42b_strain_JCVI_33_bin.10.rtRNA.txt --tblout Atopobium_sp._ICM42b_strain_JCVI_33_bin.10.rna_table.txt --noali /usr/local/projdata/9043/projects/MGX/db/rfam/rRNA_tRNA.cm.i1m Atopobium_sp._ICM42b_strain_JCVI_33_bin.10.fixed.fa

$ cat Atopobium_sp._ICM42b_strain_JCVI_33_bin.10.rna_table.txt 
#target name         accession query name           accession mdl mdl from   mdl to seq from   seq to strand trunc pass   gc  bias  score   E-value inc description of target
#------------------- --------- -------------------- --------- --- -------- -------- -------- -------- ------ ----- ---- ---- ----- ------ --------- --- ---------------------
c_000000000023       -         LSU_rRNA_archaea     RF02540   hmm     2775     2983        9      214      +     -    6 0.51   0.0   67.5     6e-18 !   -
c_000000000023       -         LSU_rRNA_bacteria    RF02541   hmm     2696     2913        2      210      +     -    6 0.52   0.0  123.9   7.2e-35 !   -
c_000000000023       -         5S_rRNA              RF00001   hmm        4      116      339      448      +     -    6 0.64   0.0   53.6   4.1e-12 !   -
c_000000000001       -         tRNA                 RF00005   hmm        2       70   161792   161722      -     -    6 0.58   0.0   35.6   2.6e-06 !   -
c_000000000001       -         tRNA                 RF00005   hmm        3       68   120429   120496      +     -    6 0.59   0.0   35.1   3.7e-06 !   -
c_000000000026       -         tRNA                 RF00005   hmm        3       70      287      218      -     -    6 0.60   0.0   31.9   3.9e-05 !   -
c_000000000003       -         tRNA                 RF00005   hmm        3       69    41223    41154      -     -    6 0.57   0.0   31.9     4e-05 !   -
c_000000000003       -         tRNA                 RF00005   hmm        3       68    70506    70572      +     -    6 0.60   0.0   31.9     4e-05 !   -
c_000000000007       -         tRNA                 RF00005   hmm        2       70    21796    21711      -     -    6 0.60   0.0   31.7   4.5e-05 !   -
c_000000000002       -         tRNA                 RF00005   hmm        3       67    34090    34156      +     -    6 0.55   0.0   31.5   5.3e-05 !   -
c_000000000004       -         tRNA                 RF00005   hmm        2       68     5519     5585      +     -    6 0.60   0.0   31.3   6.2e-05 !   -
c_000000000001       -         tRNA                 RF00005   hmm        4       66   137802   137738      -     -    6 0.62   0.0   30.7   9.7e-05 !   -
c_000000000002       -         tRNA                 RF00005   hmm        4       65    34682    34745      +     -    6 0.53   0.0   30.5   0.00011 !   -
c_000000000022       -         tRNA                 RF00005   hmm        3       67    16431    16364      -     -    6 0.63   0.0   30.4   0.00012 !   -
c_000000000003       -         tRNA                 RF00005   hmm        4       62    72074    72134      +     -    6 0.52   0.0   30.1   0.00015 !   -
c_000000000003       -         tRNA                 RF00005   hmm        2       70    28750    28665      -     -    6 0.59   0.0   29.8   0.00018 !   -
c_000000000001       -         tRNA                 RF00005   hmm        5       70      315      248      -     -    6 0.60   0.0   29.7   0.00019 !   -
c_000000000009       -         tRNA                 RF00005   hmm        6       65    23762    23702      -     -    6 0.57   0.0   29.7   0.00019 !   -
c_000000000019       -         tmRNA                RF00023   hmm        1      354     5582     5949      +     -    6 0.48   0.0   77.9   6.2e-20 !   -
#
# Program:         cmsearch
# Version:         1.1.2 (July 2016)
# Pipeline mode:   SEARCH
# Query file:      /usr/local/projdata/9043/projects/MGX/db/rfam/rRNA_tRNA.cm.i1m
# Target file:     Atopobium_sp._ICM42b_strain_JCVI_33_bin.10.fixed.fa
# Option settings: cmsearch -Z 1000 -o Atopobium_sp._ICM42b_strain_JCVI_33_bin.10.rtRNA.txt --tblout Atopobium_sp._ICM42b_strain_JCVI_33_bin.10.rna_table.txt --noali --cut_ga --hmmonly /usr/local/projdata/9043/projects/MGX/db/rfam/rRNA_tRNA.cm.i1m Atopobium_sp._ICM42b_strain_JCVI_33_bin.10.fixed.fa 
# Current dir:     /local/ifs2_projdata/0718/projects/jon/eggnog_data
# Date:            Wed Mar  4 16:20:10 2020
# [ok]

So it looks like it can find the 'LSU' and '5S', but I'm guessing the discrepancy might be due to different cutoffs or methods between the three tools (maybe?).  I'll be the first to admit I don't know much about what's going on under the hood with barrnap, infernal cmsearch, or anvi-run-hmms...

Is there an easy way to have Anvi'o report which rRNAs and tRNAs were identified in a given contigs database?  (if you want to, for example, say your assembly is high-quality based on MIMAG Standards?)

A. Murat Eren

unread,
Mar 4, 2020, 7:42:28 PM3/4/20
to Anvi'o
It is interesting. But I am happy (although it is a very bad choice for a word here) that other methods did not recover any 16S either :p Sorry for this.

The program `anvi-script-gen-hmm-hits-matrix-across-genomes` reports a matrix of hits. I think it could be what you are looking for?


Best,
--

A. Murat Eren (Meren)
http://merenlab.org :: twitter


--
Anvi'o Paper: https://peerj.com/articles/1319/
Project Page: http://merenlab.org/projects/anvio/
Code Repository: https://github.com/meren/anvio
---
You received this message because you are subscribed to the Google Groups "Anvi'o" group.
To unsubscribe from this group and stop receiving emails from it, send an email to anvio+un...@googlegroups.com.

Jonathon Baker

unread,
Mar 4, 2020, 7:50:06 PM3/4/20
to Anvi'o
That sounds like it will do it!  Thank you!

Katie S

unread,
Aug 30, 2023, 9:32:19 AM8/30/23
to Anvi'o
Hey ! This conversation saved me some time on in-house-scripting something to find rRNA's annotated from PROKKA, but is there are way to summarize the number of tRNA's from this command? I see some tRNAs appear in the Archea_76 and Bacteria_71 HMM sources but not with the 20 tRNA names that prokka uses.

I too am trying to determine if my MAGs are MIMAGs highquality with little manual scripting.

Thanks !
Katie

A. Murat Eren (Meren)

unread,
Aug 31, 2023, 4:56:08 AM8/31/23
to an...@googlegroups.com
Hey Katie,

Did you take a look at the output of `anvi-script-gen-hmm-hits-matrix-across-genomes`? I think that may provide you with the best output for this purpose.


Best wishes,
--

A. Murat Eren
 (Meren) | he/him


--
Anvi'o Paper: https://peerj.com/articles/1319/
Project Page: http://merenlab.org/projects/anvio/
Code Repository: https://github.com/meren/anvio
---
You received this message because you are subscribed to the Google Groups "Anvi'o" group.
To unsubscribe from this group and stop receiving emails from it, send an email to anvio+un...@googlegroups.com.

Iva Veseli

unread,
Aug 31, 2023, 5:12:13 AM8/31/23
to an...@googlegroups.com
Funny story, I wrote a reply to this thread earlier that was rejected because my institutional email has changed and did not have permissions to post in the anvi'o group. 🙃 So I'm sending it here again just in case it can provide a bit more context to Meren's response :) See below!

Hi Katie,

Prokka uses Aragorn to find tRNAs while anvi'o uses tRNAScan-SE, so I'm not sure you will see the same tRNA names if you rely exclusively on anvi'o. Furthermore, tRNAScan-SE is only run if you run the program `anvi-scan-trnas`, or if you ran `anvi-run-hmms` with the `--also-scan-trnas` flag. If you didn't do either of those, you may be seeing fewer tRNA annotations than expected. But if you did do those, make sure that you run the `anvi-script-gen-hmm-hits-matrix-across-genomes` program with `--hmm-source Transfer_RNAs` (the source that the tRNA annotations are stored in). 

For example, when I run that command on one of my databases, these are a few of the tRNA names that I see in the output file: Ala_AGC Ala_CGC Ala_GGC ..... Ser_AGA Ser_CGA Ser_GCT

I hope this helps,

Iva

Reply all
Reply to author
Forward
0 new messages