--taxnomic-level above genus does not visualize taxonomy layer

416 views
Skip to first unread message

alfonso....@unitn.it

unread,
Mar 15, 2017, 12:08:17 PM3/15/17
to Anvi'o
To whom it may concern,

I am stuck on a small visualization issue: when I launch anvi-interactive using the default --taxonomic-level it goes ok, when I launch it with "t_species" is still ok, but when I launch it using the --taxonomic-level "t_family" or above it works but it does not even visualize the layer of taxonomy... from the centrifuge results (which I imported twice in the contigs.db file to be sure everything went fine), there are even family, order and classes assigned... so do you have any idea of why does it happen?

Thanks in advance and kind regards,

Alfonso

A. Murat Eren

unread,
Mar 15, 2017, 12:15:19 PM3/15/17
to an...@googlegroups.com
Hi Alfonso,

Can you please send back the output of this command:

sqlite3 CONTIGS.db 'select * from taxon_names limit 100;'


Thank you,

--

A. Murat Eren (meren)
http://merenlab.org :: gpg

--
Anvi'o Paper: https://peerj.com/articles/1319/
Project Page: http://merenlab.org/projects/anvio/
Code Repository: https://github.com/meren/anvio
---
You received this message because you are subscribed to the Google Groups "Anvi'o" group.
To unsubscribe from this group and stop receiving emails from it, send an email to anvio+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/anvio/5570c285-9a32-446d-88b1-dac5bc083c02%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

alfonso....@unitn.it

unread,
Mar 15, 2017, 12:58:01 PM3/15/17
to Anvi'o
Dear meren,

here below I paste the output of the command you sent me:

1|||||Bradyrhizobium|Bradyrhizobium sp.
2|||||Paucibacter|Paucibacter sp.
3|||||Cupriavidus|Cupriavidus sp.
4|||||Myxococcus|Myxococcus fulvus
5|||||Cupriavidus|Cupriavidus basilensis
6|||||Methylobacterium|Methylobacterium radiotolerans
7|||||Achromobacter|Achromobacter xylosoxidans
8|||||Leptothrix|Leptothrix cholodnii
9|||||Agrobacterium|Agrobacterium radiobacter
10|||||Hydrogenophaga|Hydrogenophaga sp.
11|||||endosymbiont|endosymbiont of
12|||||Pseudonocardia|Pseudonocardia sp.
13|||||Bordetella|Bordetella bronchiseptica
14|||||Bradyrhizobium|Bradyrhizobium icense
15|||||Achromobacter|Achromobacter denitrificans
16|||||Methyloversatilis|Methyloversatilis sp.
17|||||Lysobacter|Lysobacter gummosus
18|||||Methylocystis|Methylocystis sp.
19|||||Streptomyces|Streptomyces lydicus
20|||||Conexibacter|Conexibacter woesei
21|||||Rubrivivax|Rubrivivax gelatinosus
22|||||Sinorhizobium|Sinorhizobium meliloti
23|||||Mitsuaria|Mitsuaria sp.
24|||||Myxococcus|Myxococcus hansupus
25|||||Anaeromyxobacter|Anaeromyxobacter sp.
26|||||Sorangium|Sorangium cellulosum
27|||||Methylibium|Methylibium petroleiphilum
28|||||Streptomyces|Streptomyces rubrolavendulae
29|||||Cupriavidus|Cupriavidus necator
30|||||Alicycliphilus|Alicycliphilus denitrificans
31|||||Nocardia|Nocardia farcinica
32|||||Massilia|Massilia sp.
33|||||Cupriavidus|Cupriavidus metallidurans
34|||||Polaromonas|Polaromonas naphthalenivorans
35|||||Acidovorax|Acidovorax citrulli
36|||||Thiomonas|Thiomonas intermedia
37|||||Citromicrobium|Citromicrobium sp.
38|||||Ilumatobacter|Ilumatobacter coccineus
39|||||Candidatus|Candidatus Nitrosotenuis
40|||||Wenzhouxiangella|Wenzhouxiangella marina
41|||||Pandoraea|Pandoraea apista
42|||||Burkholderia|Burkholderia sp.
43|||||Oligotropha|Oligotropha carboxidovorans
44|||||Rhodanobacter|Rhodanobacter denitrificans
45|||||Isosphaera|Isosphaera pallida
46|||||Rhizobium|Rhizobium leguminosarum
47|||||Aureimonas|Aureimonas sp.
48|||||Pseudomonas|Pseudomonas citronellolis
49|||||Lysobacter|Lysobacter capsici
50|||||Psychrobacter|Psychrobacter arcticus
51|||||Chryseobacterium|Chryseobacterium sp.
52|||||Streptomyces|Streptomyces venezuelae
53|||||Azospirillum|Azospirillum brasilense
54|||||Rhodopseudomonas|Rhodopseudomonas palustris
55|||||Leptospirillum|Leptospirillum sp.
56|||||Desulfuromonas|Desulfuromonas soudanensis
57|||||Rhodoplanes|Rhodoplanes sp.
58|||||Polymorphum|Polymorphum gilvum
59|||||Sulfuritalea|Sulfuritalea hydrogenivorans
60|||||Paraburkholderia|Paraburkholderia caribensis
61|||||Roseiflexus|Roseiflexus castenholzii
62|||||Mesorhizobium|Mesorhizobium ciceri
63|||||Opitutaceae|Opitutaceae bacterium
64|||||Acidithiobacillus|Acidithiobacillus caldus
65|||||Synechococcus|Synechococcus elongatus
66|||||Denitrovibrio|Denitrovibrio acetiphilus
67|||||Thauera|Thauera humireducens
68|||||Parabacteroides|Parabacteroides distasonis
69|||||Archangium|Archangium gephyra
70|||||Xanthomonas|Xanthomonas translucens
71|||||Dechloromonas|Dechloromonas aromatica
72|||||Desulfococcus|Desulfococcus oleovorans
73|||||Xanthomonas|Xanthomonas sacchari
74|||||Streptomyces|Streptomyces sp.
75|||||Raoultella|Raoultella ornithinolytica
76|||||Blastomonas|Blastomonas sp.
77|||||Lysobacter|Lysobacter antibioticus
78|||||Baumannia|Baumannia cicadellinicola
79|||||Hyphomicrobium|Hyphomicrobium denitrificans
80|||||Pandoraea|Pandoraea norimbergensis
81|||||Roseateles|Roseateles depolymerans
82|||||Methylomonas|Methylomonas methanica
83|||||Candidatus|Candidatus Nitrosopumilus
84|||||Thioalkalivibrio|Thioalkalivibrio sulfidiphilus
85|||||Microbacterium|Microbacterium sp.
86|||||Shinella|Shinella sp.
87|||||Vibrio|Vibrio fischeri
88|||||Ralstonia|Ralstonia pickettii
89|||||Porphyrobacter|Porphyrobacter neustonensis
90|||||Cellulomonas|Cellulomonas flavigena
91|||||Deinococcus|Deinococcus gobiensis
92|||||Amycolatopsis|Amycolatopsis orientalis
93|||||Azoarcus|Azoarcus sp.
94|||||Edwardsiella|Edwardsiella sp.
95|||||Pseudomonas|Pseudomonas putida
96|||||Gammaproteobacteria|Gammaproteobacteria bacterium
97|||||Paraburkholderia|Paraburkholderia rhizoxinica
98|||||Halorhodospira|Halorhodospira halophila
99|||||Bordetella|Bordetella parapertussis
100|||||Ramlibacter|Ramlibacter tataouinensis


I see that there actually is only the annotation at the level of genera and species (and nothing else), than I understand it, however, the centrifuge_report.tsv (which I imported twice) shows it, here I paste the first 30 lines (sorry for the mis-formatting of the pasted text):

name taxID taxRank genomeSize numReads numUniqueReads abundance
Azorhizobium caulinodans 7 species 5369772 5 0 0
Buchnera aphidicola 9 species 619958 6 2 0
Cellulomonas gilvus 11 species 3526441 3 0 0
Pelobacter 18 genus 3953506 2 0 0
Pelobacter carbinolicus 19 species 3665893 1 0 0
Phenylobacterium 20 genus 4379231 1 0 0
Shewanella 22 genus 5140018 4 0 0
Myxococcales 29 order 9744470 24 0 0
Myxococcaceae 31 family 9924679 1 0 0
Myxococcus 32 genus 9885694 4 0 0
Myxococcus fulvus 33 species 10026214 6 0 0
Myxococcus xanthus 34 species 9139763 1 0 0
Stigmatella 40 genus 10260756 1 0 0
Stigmatella aurantiaca 41 species 10260756 4 0 0
Archangium gephyra 48 species 12489432 305 215 0
Polyangiaceae 49 family 13907952 2 0 0
Chondromyces crocatus 52 species 11388132 200 143 0
Sorangium cellulosum 56 species 13907952 18 0 0
Lysobacter 68 genus 0 3 0 0
Caulobacter 75 genus 4238499 8 0 0
Hyphomicrobium 81 genus 3700497 1 0 0
Hyphomonas 85 genus 3705021 1 0 0
Leptothrix 88 genus 4909403 6 0 0
Gallionella 96 genus 3162471 1 0 0
Planctomycetales 112 order 6682242 5 0 0
Planctomyces 118 genus 0 1 0 0
Pirellula 123 genus 6196199 1 0 0
Pirellula staleyi 125 species 6196199 1 0 0
Planctomycetaceae 126 family 6202115 5 0 0

I also would like to point out that I recently shifted to the newest version of anvio, if it can help...

Thanks again,

Alfonso

Il giorno mercoledì 15 marzo 2017 17:15:19 UTC+1, Meren ha scritto:
Hi Alfonso,

Can you please send back the output of this command:

sqlite3 CONTIGS.db 'select * from taxon_names limit 100;'


Thank you,

--

A. Murat Eren (meren)
http://merenlab.org :: gpg

On Wed, Mar 15, 2017 at 11:08 AM, <alfonso....@unitn.it> wrote:
To whom it may concern,

I am stuck on a small visualization issue: when I launch anvi-interactive using the default --taxonomic-level it goes ok, when I launch it with "t_species" is still ok, but when I launch it using the --taxonomic-level "t_family" or above it works but it does not even visualize the layer of taxonomy... from the centrifuge results (which I imported twice in the contigs.db file to be sure everything went fine), there are even family, order and classes assigned... so do you have any idea of why does it happen?

Thanks in advance and kind regards,

Alfonso

--
Anvi'o Paper: https://peerj.com/articles/1319/
Project Page: http://merenlab.org/projects/anvio/
Code Repository: https://github.com/meren/anvio
---
You received this message because you are subscribed to the Google Groups "Anvi'o" group.
To unsubscribe from this group and stop receiving emails from it, send an email to anvio+un...@googlegroups.com.

A. Murat Eren

unread,
Mar 15, 2017, 6:45:41 PM3/15/17
to an...@googlegroups.com
​Hi Alfonso,​

On Wed, Mar 15, 2017 at 11:58 AM, <alfonso....@unitn.it> wrote:
I see that there actually is only the annotation at the level of genera and species (and nothing else), than I understand it, however, the centrifuge_report.tsv (which I imported twice) shows it, here I paste the first 30 lines (sorry for the mis-formatting of the pasted text):

You are correct. I will take a look at this. Maybe there are some improvements I've missed :/

When I get back to this, I will share the results of my investigation here in case you would like to keep an eye on:



​Best,​

alfonso....@unitn.it

unread,
Mar 16, 2017, 4:47:14 AM3/16/17
to Anvi'o
Thanks, I will keep an eye to the link you sent me, 

Kind regards

Alfonso

shahrokh....@gmail.com

unread,
Aug 10, 2017, 2:02:57 PM8/10/17
to Anvi'o

Hi, I have the exact same issue. I was wondering to know this if this issue was solved? I am using anvio-2.4.0
Thanks,
Sharok

Nicholas Youngblut

unread,
Jan 3, 2018, 6:18:43 AM1/3/18
to Anvi'o
I'm having this issue, and I'm using anvio v3:

Anvi'o version ...............................: 3
Profile DB version ...........................: 20
Contigs DB version ...........................: 9
Pan DB version ...............................: 5
Samples information DB version ...............: 2
Genome data storage version ..................: 1
Auxiliary data storage version ...............: 4
Anvi'server users data storage version .......: 1

This issue (#476) is closed, so I'm not sure why I'm having this problem with anvio v3. Has the centrifuge parser for anvi-import-taxonomy been altered to include the full taxonomy or is it still just using the genus and species levels? 

Moreover, when I use anvi-summarize, most of my metagenome bins have N/A for taxonomy at the genus level, which is surprising, since I thought that the "taxon_names" table in my contigs.db file seems to contain genus + species. I don't think that most of these metagenome bins are novel, so I'm wondering why I'm getting so many N/A's for taxonomy. Any ideas? 

A. Murat Eren

unread,
Jan 3, 2018, 9:22:20 AM1/3/18
to Anvi'o
Centrifuge output does not provide taxon names below species-level. The issue #476 is closed because we need an entirely different way to make sense of the taxonomy of bins in collections, most likely an approach that relies on phylogenomics, but we haven't been able to find time or energy to do that yet.

We have been using other tools such as RAST or CheckM to assign taxonomy to our final bins in our collections.


Best,

--

A. Murat Eren (meren)
http://merenlab.org :: twitter :: gpg

--
Anvi'o Paper: https://peerj.com/articles/1319/
Project Page: http://merenlab.org/projects/anvio/
Code Repository: https://github.com/meren/anvio
---
You received this message because you are subscribed to the Google Groups "Anvi'o" group.
To unsubscribe from this group and stop receiving emails from it, send an email to anvio+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/anvio/00da237c-cc67-47b3-a32e-2902439c75f1%40googlegroups.com.

Nicholas Youngblut

unread,
Jan 3, 2018, 11:55:38 AM1/3/18
to Anvi'o
Thanks for the quick response! I'm a bit confused by "below the species-level". I just want course taxonomic levels (ie., Genus up to Phylum) not strain-level taxonomy. Is this not possible with centrifuge? When using centrifuge, will `anvi-import-taxonomy -p centrifuge` always just import genus + species or can I get all taxonomic levels?
To unsubscribe from this group and stop receiving emails from it, send an email to anvio+un...@googlegroups.com.

A. Murat Eren

unread,
Jan 3, 2018, 12:01:06 PM1/3/18
to Anvi'o
Centrifuge only reports species names (then anvi'o parser parses out genus name from the species name).

--

A. Murat Eren (meren)
http://merenlab.org :: twitter :: gpg

To unsubscribe from this group and stop receiving emails from it, send an email to anvio+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/anvio/d7e8c598-3df2-450f-82a5-20e7e3dd5202%40googlegroups.com.

Nicholas Youngblut

unread,
Jan 3, 2018, 12:45:11 PM1/3/18
to Anvi'o
OK. I guess that I can just use the taxIDs that are returned by centrifuge in order to get/format the entire taxonomy (as done with Krona) and import that with the standard matrix import method for anvi-import-taxonomy. Thanks for the help!

Michael Lee

unread,
Jan 3, 2018, 1:04:54 PM1/3/18
to an...@googlegroups.com
For any it may help, I’ve found taxonkit to be very helpful in converting NCBI taxon ids to full lineages


-mike

Nicholas Youngblut

unread,
Jan 4, 2018, 6:24:53 AM1/4/18
to Anvi'o
Thanks Mike! taxonkit worked really well for getting the entire lineage for each centrifuge hit. I did have to make sure to remove all "no rank" hits in the centrifuge hits table; otherwise, `taxonkit lineage` would stall with writing out data and up the memory usage to >300 GB of memory. 

I'm wondering how taxonomy is determined by default for collections. When I run `anvi-summarize` with a collection specified, most of the metagenome bins have an "unknown" taxonomy at the genus level. I can't find documentation on how taxonomy for bins is calculated, but I'm probably just missing it in the documentation. Any idea if "unknown" classifications in the anvio-summary are due to multiple/many different taxonomic classifications for gene-calls in the bin? At least for my dataset, that doesn't seem likely, given that even bins with 0% redundancy are classified as "unknown", so I'm in the dark on how taxonomy is actually determined for bins. 

Jarrod

unread,
May 2, 2018, 3:38:06 PM5/2/18
to Anvi'o
FYI Centrifuge now has a 'centrifuge-promote' script that allows you to update the taxa level of the output file. https://github.com/infphilo/centrifuge/issues/54

has anyone tried Kaiju for classification? Its k-mer based 

A. Murat Eren

unread,
May 2, 2018, 5:19:29 PM5/2/18
to Anvi'o
Hi Jarrod,

We could implement the centrifuge solution, but it doesn't look very optimal for an elegant design, so I shall not promise for anything :) Thank you very much for bringing that into our attention.

I haven't tried Kaiju, but I am planning to try KrakenHLL to see how it performs for individual genes, and I would like to hear opinions if anyone had a chance to try it.


Best,

--

A. Murat Eren (meren)
http://merenlab.org :: twitter :: gpg

To unsubscribe from this group and stop receiving emails from it, send an email to anvio+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/anvio/5d00acf7-5f8a-4eac-9ab8-88486e2f7d20%40googlegroups.com.

jarrodjscott

unread,
May 2, 2018, 5:30:01 PM5/2/18
to an...@googlegroups.com
Hi Meren

You are correct. I did not have an easy time implementing the centrifuge workaround or parsing the file. Tough to get full taxi path.  Think it was a quick fix by the developers. 

I have used Kaiju--very easy to build dbs and the are several options--nr, marine, refseq, and proseq (the later two with virus options). Running is straightforward and output is pretty flexible and user friendly. Includes bit scores, full taxon path, and hit assession numbers. 

I have not compared the results with centrifuge yet. Curious how KrakenHLL performs.  

FYI this is an interesting post if you haven't seen it

Best 
Jarrod


--
Anvi'o Paper: https://peerj.com/articles/1319/
Project Page: http://merenlab.org/projects/anvio/
Code Repository: https://github.com/meren/anvio
---
You received this message because you are subscribed to the Google Groups "Anvi'o" group.
To unsubscribe from this group and stop receiving emails from it, send an email to anvio+un...@googlegroups.com.

A. Murat Eren

unread,
May 2, 2018, 5:39:47 PM5/2/18
to Anvi'o
Maybe we should implement a Kaiju parser, then. AND we could do that rather rapidly if you were to help us with your experience with it :)

In an ideal world, if you were to be willing to help, you could send us an example contigs database with a single genome in it, and the Kaiju output you generated using your workflow for that particular contigs database. That would be enough for us to implement a parser for `anvi-import-taxonomy` (which will be called `anvi-import-gene-level-taxonomy` in the next release).

Then, if you feel particularly altruistic, you could add a section in this file somewhat similar to the Centrifuge part of it:


Just saying :p


Best wishes,


--

A. Murat Eren (meren)
http://merenlab.org :: twitter :: gpg

To unsubscribe from this group and stop receiving emails from it, send an email to anvio+unsubscribe@googlegroups.com.

--
Anvi'o Paper: https://peerj.com/articles/1319/
Project Page: http://merenlab.org/projects/anvio/
Code Repository: https://github.com/meren/anvio
---
You received this message because you are subscribed to the Google Groups "Anvi'o" group.
To unsubscribe from this group and stop receiving emails from it, send an email to anvio+unsubscribe@googlegroups.com.

--
Anvi'o Paper: https://peerj.com/articles/1319/
Project Page: http://merenlab.org/projects/anvio/
Code Repository: https://github.com/meren/anvio
---
You received this message because you are subscribed to the Google Groups "Anvi'o" group.
To unsubscribe from this group and stop receiving emails from it, send an email to anvio+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/anvio/CAJ-AE9GGyauHsBvJn_eXzK0CYypKVfWJg6-b01qTGB9FReGHpA%40mail.gmail.com.

jarrodjscott

unread,
May 2, 2018, 5:46:16 PM5/2/18
to an...@googlegroups.com
are you kidding? More than happy to--any chance to give back a little. I will get you something tomorrow. Preferred way of getting the files?

--
Anvi'o Paper: https://peerj.com/articles/1319/
Project Page: http://merenlab.org/projects/anvio/
Code Repository: https://github.com/meren/anvio
---
You received this message because you are subscribed to the Google Groups "Anvi'o" group.
To unsubscribe from this group and stop receiving emails from it, send an email to anvio+un...@googlegroups.com.

--
Anvi'o Paper: https://peerj.com/articles/1319/
Project Page: http://merenlab.org/projects/anvio/
Code Repository: https://github.com/meren/anvio
---
You received this message because you are subscribed to the Google Groups "Anvi'o" group.
To unsubscribe from this group and stop receiving emails from it, send an email to anvio+un...@googlegroups.com.

--
Anvi'o Paper: https://peerj.com/articles/1319/
Project Page: http://merenlab.org/projects/anvio/
Code Repository: https://github.com/meren/anvio
---
You received this message because you are subscribed to the Google Groups "Anvi'o" group.
To unsubscribe from this group and stop receiving emails from it, send an email to anvio+un...@googlegroups.com.

jarrodjscott

unread,
May 2, 2018, 6:00:03 PM5/2/18
to an...@googlegroups.com
I see from another message Dropbox works for you all. I will use that :)

A. Murat Eren

unread,
May 2, 2018, 6:36:02 PM5/2/18
to Anvi'o
Yay :) Thank you very much!

--

A. Murat Eren (meren)
http://merenlab.org :: twitter :: gpg

To unsubscribe from this group and stop receiving emails from it, send an email to anvio+unsubscribe@googlegroups.com.

--
Anvi'o Paper: https://peerj.com/articles/1319/
Project Page: http://merenlab.org/projects/anvio/
Code Repository: https://github.com/meren/anvio
---
You received this message because you are subscribed to the Google Groups "Anvi'o" group.
To unsubscribe from this group and stop receiving emails from it, send an email to anvio+unsubscribe@googlegroups.com.

--
Anvi'o Paper: https://peerj.com/articles/1319/
Project Page: http://merenlab.org/projects/anvio/
Code Repository: https://github.com/meren/anvio
---
You received this message because you are subscribed to the Google Groups "Anvi'o" group.
To unsubscribe from this group and stop receiving emails from it, send an email to anvio+unsubscribe@googlegroups.com.

--
Anvi'o Paper: https://peerj.com/articles/1319/
Project Page: http://merenlab.org/projects/anvio/
Code Repository: https://github.com/meren/anvio
---
You received this message because you are subscribed to the Google Groups "Anvi'o" group.
To unsubscribe from this group and stop receiving emails from it, send an email to anvio+unsubscribe@googlegroups.com.

--
Anvi'o Paper: https://peerj.com/articles/1319/
Project Page: http://merenlab.org/projects/anvio/
Code Repository: https://github.com/meren/anvio
---
You received this message because you are subscribed to the Google Groups "Anvi'o" group.
To unsubscribe from this group and stop receiving emails from it, send an email to anvio+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/anvio/CAJ-AE9F-_i7t6PzsLQV%2BcHF2sbd9Lw-VAxwsUYYCbm-pxGVyog%40mail.gmail.com.

dethlefs

unread,
May 4, 2018, 2:50:32 PM5/4/18
to Anvi'o
Hi Meren and Jarrod!

A few more thoughts on Kaiju:

Like Jarrod, I've tried it and I like it...so I'm happy you're looking at implementing an Anvi'o parser, Meren!  It seems like Jarrod may have already got you the input that let you get started on that work, but feel free to let me know if any additional Kaiju output files might help you.  However, I'm not at far along as Jarrod in terms of actually bringing Kaiju taxonomy in to Anvi'o.  My focus thus far has been on exploring Kaiju output with various options/settings, and also  comparing it to Diamond output generated from the same queries and reference DB.  Diamond was run with the BLOSUM80 scoring matrix and min thresholds of 40% identity over 70% of query length, with the top 10 option to report all hits with raw alignment scores at least 90% of the value of the best hit for that query.  Diamond output was MUCH slower to generate, obviously.

For everything I'm talking about, my query inputs were Prodigal-predicted genes on cross-sample, within-subject Megahit-assembled contigs from many stool samples collected over time from a handful of human subjects, and the reference DB is UniRef100.

While Kaiju does helpfully output its 'greedy' algorithm score, I'd really discourage referring to this as a bitscore; I note Kaiju's developers call it a 'score' and not 'bitscore'.  Unlike Diamond and blastp that really can calculate comparable bitscores because they have attempted to align as much of each query to a refDB hit as possible, Kaiju's 'greedy' algorithm will stop extending the alignment upon encountering an arbitrary threshold number of mismatches (default 3), even if there are additional alignable bases beyond.  So you can't expect the Kaiju scores to be as precise an estimate of the quality of the match as the Diamond/blastp bitscore (and you can't complain because that's one aspect of Kaiju's speed).  In practice, I've found a correlation coefficient of ~0.75 between Kaiju's score and Diamond's bitscore on the same queries...good and useful, but not the same.

And strikingly, using default Kaiju settings, the number of hits from the same query set almost exactly matches the number returned from the Diamond query described above.  About 90% of my genes were getting hits with either algorithm, and ~96% of those genes had hits found by both Kaiju and Diamond.  Somewhat more hits were found by Kaiju and not Diamond (~4%) than the reverse (~3%), which I think is a real difference because it was consistent across different query sets.  Considering the single best Diamond hit per query, the mean score or bitscore for genes with hits found by one method but missed by the other was considerably lower than the mean score/bitscore for hits they both found, but beyond that I can't tell you much about the genes/hits found by only one method.  This seems promising to me.

Of genes where both methods found hits, the ID of the best Diamond hit was listed in the Kaiju output only about half the time, and that doesn't seem so good.  But the reality may be better than this would suggest, at least for taxonomy calls.  Kaiju only lists 20 hits max, to prevent *very* long output for some queries, but this can't contribute much to missed matches...<1% of my query genes have that many Kaiju hits. What I really need to do is get the LCA of all the Diamond hits to compare that to the Kaiju LCA output, but I haven't done that yet.  That can only improve the apparent agreement between methods, I think.  But this result suggests Kaiju probably can't be a Diamond replacement for finding true best hits from a big refDB, even if it works pretty well for taxonomy calls.  (The Kaiju developers aren't claiming it's a Diamond replacement, but I had hopes...)

I haven't thought through what all this might mean when using a smaller, less redundant database like ProGenomes that the Kaiju developers seem to like.  I've also got (apparently) good calls for my genes from Kaiju using ProGenomes, but I haven't yet investigated the output very much.

Best wishes to all,
Les

--
Anvi'o Paper: https://peerj.com/articles/1319/
Project Page: http://merenlab.org/projects/anvio/
Code Repository: https://github.com/meren/anvio
---
You received this message because you are subscribed to the Google Groups "Anvi'o" group.
To unsubscribe from this group and stop receiving emails from it, send an email to anvio+un...@googlegroups.com.

--
Anvi'o Paper: https://peerj.com/articles/1319/
Project Page: http://merenlab.org/projects/anvio/
Code Repository: https://github.com/meren/anvio
---
You received this message because you are subscribed to the Google Groups "Anvi'o" group.
To unsubscribe from this group and stop receiving emails from it, send an email to anvio+un...@googlegroups.com.

--
Anvi'o Paper: https://peerj.com/articles/1319/
Project Page: http://merenlab.org/projects/anvio/
Code Repository: https://github.com/meren/anvio
---
You received this message because you are subscribed to the Google Groups "Anvi'o" group.
To unsubscribe from this group and stop receiving emails from it, send an email to anvio+un...@googlegroups.com.

--
Anvi'o Paper: https://peerj.com/articles/1319/
Project Page: http://merenlab.org/projects/anvio/
Code Repository: https://github.com/meren/anvio
---
You received this message because you are subscribed to the Google Groups "Anvi'o" group.
To unsubscribe from this group and stop receiving emails from it, send an email to anvio+un...@googlegroups.com.

A. Murat Eren

unread,
May 5, 2018, 8:09:37 PM5/5/18
to Anvi'o
Hi Les,

Thank you very much for your insights. 

I finally found some time to implement a draft parser for kaiju:


Anyone who is using the master repository could try it using the program `anvi-import-taxonomy-for-genes` with `--parser kaiju` to import gene level taxonomy into the contigs database. In the next version we will have also a program called `anvi-import-taxonomy-for-layers`, which will import metagenome-level taxonomy based on short reads if someone is interested in putting that information into the context of their binning efforts. Ozcan and I are working on the codebase to facilitate that, and we will update our documentation when `v5` is out.

Meanwhile Jarrod will help us write a tiny tutorial at http://merenlab.org/2016/06/18/importing-taxonomy/ to clarify how to export genes from a contigs database, and how to run kaiju on that export in a similar fashion to centrifuge so people can do it.


Best wishes,

--

A. Murat Eren (meren)
http://merenlab.org :: twitter :: gpg

To unsubscribe from this group and stop receiving emails from it, send an email to anvio+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/anvio/ca9e5531-4dac-4b23-a59a-f9cc7ea555e7%40googlegroups.com.

Bryan Merrill

unread,
Jun 4, 2018, 8:10:37 PM6/4/18
to Anvi'o
Hi Jarrod,

I'm wondering if using 'centrifuge-promote' allowed you to visualize a different taxa level in anvi-interactive. When I "promoted" mine to family, the taxonomy layer just disappeared as the original poster indicated.

Best,
Bryan

jarrodjscott

unread,
Jun 4, 2018, 8:45:02 PM6/4/18
to an...@googlegroups.com
Hi Bryan 

I should have mentioned this in my original message. I reformatted the output to the "Simple Matrix" format and imported using the "--default-matrix" flag as described here:


This allows you to select any level you wish. Hope this helps!
Jarrod

qiuyuj...@gmail.com

unread,
Nov 17, 2018, 8:49:05 PM11/17/18
to Anvi'o
Hi Meren,

I encountered situation that in my anvi-interactive results, all of my taxonomies are "None". I checked the centrifuge_report.tsv and used your command "sqlite3 contigs.db 'select * from taxon_names limit 100;' " to check the taxonomy results, the results are as below:
1|||||Klebsiella|Klebsiella pneumoniae
2|||||BeAn|BeAn 58058
3|||||Torque|Torque teno
4|||||Actinoalloteichus|Actinoalloteichus sp.

Even though there are only four species, but i can not see them in my anvi-interactive results. Is there any clues about this issue?

Thank you very much!

在 2017年3月16日星期四 UTC+8上午6:45:41,Meren写道:
Reply all
Reply to author
Forward
0 new messages