jjo...@gmail.com
unread,Feb 27, 2017, 9:31:31 PM2/27/17Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to HUMAnN Users
Hi,
I wonder if someone could point me towards the files originally used to generate the mappings between KEGG Orthogroups (KOs) and UniRef90 IDs?
The Humann2 documentation states that “in most cases, mappings are directly inferred from the annotation of the corresponding UniRef centroid sequence in UniProt”. However, the idmapping files I can find on uniprot knowledgebase do not contain KO information, so I am not clear on how the file map_ko_uniref90.txt.gz is created.
I ask because I think I have identified an error in the KO-UniRef90 mappings that is causing artifactual results in my Humann2 output.
-----
In more detail – In my mWGS data I am able to find a very strong negative correlation between the abundance of an individual KO (generated from Humann2 0.7.1 output) and the abundance of a single microbial taxon. This relationship is biologically plausible and very interesting to me. However, when I download the full NCBI genome annotation for the relevant taxon, I find that the genome contains a gene that belongs to the KO in question. Querying Genbank - the common names, KO numbers, and EC numbers all confirm that this gene belongs to the KO in question.
When I search for the relevant gene sequence in chocophlan I am able to find it. I then use the fasta header to find the associated UniRef90 ID for this gene sequence. I can then confirm that there is no mapping between the chocophlan UniRef90 ID and the KO within ‘map_ko_uniref90.txt.gz’.
Finally, if I account for counts assigned to the missing UniRef90 ID (taken from the *genefamilies* output), then the negative correlation between my taxon and the KO completely disappears. My conclusion is that a failure to map all UniRef90 IDs to their respective KOs is therefore responsible for my initial observation.
As this is a highly abundant and potentially important taxon, I'm very concerned by this result. Any advice would be greatly appreciated!
Thanks
Jethro