How do I create ec_counts for custom database

34 views
Skip to first unread message

mail2...@gmail.com

unread,
Apr 8, 2021, 5:02:52 AM4/8/21
to picrust-users
Dear all,
I am trying to run picrust2 with custome database which is deposited in https://github.com/mruehlemann/16s_cnv_correction_databases. I successfully ran the first step. But for Hidden state prediction step, I need ec_16S_counts.txt.gz to create EC_predicted.tsv.gz and ko_16S_counts.txt.gz for creating KO_predicted.tsv.gz file. How do I create these files?

Thank you in advance

Regards
Monica

Gavin Douglas

unread,
Apr 8, 2021, 8:24:38 AM4/8/21
to picrus...@googlegroups.com
Hey Monica,

The gene family count tables correspond to the numbers of each type of gene family across each reference genome. GhostKOALA would be one method of annotating the genomes with KOs for instance.

The 16S files correspond to the 16S copy numbers per genome, which I identified originally simply by seeing how many were annotated in each genome.

Does that answer your question?


Gavin

--
You received this message because you are subscribed to the Google Groups "picrust-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to picrust-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/picrust-users/30e2710e-b3cc-4a0f-bf7f-d464ff2ca2acn%40googlegroups.com.

mail2...@gmail.com

unread,
Apr 9, 2021, 4:37:10 AM4/9/21
to picrust-users
Dear Gavin,
Thank you for your quick reply and for your time. This is my first time working with customized database. 
We need to provide aminoacid sequences and the output will be KO terms.
As of now, I have rrNA sequence from https://rrndb.umms.med.umich.edu/static/download/ database. So I convert them into aminoacid and feed to ghostkoala
Is this right way to do it? 
And How do i obtain EC numbers?

Gavin Douglas

unread,
Apr 9, 2021, 6:48:49 AM4/9/21
to picrus...@googlegroups.com
Ah I see - no the KO and EC tables refer to gene family annotations from across the entire genome (not just what the rRNA genes are annotated as).

So rather than feeding the tool those sequences, you would need the genomes that those rRNA genes corresponded to and then annotate them with KOs and ECs. PICRUSt2 is agnostic to how they are annotated by the way. One way to get these annotations would be with prokka, which I believe also automatically identifies EC matches.


Hopefully that helps!

Gavin

Reply all
Reply to author
Forward
0 new messages