How do I create ec_counts for custom database

Apr 8, 2021, 5:02:52 AM4/8/21
to picrust-users
I am trying to run picrust2 with custome database which is deposited in I successfully ran the first step. But for Hidden state prediction step, I need ec_16S_counts.txt.gz to create EC_predicted.tsv.gz and ko_16S_counts.txt.gz for creating KO_predicted.tsv.gz file. How do I create these files?

Gavin Douglas

Apr 8, 2021, 8:24:38 AM4/8/21
The gene family count tables correspond to the numbers of each type of gene family across each reference genome. GhostKOALA would be one method of annotating the genomes with KOs for instance.

The 16S files correspond to the 16S copy numbers per genome, which I identified originally simply by seeing how many were annotated in each genome.

Apr 9, 2021, 4:37:10 AM4/9/21
to picrust-users
Thank you for your quick reply and for your time. This is my first time working with customized database. 
We need to provide aminoacid sequences and the output will be KO terms.
As of now, I have rrNA sequence from database. So I convert them into aminoacid and feed to ghostkoala
Is this right way to do it? 
And How do i obtain EC numbers?

Gavin Douglas

Apr 9, 2021, 6:48:49 AM4/9/21
Ah I see - no the KO and EC tables refer to gene family annotations from across the entire genome (not just what the rRNA genes are annotated as).

So rather than feeding the tool those sequences, you would need the genomes that those rRNA genes corresponded to and then annotate them with KOs and ECs. PICRUSt2 is agnostic to how they are annotated by the way. One way to get these annotations would be with prokka, which I believe also automatically identifies EC matches.

