Custom database (hsp step)

518 views
Skip to first unread message

Carolina Alves de Oliveira

unread,
Nov 11, 2020, 8:37:09 AM11/11/20
to picrust-users
Hello!

I'm new to PICRUSt2, so I having some problems using a custom database.

I am trying to use a customized database with PICRUSt2 and for that I am following the guide on this link (https://github.com/picrust/picrust2/wiki/Frequently-Asked-Questions#how-can-i -run-a -custom-or-non-default-database-such as-the-fungi-18s-and-its-databases).

In the hsp step of the tutorial, it says that I need to specify the "reference trait tables", but I'm a little lost in this step. What are these tables? How are they generated and where do I find them?

Any help will be greatly appreciated !!

Thank you in advance,

Carolina

KENKEN KO

unread,
Nov 12, 2020, 12:38:33 AM11/12/20
to picrust-users
Hello!

I think these two table are for fungi 18S. These table are in default_files/fungi.
Use --observed_trait_table if a non-default file is needed. 
Thank you!

Kenken
2020年11月11日水曜日 22:37:09 UTC+9 Carolina Alves de Oliveira:
18S_counts.txt.gz
ec_18S_counts.txt.gz

Gavin Douglas

unread,
Nov 12, 2020, 9:01:39 AM11/12/20
to picrus...@googlegroups.com
Hey Carolina,

Those are gene family abundance tables. So the rows are the reference genomes and the columns are genes (cell correspond to copy numbers of each gene family). You can see the defalt prokaryotic gene copy numer tables in "/picrust2/default_files/prokaryotic”, such as “ko.txt.gz” and for the fungi tables you can see them in "picrust2/default_files/fungi”. You can see these directories if you download the picrust2 repo from source.

They were created by parsing annotation files from IMG / 1000 fungi database and outputing the tables with R.


Cheers,

Gavin

--
You received this message because you are subscribed to the Google Groups "picrust-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to picrust-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/picrust-users/b2c6115a-9d39-48a4-b9a7-2dfee9bf8303n%40googlegroups.com.

Carolina Alves de Oliveira

unread,
Nov 12, 2020, 9:28:44 AM11/12/20
to picrust-users
Hey Gavin!

So a question: I'm using a custom 16S database, so in my case, do I need to specify a non-default trait table? Or can I run with the default ones even when using this custom database?

Just to anticipate, I have another problem. If I can use the default trait tables, I tried, but I got this error:

Error: None of the reference ids within the function abundance table are found within the input tree. This can occur when malformed or mismatched custom reference files are used.
Execution halted

Anyway, thank you!
Carolina

Gavin Douglas

unread,
Nov 12, 2020, 9:53:41 AM11/12/20
to picrus...@googlegroups.com
Yes you will need to because the reference genome ids wont match otherwise. The key thing is that the reference 16S sequences need to have genome annotations provided as well (for whatever gene family database is of interest) so that predictions can be made for 16S sequences with unknown genome content. That error is related to this problem because none of the genome ids in the function abundance table overlapped with the reference 16S sequences in the input tree.


Cheers,

Gavin 

Yu-fei Lin

unread,
Nov 24, 2020, 4:30:58 AM11/24/20
to picrust-users
Hi all, 

This is kind of partially address in the discussion threads.

I'm hoping to build my own fungal db, and you guys have mentioned the 18S/ITS copy numbers can be parsed from annotation files from 1000 fungal genome project.

I've downloaded the gff files from JGI database, but I'm struggling to see how the copy numbers are calculated?

using 'grep' I wasn't able to recover any matches with the word '18S' or 'internal transcribed spacer' from any gff downloaded.


Any suggestions?

Much thanks
Tom

Gavin Douglas

unread,
Nov 24, 2020, 9:19:42 AM11/24/20
to picrus...@googlegroups.com
Hey there,

I used barrnap to parse out the 18S and ITSx to parse out the ITS sequences from the genomes. Note that based on our results these databases did not perform well for predicting fungi genomes, but merely slightly better than random.


All the best,

Gavin

Yu-fei Lin

unread,
Nov 24, 2020, 9:28:47 PM11/24/20
to picrust-users
Hi Gavin,

Thanks for getting back to me :)

As you've said the results from fungal db prediction are not great.  I've tried ITSx, I think it underestimates the ITS copy numbers, but I guess genome assembly quality plays a big role in that area.

I've a couple more questions please,

So in addition to generating a new HMM model and phylogenetic tree (with the new ITS sequences added to the existing PICRUSt2 db)

1)During Hidden-State predictions
- I'll need to manually update the ec_ITS_counts right?
- Will I also need to generate a KO_ITS_counts during hidden-state predictions? (there does not seem to be one for fungi)

2)During infer pathway abundance and adding description
My understanding is that I only need to change the mapping file to metacyc_path2rxn_struc_filt_fungi.txt ?
Are there fungi specific map files for gene family to pathway mapping or would you suggest to turn that off?

Cheers,

Tom

Gavin Douglas

unread,
Nov 25, 2020, 7:01:44 AM11/25/20
to picrus...@googlegroups.com
Hey Tom,

1) Yes you would need to add the EC copy numbers for any additional fungi to that table. Also you would need to annotate the genomes with KOs or convert ECs to KOs to get a KO abundance table.

2) That mapping file should work fine as long as the RXNs that map to any new EC numbers you have added are also included in that file. I haven’t explored much about fungi-specific pathway reconstruction so that would certainyl be something to explore!


Cheers,

Gavin

Reply all
Reply to author
Forward
0 new messages