custom-or-non-default-database

Juliana Young

unread,

Oct 7, 2021, 3:38:44 PM10/7/21

to picrust-users

Hi Gavin and all,

Hope this finds you guys very well!

I am reaching out because our lab has been using Picrust1-2 in some projects, and now we are interested to run Picrust2 with a custom database for ruminant microbiota. However, we are stacked in the process of creating the custom reference database or default_files/prokaryotic/pro_ref/. I have seen your instructions here https://github.com/picrust/picrust2/wiki/Frequently-Asked-Questions#how-can-i-run-a-custom-or-non-default-database-such-as-the-fungi-18s-and-its-databases, but unfortunately, I still need some more detailed guidance in order to get this done.

Specifically, I would like to ask how to generate the files: pro_ref.hmm, pro_ref.model, pro_ref.raxml_info and pro_ref.tree.

Could you please help me with this? I would appreciate any help!

Thank you very much in advance,

Cheers!

Juliana

Gavin Douglas

unread,

Oct 8, 2021, 1:41:33 PM10/8/21

to picrus...@googlegroups.com

Hey Juliana,

It’s best to creat the tree with RAXML as that will create the model and raxml info files. To get the hmm file you need to run “hmmbuild”, which part of hmmer, on the multiple sequence alignment. RAXML should output a model file if you use that for tree building, but you can also create a model file from an alignment based on the descriptions of the required format here: https://github.com/Pbdas/epa-ng#setting-the-model-parameters

The RAXML info file is trickier, but is only needed for SEPP, which unfortunately is an older format of the file that is output by the current of RAXML. I created the file that is part of the repository manually by editing the RAXML info file I acquired from the output of RAXML to look similar to the input file used in SEPP examples.

Hopefully that helps!

Gavin

--
You received this message because you are subscribed to the Google Groups "picrust-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to picrust-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/picrust-users/6ffe1026-efe3-4a01-898d-8aa7a4f0cef3n%40googlegroups.com.

Juliana Young

unread,

Oct 12, 2021, 11:27:03 AM10/12/21

to picrust-users

Hi Gavin,

Thank you very much for getting back to me so quickly!!! Yup, I will try that and let you know!

Cheers!

Juliana Young

unread,

Feb 20, 2023, 1:28:26 PM2/20/23

to picrust-users

Hi Gavin,

I hope this finds well. As you suggested awhile ago, I ran raxml-ng, but unfortunately it did not output a model file.

raxml-ng --msa all_rumen_16S_combined.renamed.fas --model GTR+G --prefix cows

raxml-ng --all --msa all_rumen_16S_combined.renamed.fas --model GTR+G --prefix cows

So, I would love to run Picrust2 with this specific rumen database (https://zenodo.org/record/1252858#.Y_OpCHbMK3A), but my bioinformatic skills are limited. I wonder if there is a protocol showing the step by step to create the custom reference files (Tree in newick format (.tree); Alignment file via Hidden Markov Model (.hmm); Model file (.model)). I was able to generate the tree and hmm but I am stuck in the model step.

Thank you very much in advance!

Gavin Douglas

unread,

Feb 20, 2023, 2:25:14 PM2/20/23

to picrust-users

Hey there,

Sorry no I don't have a workflow or tutorial written up showing how one could generate those files. Have you see the description of how to create a model file here? https://github.com/Pbdas/epa-ng#setting-the-model-parameters Hopefully that will be what you need to generate the model file. I have to admit I don't know anything more about the model format than what is written there, as that file is only needed by the pipeline as it is used by EPA-ng.