Thank you for your reply. In fact, please pay attention to my question. First of all, I recognize the efforts of the authors and the writing of the wiki, but the content here for the custom database is very brief, and it is not even a one-click script. I can't help but wonder whether it is deliberately not recommended to users to allow it. You know, in other software, such as blast, building a custom library is just a matter of command.
Secondly, the wiki does not specify whether the fasta file required to build the database is a gene file or a protein file. I checked the ko database you built, and the pro_ref.fna in it is a gene file, which contains ATCG. However, in fact, I only have the protein file, which is like this:
> AF022812 | AAC60788 | Sulfurospirillum
multivorans strain N | 1
MEKKKKPELSRRDFGKLIIGGGAAATIAPFGVPGANAAEKEKNAAEIRQQFAMTAGSPIIVNDKLERYAEVRTAFTHPTSFFKPNYKGEVKPWFLSAYDEKVRQIENGENGPKMKAKNVGEARAGRALEAAGWTLDINYGNIYPNRFFMLWSGETMTNTQLWAPVGLDRRPPDTTDPVELTNYVKFAARMAGADLVGVARLNRNWVYSEAVTIPADVPYEQSLHKEIEKPIVFKDVPLPIETDDELIIPNTCENVIVAGIAMNREMMQTAPNSMACATTAFCYSRMCMFDMWLCQFIRYMGYYAIPSCNGVGQSVAFAVEAGLGQASRMGACITPEFGPNVRLTKVFTNMPLVPDKPIDFGVTEFCETCKKCARECPSKAITEGPRTFEGRSIHNQSGKLQWQNDYNKCLGYWPESGGYCGVCVAVCPFTKGNIWIHDGVEWLIDNTRFLDPLMLGMDDALGYGAKRNITEVWDGKINTYGLDADHFRDTVSFRKDRVKKS
> AY013367 | AAG46194 | Sulfurospirillum halorespirans DSM 13726 | 1
MEKKKKPELSRRDFGKLIIGAGAAATIAPFGVPGANAAEKEKNAAEIRQQFAMTAGSPIIVNDKLERYAQVRTAFTHPTSMFKPNYKGEVKHWFLSSCDEKVRQIENGENGPKMKAKNVGEARAGRALEAAGWTLDXNFGGSFGSYYPNRFSMLWSGETMLNTQMWATVGLDRRPPDTTDPVELTNYVKFAARMAGADLVGVARLNRNWVYSGAVTIPDEQSWHKEIEKPIVFKDVPLPIETDDELIIPNTCDNVIVSGIAMNREMLQTAPTSM
I also don't know the gene copy number corresponding to 16S. Can the database I built based on it work properly?
I am paying attention to your question, but your question is not clear. In my first response to you on the PICRUSt2 Github I asked you for some more information on what it is exactly that you're trying to do. I've tried to help without knowing this based on the questions that you are asking me, but as you appear not to like my answers, I assume that I don't have enough information from you currently. So please give me more information specifically on:
1. What is the data that you have that you are trying to build a PICRUSt2 database with? Is it complete genomes?
2. What is the data that you are trying to run through this PICRUSt2 database? Is it amplicon sequencing data?
I need to know exactly what the data you have is, what format it is in, and what you are trying to get from this data.
PICRUSt2 is used for the prediction of functions within amplicon sequencing data. It requires a reference database of genomes with annotated functions as well as a phylogenetic tree containing marker genes within these genomes. The sequences from your study would then be compared to the marker genes within the genomes (most of the time, it is the 16S rRNA gene that we are using here, although it doesn't have to be the 16S rRNA gene). PICRUSt2 is only designed to work with nucleic acid sequences for both the study sequences and the reference sequences within the tree. When we are comparing the sequences in our study to the genes in our reference tree, we must first make a multiple sequence alignment of these to determine the best fit within the tree, and then the gene content of our study sequences is predicted based on their placement in the tree. You can refer to the flowchart here for a detailed description of all of the steps involved. A tool like BLAST is simply comparing two nucleotide sequences, hence it is easy to make a reference database. PICRUSt2 is carrying out a lot more steps, and the files needed for each of these steps often need to be constructed with different parameters, hence there is not a single command to make a PICRUSt2 database. Constructing a PICRUSt2 database is not a trivial task - in fact, we are currently in the process of publishing an entire paper just on the construction of a new PICRUSt2 database; this would not be necessary if there were a way to simplify this to a single command.
I am not stopping users from building a custom database, but building a database does require some bioinformatic knowledge as well as knowledge of the organisms that you want to have in the database, and I would therefore not recommend it to a user performing their first bioinformatic analysis. If there is a particular function that you are interested in then it is much easier to add this into the existing database, and I am currently in the process of writing step-by-step instructions for users to do this. This will still require more bioinformatic skill than just using the existing PICRUSt2 database, but will be easier than constructing an entirely new database.