Problem with custom database (place_seqs step)

52 views
Skip to first unread message

Vincent Darbot

unread,
Feb 14, 2024, 4:00:40 AMFeb 14
to picrust-users

Hello,

To enhance functional predictions for fungal ITS sequences, I am currently constructing a custom database with a greater number of reference sequences sourced from the 1000 Fungal Genomes Project.

 

For this purpose, I have incorporated additional 878 ITS sequences obtained from complete genomes, for which I successfully retrieved Metacyc annotations. This supplements the existing 190 sequences in the current picrust2 fungi_ITS database, resulting in a reference tree comprising 1068 ITS sequences.

 

Following the guidelines provided by picrust2, I have ensured that all necessary database files are correctly prepared. These include the .aln alignment file, .hmm file derived from the alignment, .tre phylogenetic tree file, and .model file for the model utilized by RAxML. These files are attached to this email.

 

I then tested picrust2 against this extended reference database. I only entered 300 nucleotide sequences from the 1068 sequences used to build the new reference tree. We therefore expect place_seqs to place/insert all 1068 short sequences into the reference tree. However, this is not the case. 176 of them were not inserted.

When I lower the --min-align threshold from 0.8 to 0.5, more than 22 sequences are not inserted.

 

To understand the problem, I wanted to check that one of the short sequences that didn't fit into the reference tree (one of the 176) aligned with the 1068 long sequences in the multiple alignment. And it did. Perhaps this is a problem with HMM profiles?

 

My alignment file contains numerous gaps (9341 sites in the alignment, with sequences ranging from 205 to 2483 in length). Could this be a contributing factor? (I observed that the 16S tree has significantly fewer gaps).

 

Do you have any insights into the potential source of this issue and any recommendations for improvement?


Thank you for your help.

Best regards,

Vincent Darbot

Reply all
Reply to author
Forward
0 new messages