Hello,
I am currently working with a CF lung metatranscriptome data set that contains 10000-50000 reads per fastq file after removing human and rRNA reads. I ran these fastq files through Humann2 with the following command:
humann2 --metaphlan ~/.local/bin/metaphlan2 --input ${mysample} --output ./output_directory
When I ran my fastq files through this command, I noticed that metaphlan2 isn't picking up any of the taxa I know to be present when I run these samples through other taxonomic identification programs like KRAKEN and KAIJU. Instead, I receive the following warning:
Total species selected from prescreen: 0
Selected species explain 0.00% of predicted community composition
No species were selected from the prescreen.
Because of this the custom ChocoPhlAn database is empty.
This will result in zero species-specific gene families and pathways
Since I already had an idea of which species were present in CF sputa using KRAKEN and KAIJU (eg. Pseudomonas aeruginosa, Staphylococcus aureus, Stenotrophomonas maltophilia, Streptococcus spp., etc), I wanted Humann2 to create a custom Chocophlan2 database with those taxa before performing the nucleotide-level search. I added the --taxonomic-profile bugs_list.tsv flag so that a custom Chocophlan database could be prepared from the bugs list I made.
My issue is that I'm not sure how to format the bugs_list.tsv file since running Metaphlan2 on my samples yielded an output file of 100% unclassified for half of my samples.