Hi Nicola,
I ran MetaPhlAn (using bowtie2db) on a few fastq human gut microbiome samples from a study by Qin et al. (Nature 2012, A metagenome-wide association study of gut microbiota in type 2 diabetes). I have two questions in continuum of this thread:
1) When I used your recommended "--bt2_ps very-sensitive" or "--bt2_ps sensitive" options, the output file containing the relative abundances shows me a single line of "unclassified 100.0". I was very surprised to see this. Can you please provide any comments on how this may have happened, and possibly any recommendations?
(I'm pretty sure all my files are installed correctly, and I have made sure of this while I was getting to know the program with your example files posted online, e.g. LC1.fna)
2) I next used the default "--bt2_ps" option, i.e. "very-sensitive-local", and I got relative abundances covering a wide range of different taxonomies (something you'd more likely expect). However, when I look at the relative abundances at the species level, the most abundant species came out to be "s__bacteroides_unclassified" @ 30-40% for my samples. Is there a way in MetaPhlAn to get around this, such as maybe forcing the program to find the known species most closely related to this "unclassified" one? Or am I stuck with this "unclassified" species as the most dominant at the species level? I would be most grateful for any comments and recommendations.
Are you surprised by these results, or can these things likely happen? Also, please let me know if you suspect any kind of technical error on my end, and I'll check, just in case.
Thank you.
Jaeyun