Hello,
ALL the identified strains by metaphlan/strainphlan (accessed through biobakery workflows) for the workflows tutorial data are labeled as unclassified even without using marker_in_clade 0.01. I understand that this could be, in part, due to sub-sampling of the tutorial data for demonstration purposes but when I use metaphlan/strainphlan on over
300 full human gut microbiome samples that I have, ~70% of identified
strains, on average, are labeled as "unclassified".
- I wonder if this is a typical behavior.
- Is there any way of improving strain identification e.g., by modifying the default parameters of strainPhlAn?
- How is strainPhlAn2 different from strainPhlAn? Does strainPhlAn2 have improved strain profiling capabilities that can possibly help with this issue (workflows currently supports strainPhlAn only)?
- Is there any way of having metaphlan run both strainPhlan and panPhlAn and combine the results at the end to improve strain profiling?
Another strainPhlAn question: Is there a convenient way of converting RefSeq assembly accession ids of the identified strains (GCF_###) by strainPhlAn to the actual strain names (e.g., e.g., instead of GCF_000173975, report Anaerobutyricum hallii DSM 3353)? Accession ids (GCF_###)) are not very useful when presenting the data to biologists.