strainphlan - alignment parameters - estimation of strain's richness

129 views

Skip to first unread message

Flo

unread,

Sep 11, 2019, 10:51:14 AM9/11/19

to MetaPhlAn-users

Dear metaphlan2/Strainphlan users and developers,

Thanks a lot for developing and maintaining very useful tools. I am working on oral microbiome samples using meta-genomics (2x250 pb). I have followed your detailed tutorial and I have some questions regarding metaphlan2 /Strainphlan pipeline.

I run metaphlan2v2.7.6 using (both default and --min_alignment_len 100 --bt2_ps sensitive-local) to get the taxonomic profile of my samples and I used strainphlan (i.e., https://bitbucket.org/biobakery/biobakery/wiki/strainphlan) to extract makers and then I explored different parameters to build trees.

I then used R to plot trees produced by strainphlan as well as msaplots and PCoA computed using Kimura’s distance computed using alignment fasta files produced by strainphlan.

1- My understanding is that we get a ‘population profile’ of the strains and based on markers SNP’s we can measure distance among samples regarding that particular species’s ‘population profile’. Is that correct ? Is there a way to estimate strain richness, (i.e., number of different strains)?

2- I am working with 2*250 nts length quality trimmed reads. Would you recommend tuning bowtie2 options --min_alignment_len --bt2_ps as I have seen in some metaphlan2 tutorials e.g., http://bioinformatics-ca.github.io/analysis_of_metagenomic_data_mod3_lab_2015/ for strainphlan?

3- Do you have a recommendation regarding the statistical tests in order to statistically confirm discrimination based on strainphlan? I was thinking on PERMANOVA using Kimura’s distance.

Thanks a ton,

Florentin

Nicola Segata

unread,

Nov 15, 2019, 3:25:48 AM11/15/19

to MetaPhlAn-users

Hi Florentin,

thaks for getting in touch.

Answering to your questions:

1. No, StrainPhlAn returns the genetic profiles (i.e. marker sequences) of the most abundant (dominant) strain for each species. So you do the population genetic analysis across samples you are comparing the dominant strains in the population. StrainPhlAn discards the cases in which two strains from the same species are at the same abundance as it is not possible to assign the SNPs to one or the other

2. Yes, with reads longer than 150nt, we found that in several cases it is better to use the local alignment (i.e. --bt2_ps sensitive-local or --bt2_ps very-sensitive-local) and set --min_alignment_len at least at 50. I would suggest you try both local and non-local and see which of them produces more reconstructed markers in your specific dataset.

3. PERMANOVA on Kimura's ditance computed on the StrainPhlAn alignment looks like a nice option to me. The alternative could be PERMANOVA on the phylogenetic distances computed on the final StrainPhlAn phylogenies.

thanks

Nicola

Reply all

Reply to author

Forward

0 new messages