Interpretation of AMOVA - group and population hierarchies

821 views
Skip to first unread message

Ben Sutherland

unread,
Jul 18, 2019, 2:28:26 PM7/18/19
to po...@googlegroups.com
Hello,
I have a question about the output of the amova function within poppr.

Specifically, my data is hierarchical where repunits (i.e. group) is the highest strata, followed by collection (i.e. population).
I have assigned strata to a genind file and would like to understand where the variation exists in the data.

When I look at the readout of the amova object however, I get the following titles:
Variations  Between repunit
Variations  Between samples Within repunit
Variations  Within samples

I thought that the output would look more along the lines of:
Variations Between repunit
Variations Between collections
Variations Between samples within collections

I will include my code below. I do not expect to see 'Variations Within samples' at all, since I use within = FALSE


# Here are the unique strata of my data:

unique
(genind.data$strata)
        collection repunit
1       River1     CLB
64       River2     CLB
151      River3   CC_NC
265       River4   FR_KL
429       River5   CC_NC
524        River6   CC_NC
616     River7   KI_KL
741      River8   FR_KL
864   River9   KI_KL
951         River10   CC_NC
1047       River11     CLB
1100      River12   CC_NC
1235 River13    CeAK
1334        River14   CC_NC
1385     River15   CC_NC



# run amova

obj
.amova.2 <- poppr.amova(x = genind.data, hier = ~repunit/collection, within = FALSE)
# note: within=FALSE is used as I am not as interested in the within sample variation, and also the results end up with negative percentage values, which makes it less understandable to me.



# view results

obj
.amova.2
$call
ade4
::amova(samples = xtab, distances = xdist, structures = xstruct)

$results
                                 
Df     Sum Sq   Mean Sq
Between repunit                   4  1180.9749 295.24373
Between samples Within repunit   10   392.3532  39.23532
Within samples                 1462 38281.6432  26.18443
Total                          1476 39854.9713  27.00201

$componentsofcovariance
                                               
Sigma           %
Variations  Between repunit                 0.9728207   3.5641307
Variations  Between samples Within repunit  0.1374994   0.5037575
Variations  Within samples                 26.1844345  95.9321118
Total variations                           27.2947546 100.0000000

$statphi
                           
Phi
Phi-samples-total   0.040678882
Phi-samples-repunit 0.005223756
Phi-repunit-total   0.035641307



I may not be understanding correctly, but should the result types not be:
Between repunit
Between collections within repunit
Within collections (or among samples)

Thank you very much for any information and advice,

Ben


> sessionInfo()
R version
3.5.3 (2019-03-11)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.5 LTS

Matrix products: default
BLAS
: /usr/lib/libblas/libblas.so.3.6.0
LAPACK
: /usr/lib/lapack/liblapack.so.3.6.0

locale
:
 
[1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C               LC_TIME=en_CA.UTF-8        LC_COLLATE=en_CA.UTF-8    
 
[5] LC_MONETARY=en_CA.UTF-8    LC_MESSAGES=en_CA.UTF-8    LC_PAPER=en_CA.UTF-8       LC_NAME=C                
 
[9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C      

attached
base packages:
[1] tcltk     stats     graphics  grDevices utils     datasets  methods   base    

other attached packages
:
 
[1] geosphere_1.5-10  tidyr_0.8.3       stringr_1.4.0     SNPRelate_1.16.0  gdsfmt_1.18.1     poppr_2.8.3      
 
[7] phangorn_2.5.5    ape_5.3           hierfstat_0.04-22 adegenet_2.1.1    ade4_1.7-13       cluster_2.1.0    
[13] units_0.6-3      

loaded via a
namespace (and not attached):
 
[1] Rcpp_1.0.1         lattice_0.20-38    deldir_0.1-21      class_7.3-15       gtools_3.8.1       assertthat_0.2.1  
 
[7] digest_0.6.19      mime_0.7           R6_2.4.0           plyr_1.8.4         coda_0.19-2        e1071_1.7-2      
[13] ggplot2_3.2.0      pillar_1.4.1       rlang_0.4.0        lazyeval_0.2.2     spdep_1.1-2        rstudioapi_0.10  
[19] gdata_2.18.0       vegan_2.5-5        gmodels_2.18.1     Matrix_1.2-17      splines_3.5.3      polysat_1.7-4    
[25] igraph_1.2.4.1     munsell_0.5.0      shiny_1.3.2        compiler_3.5.3     httpuv_1.5.1       pkgconfig_2.0.2  
[31] mgcv_1.8-28        htmltools_0.3.6    tidyselect_0.2.5   tibble_2.1.3       expm_0.999-4       quadprog_1.5-7    
[37] permute_0.9-5      crayon_1.3.4       dplyr_0.8.1        later_0.8.0        sf_0.7-4           MASS_7.3-51.4    
[43] grid_3.5.3         nlme_3.1-140       spData_0.3.0       xtable_1.8-4       gtable_0.3.0       DBI_1.0.0        
[49] magrittr_1.5       scales_1.0.0       KernSmooth_2.23-15 stringi_1.4.3      reshape2_1.4.3     LearnBayes_2.15.1
[55] promises_1.0.1     pegas_0.11         sp_1.3-1           seqinr_3.4-5       boot_1.3-22        fastmatch_1.1-0  
[61] tools_3.5.3        glue_1.3.1         purrr_0.3.2        parallel_3.5.3     colorspace_1.4-1   classInt_0.3-3




Zhian Kamvar

unread,
Jul 19, 2019, 2:48:32 AM7/19/19
to Ben Sutherland, poppr
That is an artifact of the ade4 implementation of AMOVA running under the hood. The “samples” here represents your lowest hierarchy. You can try it with method = “pegas” and you should get the same result with the proper labels. 

Sent from my iPhone
--
You received this message because you are subscribed to the Google Groups "poppr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to poppr+un...@googlegroups.com.
To post to this group, send email to po...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/poppr/b4a59c79-3ef9-497d-b8b7-60046629e031%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ben Sutherland

unread,
Jul 25, 2019, 1:15:39 PM7/25/19
to poppr
Thank you Zhian, that works well and makes sense to me now.

All the best,

Ben
To unsubscribe from this group and stop receiving emails from it, send an email to po...@googlegroups.com.

Raj Sharma

unread,
Aug 14, 2019, 1:26:25 PM8/14/19
to Zhian Kamvar, poppr
Dear Zhian,

How to convert filtered genlight data file back in vcf (.vcf) and plink format.  
Could you help me to convert genlight in .vcf or .ped & .map file to use in any other software or sharing data others in mostly accepted vcf format.

Looking for your kind suppprt and help.
Best Regards
Rajneesh


Zhian Kamvar

unread,
Aug 15, 2019, 4:41:28 AM8/15/19
to Raj Sharma, poppr
Realized that I forgot to reply all (mea culpa).

Hi Raj,

The only way I currently know of to create VCF files in R is using vcfR::write.vcf(). The only catch is that it needs a vcfR object in order to do that and the only package that creates vcfR objects is vcfR.

I can only give advice for creating a VCF file if you originally read in a VCF file and used poppr/adegenet to filter your data. If this is your case, then what you should do is take the locus names and samples from your filtered genlight data set and use those to subset your vcfR object (but remember that the orientation of loci are in rows in the vcfR object as opposed to columns in the genlight object). Once you have subset your vcfR object, you can then use vcfR::write.vcf() to write that object to a VCF file.

Hope that helps,
Zhian

Brian Knaus

unread,
Aug 15, 2019, 11:28:35 AM8/15/19
to Zhian Kamvar, Raj Sharma, poppr
Hi Raj,

The data contained in VCF data and a genlight object are very different. I don't think there is a good way to convert a genlight object into VCF data. I would suggest reading your vcf data into R and perform any filtering or data cleaning on the vcfR object. You can then fork your workflow into two paths. One path you can write the vcfR object to a VCF file and use vcftools to convert the VCF file into plink format. The second path would convert the vcfR object into a genlight object for analysis in adegenet and poppr.

Good luck!
Brian



--

Victoria Glynn

unread,
Oct 14, 2022, 12:58:03 PM10/14/22
to poppr
Hello everyone, 

Following up on this conversation, I wish to ask for clarification as to what each level of hierarchy in the poppr AMOVA output is actually calculating. In my particular study design, I have individuals nested within sites, and these sites are nested within two regions. I have read the poppr documentation, but I am unclear as to how "heterozygosities within genotypes" are actually being calculated. To my understanding, heterozygosity is determined at the individual level, so how are heterozygosities within individuals being calculated -- what is the level of analyses here? Unfortunately, the R output is not clearly providing insights in this regard. I understand this syntax is being inherited from ade4, but the description regarding genotypes being split into haplotypes on the poppr documentation is not in line with how my data is structured. 

Likewise, in running the randtest command, my output presents 3 tests: (1) variation within samples, (2) variation between samples, and (3) variations between regions. My AMOVA has each site nested within region (~region/site). In this case, is the variation between samples referring to differences between individuals at a single site? For the variation between regions, is this then comparing the variation between my two regions? 

Thank you in advance for your time and guidance -- poppr has been a wonderful package to use in my research.

Best regards,

Victoria

Zhian Kamvar

unread,
Oct 14, 2022, 2:16:17 PM10/14/22
to Victoria Glynn, poppr
This may help:

 from the "On Within Individual Variance" from https://grunwaldlab.github.io/poppr/reference/poppr.amova.html:

Heterozygosities within genotypes are sources of variation from within individuals and can be quantified in AMOVA. When within = TRUE, poppr will split genotypes into haplotypes with the function make_haplotypes() and use those to calculate within-individual variance. No estimation of phase is made. This acts much like the default settings for AMOVA in the Arlequin software package. Within individual variance will not be calculated for haploid individuals or dominant markers as the haplotypes cannot be split further. Setting within = FALSE uses the euclidean distance of the allele frequencies within each individual. Note: within = TRUE is incompatible with filter = TRUE. In this case, within will be set to FALSE

Reply all
Reply to author
Forward
0 new messages