Comparison of cluster analysis and MRPP to R

98 views
Skip to first unread message

Aaron Wykhuis

unread,
Jan 25, 2021, 2:24:34 PM1/25/21
to PC-ORD
Hi all,

I am comparing species data for plots and using cluster analysis to determine forest structural groupings. After having done this analysis in PC-ORD, I was curious about doing the same analysis in R using the vegan and cluster packages. After spending some time learning the R packages and how to select the right parameters, I am able to replicate the distance matrix in R, but when I try to run the cluster analysis using the same parameters as PC-ORD I am getting similar but different results. Now I'm not sure which program is "right".

Prior to running the analysis, I do a general relativization across rows, and in the cluster analysis I use the Sorenson (B-C) distance measure and flexible beta linkage, with beta = -0.25 and for example comparison using a group membership level = 6. And here is the code I am using in R from the vegan and cluster packages, where matrix is my plot/species raw data:

matrix_rel<-matrix[,c(1,2)] #Add in plot identifiers
matrix_rel[,c(3:14)]<-sapply(matrix[c(3:14)], function(x) x/rowSums(matrix[c(3:14)])) #general relativization of the data by row.
distance<-vegdist(matrix_rel[,c(3:14)], method = "bray") #calculate B-C distance matrix
tree_clust<-agnes(distance,method = "flexible", par.method=c(0.625,0.625,-0.25)) #run cluster analysis
tree_clust<-as.hclust(tree_clust)
matrix$FT6<-cutree(tree_clust, k=6) #cut dendrogram into 6 groups and add to original matrix.

Additionally, I am then using MRPP to determine within-group similarities and to determine the optimal number of groups produced by the cluster analysis, primarily interested in the A value. Again, I am getting different results in R vs PC-ORD even when importing the PC-ORD identified groups into R and doing the MRPP, the A value is quite different. I am getting A=0.507 in PC-ORD and A=0.3041 in R on the same groups. The parameters I'm using here are again Sorenson (B-C) and a weighting of n/sum(n) and the vegan package, with the R code:

mrpp(matrix[,3:14], matrix$FT6, distance="bray")

If someone could shed some light on what is going on here that would be extremely helpful.

Thanks in advance for your help,

Aaron


Bruce McCune

unread,
Jan 25, 2021, 10:42:01 PM1/25/21
to pc-...@googlegroups.com
Hi Aaron,

Thanks for doing the legwork of comparing these.

With respect to the cluster analysis: I know that there are several variants of "flexible beta" under that same name. People often cite 
Lance, G. N. and W. T. Williams. 1967. A general theory of classification sorting strategies. I. hierarchical systems. Computer Journal 9: 373-380.
PC-ORD algorithm uses that but follows Wishart, D. 1969. An algorithm for hierarchical classifications. Biometrics 25:165-170. PC-ORD uses Wishart's objective function to scale the dendrogram. Many packages use raw distances, which in my mind doesn't reflect the behavior of the objective function. I don't know how the R dendrograms that you are producing are scaled, but dendrogram scaling can actually change the appearance of the dendrogram substantially.

The MRPP routine used by PC-ORD was written by Paul Mielke, the inventor of MRPP. I don't know the origins of the version in R that you refer to.
Mielke, P. W., Jr., and K. J. Berry. 2001. Permutation Methods: A Distance Function Approach. Springer Series in Statistics. 344 pages.

I hope this helps!
Bruce McCune


--
You received this message because you are subscribed to the Google Groups "PC-ORD" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pc-ord+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pc-ord/160d8be4-f424-4d13-95b6-28409e581bb1n%40googlegroups.com.

Jorge A. Santiago-Blay

unread,
Jan 26, 2021, 9:20:16 AM1/26/21
to pc-...@googlegroups.com
 Hi: Is there a digest option I can be placed at? Thanks - Jorge

Reply all
Reply to author
Forward
0 new messages