Hi Charles,
In my personal opinion the results of cluster algorithm depend strongly on the method you are using to create the clusters and the distance measurement you apply within the clustering. I.e. most of the times you will get varying results depending on the applied methods (for linkage).
The main problem with clustering is that you will always get clusters (and most the the people are just happy with what a given algorithm is giving them), but this means by no means that these clusters are robust, i.e. reproducible with different methods. Other things like using z-scores or normalizing rows/columns of your matrix also play a major role for clustering results.
Unfortunately, this is only very seldom discussed in publications, i.e. that results with cluster analysis strongly depend on the method used. Reported clusters are always presented as a robust reality, which is not very often the case.
I personally use multiple algorithms and distance measures and see which clusters are robustly found, i.e. which members occur always together in clusters. Make sure to report the method for clustering (i.e. the linkage used), normalization, and distance measure used when reporting cluster results (so others can get the same clusters based on your published data, and could compare your results with results obtained with other methods). Than you can at least be sure that your results are fairly robust.
These are routines for agglomerative clustering.
linkage(y[, method, metric, optimal_ordering]) |
Perform hierarchical/agglomerative clustering. |
single(y) |
Perform single/min/nearest linkage on the condensed distance matrix y. |
complete(y) |
Perform complete/max/farthest point linkage on a condensed distance matrix. |
average(y) |
Perform average/UPGMA linkage on a condensed distance matrix. |
weighted(y) |
Perform weighted/WPGMA linkage on the condensed distance matrix. |
centroid(y) |
Perform centroid/UPGMC linkage. |
median(y) |
Perform median/WPGMC linkage. |
ward(y) |
Perform Ward’s linkage on a condensed distance matrix. |
These are possible metrics on the distance calculation
The distance metric to use. The distance function can
be ‘braycurtis’, ‘canberra’, ‘chebyshev’, ‘cityblock’,
‘correlation’, ‘cosine’, ‘dice’, ‘euclidean’, ‘hamming’,
‘jaccard’, ‘kulsinski’, ‘mahalanobis’, ‘matching’,
‘minkowski’, ‘rogerstanimoto’, ‘russellrao’, ‘seuclidean’,
‘sokalmichener’, ‘sokalsneath’, ‘sqeuclidean’, ‘yule’.
Good information can be found here