I am migrating an R script to Java. The R script uses the apcluster library. I am trying to recreate the same output using the Sandia Cognitive Foundry AffinityPropagation class. But I am finding it difficult to tune the selfDivergence value appropriately.
Here is my R and Java code.
library(apcluster)
NgramAdjMatrix <- matrix(
c(0.0, 0.0, 0.0, 0.0, 1.0, 2.0, 0.0,
2.0, 4.0, 0.0, 3.0, 6.0, 0.0, 4.0, 8.0, 0.0, 5.0, 10.0, 0.0, 6.0, 12.0),
nrow=7,
ncol=3,
byrow = T)
LatentClusters <- apcluster(negDistMat(r=2),NgramAdjMatrix,seed=1234)
representatives <- LatentClusters@exemplars
clustMembers <- LatentClusters@clusters
FinalNgramMatrix <- NgramAdjMatrix[representatives,]
Above R scripts gives this output,
[,1] [,2] [,3]
[1,] 0 1 2
[2,] 0 4 8
Here is my Java code using Cognitive Foundry
Vector[] data = new Vector[]{
new Vector3(0.0, 0.0, 0.0),
new Vector3(0.0, 1.0, 2.0),
new Vector3(0.0, 2.0, 4.0),
new Vector3(0.0, 3.0, 6.0),
new Vector3(0.0, 4.0, 8.0),
new Vector3(0.0, 5.0, 10.0),
new Vector3(0.0, 6.0, 12.0)
};
System.out.println(Arrays.toString(data));
AffinityPropagation<Vectorizable> instance
= new AffinityPropagation<>(
EuclideanDistanceSquaredMetric.INSTANCE, 6);
Collection<CentroidCluster<Vectorizable>> clusters = instance.learn(Arrays.asList(data));
clusters.stream().forEach((cluster) -> {
System.out.println(cluster.getCentroid() + "...");
});
Above Java code gives this output,
<0.0, 1.0, 2.0>
<0.0, 2.0, 4.0>
<0.0, 5.0, 10.0>
The output is different and dependent to a very large extent on the selfDivergence parameter which is set to 6 in my code.
Is there some way to make the Java code behave same as the R code?
-----------------------------------------------
P.S: I have also posted this question to SO earlier