Clustering through fpc package returning error

271 views
Skip to first unread message

Kristen Beck

unread,
Jan 10, 2013, 8:26:26 PM1/10/13
to davi...@googlegroups.com
Hello,
I am working with the fpc package to complete some CLARA cluster analysis and am running into an error.

The input data is a list of RefSeq mRNA IDs tab separated with the absolute change they undergo between two biological states.

> head(tt)
        GeneID    Change
1 NM_001034682  -38.0287
2    XM_582291 -293.0663
3 NM_001083506   -4.3722
4 XM_001787770 -162.9333
5 XM_001787538 -162.9333
6 NM_001075459  -29.3449

Here is how I am trying to get the cluster information:
pamk(tt$Change, krange = 2:10, criterion="multiasw", usepam=FALSE)

Error in summary(silhouette(clustering[ss[[i]]], dx))$avg.width : 
  $ operator is invalid for atomic vectors

However, when I use the average silhouette width instead of multi average silhouette width it does not return an error.  The documentation for the function states that for large data sets you should use multiasw. My data set has ~12,000 points which I presume counts as large, but I get the error.

pamk(tt$Change, krange = 2:10, criterion="asw", usepam=FALSE)

Returns no error.

I have been able to run pamk with multiasw on a different data set with no other issues.  Weirdly, the two input files are generated by the same scripts, so they should have no weird formatting issues.

Has anyone worked with the fpc package or know what could be causing this error?

Thanks in advance,
Kristen

Vince S. Buffalo

unread,
Jan 10, 2013, 8:32:06 PM1/10/13
to davi...@googlegroups.com
Hi Kristen,


operator is invalid for atomic vectors

This usually occurs when you are trying to use a matrix as if it were a dataframe (i.e. with $). I would suggest as.data.frame(). See below:

> a <- matrix(rnorm(6), nrow=2)
> a
          [,1]       [,2]        [,3]
[1,] 0.2171837  1.2074981 -0.08457454
[2,] 0.7225064 -0.5412251  0.09298678
> a <- matrix(rnorm(6), nrow=3)
> a
           [,1]        [,2]
[1,] -0.5176748 -0.01194202
[2,]  0.3674010 -0.14044518
[3,]  1.7384524 -1.18537901
> colnames(a) <- c("x", "y")
> a$x
Error in a$x : $ operator is invalid for atomic vectors
> b <- as.data.frame(a)
> b$x
[1] -0.5176748  0.3674010  1.7384524

HTH,
V
--
Vince Buffalo
Bioinformatics Programmer
Dubcovsky Lab
Plant Sciences, UC Davis

Kristen Beck

unread,
Jan 10, 2013, 8:48:18 PM1/10/13
to davi...@googlegroups.com
Hi Vince,
Thanks for the quick reply.
Unfortunately, I checked that too, unless there is something else I'm missing.

Here is how I'm loading the data and the classes of the data frame and column of interest.
> tt = read.table("~/tmp.txt", header=TRUE, sep="\t")
> class(tt)
[1] "data.frame"
> class(tt$Change)
[1] "numeric"

I also tried calling the function using the following:
pamk(tt["Change"], krange = 2:10, criterion="multiasw", usepam=FALSE)
pamk(tt[,2], krange = 2:10, criterion="multiasw", usepam=FALSE)

Plus setting the criterion = "asw" returns no error, but I don't think it is clustering the data the way it should be.

Any other suggestions?

Thanks again,
Kristen

Kristen Beck

unread,
Jan 10, 2013, 9:09:10 PM1/10/13
to davi...@googlegroups.com
I think I found how to fix the problem.
If the criterion = "multiasw" then the function distcritimulti receives ns and seed.  By default ns = 10, but if I run it with ns = 2 then the function works properly.  I think the error was because it wasn't able to calculate the silhouette properly without ns = 2.

Thanks for the help. If you guys have any other ideas feel free to chime in. This was a weird one.

Kristen
Reply all
Reply to author
Forward
0 new messages