Re: A problem about clustering

22 views
Skip to first unread message

Nicholas M. Glykos

unread,
Aug 8, 2019, 9:51:06 AM8/8/19
to Anahita Khammari, carma-molecu...@googlegroups.com


> I have 5000 frame of TMD (targeted molecular dynamics) trajectory.
> I used the command of:
> "carma64 -verbose -write -color -segid U -dPCA 5 3 298 tmd.dcd ionized.psf"
> to produce the "carma.clusters.dat" and the DG files.
> So, these results show me two cluster of the frames. I want to have 3 or 4
> cluster, so the question is: which option or flag will be changed to this
> purpose? I mean how could the number of clusters determined before
> clustering?

The clustering algorithm used by carma can _not_ be guided to produce a
pre-determined number of clusters. See this paper for the details :

https://arxiv.org/abs/1512.04024


The closest you can do for changing the number of clusters is to change the
threshold (sigma_cutoff) used for identification of clusters. This is described
in the program manual :

-dPCA <integer(1)> <integer(2)> <temp> [<sigma_cutoff>]
...
The last optional argument <sigma_cutoff> determines a cutoff for the
cluster determination procedure. By default its value is determined
automatically, and is discussed in the section PCA-based cluster analysis.


To use this feature :

- Examine the file carma.variance_explained.dat produced by carma. It will
look like this :

1 0.5495 1.50
2 0.5669 1.40
3 0.6515 1.00

The first number is the number of clusters, the third number is the
sigma cutoff that you need to specify in order to produce the given
clusters (the middle number is the value of the variance explained).

- Re-run the program but this time adding the new sigma cutoff using
something like "... -dPCA 5 3 298 1.2 tmd.dcd ionized.psf" where 1.2
is the sigma cutoff needed to produce the needed number of clusters.




--


Nicholas M. Glykos, Department of Molecular Biology
and Genetics, Democritus University of Thrace, University Campus,
Dragana, 68100 Alexandroupolis, Greece, Tel/Fax (office) +302551030620,
Ext.77620, Tel (lab) +302551030615, https://utopia.duth.gr/glykos/

Nicholas M. Glykos

unread,
Aug 10, 2019, 9:15:47 AM8/10/19
to Anahita Khammari, carma-molecu...@googlegroups.com


Please do keep the carma mailing list in the CC of the messages
(see the header of this message), so that other users may benefit from
this discussion.


> I used this command:
> "carma64 -verbose -write -color -segid U -dPCA 5 3 298 0.5 tmd.dcd
> ionized.psf"
> and I want to know:
> 1) I don't have the "carma.variance_explained.dat file". Should the mention
> file produce with this command?


Not with this command. With the original (previous) command you've used, namely :

carma64 -verbose -write -color -segid U -dPCA 5 3 298 tmd.dcd ionized.psf

Run the command as shown above (without the sigma cutoff), examine the
carma.variance_explained.dat file, and then decide a suitable value for
the sigma cutoff.


> 2) What could be the range of Sigma values? (I used the value of 0.5 for
> Sigma to have 3 clusters. Upper values produced 2 clusters)

See above.


> 3) Which file is the "loading file"? I mean, how could I determine the
> percentage of each dPCA?

What is a 'loading file' ? For the second part, see answer below.


> 4) How could I obtain the proportion of variance plot against its eigenvalue
> rank?

The carma.dPCA.eigenvalues.dat file contains the eigenvalues in descending
order (first eigenvalue ⇒ first eigenvector, ...).


> and finally, 5) the results in the attachment is true or false?

Verifying the validity or relevance of your analyses is your responsibility
and can not be shared with others. Having said that, you may want to
consider using a different clustering method. For an example of a different
procedure see

https://norma.mbg.duth.gr/index.php?id=research:howto:distance-matrix-based_clustering_of_pcs
Reply all
Reply to author
Forward
0 new messages