Clustering well-tempered simulations

Andrew

unread,

Sep 14, 2011, 6:27:30 PM9/14/11

to PLUMED users

Hello PLUMED users,

I was hoping to perform structure clustering on a well-tempered
metadynamics simulation. The system is biased on 3 CVs and the
clustering would be on the RMSD matrix of a group of atoms using
some agglomerative technique (e.g, single-linkage). From my
understanding of metadynamics, the densities of clusters derived
from the clustering technique will not be equivalent to the free
energies because of the bias in the simulation. Thus, some
reweighing technique is necessary. It appears to be not quite as
simple as the typical reweighing from well-tempered simulations
because the metric comes from comparing two configurations, each
with a different bias. I belive I have a rough of idea of how to
modify the distance metric to incorporate the hill height for
each configuration, but before I go through the trouble of
thinking hard and writing code, I was wondering if there is an
existing paper or if someone else has already gone through the
trouble of developing an algorithm. Any scripts, equations or
advice would be greatly appreciated.

Thanks,
Andrew White
University of Washington

Giovanni Bussi

unread,

Sep 14, 2011, 10:06:25 PM9/14/11

to plumed...@googlegroups.com

Hi Andrew,

the "standard" manner to reweight WT metadynamics is using the
algorithm by Bonomi, Barducci and Parrinello, which is designed to do
histograms on a CV different from what you biased. You can get the
fortran code (written by Max Bonomi) on the website of the plumed
tutorial.

Given this, you can follow the procedure below, which is a bit
involved but works (I used it a few months ago):

* Do the cluster analysis.

* Assign each of the frames in your trajectory to the closest cluster,
so that for each frame you will have the "index of the closest
cluster".

* Make a COLVAR file with the biased coordinates plus the index of the
cluster computed above as a fourth variable

* Do a reweighting choosing the parameters for the histogram on the
fourth CV (the cluster index) in such a manner that you have one
cluster per bin.

You will get as an output the free energy of each cluster. I hope this
is clear enough, if you did not understand something I can give you
more details...

Giovanni

> --
> You received this message because you are subscribed to the Google Groups "PLUMED users" group.
> To post to this group, send email to plumed...@googlegroups.com.
> To unsubscribe from this group, send email to plumed-users...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/plumed-users?hl=en.
>
>

Andrew

unread,

Sep 15, 2011, 12:02:30 PM9/15/11

to PLUMED users

Giovanni,

Thanks for the quick response. I'm a little familiar with the
reweighing technique from Max Bonomi, so I think I can handle
your procedure once I have the clusters or cluster means. The
main issue is with the cluster analysis. I'm not sure the cluster
analysis will give the ideal clusters. In the extreme case,
consider having a one dimensional system. An unbiased MD
simulation and clustering of the trajectory should produce
clusters at the energy minimums for the one dimension. If the
system is biased with well-tempered metadynamics, the trajectory
will eventually stochastically explore the one dimension once the
free energy is converged, at least from my understanding of the
well-tempered technique. Thus, the clustering of these random
configurations would be meaningless, or at least less meaningful
than unbiased MD. As your system increases in dimensions, I think
the clusters should have more to do with the topology of your
molecules and be less of an issue, but there will still be some
convolution with the biasing technique. There is no guarantee
you'll get the set of lowest free energy clusters.

In my case, I'm working with small solute molecules and I'm
afraid the system may be slightly pathological when it comes to
clustering, at least compared to something like a large protein
where topological frustration will be just as important as any
bias. I guess I could just do clustering and see how strange they
are.

Maybe you could comment on an algorithm I was considering? I'll
outline it below:

1. Treat every frame as a potential cluster
2. While the number of clusters is above N
a. For every frame in the trajectory
i. Find the nearest cluster from metric or distance matrix
and add exp(-bH) to the cluster's free energy
End For
b. Remove the highest free energy cluster
End While

Does that make any sense? I think that will guarantee you end up
with the lowest free energy clusters for a given clustering
metric, but perhaps it is overly complicated.

~Andrew

Giovanni Bussi

unread,

Sep 16, 2011, 6:57:11 AM9/16/11

to plumed...@googlegroups.com

Dear Andrew,

I cannot comment on your cluster algorithm... It seems to make sense,
but I am not an expert of that (I just use one of the g_cluster
methods in gromacs).

Still, if you want to try that algorithm with a WT simulation, you
need the weight of each frame. I think this is not feasible with the
"reweight" tool, but you can use the following algorithm, which gives
similar results (Max can comment on this, since he did a lot of
comparisons between his scheme and the one below):

Compute the final fes on a (not too fine) grid using sum_hills.

Assign each frame i to a grid point s_i. You should have several
frames per grid point, otherwise your statistics are meaningless (in
this case, increase the grid spacing).

The weight of each frame is
w_i=exp(-F(s_i)/kT)/N(s_i)

where F(s) is the free energy from the grid and N(s) the number of
frames on that grid point.

Then use w_i as the weigh of each frame to do cluster analysis.

This approach assumes that:
1. the final fes computed from WT metadynamics is correct
2. the orthogonal degrees of freedom are sampled canonically.

Assumption (2) is present also in the Bonomi et al scheme, whereas
assumption (1) is additional.

Giovanni

Giovanni Bussi

unread,

Sep 16, 2011, 7:01:48 AM9/16/11

to plumed...@googlegroups.com

On Fri, Sep 16, 2011 at 7:57 AM, Giovanni Bussi
<giovann...@gmail.com> wrote:
> Then use w_i as the weigh of each frame to do cluster analysis.

A final note on this: if you want to do a cluster analysis consistent
with an unbiased MD you can always "duplicate" frames proportionally
to their weight or devise some scheme to retain/discard each frame
based on its weight, before feeding it into the cluster analysis.

Giovanni

Massimiliano Bonomi

unread,

Sep 16, 2011, 9:55:42 AM9/16/11

to plumed...@googlegroups.com

On Sep 16, 2011, at 12:57 PM, Giovanni Bussi wrote:

> Dear Andrew,
>
> I cannot comment on your cluster algorithm... It seems to make sense,
> but I am not an expert of that (I just use one of the g_cluster
> methods in gromacs).
>
> Still, if you want to try that algorithm with a WT simulation, you
> need the weight of each frame. I think this is not feasible with the
> "reweight" tool, but you can use the following algorithm, which gives
> similar results (Max can comment on this, since he did a lot of
> comparisons between his scheme and the one below):

Hi Andrew!
If you don't want to use the clustering scheme originally proposed by Giovanni
(which has been applied also by others and seems to be working fine
in more than one situation :-)), you can assign a weight to each configuration as described below.
When we developed the WT reweighing algorithm I compared
the two approaches on alanine dipeptide and other model systems.
Results were similar, apart from the fact that the error in the FES as a function of the
non-biased variable was decreasing a little bit faster with the WT reweighing scheme
compared to the recipe below. Furthermore it was less sensitive to the final estimate of the fes
(residual corrugation, stop the simulation in a bad moment...).

Best,
Max

Alessandro Laio

unread,

Sep 17, 2011, 3:20:42 PM9/17/11

to plumed...@googlegroups.com

Dear Andrea, dear Giovanni, I would like to bring to your attention
that, in metadynamics and bias exchange metadynamics, we have addressed
in great details exactly these issues in
http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1000452
The approach described in this article has been already used several
times to construct multidimensional free energy landscapes, also as a
function of variables that are not directly biased .
A key point is performing a clustering in which the distance the
structures involves also the difference in the collective variables. If
one attempts clustering using other metrics, the value of the bias might
be affected by large fluctuations for structures assigned to the same
cluster, making the estimator of the free energy meaningless.
Alessandro

Andrew

unread,

Sep 20, 2011, 11:42:48 AM9/20/11

to PLUMED users

Thanks for all the suggestions! I'll be trying a few of these out to
see what I get.

On Sep 17, 12:20 pm, Alessandro Laio <l...@sissa.it> wrote:
> Dear Andrea, dear Giovanni, I would like to bring to your attention
> that, in metadynamics and bias exchange metadynamics, we have addressed

> in great details exactly these issues inhttp://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pc...

Reply all

Reply to author

Forward