sub sampling factor

26 views
Skip to first unread message

Amit Kumar

unread,
Oct 14, 2014, 6:28:09 AM10/14/14
to good-tu...@googlegroups.com
Hi all,

I have recently downloaded "Good turning.R"  method implemented  in carma for evaluating the convergence of my bio-molecular simulations. I evaluated by simulations at different simulation time (for eg. 30, 60, 90, 100 , 120ns) with trajectory saved every 2 ps. At first I generated carma.RMSD.matrix for heavy atoms of my protein using time step as 10. 

For instance after 30 ns of simulations I get the following message:

"The maximal RMSDs of the trajectory converged with a  sub-sampling  factor  of     40.  The  analysis
suggests that the  most  different  structure  you should  expect   to  observe  if  you  double  the
simulation  time  will  differ  by  no  more  than approximately 1.5  +-  0.1  Angstrom  (RMSD)  from
those already observed"

My question is what does value of sub-sampling factor mean? Is there a optimal sub-sampling which good give a measure of convergence? What is better for instance 40 or 35 ?

please could you let me know.
thanks,
Amit

Nicholas M Glykos

unread,
Oct 14, 2014, 6:51:47 AM10/14/14
to Amit Kumar, good-tu...@googlegroups.com


> My question is what does value of sub-sampling factor mean? Is there a
> optimal sub-sampling which good give a measure of convergence? What is
> better for instance 40 or 35 ?

See section 3.3 and Fig.4 of the paper. Due to statistical noise the
curves you will get for different sampling factors will not be identical,
but they should be pretty close. If you have enough memory on your
computer you can try _increasing_ the size of the RMSD matrix until you
get a sampling factor of, say, ~5-10 which should give you somewhat more
accurate estimates.




--


Nicholas M. Glykos, Department of Molecular Biology
and Genetics, Democritus University of Thrace, University Campus,
Dragana, 68100 Alexandroupolis, Greece, Tel/Fax (office) +302551030620,
Ext.77620, Tel (lab) +302551030615, http://utopia.duth.gr/~glykos/

Nicholas M Glykos

unread,
Oct 14, 2014, 7:02:05 AM10/14/14
to Amit Kumar, good-tu...@googlegroups.com


Ooops, made a mistake in my previous message :

> you can try _increasing_ the size of the RMSD matrix until you get a
> sampling factor of, say, ~5-10 which should give you somewhat more
> accurate estimates.

This should have been "_decreasing_" and not increasing. If you repeat the
calculation with a smaller matrix, you can compare the resulting curves
and see if they agree reasonably. If you do the comparison, please do let
me know of your findings.

Amit Kumar

unread,
Oct 14, 2014, 8:06:33 AM10/14/14
to Nicholas M Glykos, good-tu...@googlegroups.com
Thanks for your quick reply.
If I understood well by decreasing the RMSD matrix you mean increasing the time step for RMSD calculation: if for instance previously in 30 ns simulations I did the calculation every 20 ps, now I should increase the time step to say 200 ps for the 30 ns simulations. Did I get it right?
Many thanks,
Amit
--
***********************************************
Amit Kumar PhD
Visiting Scientist
CRS4, Bioengineering group
Room No. B128, Building 1
Science and Technology Park Polaris
Piscina Manna, 09010 Pula (CA). ITALY
Ph:+39-070-9250355

Nicholas M. Glykos

unread,
Oct 14, 2014, 10:26:29 AM10/14/14
to Amit Kumar, good-tu...@googlegroups.com
Yes, that is correct.
--
Sent from a tablet. Don't expect much.

Amit Kumar

unread,
Oct 14, 2014, 11:54:07 AM10/14/14
to Nicholas M. Glykos, good-tu...@googlegroups.com
From your paper I noted that you had used time step between 8 ps (12. 500 frames) to 40ps (2500 frames) for your 100 ns simulations saving your trajectory at every 0.4 ps. 

In my case for my 100 ns simulation I evaluated RMSD matrix  (5000 x 5000) every 20 ps. In this case I get sub_sampling of 24.

(i) On increasing RMSD matrix (10000 X 10000) every 10 ps, I get sub_sampling of 45

(ii) On decreasing RMSD matrix (1000X1000) I get the following message:
The maximal RMSDs  between the observed trajectory structures  have  not converged. This implies that the  length  of  the  given  trajectory  does  not
suffice for meaningfully quantifying  convergence. The only comment  that  can safely be made is that upon  doubling  the  simulation  time  you  should
expect  to  observe  structures  that  differ from those already observed by more than  approximately 2.1 Angstrom.


So in my system considering the lowest value of sub_sampling is when I use 5000 x 5000 matrix. 

any comments?
thanks
Amit  

Nikolaos Glykos

unread,
Oct 14, 2014, 12:02:37 PM10/14/14
to Amit Kumar, good-tu...@googlegroups.com


The numbers look sensible (the 1000x1000 matrix is apparently
too coarse to capture the variety of structures present in the
trajectory). The important question is : do the curves obtained
with sampling factors of 24 and 45 look similar (as expected) ?
If yes, I think you are good-to-go.

Nikolaos Glykos

unread,
Oct 14, 2014, 12:39:37 PM10/14/14
to Amit Kumar, good-tu...@googlegroups.com

> Yes, the curves obtained from sampling factors look similar and super
> impose almost perfectly. 

OK. On a second thought, the message GTMD emitted for the coarse matrix
can possibly be improved. If you can please send me the three .tar files
that GDMD produced for your three runs in case we learn something from
them.
Reply all
Reply to author
Forward
0 new messages