Computational Science- What I learned so far

J. S. John

unread,

Nov 21, 2009, 1:50:34 PM11/21/09

to diy...@googlegroups.com

Hey all,
I been learning how to approach computational biochemistry and chemistry.
For talking with a Professor, I learned that it's better to start off
with a knowledge of statistical mechanics.
The other branch of computational chemistry deals with quantum
mechanics and it's not used much in molecular modeling. More
complicated too.
He told me of 2 books to look at:

Computer simulations of liquids; Tildesley and Allen
Understanding Molecule Simulation; Frenkel

Asking him about programming, he said to know C or C++, TCL, BASH
scripting, Pyton. I think he said Perl, can't remember.
They use TCl for much of their modeling.

In terms of Mathematics, he said if you got a good understanding of
calculus, you can pick things up along the way.
He recommended:
Mathematical Methods in the Physical Sciences by Mary Boas

I found a good book to introduce me to computational biochemistry. Its
outdated but you can see the applications and its not too hard to
read.
An Introduction to Computational Biochemistry by C. Stan Tsai

Have fun

Doug Treadwell

unread,

Nov 21, 2009, 2:04:59 PM11/21/09

to diy...@googlegroups.com

l also recommend checking out the MIT opencourseware materials for anything related to computational chemistry.

My computational chemistry wishlist on amazon includes these http://amzn.com/w/1RRK7HX5N8HM3 books and some others that are mixed in with my main list of 1000+ books.

I also am curious to find out how much organic chemistry, physical chemistry, quantum chemistry, and physics in general is required to produce good simulations and also to know enough to optimize the simulations.

- Doug

--

You received this message because you are subscribed to the Google Groups "DIYbio" group.
To post to this group, send email to diy...@googlegroups.com.
To unsubscribe from this group, send email to diybio+un...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/diybio?hl=.

J. S. John

unread,

Nov 21, 2009, 2:16:10 PM11/21/09

to diy...@googlegroups.com

On Sat, Nov 21, 2009 at 2:04 PM, Doug Treadwell
<therealepi...@gmail.com> wrote:

> I also am curious to find out how much organic chemistry, physical
> chemistry, quantum chemistry, and physics in general is required to produce
> good simulations and also to know enough to optimize the simulations.
>

Organic Chem: I don't think you need to know much. Organic was about
synthesis mostly
Physical chem: Thermodynamics and maybe basics of quantum. I think
it's good to know both but I don't know the extent of using quantum
Physics: I'm sticking with knowing classical mechanics, Newton's laws.
Quantum physics is much harder to use.

Doug Treadwell

unread,

Nov 21, 2009, 4:28:07 PM11/21/09

to diy...@googlegroups.com

Quantum physics might be harder, but I wonder what the improvements in accuracy are...

Eugen Leitl

unread,

Nov 22, 2009, 4:18:39 PM11/22/09

to diy...@googlegroups.com

On Sat, Nov 21, 2009 at 11:04:59AM -0800, Doug Treadwell wrote:

> l also recommend checking out the MIT opencourseware materials for anything
> related to computational chemistry.

What are you trying to achieve?

> My computational chemistry wishlist on amazon includes these
> http://amzn.com/w/1RRK7HX5N8HM3 books and some others that are mixed in with
> my main list of 1000+ books.
>
> I also am curious to find out how much organic chemistry, physical
> chemistry, quantum chemistry, and physics in general is required to produce
> good simulations and also to know enough to optimize the simulations.

What exactly are you trying to simulate? Nontrivial (i.e. not light dia and triatomics
in gas phase) reaction dynamics? That is a hard, unsolved problem.

--
Eugen* Leitl <a href="http://leitl.org">leitl</a> http://leitl.org
______________________________________________________________
ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org
8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE

J. S. John

unread,

Nov 22, 2009, 7:25:18 PM11/22/09

to diy...@googlegroups.com

On Sat, Nov 21, 2009 at 2:04 PM, Doug Treadwell
<therealepi...@gmail.com> wrote:

> My computational chemistry wishlist on amazon includes these
> http://amzn.com/w/1RRK7HX5N8HM3 books and some others that are mixed in with
> my main list of 1000+ books.

I saw that you have [Computational Chemistry and Molecular Modeling:
Principles and Applications by K. I. Ramachandran (Author), et al.] in
your wish list. That's not a good book for introduction. I tried to
read it but it was too much for me. Doug, what is your level of
knowledge on the subject and in terms of chemistry?

Doug Treadwell

unread,

Nov 23, 2009, 5:42:40 PM11/23/09

to diy...@googlegroups.com

Yes, nontrivial reactions, protein folding, and other biological processes.

Doug Treadwell

unread,

Nov 23, 2009, 5:43:18 PM11/23/09

to diy...@googlegroups.com

Not enough unfortunately. I only have a year of chemistry in college. I have been wanting to take more but computer science and physics are keeping me busy.

Eugen Leitl

unread,

Nov 24, 2009, 2:47:39 AM11/24/09

to diy...@googlegroups.com

On Mon, Nov 23, 2009 at 02:42:40PM -0800, Doug Treadwell wrote:
> Yes, nontrivial reactions,

Is not yet feasible.

Machine-phase is easier, see

Robert A. Freitas Jr., Ralph C. Merkle, “A Minimal Toolset for Positional Diamond Mechanosynthesis,” J. Comput. Theor. Nanosci. 5(May 2008):760-861; http://www.MolecularAssembler.com/Papers/MinToolset.pdf
ABSTRACT. This paper presents the first theoretical quantitative systems level study of a complete suite of reaction pathways for scanning-probe based ultrahigh-vacuum diamond mechanosynthesis (DMS). A minimal toolset is proposed for positionally controlled DMS consisting of three primary tools – the (1) Hydrogen Abstraction (HAbst), (2) Hydrogen Donation (HDon), and (3) Dimer Placement (DimerP) tools – and six auxiliary tools – the (4) Adamantane radical (AdamRad) and (5) Germyladamantane radical (GeRad) handles, the (6) Methylene (Meth), (7) Germylmethylene (GM), and (8) Germylene (Germ) tools, and (9) the Hydrogen Transfer (HTrans) tool which is a simple compound of two existing tools (HAbst+GeRad). Our description of this toolset, the first to exhibit 100% process closure, explicitly specifies all reaction steps and reaction pathologies, also for the first time. The toolset employs three element types (C, Ge, and H) and requires inputs of four feedstock molecules – CH4 and C2H2 as carbon sources, Ge2H6 as the germanium source, and H2 as a hydrogen source. The present work shows that the 9-tooltype toolset can, using only these simple bulk-produced chemical inputs: (1) fabricate all nine tooltypes, including their adamantane handle structures and reactive tool intermediates, starting from a flat passivated diamond surface or an adamantane seed structure; (2) recharge all nine tooltypes after use; and (3) build both clean and hydrogenated molecularly-precise unstrained cubic diamond C(111)/C(110)/C(100) and hexagonal diamond surfaces of process-unlimited size, including some Ge-substituted variants; methylated and ethylated surface structures; handled polyyne, polyacetylene and polyethylene chains of process-unlimited length; and both flat graphene sheet and curved graphene nanotubes. Reaction pathways and transition geometries involving 1620 tooltip/workpiece structures were analyzed using Density Functional Theory (DFT) in Gaussian 98 at the B3LYP/6-311+G(2d,p) // B3LYP/3-21G* level of theory to compile 65 Reaction Sequences comprised of 328 reaction steps, 354 unique pathological side reactions and 1321 reported DFT energies. The reactions should exhibit high reliability at 80 K and moderate reliability at 300 K. This toolset provides clear developmental targets for a comprehensive near-term DMS implementation program.
NOTE: First paper to propose a complete set of atomically precise mechanosynthetic reactions for building diamond. See also video presentation.

Denis Tarasov, Natalia Akberova, Ekaterina Izotova, Diana Alisheva, Maksim Astafiev, Robert A. Freitas Jr., “Optimal Tooltip Trajectories in a Hydrogen Abstraction Tool Recharge Reaction Sequence for Positionally Controlled Diamond Mechanosynthesis,” J. Comput. Theor. Nanosci. 6(2009). In press.
ABSTRACT. The use of precisely applied mechanical forces to induce site-specific chemical transformations is called positional mechanosynthesis, and diamond is an important early target for achieving mechanosynthesis experimentally. A key step in diamond mechanosynthesis (DMS) employs an ethynyl-based hydrogen abstraction tool (HAbst) for the site-specific mechanical dehydrogenation of H-passivated diamond surfaces, creating an isolated radical site that can accept adatoms via radical-radical coupling in a subsequent positionally controlled reaction step. The abstraction tool, once used (HAbstH), must be recharged by removing the abstracted hydrogen atom from the tooltip, before the tool can be used again. This paper presents the first theoretical study of DMS tool-workpiece operating envelopes and optimal tooltip trajectories for any positionally controlled reaction sequence – and more specifically, one that may be used to recharge a spent hydrogen abstraction tool – during scanning-probe based ultrahigh-vacuum diamond mechanosynthesis. Trajectories were analyzed using Density Functional Theory (DFT) in PC-GAMESS at the B3LYP/6-311G(d,p) // B3LYP/3-21G(2d,p) level of theory. The results of this study help to define equipment and tooltip motion requirements that may be needed to execute the proposed reaction sequence experimentally and provide support for early developmental targets as part of a comprehensive near-term DMS implementation program.
NOTE: First published theoretical study of DMS tool-workpiece operating envelopes and optimal tooltip trajectories for a complete positionally controlled reaction sequence.

> protein folding, and other biological processes.

Also not yet feasible, but also for different (forcefield
accuracy) reasons.

Doug Treadwell

unread,

Nov 25, 2009, 1:14:29 AM11/25/09

to diy...@googlegroups.com

You say "not yet feasible", I say "not yet accomplished". I suspect these problems aren't as hard as people think. Maybe I'll be proven wrong, but my experience has been that most people (even educated, skilled people) say "it's impossible" or "it's too hard" and then I go on to solve it rather quickly. I wouldn't put any bets on my not succeeding.

- Doug

--

You received this message because you are subscribed to the Google Groups "DIYbio" group.
To post to this group, send email to diy...@googlegroups.com.
To unsubscribe from this group, send email to diybio+un...@googlegroups.com.

For more options, visit this group at http://groups.google.com/group/diybio?hl=en.

Meredith L. Patterson

unread,

Nov 25, 2009, 7:54:38 AM11/25/09

to diy...@googlegroups.com

On Wed, Nov 25, 2009 at 7:14 AM, Doug Treadwell
<therealepi...@gmail.com> wrote:
> You say "not yet feasible", I say "not yet accomplished". I suspect these
> problems aren't as hard as people think. Maybe I'll be proven wrong, but my
> experience has been that most people (even educated, skilled people) say
> "it's impossible" or "it's too hard" and then I go on to solve it rather
> quickly. I wouldn't put any bets on my not succeeding.

Protein folding is known to be NP-complete.

http://www.citeulike.org/user/aleperd/article/3116229 (NP-hardness proof)
http://citeseer.ist.psu.edu/74876.html (equivalence to other
NP-complete problems)

That said, if you can find a polynomial-time algorithm for protein
folding, the Clay Mathematics Institute will be very interested in
talking to you -- there's a $1 million prize for determining whether
P=NP, and a polynomial-time algorithm for solving an NP-complete
problem would answer that question.

Cheers,
--mlp

Brent Neal

unread,

Nov 25, 2009, 8:32:29 AM11/25/09

to diy...@googlegroups.com

>
>> protein folding, and other biological processes.
>
> Also not yet feasible, but also for different (forcefield
> accuracy) reasons.
>
>

Interestingly, this may become feasible in the near future. Right as I
was finishing up my dissertation, my research group started working on
O(N) DFT algorithms and hybridizing them with semiempirical molecular
dynamics for multiscale simulations. I believe that advances in this
area will ultimately lead to a tractable way to solve protein folding
problem for peptides of arbitrary size.

Note that I don't think this will happen in P, btw. I don't think
we'll ever find P=NP with a Von Neumann computer. But the scheme my
fellow students and I cooked up (N.B., while we were completely
smashed) involved using fast DFT-MD as a part of a simulated annealing
scheme to find the minimum energy state of the protein.

If you're interested, here's one of the the papers out of that work:
http://scholar.google.com/scholar?q=doi:10.1016/j.cpc.2005.01.005++++&hl=en&btnG=Search&as_sdt=2001&as_sdtp=on

B

--
Brent Neal, Ph.D.
http://brentn.freeshell.org

Cathal Garvey

unread,

Nov 25, 2009, 8:40:12 AM11/25/09

to diy...@googlegroups.com

A quick note to help "debug" these ideas: Proteins do not always naturally occupy their lowest-energy state. There are a host of specialised systems in the cell that help proteins fold into the shapes they are supposed to be rather than the shape they would naturally take if left to their own devices.

Hence, a program that calculates folding based on the sequence and chemistry of a protein on its own will give some fantastic results for most proteins, but for a few the results will be misleading or totally off the mark. To complete the picture, models would have to include the activity of the protein chaperones and folding systems at work in a given cell.

2009/11/25 Brent Neal <bre...@gmail.com>

--

You received this message because you are subscribed to the Google Groups "DIYbio" group.
To post to this group, send email to diy...@googlegroups.com.
To unsubscribe from this group, send email to diybio+un...@googlegroups.com.

For more options, visit this group at http://groups.google.com/group/diybio?hl=en.

--
letters.cunningprojects.com
twitter.com/onetruecathal

Brent Neal

unread,

Nov 25, 2009, 8:56:42 AM11/25/09

to diy...@googlegroups.com

Let me weigh in for just a bit on this as a former student of
molecular modelling and simulation.

Allen and Tildesley is -the- book to get started with. But it will
leave you (at this point) almost 2 decades out of date. You want to
supplement this book with the work of Bruce Berne, Glenn Martyna, and
Mark Zuckerman. Check Google scholar. The reason why is that they
generalize the velocity-Verlet integrator into a fully-reversible,
factorable integrator that can (among other things) be used for
multiple time scale integrations. One of their papers (Maybe J Chem
Phys, in 1992?) shows explicitly how to accomplish this. You REALLY
want to do this, because the classic methods for ignoring far field
contributions (Coulomb screening via decaying exponential) introduce
inaccuracies in the force field (one of the many inaccuracies that
Eugen alluded to earlier.) By handling all the far field effects and
updating the far field every Nth timestep, your error is less, and is
well-bounded.

I also do not believe you will ever get good protein folding with a
semi-empirical forcefield. You MAY get their with a reactive
bond-order potential. Falsifying that hypothesis would be a good
master's thesis. (I don't think you'd actually have to show folding to
show that it would work.) The best bond-order potential I know of is
the Brenner potential. I used to understand that potential really
well, but 15 years later, I don't think that's true. You can find it
in Phys Rev B 42, 9458 (1990)

Do not ignore DFT. There are several schema now for O(N) DFT. Further,
my current work in the realm of chemical physics/physical chemistry
has convinced me that for small molecules, keeping track of that
electron density is important. Bond order potentials do this
implicitly, DFT does it slightly more explicitly. Do not trust anyone
who tells you that ab initio and DFT are the same thing. :)

Lastly, remember that all these models do is solve coupled
differential equations. An integrator, whether its something simple,
like the Runge-Kutta method, something arcane, like the Gear
integrator, or elegant, like the Verlet integrators, is simply an
iterative method to solve those coupled differential equations. If you
don't understand partial differential equations, you are going to
struggle mightily with all of this simulation. When you're stuck, go
back to the basics.

I hope all this is actually helpful and not either pedantic or
discouraging. When I started doing molecular dynamics in the mid-90s,
it was the fringe science of physics. Now, its gotten so commonplace
that its hard to find conference symposia that are dedicated to it
anymore. Hopefully everyone here who has an interest will be able to
get up to speed quickly and at least have a lot of fun watching their
test systems evolve!

Brent

Brent Neal

unread,

Nov 25, 2009, 9:03:32 AM11/25/09

to diy...@googlegroups.com

On Wed, Nov 25, 2009 at 08:40, Cathal Garvey <cathal...@gmail.com> wrote:
> A quick note to help "debug" these ideas: Proteins do not always naturally
> occupy their lowest-energy state. There are a host of specialised systems in
> the cell that help proteins fold into the shapes they are supposed to be
> rather than the shape they would naturally take if left to their own
> devices.
>
> Hence, a program that calculates folding based on the sequence and chemistry
> of a protein on its own will give some fantastic results for most proteins,
> but for a few the results will be misleading or totally off the mark. To
> complete the picture, models would have to include the activity of the
> protein chaperones and folding systems at work in a given cell.
>

That's partially true, I think. The "correct" folded state of a
protein in a particular environment should be an equilibrium state,
should it not? Its part of why we were looking at the problem, since
adding the chaperone molecules necessary to distinguish the in vacuo
equilibrium from the solvated equilibrium pushed it up into the 100M
atom range that made it interesting to us. The scheme we envisioned
involved keeping the phase space trajectory of the protein, then
looking at the energy change over that trajectory, then either
accepting it or rejecting it by adding a bit of KE to the system. We
would thus be biased towards large scale motions that acted quickly to
lower the energy of the system, which we postulated would be
well-correlated with (but not exactly the same as) "good" folding
motions.

Cathal Garvey

unread,

Nov 25, 2009, 9:08:52 AM11/25/09

to diy...@googlegroups.com

I'm afraid you might be mistaking me for an expert! ;)

But, what I'm trying to get across is that there are lots of things that muddy the water between the raw sequence of an amino acid chain and its ultimate structure. Aside from the possible need for chaperones and folding systems, it might contain inteins that cut themselves out before or after folding, it might have a number of disulphide bonds (or might be predicted to contain them but not deliver), etc etc.

Add to this that its ultimate form might be largely irrelevant if it only takes active form upon binding to a substrate or another protein required for function.

By no means am I saying that you can't predict protein form; I'm very much looking forward to a reliable and easy-to-use system for doing just that! But whichever project is successful in doing so will have to be a holistic set of simulations modelling every level of protein folding, including the odd or esoteric bits. Or at least one that can predict from sequence which peptides it cannot accurately predict..

Cathal Garvey

unread,

Nov 25, 2009, 9:09:44 AM11/25/09

to diy...@googlegroups.com

Whoops, that last bit was a little oddly phrased. I'm sure there's a joke in there. I meant a program that could determine which sequences it could predict folding for at around 100%, and which it couldn't model.

2009/11/25 Cathal Garvey <cathal...@gmail.com>

--
letters.cunningprojects.com
twitter.com/onetruecathal

Brent Neal

unread,

Nov 25, 2009, 9:13:14 AM11/25/09

to diy...@googlegroups.com

On Wed, Nov 25, 2009 at 09:08, Cathal Garvey <cathal...@gmail.com> wrote:
> I'm afraid you might be mistaking me for an expert! ;)
>
>

Same here!

Your point is well-taken, truly. Biological systems are notoriously
messy. :D I guess my approach was "keep adding complexity to the
solution until your answers look good for a test system." In circa
2000, when I was looking at this, the level of understanding I had was
"add chaperones because the equilibrium state in water isn't the same
as in vacuo." Eventually, the brave soul who solves this problem may
have to add "with a ribosome sitting in an arbitrary orientation right
here." :D :D

B

--
Brent Neal
http://brentn.freeshell.org

Nathan McCorkle

unread,

Nov 25, 2009, 9:55:08 AM11/25/09

to diy...@googlegroups.com

Can someone define the acronyms that were used throughout this thread?

--

You received this message because you are subscribed to the Google Groups "DIYbio" group.
To post to this group, send email to diy...@googlegroups.com.
To unsubscribe from this group, send email to diybio+un...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/diybio?hl=en.

--
Nathan McCorkle
Rochester Institute of Technology
College of Science, Biotechnology/Bioinformatics

Meredith L. Patterson

unread,

Nov 25, 2009, 10:45:36 AM11/25/09

to diy...@googlegroups.com

On Wed, Nov 25, 2009 at 3:55 PM, Nathan McCorkle <nmz...@gmail.com> wrote:
> Can someone define the acronyms that were used throughout this thread?

O(n) is what's called "big-O" notation -- it describes the worst case
runtime of an algorithm in terms of the size of the input to that
algorithm. O(n) itself is linear runtime, i.e., the runtime increases
in direct proportion to the size of the input. O(n^2), O(n^2 + n), and
so on are what's called polynomial runtime -- the runtime increases as
some polynomial function of the size of the input. There are other
classes, such as exponential runtime (O(2^n)), factorial runtime
(O(n!)), basically any function of n.

P is the class of all algorithms which run in polynomial time or less
(e.g., linear, logarithmic).

NP is the class of all algorithms for which a possible solution can be
evaluated for correctness in polynomial time. NP actually means
"nondeterministic polynomial", and there's a technical definition for
it but the one I gave is usually the only one that matters.

An NP-hard problem has no polynomial-time solution. It might have an
exponential-time solution, but not a polynomial-time solution.

An NP-complete problem is NP-hard and also in NP -- possible solutions
can be evaluated in polynomial time, but no polynomial-time algorithm
exists.

The "P=NP" question boils down to "is it impossible to ever find a
polynomial-time solution to an NP-complete problem?" It's one of the
great unanswered problems of mathematics/computer science.

Cheers,
--mlp

Brent Neal

unread,

Nov 25, 2009, 1:51:12 PM11/25/09

to diy...@googlegroups.com

On Wed, Nov 25, 2009 at 09:55, Nathan McCorkle <nmz...@gmail.com> wrote:
> Can someone define the acronyms that were used throughout this thread?
>

DFT = Density functional theory. Its a method in computational
chemistry that lets you approximate the wave functions of electrons
with a functional based on the electron probability density. Kohn and
Sham won the Nobel Prize *mumble* years ago for developing it. Its
very widely used, even though until recently, calculations for more
than a few hundred atoms were intractable.

KE= Kinetic energy, i.e. heat.

In addition to Meredith's great exposition on P=NP, I'd like to
explain that simulated annealing,which I referenced earlier, is a
common technique used to "solve" NP-complete problems. (See _Numerical
Recipes_ for an example of using simulated annealing to solve the
Travelling Salesman problem.) Simulated annealing and other stochastic
(Monte Carlo) methods for solving NP problems will always return a
solution. It is, however, not guaranteed that this solution
simultaneously be a global minimum (i.e., the 'correct' solution) and
for said solution to be found in a bounded length of time. :)

Eugen Leitl

unread,

Nov 25, 2009, 2:36:06 PM11/25/09

to diy...@googlegroups.com

On Wed, Nov 25, 2009 at 09:03:32AM -0500, Brent Neal wrote:

> That's partially true, I think. The "correct" folded state of a
> protein in a particular environment should be an equilibrium state,

Not necessarily. The folding funnel is kinetically controlled. Chaperones
take care of misfolds. What you can reach kinetically is not necessarily
the absolute energetic minimum, the energetic difference that far down
is small, and how do you prove you're in the absolute minimum anyway? Certainly
not numerically.

> should it not? Its part of why we were looking at the problem, since
> adding the chaperone molecules necessary to distinguish the in vacuo
> equilibrium from the solvated equilibrium pushed it up into the 100M

You're talking about equilibria in modeling? Fast folders take us, slow folders
several ms and longer. So far only very fast, small folders are reachable
with MD.

> atom range that made it interesting to us. The scheme we envisioned
> involved keeping the phase space trajectory of the protein, then
> looking at the energy change over that trajectory, then either
> accepting it or rejecting it by adding a bit of KE to the system. We
> would thus be biased towards large scale motions that acted quickly to
> lower the energy of the system, which we postulated would be
> well-correlated with (but not exactly the same as) "good" folding
> motions.

That does not make much sense. How much of protein folding literature
did you study? What kind of modelling environment did you use, on
what particular system and using which parameters?

Eugen Leitl

unread,

Nov 25, 2009, 2:57:05 PM11/25/09

to diy...@googlegroups.com

On Wed, Nov 25, 2009 at 08:56:42AM -0500, Brent Neal wrote:

> Let me weigh in for just a bit on this as a former student of
> molecular modelling and simulation.

One of my hats resembles that remark.

> I also do not believe you will ever get good protein folding with a
> semi-empirical forcefield. You MAY get their with a reactive

Meh. You can build arbitrarily complex forcefields which are all
knobs, so there's no telling how far these would go. One of
ways to build such black box forcefields is to take really good structures
from PDB as training set (reserving some as controls), randomly
minimally distort these and then optimize forcefields evolutionary
using as fitness function those forcefields which regenerate the
original structure. I'm not sure whether anybody has done this yet,
I haven't touched the literature in a decade or so.

> bond-order potential. Falsifying that hypothesis would be a good
> master's thesis. (I don't think you'd actually have to show folding to

Hey, Minsky thought machine vision was just a Ph.D. problem.

> show that it would work.) The best bond-order potential I know of is
> the Brenner potential. I used to understand that potential really
> well, but 15 years later, I don't think that's true. You can find it
> in Phys Rev B 42, 9458 (1990)

Machine-phase people like Brenner's.

> Do not ignore DFT. There are several schema now for O(N) DFT. Further,
> my current work in the realm of chemical physics/physical chemistry
> has convinced me that for small molecules, keeping track of that
> electron density is important. Bond order potentials do this
> implicitly, DFT does it slightly more explicitly. Do not trust anyone
> who tells you that ab initio and DFT are the same thing. :)

They are not. There's a continuum between low level of theory and
high level of theory, though. Low level of theory typically scales
abysmally, and even those codes do a lousy job e.g. for heavy atoms
which have relativistic effects.

> Lastly, remember that all these models do is solve coupled
> differential equations. An integrator, whether its something simple,
> like the Runge-Kutta method, something arcane, like the Gear
> integrator, or elegant, like the Verlet integrators, is simply an
> iterative method to solve those coupled differential equations. If you
> don't understand partial differential equations, you are going to
> struggle mightily with all of this simulation. When you're stuck, go
> back to the basics.

I think protein folding could profit from a discrete physics approach,
where volume is cut up in voxels with enough state describing everything
in that voxel. This could be mapped directly to a 3d node lattice hardware,
so you would be able to run it as fast as your implementation does,
which could allow you to model ~fs within ~ns of wall clock time. Such
hardware would be able to do protein folding within reasonable times,
given that it's only 10^6 times slower than realtime.

> I hope all this is actually helpful and not either pedantic or
> discouraging. When I started doing molecular dynamics in the mid-90s,
> it was the fringe science of physics. Now, its gotten so commonplace

You're not giving the field justice. Of course the field overpromised
and underdelivered, which is why pharma nowadays takes a very dim view
of virtual screening. Can't really blame them for that.

> that its hard to find conference symposia that are dedicated to it
> anymore. Hopefully everyone here who has an interest will be able to
> get up to speed quickly and at least have a lot of fun watching their
> test systems evolve!

The machine phase people certainly could use some help. Protein folding
certainly has enough people working on it: http://predictioncenter.org/

Brent Neal

unread,

Nov 25, 2009, 3:11:01 PM11/25/09

to diy...@googlegroups.com

On Wed, Nov 25, 2009 at 14:36, Eugen Leitl <eu...@leitl.org> wrote:
> On Wed, Nov 25, 2009 at 09:03:32AM -0500, Brent Neal wrote:
>
>> That's partially true, I think. The "correct" folded state of a
>> protein in a particular environment should be an equilibrium state,
>
> Not necessarily. The folding funnel is kinetically controlled. Chaperones
> take care of misfolds. What you can reach kinetically is not necessarily
> the absolute energetic minimum, the energetic difference that far down
> is small, and how do you prove you're in the absolute minimum anyway? Certainly
> not numerically.

Remember, SA doesn't get you to an absolute minimum, it only gets you
to a local minimum, hence my choice of SA for the problem

>
>> should it not? Its part of why we were looking at the problem, since
>> adding the chaperone molecules necessary to distinguish the in vacuo
>> equilibrium from the solvated equilibrium pushed it up into the 100M
>
> You're talking about equilibria in modeling? Fast folders take us, slow folders
> several ms and longer. So far only very fast, small folders are reachable
> with MD.

There are several techniques for forcing systems to equilibria in
shorter times. With the stochastic scheme however, what you're doing
is performing a short MD calculation, then using SA to accept or
reject the move, which (theoretically) would have the affect of a
"quench."

And 12 years ago, I was doing MD simulations that ran for 100s of
nanoseconds. I'm quite sure ms are in range with modern computers.

>
>> atom range that made it interesting to us. The scheme we envisioned
>> involved keeping the phase space trajectory of the protein, then
>> looking at the energy change over that trajectory, then either
>> accepting it or rejecting it by adding a bit of KE to the system. We
>> would thus be biased towards large scale motions that acted quickly to
>> lower the energy of the system, which we postulated would be
>> well-correlated with (but not exactly the same as) "good" folding
>> motions.
>
> That does not make much sense. How much of protein folding literature
> did you study? What kind of modelling environment did you use, on
> what particular system and using which parameters?
>

Almost none, because at that time, that's how much of it there was, at
least with respect to MD. :) Remember - this was a slightly crackpot
theory of drunken grad students who were being paid to do other things
- in my case, nanoindentation of GaAs. :)

Brent Neal

unread,

Nov 25, 2009, 3:44:53 PM11/25/09

to diy...@googlegroups.com

On Wed, Nov 25, 2009 at 14:57, Eugen Leitl <eu...@leitl.org> wrote:
> On Wed, Nov 25, 2009 at 08:56:42AM -0500, Brent Neal wrote:
>
>> Let me weigh in for just a bit on this as a former student of
>> molecular modelling and simulation.
>
> One of my hats resembles that remark.
>
>> I also do not believe you will ever get good protein folding with a
>> semi-empirical forcefield. You MAY get their with a reactive
>
> Meh. You can build arbitrarily complex forcefields which are all
> knobs, so there's no telling how far these would go. One of
> ways to build such black box forcefields is to take really good structures
> from PDB as training set (reserving some as controls), randomly
> minimally distort these and then optimize forcefields evolutionary
> using as fitness function those forcefields which regenerate the
> original structure. I'm not sure whether anybody has done this yet,
> I haven't touched the literature in a decade or so.

The problem is that these arbitrarily complex forcefields are subject
to overdetermination. And frankly, 99.9% of forcefields are utter
crap. We always built our own fields so that we could show their
accuracy and precision with real world data (i.e., comparing our
resulting structures against neutron scattering data) and we always
used our own integrators, so that we could incorporate the nice
touches that crap packages like AMBER and Cerius2 never would touch,
like multiple time scale MD and FMM. I still give the guys at
Accelrys a hard time at conferences about not having a decent
integrator in their software. :)

Your approach for building a field from PDB is a good start, but such
forcefields can mispredict structures in macromolecular assemblies or
gels of the molecule, especially when the systems are under strain.

Building a truly good forcefield takes years and by truly good, I mean
suitable for use on not only non-trivial problems, but also problems
that are nearly intractable with today's technology*. Nowadays, I
think its easier just to let the physics of the environment drive
system evolution when you can, especially now that DFT is
computationally much easier.

(* As a note, it wasn't that long ago (20 years-ish) that a single
potential that would correctly model silicon and silica was considered
intractable. Then, we got the Stillinger-Weber potential and hordes of
computer jocks could get funding from Sematech on modelling
deformation states in Si-SiO2 multilayer structures. :D)

>
>> bond-order potential. Falsifying that hypothesis would be a good
>> master's thesis. (I don't think you'd actually have to show folding to
>
> Hey, Minsky thought machine vision was just a Ph.D. problem.
>
>> show that it would work.) The best bond-order potential I know of is
>> the Brenner potential. I used to understand that potential really
>> well, but 15 years later, I don't think that's true. You can find it
>> in Phys Rev B 42, 9458 (1990)
>
> Machine-phase people like Brenner's.

Oh yeah, totally. Primarily because Brenner made his name by
simulating some really funky fullerene structures. (Full disclosure: I
did my senior thesis with Brenner, simulating funky fullerene
structures)

>
>
> You're not giving the field justice. Of course the field overpromised
> and underdelivered, which is why pharma nowadays takes a very dim view
> of virtual screening. Can't really blame them for that.

It is quite probable that I'm overly cynical about the field, which is
why I jumped ship to software development and then to polymer
physics/physical chemistry.

>
> The machine phase people certainly could use some help. Protein folding
> certainly has enough people working on it: http://predictioncenter.org/

The machine phase problems are really interesting, too, for those of
you out there who are interested in pure nanotech as will as the cool
bio stuff we do here. :)

Eugen, it still amazes me that we never ran into each other at any
conferences. :D :D

B

Meredith L. Patterson

unread,

Nov 25, 2009, 3:46:34 PM11/25/09

to diy...@googlegroups.com

On Wed, Nov 25, 2009 at 4:45 PM, Meredith L. Patterson
<clon...@gmail.com> wrote:
> P is the class of all algorithms which run in polynomial time or less
> (e.g., linear, logarithmic).
>
> NP is the class of all algorithms for which a possible solution can be
> evaluated for correctness in polynomial time. NP actually means
> "nondeterministic polynomial", and there's a technical definition for
> it but the one I gave is usually the only one that matters.

I misspoke myself; I should have said P is the class of *problems* for
which polynomial-time solution algorithms exist, and NP is the class
of problems for which a possible solution can be evaluated for
correctness in polynomial time.

Cheers,
--mlp

Meredith L. Patterson

unread,

Nov 25, 2009, 3:49:17 PM11/25/09

to diy...@googlegroups.com

On Wed, Nov 25, 2009 at 7:51 PM, Brent Neal <bre...@gmail.com> wrote:
> In addition to Meredith's great exposition on P=NP, I'd like to
> explain that simulated annealing,which I referenced earlier, is a
> common technique used to "solve" NP-complete problems.

Compressed annealing is an improvement on this technique which uses
both temperature and pressure as parameters to the heuristic solver.
See http://portal.acm.org/citation.cfm?id=1235116 for a paper on it,
http://sixdemonbag.org/Djinni/ for a very good library which
implements it.

Cheers,
--mlp

Eugen Leitl

unread,

Nov 25, 2009, 4:24:23 PM11/25/09

to diy...@googlegroups.com

On Wed, Nov 25, 2009 at 03:44:53PM -0500, Brent Neal wrote:

> The problem is that these arbitrarily complex forcefields are subject
> to overdetermination. And frankly, 99.9% of forcefields are utter
> crap. We always built our own fields so that we could show their
> accuracy and precision with real world data (i.e., comparing our
> resulting structures against neutron scattering data) and we always
> used our own integrators, so that we could incorporate the nice
> touches that crap packages like AMBER and Cerius2 never would touch,
> like multiple time scale MD and FMM. I still give the guys at
> Accelrys a hard time at conferences about not having a decent
> integrator in their software. :)

Wow, cool stuff. I never did any more than using off-shelf
packages like EGO and NAMD (along with SOLVATE, X-PLOR,
VMD) to model hydrated bilayers and a bit of polylysine
with counterions, on a crap Linux box I built myself in 1998.
My god, given that NAMD supports CUDA and quite powerful
hardware we have today people really have it easy these days.
(Uphill. Both ways. Barefoot. In the snow).

> Your approach for building a field from PDB is a good start, but such
> forcefields can mispredict structures in macromolecular assemblies or
> gels of the molecule, especially when the systems are under strain.

The problem with PDB structures is that these are typically from
crystals (though nowadays probably increasingly NMR), though
that seems to not have surprisingly little impact.

> Building a truly good forcefield takes years and by truly good, I mean
> suitable for use on not only non-trivial problems, but also problems
> that are nearly intractable with today's technology*. Nowadays, I
> think its easier just to let the physics of the environment drive
> system evolution when you can, especially now that DFT is
> computationally much easier.

How large systems can you do with DFT these days? Parallizable much?

> Oh yeah, totally. Primarily because Brenner made his name by
> simulating some really funky fullerene structures. (Full disclosure: I
> did my senior thesis with Brenner, simulating funky fullerene
> structures)

Very interesting. Have you worked with Freitas/Merkle/Drexler?

> It is quite probable that I'm overly cynical about the field, which is
> why I jumped ship to software development and then to polymer
> physics/physical chemistry.

Heh, I did polymer science briefly, too. Could actually use some
of it in synthetic ice blocker (heterogenous and homogenous ice
nuclear inhibitors) work later on.

> The machine phase problems are really interesting, too, for those of
> you out there who are interested in pure nanotech as will as the cool
> bio stuff we do here. :)

Very much so. There are probably less than 10 people working in
machine-phase world-wide right now. Every single person, whether
computationally or in the lab will make a difference.

>
> Eugen, it still amazes me that we never ran into each other at any
> conferences. :D :D

Heh, as a semiamateur dabbler I could never afford going to any.
My only interest in the PFP was the inverse PFP, to be able to design
sequences which fold into a desired shape, e.g. for convergent
self-assembly for molecular electronics and such.

It is truly interesting to see what kind of people are drawn to
DIYbio.

Brent Neal

unread,

Nov 25, 2009, 6:11:35 PM11/25/09

to diy...@googlegroups.com

On 25 Nov, 2009, at 16:24, Eugen Leitl wrote:

>

>
>> Building a truly good forcefield takes years and by truly good, I
>> mean
>> suitable for use on not only non-trivial problems, but also problems
>> that are nearly intractable with today's technology*. Nowadays, I
>> think its easier just to let the physics of the environment drive
>> system evolution when you can, especially now that DFT is
>> computationally much easier.
>
> How large systems can you do with DFT these days? Parallizable much?

I suspect O(10^5) atoms. I know that Fuyuki, our collaborator in those
days was doing 11k atoms, and a paper I reviewed for the weekly group
meeting on the subject had done 40k. And yeah, all of our algorithms
were highly parallelizable. One of the profs in the group was joint
with CompSci and most of his research was in efficiently
parallelizable solvers for MD and other problems. At the time I left,
our O(N) DFT code had a parallel efficiency of around 95%.

>
>> Oh yeah, totally. Primarily because Brenner made his name by
>> simulating some really funky fullerene structures. (Full
>> disclosure: I
>> did my senior thesis with Brenner, simulating funky fullerene
>> structures)
>
> Very interesting. Have you worked with Freitas/Merkle/Drexler?

Not at all. Working for Don was a senior research thing, and, while
fun, the level of science in the simulation of fullerenes was a LOT
more mature than the level of science in the fullerenes themselves. I
wanted to spend more time doing research in the math.

>
>> It is quite probable that I'm overly cynical about the field, which
>> is
>> why I jumped ship to software development and then to polymer
>> physics/physical chemistry.
>
> Heh, I did polymer science briefly, too. Could actually use some
> of it in synthetic ice blocker (heterogenous and homogenous ice
> nuclear inhibitors) work later on.

Cool! Were you working with PEGs?

>
>
>>
>> Eugen, it still amazes me that we never ran into each other at any
>> conferences. :D :D
>
> Heh, as a semiamateur dabbler I could never afford going to any.
> My only interest in the PFP was the inverse PFP, to be able to design
> sequences which fold into a desired shape, e.g. for convergent
> self-assembly for molecular electronics and such.
>

Ahh. That's a really interesting problem. I interviewed a physicist
for one of our open job positions who was working in molecular
electronics. He'd done some clever Raman scattering studies on a
particularly funky small molecule that I thought would be an
interesting thermally- driven flipflop.

Eugen Leitl

unread,

Nov 26, 2009, 8:13:29 AM11/26/09

to diy...@googlegroups.com

On Wed, Nov 25, 2009 at 06:11:35PM -0500, Brent Neal wrote:

> > Heh, I did polymer science briefly, too. Could actually use some
> > of it in synthetic ice blocker (heterogenous and homogenous ice
> > nuclear inhibitors) work later on.
>
> Cool! Were you working with PEGs?

We evaluated many systems, PEGs included. The most fertile system
turned out to be short-chained PVA/PVAc copolymer (mostly PVA) and
polyglycerols. We did use PEGs for chilling injury management, since
they prevent cold agglutination. Rabbits can tolerate ridiculous
concentrations of PEG in their bloodstream for days.

Brent Neal

unread,

Nov 26, 2009, 11:39:19 PM11/26/09

to diy...@googlegroups.com

On 26 Nov, 2009, at 8:13, Eugen Leitl wrote:

> We evaluated many systems, PEGs included. The most fertile system
> turned out to be short-chained PVA/PVAc copolymer (mostly PVA) and
> polyglycerols. We did use PEGs for chilling injury management, since
> they prevent cold agglutination. Rabbits can tolerate ridiculous
> concentrations of PEG in their bloodstream for days.

This may be a stupid question, but when you say polyglycerol, what was
the other component in the polymer. I'm not aware of any reaction that
will cause two primary alcohols or a primary and a secondary alcohol
to react with one another.

B

--
Brent Neal, Ph.D.
http://brentn.freeshell.org

<bre...@gmail.com>

Eugen Leitl

unread,

Nov 29, 2009, 4:11:11 AM11/29/09

to diy...@googlegroups.com

On Thu, Nov 26, 2009 at 11:39:19PM -0500, Brent Neal wrote:

> This may be a stupid question, but when you say polyglycerol, what was
> the other component in the polymer. I'm not aware of any reaction that
> will cause two primary alcohols or a primary and a secondary alcohol
> to react with one another.

No, no. The polyglycerol and the polyvinyl alcohol do not react,
they're just synthetic analogs of http://en.wikipedia.org/wiki/Antifreeze_protein

Specifically, they prevent homogeneous and non-homogeneous ice
nucleation, allowing you to supercool a large volume so that
you can vitrify, if assisted with conventional colligative cryoprotectants
(glycerol, ethanediol, propyleneglycol, DMSO and the like).

See http://www.21cm.com/abstracts.jsp for a few publications.

Brent Neal

unread,

Nov 29, 2009, 1:32:43 PM11/29/09

to diy...@googlegroups.com

On 29 Nov, 2009, at 4:11, Eugen Leitl wrote:

> On Thu, Nov 26, 2009 at 11:39:19PM -0500, Brent Neal wrote:
>
>> This may be a stupid question, but when you say polyglycerol, what
>> was
>> the other component in the polymer. I'm not aware of any reaction
>> that
>> will cause two primary alcohols or a primary and a secondary alcohol
>> to react with one another.
>
> No, no. The polyglycerol and the polyvinyl alcohol do not react,
> they're just synthetic analogs of http://en.wikipedia.org/wiki/Antifreeze_protein
>
>

You misunderstood the question. Glycerol has 2 primary and 1 secondary
alcohol. I know of no reaction that would cause those to polymerize.
Polyvinyl alcohol is a different beast altogether. It easily
polymerizes via a radical mechanism through the double bond, leaving a
pendant primary alcohol. Typically, things like polyethylene glycol
are not reacted through the alcohols, but by epoxidizing the monomer
to ethylene oxide (or propylene oxide, in the case of PPG). EO is not
a very fun chemical, due to both its toxicity and explosive nature. :)
Alternatively, polyols can be polymerized with a co-mer thats more
reactive - polyacids and polyols will give you polyesters,
polyisocyanates and polyols will give you polyurethanes.

Apparently, polyglycerol is a bit of misnomer, as it refers to di- or
trimerized glycerol created by dehydration, at least from what I can
tell from Solvay's website. :)

Reply all

Reply to author

Forward