Interactions stop working in long simulations

52 views
Skip to first unread message

Anthony Trubiano

unread,
Nov 7, 2023, 1:10:46 PM11/7/23
to hoomd-users
Hello HOOMD devs,

I have recently come across this strange issue when running very long simulations with the Langevin integrator.

I am studying the self-assembly of spherical capsid shells out of rigid pentagonal subunit monomers. Each vertex of the subunit is an attractor site that allows them to bind and assemble larger structures. These simulations seem to be working just fine under typical conditions.

However,  I recently wanted to verify some predictions of how the system would approach equilibrium with relatively weak interactions, which takes quite long to reach. I ran long simulations and noticed that in all of them I would get the expected dynamics up until just before 1 billion time steps, at which point the assembled structures would start to break apart. Eventually all of the subunits break apart, and no longer interact to form any sized intermediates for the rest of the simulation. This makes me think something weird is happening to the interaction potential, though I have not noticed subunits overlapping, so the repulsion forces still seem to work?

As a test, I took a snapshot of the system after 300 million time steps and used this as an initial condition for a new simulation. This time, things started breaking apart after about 650million time steps, so around the same total simulation time as the original simulations.

I have not included my code to setup the simulation, as you probably do not want to run a 48 hour simulation to verify this on your end, but I can include it if it would help you at all. I did include a post-processed plot of the simulation data, showing the proportion of monomers (red, starts at 100%) and full shells (orange, starts at 0) as a function of time. The time 10 on the axis corresponds to 1 billion time steps, and you can see the jump in populations there.

Some additional info and thoughts:
I am using HOOMD version 3.9.0 with single precision. My best guess is that there is some accumulation of roundoff error in single precision that becomes critical for some reason after ~1 billion time steps. I am not sure if the absolute position of the particles is being stored as well as its position in the periodic box, but I could imagine this being an issue with such large mean square displacements. I already started another simulation using the double precision version and can comment if this fixes the issue in 2-3 days.

Thank you for your time,
Anthony Trubiano


strange_evidence2.pdf

Joshua Anderson

unread,
Nov 7, 2023, 1:53:53 PM11/7/23
to hoomd...@googlegroups.com
Anthony,

I am not aware of anything in HOOMD's code that would cause differing behavior for timesteps less than 1 billion vs timesteps greater than 1 billion. You can verify this is the case by setting `simulation.timestep = 1_000_000_000` *before* creating the simulation state.

A minimal working example does not need to be your full system. It could be ~10 particles interacting via LJ for example. Without a minimal working example, I can only make random guesses about the inputs you gave hoomd, how hoomd behaved, and the outputs you expect.

I suggest:
* You should plot the box volume, kinetic temperature, pressure, and potential energy as a function of time.
* Verify that the rigid body and pair interaction parameters remain constant.
* Test the simulation with a different thermostat.
* Check if the problem repeats in a minimal example.

I would also guess it possible that the larger assembled structures are metastable and break apart after some rare event.
------
Joshua A. Anderson, Ph.D.
Research Area Specialist, Chemical Engineering, University of Michigan

--
You received this message because you are subscribed to the Google Groups "hoomd-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hoomd-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hoomd-users/fa99eef2-4523-45bf-b0a5-779e408c14ddn%40googlegroups.com.
<strange_evidence2.pdf>

Anthony Trubiano

unread,
Nov 7, 2023, 2:49:05 PM11/7/23
to hoomd-users
Hi Joshua,

Thank you for the response and suggestions. I indeed verified that the value of simulation.timestep does not affect the simulation behavior.

"I would also guess it possible that the larger assembled structures are metastable and break apart after some rare event."

I also initially thought this, but since it happens at around the same point on dozens of different random seeds, it seems suspect. That plot was averaged over 30 simulations. They also remain as monomers after this point, even though the remaining simulation time is a few times longer than the full capsid assembly timescale.

I am running a few test simulations now, which will log the quantities you suggested, for both single and double precision, and a few different values for the box size and total number of particles, to see how consistent this strange behavior is. I'll report back on the results in a few days. 

All the best,
Anthony

Anthony Trubiano

unread,
Nov 10, 2023, 2:34:14 PM11/10/23
to hoomd-users
So I have some results from these tests. Turns out that our cluster did not have version 3.9.0 installed with double precision, so I only have single precision tests for now.

I tracked each of the 4 quantities you suggested, in simulations with 24, 48, and 100 rigid bodies. The volume remains fixed in each case, and the pressure undergoes small fluctuations about 0, so these seem fine. The potential energy and kinetic temperature behave erratically near the billion timestep mark though. I included plots of these quantities over time. The potential goes from fluctuating about some steady state value back to nearly 0 quite rapidly. The temperature increases steadily to around 60-70, when I have the integrator set to kT=1. The velocities being this large explains why there is no assembly any longer, but I imagine this is a symptom of the problem and not the problem itself.

We were not able to install double precision on our cluster due to some weird c++ dependency issues, but we were able to get 4.3.0 on there with the default mixed precision. I am performing the same set of simulations on this version now, so I can comment on how these go early next week.

Best,
Anthony
long_sim_temp.png
long_sim_small_temp.png
long_sim_potential.png
long_sim_small_potential.png

Joshua Anderson

unread,
Nov 10, 2023, 3:38:43 PM11/10/23
to hoomd...@googlegroups.com
Anthony,

Thanks. It is clear that the thermostat is not working as you intend.

If there is something fundamentally wrong with Langevin, than I would expect a) the problem to go away with another method like Bussi and b) the problem to manifest  in a much smaller test case with a minimal model. This could also achieve 1 billion steps in fewer hours.

If the problem doesn't occur in a minimal test, then the strategy is to add pieces from the full workflow one at a time until you find the one that causes the problem.

Here is my minimal test case:
```
import hoomd                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                        
simulation = hoomd.util.make_example_simulation()                                                                                                                                                                                                                       
                                                                                                                                                                                                                                                                        
lj = hoomd.md.pair.LJ(nlist=hoomd.md.nlist.Cell(buffer=0.4), default_r_cut=2.5)                                                                                                                                                                                         
lj.params[('A', 'A')] = dict(epsilon=1.0, sigma=1.0)                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
langevin = hoomd.md.methods.Langevin(filter=hoomd.filter.All(), kT=1.0)                                                                                                                                                                                                 
simulation.operations.integrator = hoomd.md.Integrator(dt=0.001, forces=[lj], methods=[langevin])                                                                                                                                                                       
thermodynamic_quantities = hoomd.md.compute.ThermodynamicQuantities(filter=hoomd.filter.All())                                                                                                                                                                          
simulation.operations.computes.append(thermodynamic_quantities)                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                        
end_step = 1_300_000_000                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                        
while simulation.timestep < end_step:                                                                                                                                                                                                                                   
    simulation.run(1e6)                                                                                                                                                                                                                                                 
    print(f'Step: {simulation.timestep}, TPS: {simulation.tps}, kT: {thermodynamic_quantities.kinetic_temperature}')     
```

Which continues to thermostat the system correctly at the end of the run:
Step: 1296000000, TPS: 2033094.7157835243, kT: 0.7605570555589607
Step: 1297000000, TPS: 2032297.268186012, kT: 1.4687843601607615
Step: 1298000000, TPS: 2031157.9631547946, kT: 1.0667267743624402
Step: 1299000000, TPS: 2031141.4608781843, kT: 1.8016305882657861
Step: 1300000000, TPS: 2029385.5020699734, kT: 0.699160693949592

Langevin is a very difficult integrator to break. The drag force applied to the velocities can absorb a massive amount of energy and the magnitude of the random force is set to achieve the canonical velocity distribution via the fluctuation-dissipation theorem. It is still possible to have unstable energy production in Langevin simulations when dt is set too large. You can verify your dt selection by performing constant volume simulations with no thermostat starting from some equilibrium configuration. If the system blows up and sends particles flying out of the box, it is too large.
------
Joshua A. Anderson, Ph.D.
Research Area Specialist, Chemical Engineering, University of Michigan
To view this discussion on the web visit https://groups.google.com/d/msgid/hoomd-users/17c06f49-017e-4596-a447-61ccd3b53a31n%40googlegroups.com.
<long_sim_temp.png><long_sim_small_temp.png><long_sim_potential.png><long_sim_small_potential.png>

Anthony Trubiano

unread,
Nov 13, 2023, 11:59:28 AM11/13/23
to hoomd-users
Hi Joshua,

Thanks for taking the time to test out this minimal case for me.

I ran the original system over the weekend using version 4.3.0 and it worked perfectly fine for all 1.25 billion time steps. I am unfortunately in a bit of a time crunch so I do not think I will be able to keep testing for what the issue was in version 3.9.0 now that I can generate results with version 4. I imagine old versions are not updated anyway, so the solution if this issue ever pops up again can be "use version 4".

Thank you for the assistance and all the best to you and the other HOOMD developers,
Anthony

Joshua Anderson

unread,
Nov 20, 2023, 6:08:38 AM11/20/23
to hoomd...@googlegroups.com
Thanks for the update, Anthony.

I can't name a single change between 3.9.0 and 4.0 that might lead to this change in behavior. As you say, the problem is fixed and it is not worth spending the effort to determine the original cause.

------
Joshua A. Anderson, Ph.D.
Research Area Specialist, Chemical Engineering, University of Michigan
Reply all
Reply to author
Forward
0 new messages