Possibility of forcing assignment of a particular replica to a matching client

65 views
Skip to first unread message

Taylor Baird

unread,
Feb 17, 2021, 8:45:04 AM2/17/21
to ipi-users
Hello, I'm just posting as I was wondering if you could help clarify something for me. I'm curious if there is possibly an official way to force i-pi to always dispatch the same replica to the client which was initially assigned that replica?

For some background, I have been using i-pi to introduce NQE's in a system which uses orbital-free DFT to propagate the electronic degrees of freedom. As this driver code requires the continuous propagation of these electronic degrees of freedom (the coefficients of the electronic density) it is necessary to be consistent with regards to which replica is passed to a given driver. Otherwise the density coefficients stored in memory by that client code will be incommensurate with the different replica geometry passed to it and due to the fact that orbital-free DFT doesn't represent the density as a square of orbitals like in KS-DFT, this means that this inconsistency makes it possible to generate negative densities, breaking the simulation. I have been using the same number of driver instances as there are replicas in the simulation and for a while i-pi assigns the correct driver instance its corresponding replica but there eventually comes a point where there is a mismatch.
At the moment, I'm getting around this by changing a few lines in i-pi's sockets.py file to ensure only match requests are assigned to the clients but I feel like this is a bit of a hacky solution. The only alternative I can see is passing the replica ID to the client upon assignment and loading in the correct density coefficients for that replica. However, I think this would mean storing all these previous density coefficients for each replica unless there is some efficient way to have the different clients pass them amongst themselves. Thanks in advance!

Best wishes,

Taylor

Michele Ceriotti

unread,
Feb 17, 2021, 9:19:18 AM2/17/21
to ipi-users
This is an excellent question. As you noticed, i-PI tries hard to assign the same replica ID to the same client, if at all possible, precisely because this often allows for faster electronic structure calculations. 
I can see two ways around this. One is to make your "hack" clean - which would mean implementing a third "matching mode": as you can see around line 200 in inputs/forcefields.py, you can specify matching="any"
that just picks the first client available, and "auto" (which is default) that tries to match and only falls back if no client is available. 
You could have a "match" mode that stops the calculation if it can't match replicas, but I fear it will be tricky to make this failproof: the whole idea in the client assignment code was to make it robust to hangups
of the clients. 

The other possibility is similar to what is implemented in quantum Espresso: the driver side checks the replica ID, and if the replica is not the same, it restarts the calculation from scratch, paying the price of re-initializing
the density/wavefunction. This usually gives you the most resilient setup, although at the price of running a more expensive calculation every now and then. 

Hope this helps. If you want to follow the i-PI feature route, I suggest we move the discussion to github, as this gets to be more of a feature request than something that can be solved on the user side.
Michele

Taylor Baird

unread,
Feb 17, 2021, 12:43:00 PM2/17/21
to ipi-users

Dear Michele, thanks a lot for your help! What you suggest makes sense. I had initially been searching for a means to specify matching="match" in <ffsocket>, or something like that as you say, as it appears that by only allowing for the matching modes "None" or "match", i-pi does manage to succeed in always passing a given replica to the right client. I'm guessing this is just because polling is allowed to keep going ahead when otherwise i-pi would resort to "any" or "free" assignments. At least in the case of the system I'm looking at, just falling back to a naive guess for the density and starting from scratch in the middle of a run didn't seem to work. If it is possible to implement another matching mode in a clean way, I reckon this would be a nice option to at least try for anyone else trying to use a driver code that functions in a similar manner. Trying to use this sort of matching could be attempted before implementing a means of storing density histories of all clients on the driver side and re-assigning them upon failure to match clients and replicas or upon hanging of a client. Should I open an issue in your github repository?

Best wishes,

Taylor

marian...@gmail.com

unread,
Feb 17, 2021, 1:37:57 PM2/17/21
to ipi-users
Dear Taylor,

I can just add one thing here:

In FHI-aims I also tried this matching at some point, because I wanted to decide when the code should use some experimental wave function extrapolation MDs and when not. Because the extrapolation technique that we were trying at the time did not represent a big gain, that ended up not representing a time gain for us. But starting from scratch in the middle of a run did work (I really had to call most of the initialization routines again though) -- one just lost time. Of course the orbital free case is something else (in particular because of the density history that needs to be store) and it is probably a similar problem that one would face for extended-lagrangian-MD which we are also working on now.  

If you want to try and start implementing this we can support you in the task. You are very welcome to contribute directly to the code and that is probably the fastest way for you to try to solve your current issue. You can fork the repository, make a new branch of your own, and try out a few solutions. We are quite available also in a Slack channel that we can invite you to!

Taylor Baird

unread,
Feb 17, 2021, 2:59:17 PM2/17/21
to ipi-users
Dear Mariana, 

thank you for your advice! If it is okay, I'd be more than happy to have a go at implementing this optional forced-matching mode. And I'm grateful for your support and invite. I'll follow your suggestion and fork your repository in that case. My first thought is that the forced matching could be implemented by simply changing the matching attribute's "options" entry in forcefield.py from "options": ["auto", "any"] to  "options": ["auto", "any", "match"] and then having one extra elif in sockets.py's pool_distribute() function which says elif self.match_mode == "match":  match_seq = ["match", "none"]. It seems to run as desired with these changes. Let me know if you think something along those lines sounds reasonable. Thanks again!

Best wishes,

Taylor

marian...@gmail.com

unread,
Feb 17, 2021, 3:05:01 PM2/17/21
to ipi-users
Sounds like a good try. Make sure to keep the current behavior the default and set up a way so the user can choose to force the matching. When you are done and tested you can open a PR back to the i-PI repo and it will be reviewed too.

Taylor Baird

unread,
Feb 17, 2021, 3:52:09 PM2/17/21
to ipi-users
Great! Will do - as it stands the default is still "auto" with "match" just being another option that the user could specify as the value of the matching attribute of <ffsocket>. Okay dokey - I'll try and do that asap! Appreciate all the help! 

Best wishes,

Taylor

Michele Ceriotti

unread,
Feb 17, 2021, 3:52:21 PM2/17/21
to ipi-users
what worries me is what happens if one of the client dies. then the calculation will be stuck right? I think in all cases you can't avoid figuring out a way to restart from scratch when needed. 
M

Taylor Baird

unread,
Feb 18, 2021, 5:49:04 AM2/18/21
to ipi-users
Hi Michele, yeah with those changes I'd suggested as soon as a client hangs the whole run would become stuck like you say. At least in the case I'm looking at this has never happened but if a different driver code did encounter this issue I guess there would be no option but to implement a means of loading the correct density histories into an alternative client. Do you still think it would be worth having this optional forced matching mode or does it kind of go against the desire to have i-pi be as robust as possible to client hang-ups?

Best wishes,

Taylor

marian...@gmail.com

unread,
Feb 18, 2021, 5:52:37 AM2/18/21
to ipi-users
Hi Taylor,

You could try implementing a clean exit in case of a clearly hanging simulation if you use this mode of matching replicas and clients. At least then i-PI would stop writing out all restart files. I don't think that is too bad since you could then restart the whole simulation from that point. For the automatic mode, i-PI can just continue going on, just forgetting about the client that died and using the other clients.

Michele Ceriotti

unread,
Feb 18, 2021, 6:12:55 AM2/18/21
to ipi-users
yeah what bugs me is that e.g. in a CPMD scenario restarting a calculation from scratch means there is a "glitch" in the electronic kinetic energy which means that a run will depend (significantly) on its restart history.
Is this the case here? personally I worry we are opening up a big risk for people to produce wrong results here.
however, if we want to have a "forced match, exit if one client dies", and then a clean restart can be made, I see no real harm.  

Taylor Baird

unread,
Feb 18, 2021, 6:34:27 AM2/18/21
to ipi-users
Hi Mariana, 

okay dokey - that sounds like a good idea. I'll have a shot at getting i-pi to carry out a clean exit if a client goes down when using this "match" mode whilst having "auto" continue to try and assign the replica to any free client if matching is not possible.

Taylor Baird

unread,
Feb 18, 2021, 6:35:31 AM2/18/21
to ipi-users
Hi Michele - yeah, in my case trying to restart with a guess for the density (i.e. not using the density from the previous two MD steps) just didn't work anyway. If a driver code was based on CPMD, wouldn't an "any" matching procedure lead to inconsistency in the electronic density propagation anyway if there came a point where replica<->driver matching wasn't strictly enforced? The forced matching + clean exit idea should be fail-proof although if client hanging was to keep occurring then some automated re-initialization of drivers (with the correct density histories) would probably be desirable although this would be client-side.

marian...@gmail.com

unread,
Feb 18, 2021, 6:36:39 AM2/18/21
to ipi-users
If a client of this type crashes without writing its history the simulation is lost regardless and there is nothing to do with i-PI right? It would always be the case.

The i-PI clean exit would have to be handled and also ensured in the client code when running in this mode, such that the clients also do a clean exit writing their own history and restart information. Then one needs to code a restart in which matching to the same clients as before is also ensured.

Taylor Baird

unread,
Feb 18, 2021, 7:03:49 AM2/18/21
to ipi-users
Ah yeah, that's a good point. It's actually a bit trickier than I first thought. As you point out, you'd have to make sure each client also writes out a restart file which itself contains the ID of the replica on which it was working (otherwise yeah the simulation would be lost). And then matching up of drivers and replicas would have to be done again upon restart before proceeding with the computation.

Michele Ceriotti

unread,
Feb 18, 2021, 7:18:56 AM2/18/21
to ipi-users
I think this needs a bit deeper discussion. Can you move the discussion to a GH issue?

Taylor Baird

unread,
Feb 18, 2021, 7:40:25 AM2/18/21
to ipi-users
Yeah, for sure - will do! 
Reply all
Reply to author
Forward
0 new messages