VMC orbital optimization vs multideterminant -- accuracy on water clusters

62 views
Skip to first unread message

Sander Vandenhaute

unread,
Sep 25, 2023, 4:02:55 PM9/25/23
to qmcpack
Hi all,

Thanks for all of the documentation and tutorials that have been made over the years. I've recently gained interest in performing VMC + FN-DMC on a variety of water cluster geometries. The number of atoms in those clusters would far exceed what would be possible with multireference methods incl CIPSI or even single-reference CCSD(T). As such, the nodes in my initial trial wavefunction might be determined by a single determinant of DFT orbitals, i.e. not very accurate.

I was wondering to what extent the fixed node error can be reduced using VMC optimization of the orbital coefficients, as opposed to considering multiple determinants as done in the youtube tutorial here: https://youtu.be/2H_FWv8SWkY?feature=shared&t=5847. Is backflow still a thing in qmcpack? There are a few unresolved issues, and its latest mention in the changelog dates back to 2018 based on a quick search.
Multiple determinants does not seem feasible for any reasonably nontrivial number of atoms, and I'm surprised this is even a thing in QMC given that its main selling point is scalability (and if I understand correctly, that gets lost when having to generate multi-reference trial wavefunctions with millions of determinants).

Thanks
Sander


Paul R. C. Kent

unread,
Sep 25, 2023, 5:40:35 PM9/25/23
to qmcpack

Thanks for the note Sander. These are all good questions. Ultimately those about accuracy will simply have to be tested – they are research questions.


Both for QMC and for many-body quantum chemical techniques, I think it is healthiest to think about how hard we have to work to obtain a specific property to a given accuracy. The real-world prefactors and size-scaling of every method differs for each system and property.


There are papers on up to about the water hexamer as well as bulk water in the literature that presumably you have looked at. They focus on QMC energies.


On the functionality related topics, backflow is fully available up to the last release version. For “real” ab initio calculations, the data on the utility of backflow is quite limited. Potentially importantly, it is also not a route to complete convergence/removal of nodal error. It is also somewhat costly, limiting interest and applicability. For this reason backflow is already not fully available in the development version – there just isn’t yet reason to prioritize it over other features that get more use, and we have a lot of development work improving the code for GPUs. Backflow will likely come back once popular features are improved since understanding where it is worthwhile would be useful.


We are currently exploring orbital optimization, as are many in the broader QMC community. We are not claiming full production capability yet, but all the pieces should be there and simple optimizations work.

 

Orbital optimizations provides a route to the best set of single particle orbitals for a given form of trial wavefunction (single det, or pfaffian etc.). There are not many papers that go into the costs of using this technique vs number of coefficients vs atomic number (etc.); this still needs to be explored more broadly. Particularly for systems like water clusters where we do not expect multideterminants to be strictly necessary, optimizing the orbitals should be an excellent route to high accuracy at reasonable cost. But I don’t think anyone knows where the limits are yet – new data is needed.

 

Multideterminants are certainly a convenient route for improving the wavefunction. One possibility is to use large multideterminant wavefunctions to calibrate the error of simpler wavefunctions, e.g. to pick the best nodal surface from available DFT functionals or to check the single determinant+orbital optimization results. Whether this is a viable route for you would depend on how accurate you need the results to be and how much computer budget you have.

 

Ultimately answering these questions will come down to someone trying out these techniques for their systems and properties of interest…

 

I hope this helps.


-- Paul

Sander Vandenhaute

unread,
Sep 26, 2023, 10:38:02 AM9/26/23
to qmcpack
Hi Paul, 

Thanks for the reply. Indeed, I'm going to try and figure out which DFT functional yields the best nodal surface.
To this end, I ran a simple PBE calculation in ORCA (cc-pVDZ basis), converted the .gbw file to molden using orca_2mkl, and then used molden2qmc to convert it into an .h5. I then generated a simple QMCPack input which is supposed to optimize the Jastrows.
The script crashes when it's reading the .h5 file -- any idea why? 

HDF5 read failure in hdf_archive::read /format
Fatal Error. Aborting at Unhandled Exception
MPICH ERROR [Rank 0] [job id 4614716.0] [Tue Sep 26 17:33:00 2023] [nid005005] - Abort(1) (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0


I've attached the molden file, job script, input .xml, and converted .h5. The same happens when I use a different basis, or use the .gbw from a CCSD(T) calculation ...

Thanks
Sander
orbitals.h5
test.xml
input.molden.input
job.sh

Sander Vandenhaute

unread,
Sep 26, 2023, 10:45:32 AM9/26/23
to qmcpack
Forgot to add: this was QMCPack 3.17.1, with a cray-hdf5-parallel module, version 1.12.2.1. All deterministic tests were passing, so I'm assuming the .h5 / .xml file is not what it should be ...

Paul R. C. Kent

unread,
Sep 26, 2023, 11:35:56 AM9/26/23
to qmcpack
I agree with your analysis. Some part of one of the formats is likely not quite right or has been updated. If you are OK with doing so, can you please post this as an issue on https://github.com/QMCPACK/qmcpack/ ?

Sander Vandenhaute

unread,
Sep 27, 2023, 7:46:19 AM9/27/23
to qmcpack
I went ahead and used PySCF instead, together with convert2qmc. Specifically, I was looking at a water dimer at cc-pVDZ / PBE level of theory. For an initial VMC run to estimate the energy, I used the following parameters together with the batched driver:

  <qmc method="vmc" move="pbyp" checkpoint="-1" gpu="yes">
    <estimator name="LocalEnergy" hdf5="no"/>
    <parameter name="walkers_per_rank">48</parameter>
    <parameter name="warmupSteps">100</parameter>
    <parameter name="blocks">100</parameter>
    <parameter name="steps">50</parameter>
    <parameter name="substeps">8</parameter>
    <parameter name="timestep">0.4</parameter>
    <parameter name="usedrift">no</parameter>
  </qmc>

This was executed on a single AMD EPYC node (48 available cores) with 4xMI250X (8 effective GPUs), using 8 MPI ranks and 6 threads. The compute time per block is about 5 seconds, and it scales linearly with walkers_per_rank. GPU usage according to rocm-smi is at 100% for all 8 GPUs, and CPU usage is about 400-550% which I guess is OK given the number of threads. I was wondering whether these are reasonable timings for you. The system only has 20 electrons, so I didn't really expect that 48 walkers per rank would already satisfy a single GPU completely... Also, GPU usage is at 100% but the power usage is relatively low, so I don't think we're maximally using it (though this is probably limited by the low number of electrons?).
The output file is attached. Does all of this make sense, or do you think something's up with the installation? 

Best wishes,
Sander
output.txt
Default.s000.scalar.dat

Paul R. C. Kent

unread,
Oct 1, 2023, 1:46:58 PM10/1/23
to qmcpack
Sander -

Missed seeing your report earlier.

First, if you want to check GPU installation, keep an eye on usage during one of the larger performance tests for NiO with batched in the name. You have to setup QMC_DATA for in the large input file to be available and this test to be activated. As noted in the manual these don't measure maximum performance, but they are large enough that GPU usage should be clear. I mention the NiO tests because they use the spline basis sets and utilize a fully GPU accelerated code path as a result. At least in a workstation environment we can very clearly see the different phases of execution of these runs on both NVIDIA and AMD GPUs.

Second, today the only fully accelerated code paths are single or multideterminants using splined orbitals+jastrow functions for the trial wavefunction. Other trial wavefunctions will use the CPU for parts of the evaluation and easily create an Amdahl bottleneck. Note that the multideterminant/table method is GPU accelerated as of QMCPACK v3.14.0 from April last year. Definitely more experience is needed since the first version is aimed at very large determinant expansions. What is not ported to GPUs yet is Gaussian basis set evalution / LCAO. This is actually being worked on right now, at least an initial version to get a sense of how real world calculations perform. A version will definitely be officially released in QMCPACK v4.0 and in the development version in working state "soon" (originally due yesterday, perhaps now in about a month), but I would caution about any speedups. Multideterminant LCAO will immediately work due to the design of the code, so it should be possible to run and keep the GPUs "warm" at that point for both all standard materials and all standard molecular calculations.

Third, to get good acceleration out of modern GPUs requires quite a large workload. This is most readily achieved by having an electron count >100, preferably many hundreds, as well as a large batch size. Small calculations of only a handful of electrons are very hard to accelerate in a general purpose code even with native CUDA on the mature NVIDIA platforms since the algorithms are quite branchy and the datasets small. I believe large cache CPUs have the potential to do very well for these calculations, although the performance will collapse as the problem size grows and when you run out of L3.

To gauge performance I suggest making a CPU only reference run with 1 walker/core.

Hope this helps.
Paul
Reply all
Reply to author
Forward
0 new messages