Expected behavior of setPairMask

Niklas Thompson

unread,

Feb 13, 2023, 7:14:20 PM2/13/23

to diffpy-users

Hello,

I frequently encounter the use case where I would like to calculate the partial PDF of a large structure (typically a frame from an MD simulation), where I only include a few atom pairs, e.g., the intramolecular pairs from a molecule, while ignoring everything else in its environment (e.g., solvent molecules). In principle, I *feel* like I should be able to do this in one of two ways using the DebyePDFCalculator:

(1) Extract the coordinates of the molecule of interest from the rest of the coordinates, pack these into a Structure() object, and pass that to the calculator.

(2) Pass the calculator all of the coordinates, but then set a suitable pair mask so all of the unwanted pairs are not calculated in the Q-space summation.

However, these two approaches do not yield identical results; I attach an example below where the PDF is calculated from either (1) the isolated structure of a molecule solvated in a box of water, (2) the whole box, but with a pair mask only including the atoms in the molecule. As you can see, there are significant differences between the two approaches. Looking in reciprocal space, the computed F(Q)'s seem to be different only by a smooth polynomial factor, i.e. F_1(Q) = F_2(Q) * P(Q), but I don't understand how this produces the large error in real-space.

The only difference between how these two calculations are performed, to my knowledge, is that the calculator is "aware" of (say) the density of the full system in (2), which maybe is included in some opaque PDF baseline calculation, but intuitively I wouldn't think that this would result in the sharp "peaky" discrepancies between the two calculations.

Is this the expected behavior of setPairMask, or have I uncovered an error? For completeness, below I include a python script and the relevant xyz files to reproduce the attached plot.

Best,

Nik

pairmask_testing.pdf

pairmask_test.py

isolated.xyz

frame.xyz

Niklas Thompson

unread,

Feb 14, 2023, 12:25:52 PM2/14/23

to diffpy-users

Here's the issue in an even simpler system:

------------------------------------------------------------------------------------------------

from diffpy.structure import loadStructure
from diffpy.srreal.pdfcalculator import DebyePDFCalculator

# Create Pt dimer embedded in Cr matrix

embedded = Structure()
embedded.addNewAtom(atype='Pt', xyz=[0, 0, 1])
embedded.addNewAtom(atype='Pt', xyz=[0, 0, -1])
embedded.addNewAtom(atype='Cr', xyz=[3, 0, 0])
embedded.addNewAtom(atype='Cr', xyz=[-3, 0, 0])
embedded.addNewAtom(atype='Cr', xyz=[0, 3, 0])
embedded.addNewAtom(atype='Cr', xyz=[0, -3, 0])

# Replicate just the Pt dimer
molecule = Structure()
molecule.addNewAtom(atype='Pt', xyz=[0, 0, 1])
molecule.addNewAtom(atype='Pt', xyz=[0, 0, -1])

calc = DebyePDFCalculator(qmin=1, qmax=20)
# Large system calculation

r, g_tot = calc(embedded)

# Isolated system calculation
r, g_mol = calc(molecule)

# Calculate all the pairs summing to the large system
calc.setPairMask([0, 1], [0, 1], True, False)
r, g_PtPt = calc(embedded)
calc.setPairMask([0, 1], [2, 3, 4, 5], True, False)
r, g_PtCr = calc(embedded)
calc.setPairMask([2, 3, 4, 5], [2, 3, 4, 5], True, False)
r, g_CrCr = calc(embedded)

------------------------------------------------------------------------------------------------

Plotting the various contributions to the total PDF of the large system, it appears as though setPairMask is doing exactly what I would expect. However, plotting the (normalized) Pt-Pt contribution calculated from the large system or from the isolated system, it appears as though the relative magnitudes of the Fourier truncation errors are different.

In this dummy example, it makes practically no difference. My concern is that, in systems with more complex internal structures and a mixture of light atom and heavy atom scatterers, this discrepancy makes using setPairMask an unreliable tool for dissecting the total scattering in terms of individual contributions (i.e., these errors may be of a similar magnitude as the real pair correlations in the system).

simple_test.pdf

Simon Billinge

unread,

Feb 15, 2023, 3:09:27 AM2/15/23

to diffpy...@googlegroups.com

Thanks for pointing this out Nik. It is a puzzle. I will think about it. Anyone else got any ideas?

--
You received this message because you are subscribed to the Google Groups "diffpy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to diffpy-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/diffpy-users/be50941b-77fb-4407-bbeb-1de923bc600bn%40googlegroups.com.

--

Simon Billinge

Professor, Columbia University

Niklas Thompson

unread,

Mar 16, 2023, 6:41:50 PM3/16/23

to diffpy-users

In the spirit of solving my own problems, here is what's going on.

The standard (atom normalized) DSE for the single-scattering intensity from coherent, elastic scattering events reads:

I_c(Q) = (S(Q)-1)*<f>^2 + <f^2> = (1/N)*sum_ij(f_j*f_i*sin(Qr_ij)/Qr_ij) + <f^2>

Since the DebyePDFCalculator actually just evaluates F(Q) = (Q*(S(Q)-1)), this reduces to:

F(Q) = (1/N<f>^2)*sum_ij(f_j*f_i*sin(Qr_ij)/r_ij)

Hence, in the simple example above of a Pt dimer embedded in a Cr matrix, what differs between doing a calculation of F_PtPt for the embedded structure versus the structure with just the two Pt atoms is in the normalization factor, 1/N<f>^2, since the two structures differ in both the total atom numbers and the average form factor. If the normalization was *just* the total number of atoms, the resultant partial PDFS would be identical after normalization, since N is just a constant, but since the average form factors are included in the normalization, the difference becomes non-linear as the form factors are Q-dependent.

The attached plot shows the two F_PtPt calculations for the embedded system versus the isolated Pt-Pt system, and the result of renormalizing the isolated calculation by the ratio (2*<f>^2_isolated/6*<f>^2_embedded), which recovers F(Q) for the embedded structure.

But all of this begs the question: What is the proper form factor to use in calculating the scattering from a solute in dilute solution? In the time-resolved X-ray solution scattering community, this issue does not seem to be totally relevant, since they typically only care about difference signals where the average form factors are necessarily the same for the "laser on" and "laser off" data, but in the case of a static measurement on dilute solutions, the correct procedure is less clear to me...