PDF data quality test and number of allowed fitting parameters

Skip to first unread message

ANUPAM KUMAR SINGH School of Materials Sci. & Tech

Jul 5, 2022, 4:32:20 AM7/5/22
to diffpy-users

Recently, I was doing PDF analysis and found two queries listed below:
(1) I am curious to know about the method for the PDF data quality test. I mean is there any standard method to test the quality of collected PDF data (obtained from Fourier transform of the high-Q x-ray diffraction data) to confirm that all the features in the PDF are from the sample only?

(2) My second query is given below:
How many number of fitting parameters are allowed for the refinement to avoid overfitting during the PDF refinements in the short-range (say up to r=2 to 6 Angstrom)? I mean if a phase containing the 11 fitting parameters provides a better fit compared to the phase containing the 6 fitting parameters in the same short-range then how to decide that material has a phase with 11 parameters or is it just an effect of overfitting due to increased number of parameters?

ANUPAM KUMAR SINGH School of Materials Sci. & Tech

Aug 13, 2022, 4:35:08 AM8/13/22
to diffpy-users
Hello everyone!

It would be very helpful for my research work if I can get a few suggestions regarding my queries given in the trailing message. 

Mikkel Juelsholt

Aug 13, 2022, 2:02:55 PM8/13/22
to diffpy-users
Hi Anupam 

Those are two good questions that do not have any easy answers. 

1) There is no way to just say if the data is correct and all you see correspond to real signal, it is up to you as the experimenter to figure out. This is true for all experiments, not just PDF experiments.
 However there are a few tricks you can employ. In general for all measurements, not just PDF, your signal will consist of two parts. A part from the stuff your are interest in and then a part coming from your instrument/electronics. A well-calibrated experiment should have full control over the instrument part. When we do X-ray PDF this could be detector effects (dead pixels, detector gaps, dynamic range), sample alignment,  Q-resolution etc. Also effects from the Fourier transform so your Qmax. In a good experiment, you should be able to correct all this. The detector needs a good mask and data should be integrated in the correct way. Samples should be well aligned. Resolution effects from the instrument can be obtained using a standard to refine the Qdamp and Qbroadd in PDFgui. The Fourier effects will lead to a series of ripples, but these can be calculated by PDFgui. If you are unsure if something is a ripple, then try to simulate a number of PDFs and vary the Qmax. Ripples should move and change as the Qmax is changed. Structural peaks should not move position, though they will get broader. Lastly there is noise. If I suspect something could be a noise effect I usually calculated the PDF to very high r, like 200-500 Å and see how the noise at the baseline looks like. If the noise is just as intense as the thing I am looking at then it is likely noise. Of course, you also need to do the correct correction for fluorescence, Compton scattering etc. to obtain your PDF.
Then there is the actual signal from your sample. This is very broadly composed of air-scattering, scattering from sample container and sample. We generally assume that the scattering from the air and sample container does not change with or without the sample, though this is not always true, so we just subtract the scattering with and without the sample. If you are not sure if you have done your background subtraction correct then I would advise to calculate the PDF of your background. Then see where the PDF peaks are and then use them to guide your background subtraction. For example if you have a SiO2 container then you would have a Si-O distance at 1.6 Å. Subtract your background until it is gone. In general your background should scale with measurement time. If you measured your sample for 10 min and your background for 5 min, then your background should be scaled by 2. 

2) In principle you can look at your chi-squared to see if you are overfitting. A good rule of thumb is to not have more parameters than you have peaks. (But be aware that there are many situations were you would break this rule) This means that from 2-6 Å you should only have something like 2-4 parameters. In general, a stand-alone fit over 4 Å should just not be done in the first place. Another good thing to check is if the parameters change anything meaningful. Not the actual numbers, but is the 11 parameter model telling you anything new or does it just give you a better fit? 

Hope this helps! 


ANUPAM KUMAR SINGH School of Materials Sci. & Tech

Aug 14, 2022, 1:27:50 AM8/14/22
to diffpy-users
Dear Sir,

Thank you very much for the detailed explanation of my queries. They are really helpful and cleared my several doubts related to the PDF data analysis. 

with best regards

Reply all
Reply to author
0 new messages