Bugfix for GeneralizedRadialDistributionFunction.from_preset

38 views
Skip to first unread message

ldwi...@umich.edu

unread,
Jun 21, 2018, 3:58:01 PM6/21/18
to matminer
Hello,

I ran into a bug while testing the GRDF in the mat miner.featurizers.site module. The GeneralizedRadialDistributionFunction.from_preset defines bins using lambda functions in a loop, but they all take the final value instead of being defined with the value of the variable at each step of the loop. The lambda functions need to define a local variable instead of taking the variable from the outer scope to have the proper behavior.

This line of code: lambda d: np.exp(-width * (d - center)**2.)))
should be like this: lambda d, center=center: np.exp(-width * (d - center)**2.)))

and similarly for the lambda in the histogram preset.


However, with this bug fix, the default values for the preset (cutoff=10.0; width=0.5; spacing=0.5) produce the following error for Gaussian functions:

ValueError: Numerical integration does not play well with the chosen bin functionals, choose new ones.


Broader bin values of (cutoff=10.0; width=1.0; spacing=1.0) will run for both Gaussian and Histogram preset functions.

The Histogram preset does sometimes produce values of "inf" for some bins though, and I have not explored the cause of this behavior.


Best,
Logan Williams



Anubhav Jain

unread,
Jun 21, 2018, 7:25:05 PM6/21/18
to ldwi...@umich.edu, matminer, Maxwell Dylla, Saurabh Bajaj
Hi Logan

Thanks for reporting this, I've cc'ed Max Dylla and Saurabh Bajaj who implemented that descriptor.

Or let me know if you just want to submit a pull request for the patch and I will merge it in. That way the patch can be properly attributed to your Github account.

Best,
Anubhav

--
You received this message because you are subscribed to the Google Groups "matminer" group.
To unsubscribe from this group and stop receiving emails from it, send an email to matminer+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/matminer/7ab7862a-ba42-4f26-aac0-97e600b866f8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
Best,
Anubhav

Saurabh Bajaj

unread,
Jun 21, 2018, 7:49:15 PM6/21/18
to Anubhav Jain, ldwi...@umich.edu, matminer, Maxwell Dylla
I will defer this to Max Dylla who I think implemented the latest version of this descriptor(?)

Saurabh



On Thu, Jun 21, 2018 at 4:24 PM, Anubhav Jain <AJ...@lbl.gov> wrote:
Hi Logan

Thanks for reporting this, I've cc'ed Max Dylla and Saurabh Bajaj who implemented that descriptor.

Or let me know if you just want to submit a pull request for the patch and I will merge it in. That way the patch can be properly attributed to your Github account.

Best,
Anubhav

On Thu, Jun 21, 2018 at 12:58 PM <ldwi...@umich.edu> wrote:
Hello,

I ran into a bug while testing the GRDF in the mat miner.featurizers.site module. The GeneralizedRadialDistributionFunction.from_preset defines bins using lambda functions in a loop, but they all take the final value instead of being defined with the value of the variable at each step of the loop. The lambda functions need to define a local variable instead of taking the variable from the outer scope to have the proper behavior.

This line of code: lambda d: np.exp(-width * (d - center)**2.)))
should be like this: lambda d, center=center: np.exp(-width * (d - center)**2.)))

and similarly for the lambda in the histogram preset.


However, with this bug fix, the default values for the preset (cutoff=10.0; width=0.5; spacing=0.5) produce the following error for Gaussian functions:

ValueError: Numerical integration does not play well with the chosen bin functionals, choose new ones.


Broader bin values of (cutoff=10.0; width=1.0; spacing=1.0) will run for both Gaussian and Histogram preset functions.

The Histogram preset does sometimes produce values of "inf" for some bins though, and I have not explored the cause of this behavior.


Best,
Logan Williams



--
You received this message because you are subscribed to the Google Groups "matminer" group.
To unsubscribe from this group and stop receiving emails from it, send an email to matminer+unsubscribe@googlegroups.com.


--
Best,
Anubhav



--

ldwi...@umich.edu

unread,
Jun 25, 2018, 1:15:53 PM6/25/18
to matminer
Hi Saurabh and Anubhav,

I think I have successfully submitted a pull request with the code updates, but I'm new to using Github so I'm not certain. I saw the AngularFourierSeries.from_preset uses the same code and updated those lambda functions as well. The commit fails a test on ci/circleci which I think is from changing the default bin sizes in the GRDF.from_preset and thus having half as many bins as expected.

error snippet:
"
AssertionError:
Items are not equal:
  ACTUAL: 10
  DESIRED: 20
"

Best,
Logan Williams

On Thursday, June 21, 2018 at 5:49:15 PM UTC-6, Saurabh Bajaj wrote:
I will defer this to Max Dylla who I think implemented the latest version of this descriptor(?)

Saurabh


On Thu, Jun 21, 2018 at 4:24 PM, Anubhav Jain <AJ...@lbl.gov> wrote:
Hi Logan

Thanks for reporting this, I've cc'ed Max Dylla and Saurabh Bajaj who implemented that descriptor.

Or let me know if you just want to submit a pull request for the patch and I will merge it in. That way the patch can be properly attributed to your Github account.

Best,
Anubhav

On Thu, Jun 21, 2018 at 12:58 PM <ldwi...@umich.edu> wrote:
Hello,

I ran into a bug while testing the GRDF in the mat miner.featurizers.site module. The GeneralizedRadialDistributionFunction.from_preset defines bins using lambda functions in a loop, but they all take the final value instead of being defined with the value of the variable at each step of the loop. The lambda functions need to define a local variable instead of taking the variable from the outer scope to have the proper behavior.

This line of code: lambda d: np.exp(-width * (d - center)**2.)))
should be like this: lambda d, center=center: np.exp(-width * (d - center)**2.)))

and similarly for the lambda in the histogram preset.


However, with this bug fix, the default values for the preset (cutoff=10.0; width=0.5; spacing=0.5) produce the following error for Gaussian functions:

ValueError: Numerical integration does not play well with the chosen bin functionals, choose new ones.


Broader bin values of (cutoff=10.0; width=1.0; spacing=1.0) will run for both Gaussian and Histogram preset functions.

The Histogram preset does sometimes produce values of "inf" for some bins though, and I have not explored the cause of this behavior.


Best,
Logan Williams



--
You received this message because you are subscribed to the Google Groups "matminer" group.
To unsubscribe from this group and stop receiving emails from it, send an email to matminer+u...@googlegroups.com.


--
Best,
Anubhav

Anubhav Jain

unread,
Jun 25, 2018, 4:16:43 PM6/25/18
to ldwi...@umich.edu, Maxwell Dylla, Maxwell Thomas Dylla, matminer
Hi Logan,

Thanks! I see the PR. 

I do need the tests to pass before merging, and think it's best if we can sort out what's happening with the bins to prevent further problems. Max - can you comment about the binning errors?

Best
Anubhav



For more options, visit https://groups.google.com/d/optout.


--
Best,
Anubhav

maxwelld...@u.northwestern.edu

unread,
Jun 29, 2018, 3:22:30 PM6/29/18
to matminer
One of the tests for GRDF is checking whether the correct bins are generated from the default preset. Since you changed the present bins, they no longer correspond to the bins hard-coded into the test on line 492 of test_site.py. If that is changed, then I think your changes will pass all of the tests.

The other errors seem to stem from the numerical integration. The power of these featurizers is that you can use any binning functional that you want, but the downside of that is that numerical integration is the only way to handle computing the volume of an arbitrary bin in GRDF. The histogram present is probably yielding infinity for a bin because the numerical integration doesn't handle a step function well and evaluates the volume as zero.

A general question for the group: is it worth it to allow arbitrary bins? or should we only allow bins where there is an analytic form for the volume of the bin and hard-code that in?

Best,
Max

Logan Ward

unread,
Jul 2, 2018, 11:15:25 AM7/2/18
to maxwelld...@u.northwestern.edu, matminer

I could see a few steps forward around this issue with the bins and step functions:

 

  1. We could defer to a specialized class for binning based on step functions, as PRDFs are a common ML strategy and lend themselves well to other optimizations (e.g., using numpy’s histogram function). Are the other binning strategies all easier to integrate?
  2. Hard code common binning functions and their integrals. We could still allow users to provide custom functions to the “bins” function.

 

Personally, I’d pick with option #2. Adding defaults could lead to additional stability and performance by avoiding numerical integration, allow for easier pickling / multiprocessing by avoiding lambda functions, and still preserve the original functionality. Many classes in scikit-learn (e.g., GridSearchCV) allows users to pass either strings for hard-coded options or a callable to do something advanced, and I think that could also work well here.

 

Logan

Max

 

https://lh3.googleusercontent.com/IXMIh3o7c0Z8NoYvfH2sja8h4DdbhC5O9LRNueZJMvZkspIznZ2V_zuM9Z4B9yoKhVOq4E-Yywthrv4qVjSHk-x5R8caF4ah6L6rmPb4Fc7ml2RLDuTi735XEmRUWiaLX5HIZztm

Saurabh Bajaj

Data Engineer | Citrine Informatics

sba...@citrine.io

 

 

--
You received this message because you are subscribed to the Google Groups "matminer" group.
To unsubscribe from this group and stop receiving emails from it, send an email to matminer+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/matminer/500b33af-5a9f-4e5f-8f10-b3d07ade5ac7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


 

--

Best,
Anubhav

--

You received this message because you are subscribed to the Google Groups "matminer" group.
To unsubscribe from this group and stop receiving emails from it, send an email to matminer+u...@googlegroups.com.

Reply all
Reply to author
Forward
0 new messages