Issue with deserialization after upgrading python/jpmml packages

170 views
Skip to first unread message

Leonid Bakaleynik

unread,
Jan 17, 2023, 1:58:07 AM1/17/23
to Java PMML API

Hello,

I'm working on upgrading our python environment to python 3.10, scikit-learn 1.2.0, and numpy 1.23. Along with these upgrades, I also upgraded jpmml packages (pmml-evaluator/pmml-model to 1.6.4, pmml-evaluator-extension to 1.5.16, and jpmml-sklearn to 1.6.30). However, I am now encountering an issue with deserialization when trying to use these updated packages.

The specific error message I am receiving is the following:

"Caused by: net.razorvine.pickle.PickleException: expected zero arguments for construction of ClassDict (for numpy.dtype). This happens when an unsupported/unregistered class is being unpickled that requires construction arguments. Fix it by registering a custom IObjectConstructor for this class."

I was wondering if anyone else has encountered a similar issue and if there is a solution to this problem - using different versions of JPMML, or saving the model differently etc. Any advice would be greatly appreciated.

Thanks,
Leonid


Villu Ruusmann

unread,
Jan 17, 2023, 2:49:37 AM1/17/23
to Java PMML API
Hi Leonid,

> I also upgraded jpmml packages (pmml-evaluator/pmml-model
> to 1.6.4, pmml-evaluator-extension to 1.5.16, and jpmml-sklearn
> to 1.6.30).
>

Are you trying to combine these libraries into one Java application,
or two different Java applications (eg. one for producing PMML XML
files, the other consuming them)? I believe it must be the latter
case, because JPMML-Evaluator 1.6.X and JPMML-SkLearn 1.6.X don't
belong together, there would be a JVM-level class loading/class
mis-definition conflict.

Some background. The "root" of JPMML library system is the JPMML-Model
library. The two main "application branches" that inherit from it are
the following:
1) Evaluator branch: JPMML-Model -> JPMML-Evaluator -> JPMML-Transpiler
2) Converter branch: JPMML-Model -> JPMML-Converter -> JPMML-Python ->
JPMML-SkLearn

In your setup, the evaluator branch assumes JPMML-Model 1.6.X, but the
converter branch is assuming JPMML-Model 1.5.X. There's a major
source/binary incompatibility around the representation of field names
(org.dmg.pmml.FieldName vs. java.lang.String).

You should be upgrading JPMML-SkLearn also to 1.7.X, preferably to the
latest 1.7.22 version.

>
> I'm working on upgrading our python environment to python 3.10, scikit-learn 1.2.0, and numpy 1.23.
>

The JPMML-SkLearn 1.6.30 version dates back to September 2021 (~2.5
years old). It is therefore much-much older than Numpy 1.23 (maybe
~0.5 years old?), and cannot possibly know about all numpy data type
innovations that have happened.

> The specific error message I am receiving is the following:
>
> "Caused by: net.razorvine.pickle.PickleException: expected
> zero arguments for construction of ClassDict (for numpy.dtype).
>

Should be solved in JPMML-SkLearn 1.7.22.

You can get the example executable JAR file from here:
https://github.com/jpmml/jpmml-sklearn/releases/tag/1.7.22

Can you convert your PKL file using this command-line application? If
yes, proceed to upgrading your Java application library stack. If not,
please give me more information about this Python class, and I can
implement a new Python-to-Java binding for it.


Villu

Leonid Bakaleynik

unread,
Jan 17, 2023, 3:31:36 AM1/17/23
to Java PMML API
Hi Villu, thanks for a quick response!


> Are you trying to combine these libraries into one Java application,
or two different Java applications 

I need to combine them in one application. 
I've tried using 1.7.22, but sbt failed to use it, under coursier cache repo, repository/maven-public/org/jpmml/jpmml-sklearn/1.7.22 I see those files:
     Jan 17 10:19 .
     Jan 17 10:20 ..
     Jan 17 10:19 .jpmml-sklearn-1.7.22.jar.error
     Jan 17 10:19 .jpmml-sklearn-1.7.22.jar.sha1.error
     Jan 17 10:19 .jpmml-sklearn-1.7.22.pom.checked
     Jan 17 10:19 .jpmml-sklearn-1.7.22.pom.sha1.checked
     Jan 17 10:19 .jpmml-sklearn-1.7.22.pom__sha1.computed
     Jan 9 13:13 jpmml-sklearn-1.7.22.pom
     Jan 9 13:13 jpmml-sklearn-1.7.22.pom.sha1

I've tried several other 1.7.* versions, and got the same error.
So it appears like a SHA1 mismatch error. Version 1.6  is downloaded correctly


> Can you convert your PKL file using this command-line application?

Yes, it works!

So I'd be fine if I could use the 1.7.22. Are you familiar with such an issue with 1.7?

Thanks

Villu Ruusmann

unread,
Jan 17, 2023, 4:14:08 AM1/17/23
to Java PMML API
Hi Leonid,

> > Are you trying to combine these libraries into one Java application,
> or two different Java applications
>
> I need to combine them in one application.
>

You should always try combining "latest evaluator version" with the
"latest converter version" first

I'm doing my best to ensure that this principle works. For example,
every time when I do a new JPMML-Model version, I propagate it to the
end of both evaluator and converter branches in two-three days max.

Forgot to mention it earlier, but the
'org.jpmml:pmml-evaluator-extension' module was merged into the main
`org.jpmml:pmml-evaluator` module during the JPMML-Evaluator 1.5.X ->
1.6.X upgrade. So, you should be deleting the standalone
'org.jpmml:pmml-evaluator-extension:1.5.15' module from your
application classpath.

Eff me, this stuff should be documented somewhere.

>
> I've tried several other 1.7.* versions, and got the same error.
> So it appears like a SHA1 mismatch error. Version 1.6 is downloaded correctly
>

When making JPMML library releases, then both Apache Maven and the
initial upload repository (Sonatype OSS) run a comprehensive set of
checks to ensure that uploaded files are correct.

What's happening here is something different.

During the JPMML-SkLearn 1.6.X -> 1.7.X upgrade the project was modularized.

In 1.6.X, there is a single module with 'org.jpmml:jpmml-sklearn'
coordinates (note the letter "j" in the beginning of the artifactId).

In 1.7.X, there are multiple modules; the main module (provides
Scikit-Learn and SkLearn2PMML package compatibility) is now called
'org.jpmml:pmml-sklearn' (note that there is no "j" prefix in the
beginning of the artifactId!). If you want to add support for 3rd
party Python packages, you must include additional
'org.jpmml:pmml-sklearn-<package>' dependencies.

For example, if you want to support XGBoost-via-SkLearn uploads, then
you would also need to add the 'org.jpmml:pmml-sklearn-xgboost'
dependency to your application classpath.

The SkLearn2PMML package provides an example how to include everything:
https://github.com/jpmml/sklearn2pmml/blob/0.90.2/pom.xml#L35-L106

TLDR: In your SBT configuration, replace
"org.jpmml:jpmml-sklearn:1.6.30" with "org.jpmml:pmml-sklearn:1.7.22"
(plus whatever extensions your application may need), and everything
should build and run nicely.


Villu

Leonid Bakaleynik

unread,
Jan 19, 2023, 8:14:30 AM1/19/23
to Java PMML API
Hi Villu,

So now I can load the model successfully!
I needed to add this line before unpickling:
SkLearnEncoder encoder = new SkLearnEncoder();
Like here, which is confusing.

Anyway, now our flow mostly works, I have an issue with saving 2 models, ScikitBaggingRegressor/Classifier, and AdaBoost
We load estimators with 
PickleUtil.unpickle(storage)

and then encode it with encodePMML(skLearnEncoder). encodePMML fails with this exception:
Caused by: java.lang.IllegalArgumentException: Attribute 'sklearn.ensemble._bagging.BaggingClassifier.base_estimator_' not set
    at org.jpmml.python.PythonObject.get(PythonObject.java:82)
    at sklearn.ensemble.EnsembleClassifier.getBaseEstimator(EnsembleClassifier.java:50)
    at sklearn.ensemble.EnsembleClassifier.getOpType(EnsembleClassifier.java:37)
    at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:163)

When I load the same model in python, base_estimator_ field is there. It's deprecated, and scikit suggests to use the estimator_ field instead (which is also present), however estimator_ field is also missing when I unpickle the model by JPMML.

I've also tried with the example executable JAR, and got the same error:
SEVERE: Failed to convert PKL to PMML
java.lang.IllegalArgumentException: Attribute 'sklearn.ensemble._bagging.BaggingClassifier.base_estimator_' not set
    at org.jpmml.python.PythonObject.get(PythonObject.java:82)
    at sklearn.ensemble.EnsembleClassifier.getBaseEstimator(EnsembleClassifier.java:50)
    at sklearn.ensemble.EnsembleClassifier.getOpType(EnsembleClassifier.java:37)
    at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:163)
    at org.jpmml.sklearn.example.Main.run(Main.java:226)
    at org.jpmml.sklearn.example.Main.main(Main.java:151)


Villu Ruusmann

unread,
Jan 19, 2023, 8:33:29 AM1/19/23
to Java PMML API
Hi Leonid,

> I needed to add this line before unpickling:
> SkLearnEncoder encoder = new SkLearnEncoder();
>

You would only need to reference the org.jpmml.sklearn.SkLearnEncoder
class, not instantiate it (but there are no negative side-effects to
doing so).

What happens here is that the SkLearnEncoder class has a static
initializer block, which takes care of registering SkLearn-to-Java
class mappings:
https://github.com/jpmml/jpmml-sklearn/blob/1.7.22/pmml-sklearn/src/main/java/org/jpmml/sklearn/SkLearnEncoder.java#L323-L332

After that, SkLearnEncoder invokes PythonEncoder, which then registers
low-level Python-to-Java class mappings:
https://github.com/jpmml/jpmml-python/blob/1.1.11/pmml-python/src/main/java/org/jpmml/python/PythonEncoder.java#L26-L30

If you additionally want to import/convert StatsModels package models,
then there's one more static initializer:
https://github.com/jpmml/jpmml-sklearn/blob/1.7.22/pmml-sklearn-statsmodels/src/main/java/sklearn2pmml/statsmodels/StatsModelsUtil.java#L58-L63

That should be all. I have an internal TODO entry about figuring out a
mechanism for auto-discovering/auto-initializing such class mappings
(probably, to be implemented using Java's standard service loader
mechanism).

> Anyway, now our flow mostly works, I have an issue with saving 2 models, ScikitBaggingRegressor/Classifier, and AdaBoost
> We load estimators with
> PickleUtil.unpickle(storage)
>

Strange, I'll look into this later today, and if there's something
wrong on my end, will do an JPMML-SkLearn library update later this
week.

In your first e-mail you stated that you're currently targeting
Scikit-Learn 1.2.0. I did re-run my integration tests with the same
version not long ago, and I didn't encounter any problems related to
missing/renamed attributes:
https://github.com/jpmml/jpmml-sklearn/commit/d2ab66d041c4ea952340d58a8036c71c5c923c65

This test resources update was performed on 10th of December, 2022,
which means that JPMML-SkLearn versions 1.7.18 and newer should be
SkLearn 1.2.0-compatible.

But like I said, I'll double check BaggingClassifier/Regressor and
AdaBoostRegressor cases.


Villu

Villu Ruusmann

unread,
Jan 22, 2023, 2:00:05 AM1/22/23
to Java PMML API
Hi Leonid,

>
> I have an issue with saving 2 models,
> Scikit BaggingRegressor/Classifier, and AdaBoost
>

I have prepared and released JPMML-SkLearn 1.7.23, which knows about
both "BaseEnsemble.base_estimator_" (SkLeanr 1.1 and older) and
"BaseEnsemble._estimator" attributes (SkLearn 1.2+).

This issue manifested itself when the pipeline contained only a single
estimator step:
# Fails
pipeline = PMMLPipeline([
("estimator", AdaBoostRegressor(...))
])

The bad code path was not taken when the ensemble estimator step was
preceded by some other step, such as a data pre-processor:
# Succeeds
pipeline = PMMLPipeline([
("transformer", DataFrameMapper(...)),
("estimator", AdaBoostRegressor())
])

Thank you for bringing this issue to my attention!


Villu

Leonid Bakaleynik

unread,
Jan 22, 2023, 6:07:47 AM1/22/23
to Java PMML API
Hi Villu,

Thanks for the quick fix!
AdaBoostRegressor works for me now, and ScikitLearnBagging classifier works too.

However SciKitLearnBagging regression still fails. I've verified with the example executable JAR:

SEVERE: Failed to convert PKL to PMML
java.lang.IllegalArgumentException: Attribute 'sklearn.ensemble._bagging.BaggingRegressor.base_estimator_' not set
        at org.jpmml.python.PythonObject.get(PythonObject.java:82)
        at sklearn.ensemble.EnsembleRegressor.getBaseEstimator(EnsembleRegressor.java:50)
        at sklearn.ensemble.EnsembleRegressor.getOpType(EnsembleRegressor.java:37)

        at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:163)
        at org.jpmml.sklearn.example.Main.run(Main.java:226)
        at org.jpmml.sklearn.example.Main.main(Main.java:151)

Can you have look?

Thanks,
Leonid

Villu Ruusmann

unread,
Jan 22, 2023, 1:26:53 PM1/22/23
to Java PMML API
Hi Leonid,

> I've verified with the example executable JAR:
>

There's also a new 1.7.23 command-line executable available, did you
switch to it?

> java.lang.IllegalArgumentException: Attribute 'sklearn.ensemble._bagging.BaggingRegressor.base_estimator_' not set
> at sklearn.ensemble.EnsembleRegressor.getBaseEstimator(EnsembleRegressor.java:50)
>

This exception is raised at EnsembleRegressor.java:50, which is
characteristic to the previous 1.7.22 version:
https://github.com/jpmml/jpmml-sklearn/blob/1.7.22/pmml-sklearn/src/main/java/sklearn/ensemble/EnsembleRegressor.java#L50

In 1.7.23, it would be raised at EnsembleRegressor.java:57 (the
default attribute access, after all custom attribute accesses have
failed):
https://github.com/jpmml/jpmml-sklearn/blob/1.7.23/pmml-sklearn/src/main/java/sklearn/ensemble/EnsembleRegressor.java#L57


Villu

Leonid Bakaleynik

unread,
Jan 22, 2023, 2:22:34 PM1/22/23
to Java PMML API
Hi Villu,


> There's also a new 1.7.23 command-line executable available, did you
switch to it?

No, I've used the previous version by mistake. 
I've downloaded the 1.7.23 executable, here is the updated stack trace:


SEVERE: Failed to convert PKL to PMML
java.lang.IllegalArgumentException: Attribute 'sklearn.ensemble._bagging.BaggingRegressor._estimator' not set
        at org.jpmml.python.PythonObject.get(PythonObject.java:82)
        at sklearn.ensemble.EnsembleRegressor.getEstimator(EnsembleRegressor.java:57)
        at sklearn.ensemble.EnsembleRegressor.getOpType(EnsembleRegressor.java:37)
        at sklearn.ensemble.EnsembleRegressor.getOpType(EnsembleRegressor.java:39)

        at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:163)
        at org.jpmml.sklearn.example.Main.run(Main.java:226)
        at org.jpmml.sklearn.example.Main.main(Main.java:151)

Thanks,
Leonid

Reply all
Reply to author
Forward
0 new messages