PyJNIUS / JPMML issue

16 views
Skip to first unread message

Patrick Hofmann

unread,
Jan 12, 2023, 4:34:43 AM1/12/23
to Java PMML API
Hello,

I'm having an issue evaluating my PMML document using the Pyjnius library. This has historically worked just fine, but has stopped working all of a sudden.

The error comes when I try to evaluate a single list of arguments. I double checked all the datatypes and optypes, and they look correct. Can you please suggest how to debug this?


evaluator.evaluate(arguments)
>> /opt/spotx-miniconda/miniconda/envs/py3-data/lib/python3.7/site-packages/jpmml_evaluator/__init__.py in evaluate(self, arguments) 80 javaArguments = self.backend.dict2map(arguments) 81 javaArguments = self.backend.staticInvoke("org.jpmml.evaluator.EvaluatorUtil", "encodeKeys", javaArguments) ---> 82 javaResults = self.javaEvaluator.evaluate(javaArguments) 83 javaResults = self.backend.staticInvoke("org.jpmml.evaluator.EvaluatorUtil", "decodeAll", javaResults) 84 results = self.backend.map2dict(javaResults) jnius/jnius_export_class.pxi in jnius.JavaMethod.__call__() jnius/jnius_export_class.pxi in jnius.JavaMethod.call_method() jnius/jnius_utils.pxi in jnius.check_exception() JavaException: JVM exception occurred: Categorical value cannot be used in comparison operations

Regards,
Patrick Hofmann

Patrick Hofmann

unread,
Jan 12, 2023, 5:02:06 AM1/12/23
to Java PMML API
It just dawned on me to check if any categorical features in the PMML document have predicate operators other than just "equal", and sure enough, I found a categorical feature with a SimplePredicate operator of "lessOrEqual" - would this cause the JavaException?

Patrick

Villu Ruusmann

unread,
Jan 12, 2023, 5:09:18 AM1/12/23
to Java PMML API
Hi Patrick,

> I'm having an issue evaluating my PMML document
> using the Pyjnius library.
>

You mean the "PyJNIus backend of the JPMML-Evaluator Python package", right.

From what I can see, the raised exception does not indicate any
malfunction in the Python-Java-Python connectivity layer - it's a PMML
document issue (explanation below).

Feel free to replace the PyJNIus backend with another one - JPype or
Py4J. You would still see the same exception message, but it would be
enclosed in a different backend-specific exception class.

> The error comes when I try to evaluate a single list of arguments.
> I double checked all the datatypes and optypes, and they look correct.

The exception message: "Categorical value cannot be used in comparison
operations".

This means that a PMML internal opType sanity check has failed. This
check is typically performed in two places - inside the Apply element
when performing comparison operations, or in decision tree/rule set
models where the business logic involves evaluating lot of Predicate
elements.

EDIT: Looks like you figured it out yourself.

> Can you please suggest how to debug this?
>

No effective way of debugging a "live" PMML engine for more info. It's
a hidden sanity/data consistency check, which is supposed to never
trigger.

You should compare your PMML documents - the older "works fine"
version vs. the newer "doesn't work" version.

Pay attention to differences under the DataDictionary element, around
DataField@opType elements. Some "categorical" value has flipped to
"continuous" there.

Maybe you changed something in your Python data science script
recently? Omitted a DataFrame.astype(str) call somewhere?


VR

Villu Ruusmann

unread,
Jan 12, 2023, 5:52:38 AM1/12/23
to Java PMML API
Hi Patrick,

>
> You should compare your PMML documents - the older "works fine"
> version vs. the newer "doesn't work" version.
>
> Pay attention to differences under the DataDictionary element, around
> DataField@opType elements. Some "categorical" value has flipped to
> "continuous" there.
>

Sorry, it's the other way around - some "continuous" attribute value
has flipped to "categorical" in your PMML document.

This is information gain, not information loss (from the modeling
workflow perspective). Can't happen by accident, there must have been
some ML code change involved.

I maintain my position that such sanity checks are very useful,
because they help catching situations where the human intention and
the actual code don't match - and need syncing! These inconsistencies
go unnoticed by most other ML tools and frameworks (because they don't
have the concept of a run-time operational data type; you'll be lucky
if you have proper data types).


VR

Patrick Hofmann

unread,
Jan 12, 2023, 10:59:36 AM1/12/23
to Java PMML API
Thanks Villu - I discovered indeed that a categorical variable had been inadvertently cast as continuous in the PMML. Issue resolved, thanks for your help!

Patrick

Reply all
Reply to author
Forward
0 new messages