Controlling the generated PMML from sklearn2pmml

151 views
Skip to first unread message

Donato Marrazzo

unread,
Jun 23, 2020, 8:41:54 PM6/23/20
to Java PMML API
Hi, I'm a newbie of ML and jPMML.

I generated a PMML from a LogisticRegression.

There are some problems that I'd like to fix:

1. the PMML namespace version, I'd like to get this:

2. the generated model tag lacks of the modelName attribute:
<RegressionModel modelName="approvalRegression" functionName="classification" normalizationMethod="logit">

3. the output field name contains parenthesis that breaks my client (DMN model), so I had to manual tweak after the generated file:
<OutputField name="probability(false)" optype="continuous" dataType="double" feature="probability" value="false"/>
<OutputField name="probability(true)" optype="continuous" dataType="double" feature="probability" value="true"/>

This is how I generate the PMML file with python, please let me know if I improve the result to avoid manual tweaks of it after the generation:

from sklearn2pmml import sklearn2pmml
from sklearn2pmml import make_pmml_pipeline

pipeline = make_pmml_pipeline(
model,
active_fields= ["category", "urgency", "targetPrice", "price"],
target_fields= ["approval"]
)
sklearn2pmml(pipeline, "order-approval.pmml")


Thank you

Villu Ruusmann

unread,
Jun 24, 2020, 3:10:24 AM6/24/20
to Java PMML API
Hi Donato,

>
> 1. the PMML namespace version, I'd like to get this:
>

The PMML namespace URI identifies the PMML schema version that the
conversion tool (in this case, the SkLearn2PMML/JPMML-SkLearn software
stack) adheres to.

The JPMML-SkLearn library (the 1.6.X development branch) is currently
adhering to PMML 4.4. If you forcibly change the PMML namespace URI
from 4.4 to 4.2 (or some other 4.X or 3.X PMML namespace URI), then
you risk invalidating your PMML document.

For PMML version changes you'd need to use a proper PMML translation
tool (updates PMML markup properly, or informs you if it can't be
done).

> 2. the generated model tag lacks of the modelName attribute:
>

This is an optional attribute. It cannot be populated automatically
with a sensible value, so setting it manually after the conversion
seems like the right thing to do.

However, it would be nice if the PMMLPipeline class provided means for
setting it in Python. I've just opened a new GitHub issue about it
here: https://github.com/jpmml/sklearn2pmml/issues/234

> 3. the output field name contains parenthesis that breaks my client (DMN model):
> <OutputField name="probability(false)" optype="continuous" dataType="double" feature="probability" value="false"/>
>

Your client (DMN model) is breaking for no good reason. According to
the PMML specification, PMML field names may contain any character,
including control characters such as parentheses.

The JPMML family of conversion tools uses a convention where field
names are formatted similar to Java/Python/R function invocations. For
example, the field name "probability(false)" should be interpreted as
"invoke the probability function with the 'false' argument". This
convention makes it easy to generate arbitrary complexity field names,
which can be parsed/reformatted later on.

Some other conversion tools (eg. the legacy 'pmml' R package) would
format this field as "probability_false". This style doesn't scale at
all, compare "final_business_decision(probability(false))" vs
"final_business_decision_probability_false" for
readability/parseability.

TLDR: You'd need to update your PMML client application to support
more recent PMML schema versions (PMML 4.2 is 6+ years old) and field
naming conventions. I won't be downgrading/dummyfying the JPMML
software stack.


Villu

Donato Marrazzo

unread,
Jun 24, 2020, 4:47:06 AM6/24/20
to Villu Ruusmann, Java PMML API
Hello Villu,

Thank you very much for the prompt reply.
It makes sense.

About the output names, to be more precise the DMN client actually is able to receive the jpmml result, the challenge is how to handle it with the expression language. Maybe I will find out a workaround!

Thanks

All the best,


Donato Marrazzo

   
Reply all
Reply to author
Forward
0 new messages