FunctionTransformer in Sklearn2PMML

650 views
Skip to first unread message

Andrew Orso

unread,
May 12, 2017, 12:09:35 PM5/12/17
to Java PMML API
Hi Villu,

I'm trying to use a function I am defining inside of a PMMLPipeline using the Sklearn FunctionTransformer preprocessor as follows:

def equality_column(X):
equality_col = X[:,0] == X[:,1]
equality_col = equality_col.astype(int)
X = np.append(X,equality_col[:,np.newaxis],1)
return(X)

from sklearn2pmml import PMMLPipeline
from sklearn.tree import DecisionTreeClassifier
from sklearn2pmml import sklearn2pmml

iris_pipeline = PMMLPipeline([
("mapper", DataFrameMapper([
(['Sepal.Length', 'Sepal.Width', 'Petal.Length', 'Petal.Width'], FunctionTransformer(equality_column)),
])),
("classifier", LogisticRegression())
])
iris_pipeline.fit(iris_df[iris_df.columns.difference(["Species"])], iris_df["Species"])

sklearn2pmml(iris_pipeline, "LogisticRegressionIris.pmml", with_repr = True)

The transformer should just take in a dataset, check equality on the first and second column, and append a new column that is 1 if equal, 0 otherwise. It fails on the sklearn2pmml line, though everything up until that point seems to work and I can even run iris_pipeline.predict(iris_df). I have seen the following issue on github https://github.com/jpmml/sklearn2pmml/issues/11, is it still true that sklearn2pmml only supports a limited list of ufuncs and if so, do you have any plans to expand that in the future?

Thanks,

Andrew

Villu Ruusmann

unread,
May 12, 2017, 3:47:55 PM5/12/17
to Java PMML API
Hi Andrew,

>
> I'm trying to use a function I am defining inside of a PMMLPipeline
> using the Sklearn FunctionTransformer preprocessor:
>

Class FunctionTransformer takes a callable as the "func" argument. The
problem is that this callable must be "persistable" when the
FunctionTransformer is dumped into a Pickle file; this condition is
met with Numpy ufuncs (persisted as a function name), but not with
arbitrary user-defined Python functions.

Give it a try - if you persist your pipeline, and restore it in clean
environment, then it should fail to execute (eg
pipeline#transform(..)), because the function "equality_column(X)" is
undefined.

Moreover, JPMML-SkLearn/SkLearn2PMML do not translate the "script
body" of Numpy ufuncs. It is simply assumed that when a function name
"numpy.log" is seen, then <Apply function="ln">..</Apply> should be
emitted.

>
> The transformer should just take in a dataset, check equality
> on the first and second column, and append a new column
> that is 1 if equal, 0 otherwise.
>

At the moment, your best option is to develop a special-purpose
Transformer class (as opposed to general-purpose, and make it known to
the SkLearn2PMML package.

I have provided the SkLearn2PMML-Plugin project that demonstrates how it's done:
https://github.com/jpmml/sklearn2pmml-plugin

Another option would be to develop some sort of ExpressionTransformer
class, which keeps the mathematical expression around in string
representation (so that it can be pickled and unpickled on Python
side, and parsed on Java/PMML side). For example:
DataFrameMapper([
(["Sepal.Length", "Sepal.Width"], ExpressionTransformer("X[0] / X[1]"))
])


VR
Reply all
Reply to author
Forward
0 new messages