Deserializing ML Learning models in JNBK - Model Visualization

11 views
Skip to first unread message

Luis Rivera

unread,
Jan 16, 2023, 2:00:48 PM1/16/23
to OpendTect Users

Dear Opendtect community,

I've understood that some ML models can be represented either by strings or graphs/charts in general by using their methods or other class methods. I'm trying to de-serialize a Scikit-Learn model for further visualization, but either I'm doing it wrong or I ran into a bug.

I noted that Scikit-Learn training process (mine is a simply Ensemble, XGBoost model) finishes by serializing the model in a JSON file and producing another h5 file that I presume contains the training set. In this context, I'm trying to persist the model by de-serializing it using the sklearn_json library and ran into the error shown in Figure 1.

The model works fine and I can even print the content of the JSON file in Jupyter Notebook (Figure 2). The model persists in Opendtect, I believe it de-serialize it well. Perhaps am I de-serializing it wrong? or could it be a bug?

Thanks for your guidance!

Luis.

Figure 1 - Model deserialization error.png
Figure 2 - Reading the model.png

Arnaud Huck

unread,
Jan 17, 2023, 3:27:39 AM1/17/23
to us...@opendtect.org, Luis Rivera

Dear Luis,

The python package dgbpy, installed with the OpendTect Machine Learning plugin, does not make use of the python package sklearn-json. All sklearn models trained with that plugin are saved in onnx format, which we have chosen for its interoperability. You thus should not try to read a model trained in OpendTect with the python package sklearn-json, it won't work.

The sklearn platform is however complemented by the xgboost library, which will be used by the OpendTect Machine Learning plugin if it present in the current python environment. This library has its own mechanism for saving and reading models: It saves the models using two json formats: .json and .ubj, respectively standard text-based JSON and binary JSON format. The OpendTect Machine Learning plugin is exclusively using the 'standard' json format when saving any xgboost model. Please refer to dgbpy.dgbscikit.load( modelfnm ) for an example on how to read sklearn and xgboost models saved by the OpendTect Machine Learning plugin.

Best regards,

Arnaud Huck, MSc
Chief Technical Officer
dGB Earth Sciences

dGB Earth Sciences
Phone: +31 53 43 15 155
E-mail: arnau...@dgbes.com
Internet: dgbes.com

--
You received this message because you are subscribed to the Google Groups "OpendTect Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to users+un...@opendtect.org.
To view this discussion on the web visit https://groups.google.com/a/opendtect.org/d/msgid/users/8aab08b0-2059-4efd-8849-a52f09495cb2n%40opendtect.org.

Luis Rivera

unread,
Jan 17, 2023, 9:33:37 AM1/17/23
to OpendTect Users, arnaud.huck, Luis Rivera
Dear Arnaud,

Thank you for your feedback. I managed to deserialize the xgboost model without any hassle.

Best regards,
Luis.
Reply all
Reply to author
Forward
0 new messages