Tracking predictions with ml flow

Adrian Stern

unread,

May 6, 2021, 10:49:32 PM5/6/21

to mlflow-users

Hello all,

TL;DR - Can you track the inference side of an ml pipeline using ml flow? if so whats the recommended way?

I was thinking of using ml-flow-tracking to track model pipelines. ml-flow looks like a really good fit for tracking the training side, but I also want to track the prediction/inference side. I didn't see any functionality for this built in to ml-flow.

Let's define a "problem" has a single point solution that results in a model, that model will then go into production and then be used it to make batch predictions. Giving that definition, I was thinking of doing it in two ways.

Have a single experiment per "problem" and a tag/parameter that says if the run is a training or a prediction run.
Have two experiments per problem, one for training and one for prediction. Each would get their own series of runs.

Do either of those solutions make sense from ml flow perspective? Is one preferable?

Is there an alternative?

The down side of 1) is that most of the metrics for training would be not be sent for prediction and vice versa. e.g. auc and hyper-parameters wouldn't apply to prediction, and something like mean inference score wouldn't apply to training.
The down side of 2) is that there is no link between training or prediction for a single problem in ml-flow. This makes it hard to compare things between the two, like comparing feature drift between training and prediction.

Little info that might help

For any experiment we will do multiple trainings that result in a best model.
Using that best model we could do one or multiple predictions.
We will then repeat steps 1 and 2 every so often likely due to feature drift.

Thanks!

nadine ben harrath

unread,

May 7, 2021, 1:25:25 AM5/7/21

to mlflow-users

i'm interested in the same issue , how can i do model drift using mlflow

if there's any paper to suggest , it would be grateful

thank you

Jules Damji

unread,

May 7, 2021, 7:46:58 AM5/7/21

to nadine ben harrath, mlflow-users

Here is interesting idea explored using tracking for monitoring.

https://link.medium.com/ugnTO7b13fb

I’ll respond to Adrian’s mail shortly.

Sent from my iPhone

Pardon the dumb thumb typos :)

On May 6, 2021, at 10:25 PM, nadine ben harrath <nadinebe...@gmail.com> wrote:

i'm interested in the same issue , how can i do model drift using mlflow

--
You received this message because you are subscribed to the Google Groups "mlflow-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mlflow-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mlflow-users/bdf9f8e3-0779-49d1-8427-c06d34fa77f1n%40googlegroups.com.

Jules Damji

unread,

May 7, 2021, 11:43:15 AM5/7/21

to Adrian Stern, mlflow-users

TL;DR - Can you track the inference side of an ml pipeline using ml flow? if so whats the recommended way?

Currently, not part of the MLflow yet per se, for instance, model performance monitoring in staging/production. Theoretically, you can with the tracking API if you wanted.

One benefit of MLflow Tracking APIs is that, though intended for tracking experiments, they can be used for other tracking needs: Misusing MLflow To Help Deduplicate Data At Scale

There are other vendors who can monitor data and concept drift or monitor model inference performance. For instance, MLflow models deployed to

Seldon, SageMaker, Alogirthmia, or KFServing captures metrics. This talk at DAIS might of interest for general ML performance monitoring during inference.

Let's define a "problem" has a single point solution that results in a model, that model will then go into production and then be used it to make batch predictions. Giving that definition, I was thinking of doing it in two ways.
Have a single experiment per "problem" and a tag/parameter that says if the run is a training or a prediction run.
Have two experiments per problem, one for training and one for prediction. Each would get their own series of runs.

Both are viable ideas, but the second keeps them separate and distinct.

Does either of those solutions make sense from an MLflow perspective? Is one preferable?
Is there an alternative?
The downside of 1) is that most of the metrics for training would be not be sent for prediction and vice versa. E.g., AUC and hyper-parameters wouldn't apply to prediction, and something like mean inference score wouldn't apply to training.
The downside of 2) is that there is no link between training or prediction for a single problem in ml-flow. This makes it hard to compare things between the two, like comparing feature drift between training and prediction.

Couldn't the same identifiable/searchable tag for a specific run be the link between a run in experiments 1 and 2?

Little info that might help

For any experiment, we will do multiple trainings that result in the best model.

Using that best model we could do one or multiple predictions.
We will then repeat steps 1 and 2 every so often likely due to feature drift.

I think this could be a viable POC. I'm sure others will opine on this.

Cheers

Jules

––

The Best Ideas are Simple

Jules S. Damji

Sr. Developer Advocate

Databricks, Inc.

ju...@databricks.com

(510) 304-7686

--

You received this message because you are subscribed to the Google Groups "mlflow-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mlflow-users...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/mlflow-users/37d68734-aa45-49aa-bbbf-7e163624da35n%40googlegroups.com.

Jules Damji

unread,

May 8, 2021, 9:10:35 PM5/8/21

to nadine ben harrath, mlflow-users

I forgot to include this blog in the previous answer:

https://databricks.com/blog/2019/09/18/productionizing-machine-learning-from-deployment-to-drift-detection.html

Sent from my iPhone

Pardon the dumb thumb typos :)

On May 7, 2021, at 4:46 AM, Jules Damji <ju...@databricks.com> wrote:

Here is interesting idea explored using tracking for monitoring.

Reply all

Reply to author

Forward