MLFlow Tracking server scalability

108 views
Skip to first unread message

Daan Gerits

unread,
Jun 22, 2018, 4:38:07 AM6/22/18
to mlflow-users
Hey Guys,

First of all, great job with this effort. It's certainly something a lot of people are waiting for (or have tried to create themselves).

I was wondering about the scalability of the tracking server. I see in the code there is an abstraction of for the tracking Store, which is currently a FileStore if I'm correct. What are the plans to support other stores for this (ElasticSearch, Kafka, S3, ...?)

Cheers,
D.

Matei Zaharia

unread,
Jun 23, 2018, 7:06:44 PM6/23/18
to daan....@gmail.com, mlflow...@googlegroups.com
Hi Daan,

We do intend to add other ones. There are actually two elements here,
the metadata store and the artifact store (which can contain large
files uploaded by the job). For the metadata part we'll probably add a
database option, and for the artifacts we'll support cloud storage
systems.
> --
> You received this message because you are subscribed to the Google Groups "mlflow-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to mlflow-users...@googlegroups.com.
> To post to this group, send email to mlflow...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/mlflow-users/bd04aa5c-3197-4000-9acc-a1857783fcf2%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Steve Casey

unread,
Oct 22, 2018, 4:02:46 PM10/22/18
to mlflow-users
Hi,

Is there issues in github for these two stores? would be great to understand plans & participate

Thanks!

Matei Zaharia

unread,
Nov 8, 2018, 6:36:23 PM11/8/18
to Steve Casey, mlflow-users
We’ve already received pull requests for a few artifact store backends (Google Cloud Storage, Azure Storage, SFTP, and others). If you’d like to work on another one, or if you’d like to work on a database store for metadata, that would be awesome. There is an open pull request for a DynamoDB metadata store but we’d prefer to use something like SQLAlchemy that can work with a variety of backend databases if possible.

Matei
> To view this discussion on the web visit https://groups.google.com/d/msgid/mlflow-users/13a17d23-1fe1-4bd5-91a5-6f81d68e8178%40googlegroups.com.

Matei Zaharia

unread,
Nov 8, 2018, 6:38:33 PM11/8/18
to Steve Casey, mlflow-users
BTW I’ll also add that the MLflow team at Databricks will probably implement this at some point if we don’t receive an external patch, but it might be a bit further down the line since we also have requests about the UI, model scoring, etc right now. In any case though we’re happy to provide feedback to anyone interested in it. The metadata store has a clearly separated API already and it shouldn’t be a huge amount of work to make a new one, though some care might need to be taken to make sure we can support database migrations, etc.

Matei
Reply all
Reply to author
Forward
0 new messages