ML as a build step?

thertweck

unread,

Feb 28, 2017, 1:37:32 PM2/28/17

to bazel-discuss

Hi all!

I'm trying to setup a Bazel build that involves machine learning. What is the best practice for doing that?

E.g. an app could depend on a (pre-)trained model for providing some predictions at runtime. It would make sense to train the model using a genrule, based on a dataset which might be provided as a filegroup.

I'm a bit stuck here with the following issues:

- The dataset is fairly big. I don't want to add it to version control.

- new_local_repository breaks when the dataset is moved to another location or when it happens to be somewhere else on another machine.

- For local development the whole dataset is not necessary, only a small part would be enough. It would be great if it was possible to switch to a sample dataset easily.

I'd be happy about any hint on how I could get that working.

Cheers,

Tim

Warren Turkal

unread,

Mar 2, 2017, 1:29:37 PM3/2/17

to bazel-discuss

When you say data set, do you mean the training data or the data generated from training? Personally, I would think that the former definitely should be in the repo somewhere, and the latter should be in the repo if it takes a long time to regenerate and must be done often (engineering decisions of course).

Also, why not put the data in a repo? You don't necessarily have to put it in a raw git repo. Check out something like https://git-lfs.github.com/.

As for changing to a different dataset, could you provide a dataset location as a command line arg? Just default to the internal one unless that arg is given? Or do you mean that you want to be able to build with the different dataset?

wt

Steren Giannini

unread,

Mar 3, 2017, 4:24:29 AM3/3/17

to Warren Turkal, bazel-discuss

Hi,

Your use case seems very valid, but is not the main focus of Bazel. Yes, once the model is generated from a training phase you do want to have it defined as a dependency of your service app, but I am not sure of using Bazel for the training phase itself.

Maybe check what SyntaxNet is doing? https://github.com/tensorflow/models/tree/master/syntaxnet

Also, if you are looking for a solution to help with a pre-processing step, check tf.Transform

--
You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-discuss/e213ccc1-ba3d-417f-b4f1-ba4082693233%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward