How to do multi-device graph placement in TensorFlow serving?

Nodir Kodirov

unread,

Jan 31, 2018, 8:17:32 PM1/31/18

to Discuss

Hi,

I was wondering if anyone tried to serve the model by partitioning it to different devices. Looking through GitHub issues and serving tutorials, community seems to suggests that the best way is to run the whole model on the same device (without model splitting). To quote from an example GitHub issue "Probably your best solution is to build a script which loads your graph once per GPU".

However, I'd guess model splitting to provide higher inference throughput. Consider an analogy in model training. ICML'17 paper shows that splitting Inception-V3 model to 1 CPU and 4GPUs achieves 19% faster training time compared to the single-GPU setup (see table 2). Figure 5 shows (see attached) how graph nodes are split between CPU and 4 GPUs. Since both training and serving are data flow systems where faster training (and serving) is the result of tensors taking shorter time to pass through the graph, intuitively, I would expect model splitting to also accelerate serving. Does anyone see why this might not be true?

Since TF serving aims to provide high performance inference, I was wondering if anyone already tried or knows an ongoing work to do model splitting.

Note: my question is about serving not training. I know training already supports graph splitting. I'm adding this clarification as most of the mailing list questions are about the training. Also, there is an old (20 month ago) post about model splitting between multiple GPUs for serving. But it does not describe how splitting can be done in general, it roughly says "if model was trained in N GPUs, serving will also use N GPUs". What if training was done with 2 GPUs and I want to serve with 6 GPUs? General answer is what I am looking for here.

Thanks!

fig.5.png

Jorge Muñoz

unread,

Feb 1, 2018, 6:15:16 PM2/1/18

to Discuss

I think the answer is still the same. It doesn't matter you use the graph for training or serving, you need to split it somehow. Once it uses several GPUs you export it the same way and serving will use the GPUs as you defined in the graph. The only thing that changes between training and serving is the input tensor. For training it will come from a dataset and for serving it will be a placeholder.

One thing you can do is to split the batch in several mini batches, one per GPU. Then you can share the variables of the graph among the GPUs but pass to every GPU a different mini batch as input. Finally, you join all the outputs and return them.

Nodir Kodirov

unread,

Feb 2, 2018, 5:03:01 PM2/2/18

to Discuss

Thanks for the response, Jorge!

It seems you are answering more advanced question than what I am asking. Let me try with code samples. You said "[...] you need to split it somehow", which is precisely my question. How do I split the graph in TF serving?

For training, I use with tf.device() to specify the device I'd like to place the node. This is well-explained in the GPU tutorial as follows:

with tf.device('/cpu:0'):
  a = tf.constant([1.0, 2.0, 3.0], shape=[1, 3], name='a')
  b = tf.constant([1.0, 2.0, 3.0], shape=[3, 1], name='b')

I do not think there is with tf.device() in serving. For serving, tensorflow_model_server consumes already trained and exported model using --model_base_path parameter. It looks as follows in the sample mnist tutorial

tensorflow_model_server --port=9000 --model_name=mnist --model_base_path=/tmp/mnist_model/

I know model is exported to /tmp/mnist_model using mnist_saved_model.py But looking at this source file, it is not clear how I can assign nodes to devices. Can you point what I am missing? Example code where trained mnist model is split between two GPUs and exported (to be fed to serving) would be exactly what I am looking for.

Thanks!

Jorge

unread,

Feb 2, 2018, 9:38:16 PM2/2/18

to Nodir Kodirov, Discuss

When you save the graph the devices assigned to variables are saved with them. So you restore exactly the same graph, included the devices, in serving.

--
You received this message because you are subscribed to a topic in the Google Groups "Discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/a/tensorflow.org/d/topic/discuss/jSIXO6aCv1Y/unsubscribe.
To unsubscribe from this group and all its topics, send an email to discuss+unsubscribe@tensorflow.org.
To post to this group, send email to dis...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/discuss/90f85043-b49a-4cb2-842f-9d8857b9befa%40tensorflow.org.

Reply all

Reply to author

Forward