Tensorflow inference speed compared to caffe

msu...@yoobic.com

unread,

Jul 18, 2017, 7:00:41 AM7/18/17

to Discuss

Hi,

I noticed that Tensorflow is much slower than caffe.

I have a faster rcnn model with a VGG16 features extractor with caffe. This model is based on this repository : https://github.com/rbgirshick/py-faster-rcnn/ , input images are 1400x2000.
I have another network, made with tensorflow, based on this google blog https://cloud.google.com/blog/big-data/2017/06/training-an-object-detector-using-cloud-machine-learning-engine which is a faster rcnn with a resnet101 features extractor with input images size 600x1000.

Based on the figure 7 from this article https://arxiv.org/abs/1611.10012, with the same configuration, faster-rcnn with resnet101 should be faster than faster-rcnn with VGG16. In my case images for the caffe model are ~ 4 times bigger so the Tensorflow faster rcnn with resnet101 should infere much faster that the caffe network. But I can’t undestand why my inference time are both ~2secs

Is it normal that Tensorflow is so slow ?

Details

I am running inferences on Google compute server.
OS : Ubuntu 1604
GPU : Tesla K80
NVIDIA driver : Driver Version: 375.66
CPU : 8 CPU Intel(R) Xeon(R) CPU @ 2.60GHz

Code used for tensorflow inference (the code is based on the notebook found in the tensorflow/models repo):

model_ckpt = "output_inference_graph.pb"
detection_graph = tf.Graph()
with detection_graph.as_default():
    od_graph_def = tf.GraphDef()
    with tf.gfile.GFile(model_ckpt, 'rb') as fid:
        serialized_graph = fid.read()
        od_graph_def.ParseFromString(serialized_graph)
        tf.import_graph_def(od_graph_def, name='')
config = tf.ConfigProto(allow_soft_placement = True)
with detection_graph.as_default():
    with tf.Session(graph=detection_graph, config=config) as sess:
        image = Image.open(image_path)
        (im_width, im_height) = image.size
        image_np = np.array(image.getdata()).reshape((im_height, im_width, 3)).astype(np.uint8)
        # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
        image_np_expanded = np.expand_dims(image_np, axis=0)
        image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
        # Each box represents a part of the image where a particular object was detected.
        boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
        # Each score represent how level of confidence for each of the objects.
        # Score is shown on the result image, together with the class label.
        scores = detection_graph.get_tensor_by_name('detection_scores:0')
        classes = detection_graph.get_tensor_by_name('detection_classes:0')
        num_detections = detection_graph.get_tensor_by_name('num_detections:0')
        # Actual detection.
        (boxes, scores, classes, num_detections) = sess.run(
        [boxes, scores, classes, num_detections],
        feed_dict={image_tensor: image_np_expanded})

After timing operations, the `sess.run` is taking ~2s (i don't considere the reading graph time which is also ~2s).

Zan Huang

unread,

Jul 18, 2017, 12:03:36 PM7/18/17

to msu...@yoobic.com, Discuss

Yes it is. Tensorflow is slightly slower than library's such as theano or caffe. It's more powerful in the sense that you can visualize easily. Another bonus is the data flow computational graph it uses.

--
You received this message because you are subscribed to the Google Groups "Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss+unsubscribe@tensorflow.org.
To post to this group, send email to dis...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/discuss/a3b78be5-9185-4a9a-bcba-2a855f742fb2%40tensorflow.org.

Sebastian Raschka

unread,

Jul 18, 2017, 12:38:40 PM7/18/17

to Zan Huang, msu...@yoobic.com, TensorFlow Mailinglist

Hm, I think there are a couple of things to consider. First, are you using Tf 1.2? I think the changes after >=1.0 have improved the performance quite a bit. Secondly, I think that it may also feel a bit slow running this example because there could be an overhead launching the session.

It would be interesting to set up a for-loop in the session and measure the time between iterations after the initial iteration.

Third, and I am not sure if that makes a difference, but you have a lot of separate calls in the session; you could try
instead of calling `get_tensor_by_name` several times, you could do e.g.,

sess.run(
['detection_boxes:0', 'detection_scores:0', 'detection_classes:0', 'num_detections:0'],
feed_dict={image_tensor: image_np_expanded})

directly to see if that improves the speed during inference.

Lastly, maybe ` image = Image.open(image_path)` is another bottle neck. You could try TensorFlow's own image reading/processing utils: https://www.tensorflow.org/api_guides/python/image

Best,
Sebastian

> To unsubscribe from this group and stop receiving emails from it, send an email to discuss+u...@tensorflow.org.

> To post to this group, send email to dis...@tensorflow.org.
> To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/discuss/a3b78be5-9185-4a9a-bcba-2a855f742fb2%40tensorflow.org.
>

> --
> You received this message because you are subscribed to the Google Groups "Discuss" group.

> To unsubscribe from this group and stop receiving emails from it, send an email to discuss+u...@tensorflow.org.

> To post to this group, send email to dis...@tensorflow.org.

> To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/discuss/CAKakgFfHkL5kJ2YV_n3Q8ypqWXrtEEDKHWp9x6PHOQfTcq%2B%3D9w%40mail.gmail.com.

msu...@yoobic.com

unread,

Jul 19, 2017, 5:20:04 AM7/19/17

to Discuss, msu...@yoobic.com

Sure i wrote some benchmark saying that tensorflow is slower than caffe but is it so slower as that ?

I have the same inference time for caffe and tensorflow with caffe predicting a 4 times bigger image.

To unsubscribe from this group and stop receiving emails from it, send an email to discuss+u...@tensorflow.org.

msu...@yoobic.com

unread,

Jul 19, 2017, 6:15:27 AM7/19/17

to Discuss, www.za...@gmail.com, msu...@yoobic.com

Point 1 : i am using Tensorflow version 1.2.1

Point 2 : I was not totally explicit in the first post, i made a for loop to realise multiple inference and timed each inference. Here are the results

Reading graph : 1.8 sec

Sess.run in first iteration of for loop : 3.5 s

Sess.run in other iterations of for loop : ~2sec ( I was mentioning that speed in the 1st post)

Point 3 : I tried not to use get_tensor_by_name as you suggested it didnt help.

Last point : I am not consediring reading image timing i just time the sess.run function

Martin Wicke

unread,

Jul 31, 2017, 12:08:36 PM7/31/17

to msu...@yoobic.com, Discuss, Zan Huang

Two things:

Feeding things is slow, it can often involve extra memory copies. Try putting the image reading and processing into the graph itself. That will mess with the comparison since then that'll be part of the run call also.

Are you using the CPU or GPU for this? Have you checked placement (say in tensorboard)?

Martin

To unsubscribe from this group and stop receiving emails from it, send an email to discuss+unsubscribe@tensorflow.org.

To post to this group, send email to dis...@tensorflow.org.

To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/discuss/113924c0-4196-473f-8950-b73009e21d21%40tensorflow.org.

Reply all

Reply to author

Forward