Question about keras inference latency measurement

443 views
Skip to first unread message

Deepa Mohan

unread,
Feb 24, 2021, 6:17:47 PM2/24/21
to Keras-users
Hello

Hoping to get some resolution/help on this. oobservation

Am measuring model prediction/evaluation time on a GPU for a keras (tf 2.2) trained model. I see a lot of difference in time measured when I use model.evaluate() vs callback vs python time. Could you help explain why there is this discrepency? Help me forward this query to who can help?

The following INFO reports time for every query( set of 9 test queries) using model.evaluate()

1/1 [==============================] - 0s 386us/step - loss: 0.0000e+00 - acc: 0.0000e+00

INFO:xtremedistil:Count of instances with label 0.0 is 32

1/1 [==============================] - 0s 249us/step - loss: 0.0000e+00 - acc: 0.0000e+00

INFO:xtremedistil:Count of instances with label 0.0 is 32

1/1 [==============================] - 0s 275us/step - loss: 0.0000e+00 - acc: 0.0000e+00

INFO:xtremedistil:Count of instances with label 0.0 is 32

1/1 [==============================] - 0s 280us/step - loss: 0.0000e+00 - acc: 0.0000e+00

INFO:xtremedistil:Count of instances with label 0.0 is 32

1/1 [==============================] - 0s 250us/step - loss: 0.0000e+00 - acc: 0.0000e+00

INFO:xtremedistil:Count of instances with label 0.0 is 32

1/1 [==============================] - 0s 249us/step - loss: 0.0000e+00 - acc: 0.0000e+00

INFO:xtremedistil:Count of instances with label 0.0 is 32

1/1 [==============================] - 0s 252us/step - loss: 0.0000e+00 - acc: 0.0000e+00

INFO:xtremedistil:Count of instances with label 0.0 is 32

1/1 [==============================] - 0s 255us/step - loss: 0.0000e+00 - acc: 0.0000e+00

INFO:xtremedistil:Count of instances with label 0.0 is 32

1/1 [==============================] - 0s 260us/step - loss: 0.0000e+00 - acc: 0.0000e+00


Following shows python time() difference around model.evaluate()


python time [2.9290642738342285, 0.07118463516235352, 0.15517759323120117, 0.07076430320739746, 0.06884241104125977, 0.06838345527648926, 0.07424783706665039, 0.06981348991394043, 0.06886076927185059]


Following shows callback.logs that I wrote a custom callback function and passed it to model.evaluate()

callback time [2.894454002380371, 0.039789676666259766, 0.12280726432800293, 0.03963470458984375, 0.0384669303894043, 0.03844141960144043, 0.04326891899108887, 0.039290428161621094, 0.03845667839050293]


Like you can see the time reported is in micro seconds in the first case, but is larger in the second and third case.

Some details on the checkpoint - its in .h5 format

I load the model using model.load_weights and then use model.evaluate()

with strategy.scope():

                

                model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=3e-5, epsilon=1e-08), loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), me

trics=[tf.keras.metrics.SparseCategoricalAccuracy(name="acc")])

                logger.info (model.summary())

                model.load_weights(os.path.join(args["model_dir"], "x.h5"))



#Python time measure

start = time()                        

Y = model.evaluate(X, batch_size=1, callbacks=[cb])

y = np.argmax(Y, axis=-1)

end = time()


#Callback function

from time import time

class TimingCallback(tf.keras.callbacks.Callback):

  def __init__(self):

    self.logs=[]

  def on_test_begin(self, logs={}):

    self.starttime=time()

  def on_test_end(self, logs={}):

    self.logs.append(time()-self.starttime)


Any help appreciated. Thanks

Deepa

Lance Norskog

unread,
Feb 24, 2021, 9:39:12 PM2/24/21
to Deepa Mohan, Keras-users
Two suggestions:
1) call model.evaluate() after loading the model weights. There is a certain "warm-up" time to your computer, memory,&GPU.
2) add a separate time check for np.argmax. This requires scanning memory. Call it twice, one after the other, and measure both times.

Cheers,

Lance

--
You received this message because you are subscribed to the Google Groups "Keras-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to keras-users...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/keras-users/22c11962-b0e8-45bc-888e-aa5971cf92d0n%40googlegroups.com.


--
Lance Norskog
lance....@gmail.com
Redwood City, CA

Deepa Mohan

unread,
Feb 26, 2021, 4:23:48 PM2/26/21
to Keras-users
Hi Lance
Thanks for your reply. Yes I do have it in this sequence
1.model.compile
2.model.load_weights
3. model.evaluate()

I did separate np.argmax, didn't make much difference.

Lance Norskog

unread,
Feb 28, 2021, 9:37:30 PM2/28/21
to Deepa Mohan, Keras-users
No, I meant:
load weights
start timer
model.evaluate()
end timer
start timer
model.evaluate()
end timer
start timer
model.evaluate()
end timer

This "warms up" the all of the data in memory. You should find the second and third a little faster than the first.

What you are trying to do is called 'micro-benchmarking', and it is really hard to do well. You need a deep understanding of software and hardware interactions, and how all of the pieces fit together, and all of the performance tricks at play in each subsystem.

Cheers,

Lance

Reply all
Reply to author
Forward
0 new messages