Groups

Question about keras inference latency measurement

443 views

Skip to first unread message

Deepa Mohan

unread,

Feb 24, 2021, 6:17:47 PM2/24/21

to Keras-users

Hello

Hoping to get some resolution/help on this. oobservation

Am measuring model prediction/evaluation time on a GPU for a keras (tf 2.2) trained model. I see a lot of difference in time measured when I use model.evaluate() vs callback vs python time. Could you help explain why there is this discrepency? Help me forward this query to who can help?

The following INFO reports time for every query( set of 9 test queries) using model.evaluate()

1/1 [==============================] - 0s 386us/step - loss: 0.0000e+00 - acc: 0.0000e+00

INFO:xtremedistil:Count of instances with label 0.0 is 32

1/1 [==============================] - 0s 249us/step - loss: 0.0000e+00 - acc: 0.0000e+00

INFO:xtremedistil:Count of instances with label 0.0 is 32

1/1 [==============================] - 0s 275us/step - loss: 0.0000e+00 - acc: 0.0000e+00

INFO:xtremedistil:Count of instances with label 0.0 is 32

1/1 [==============================] - 0s 280us/step - loss: 0.0000e+00 - acc: 0.0000e+00

INFO:xtremedistil:Count of instances with label 0.0 is 32

1/1 [==============================] - 0s 250us/step - loss: 0.0000e+00 - acc: 0.0000e+00

INFO:xtremedistil:Count of instances with label 0.0 is 32

1/1 [==============================] - 0s 249us/step - loss: 0.0000e+00 - acc: 0.0000e+00

INFO:xtremedistil:Count of instances with label 0.0 is 32

1/1 [==============================] - 0s 252us/step - loss: 0.0000e+00 - acc: 0.0000e+00

INFO:xtremedistil:Count of instances with label 0.0 is 32

1/1 [==============================] - 0s 255us/step - loss: 0.0000e+00 - acc: 0.0000e+00

INFO:xtremedistil:Count of instances with label 0.0 is 32

1/1 [==============================] - 0s 260us/step - loss: 0.0000e+00 - acc: 0.0000e+00

Following shows python time() difference around model.evaluate()

python time [2.9290642738342285, 0.07118463516235352, 0.15517759323120117, 0.07076430320739746, 0.06884241104125977, 0.06838345527648926, 0.07424783706665039, 0.06981348991394043, 0.06886076927185059]

Following shows callback.logs that I wrote a custom callback function and passed it to model.evaluate()

callback time [2.894454002380371, 0.039789676666259766, 0.12280726432800293, 0.03963470458984375, 0.0384669303894043, 0.03844141960144043, 0.04326891899108887, 0.039290428161621094, 0.03845667839050293]

Like you can see the time reported is in micro seconds in the first case, but is larger in the second and third case.

Some details on the checkpoint - its in .h5 format

I load the model using model.load_weights and then use model.evaluate()

with strategy.scope():

model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=3e-5, epsilon=1e-08), loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), me

trics=[tf.keras.metrics.SparseCategoricalAccuracy(name="acc")])

logger.info (model.summary())

model.load_weights(os.path.join(args["model_dir"], "x.h5"))

#Python time measure

start = time()

Y = model.evaluate(X, batch_size=1, callbacks=[cb])

y = np.argmax(Y, axis=-1)

end = time()

#Callback function

from time import time

class TimingCallback(tf.keras.callbacks.Callback):

def __init__(self):

self.logs=[]

def on_test_begin(self, logs={}):

self.starttime=time()

def on_test_end(self, logs={}):

self.logs.append(time()-self.starttime)

Any help appreciated. Thanks

Deepa

Lance Norskog

unread,

Feb 24, 2021, 9:39:12 PM2/24/21

to Deepa Mohan, Keras-users

Two suggestions:

1) call model.evaluate() after loading the model weights. There is a certain "warm-up" time to your computer, memory,&GPU.

2) add a separate time check for np.argmax. This requires scanning memory. Call it twice, one after the other, and measure both times.

Cheers,

Lance

--
You received this message because you are subscribed to the Google Groups "Keras-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to keras-users...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/keras-users/22c11962-b0e8-45bc-888e-aa5971cf92d0n%40googlegroups.com.

--

Lance Norskog
lance....@gmail.com
Redwood City, CA

Deepa Mohan

unread,

Feb 26, 2021, 4:23:48 PM2/26/21

to Keras-users

Hi Lance

Thanks for your reply. Yes I do have it in this sequence

1.model.compile

2.model.load_weights

3. model.evaluate()

I did separate np.argmax, didn't make much difference.

Lance Norskog

unread,

Feb 28, 2021, 9:37:30 PM2/28/21

to Deepa Mohan, Keras-users

No, I meant:

load weights

start timer

model.evaluate()

end timer

start timer

model.evaluate()

end timer

start timer

model.evaluate()

end timer

This "warms up" the all of the data in memory. You should find the second and third a little faster than the first.

What you are trying to do is called 'micro-benchmarking', and it is really hard to do well. You need a deep understanding of software and hardware interactions, and how all of the pieces fit together, and all of the performance tricks at play in each subsystem.

Cheers,

Lance

To view this discussion on the web, visit https://groups.google.com/d/msgid/keras-users/ccc01333-e37e-4b98-9dfe-dbe2489097een%40googlegroups.com.

Reply all

Reply to author

Forward

0 new messages

Search

Clear search

Close search

Google apps

Main menu