Load a pre trained CCT leads to different accuracy values

René Larisch

unread,

Feb 9, 2022, 11:50:24 AM2/9/22

to Keras-users

Hello everyone,

I wanted to use the implementation of the Compact Convolutional Transformer, as mentioned in the Keras documentation (https://keras.io/examples/vision/cct/).

So to test the implementation I set up a little notebook in google colab and start playing a bit with it. Now the problem is, if I create a complete new model and use the

load_weights() functions to load the weights from the last checkpoint, 
the accuracy on the test set is different each time, I load a new model.
You can find the colab-notebook here: 

https://colab.research.google.com/drive/15p25m1_G7wF0oCphhS2dca8tBOLitkgp?usp=sharing 

At the bottom of the notebook, in the last cell, you can find a little for-loop which 3 times 
created a new CCT network and loaded the pre-trained weights. 
As you can see, the accuracy  is different each time. 

In the cell above you can see, how the original model (the one which is trained) preforms 3 times exactly the same.

Can someone explain, what I did wrong and what causes such behavior? 

Thank you for time and help!
Best regards

Sayak Paul

unread,

Feb 9, 2022, 1:19:32 PM2/9/22

to René Larisch, Keras-users

Hi Rene,

I am the primary author of that example. Could send an email to me instead so that I don't miss out on it?

Thanks,
Sayak | sayak.dev

--
You received this message because you are subscribed to the Google Groups "Keras-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to keras-users...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/keras-users/cf61bc5f-f2e1-4ddc-8819-6eb96627de40n%40googlegroups.com.

Lance Norskog

unread,

Feb 9, 2022, 6:42:50 PM2/9/22

to Sayak Paul, René Larisch, Keras-users

As to the problem, I would guess that one or more of a few things is not deterministic:

keras.utils.to_categoric()

Embedding layer

Also, maybe the layer dropping or augmentations are being called during inference?

Maybe these layers need a smart_cond wrapper to be reloadable?

Frankly, model reloadability in Keras seems to be a case of Zeno's Paradox: you never quite finish!

To view this discussion on the web, visit https://groups.google.com/d/msgid/keras-users/CAGa_XGE2b_W5eNG9yEdRizoGtzE%2BqYBgqbde9TEUGby_dAthXQ%40mail.gmail.com.

--

Lance Norskog
lance....@gmail.com
Redwood City, CA

Sayak Paul

unread,

Feb 9, 2022, 8:18:43 PM2/9/22

to Lance Norskog, René Larisch, Keras-users

Since the Stochastic Depth layer has a training argument in its call, one would expect it to behave accordingly during inference. Keras sets the training argument automatically during inference which is expected to be propagated that as well.

Same goes for the augmentation layers. They also have fixed inference behaviour.

Worth posting it in discuss.tensorflow.org.

Thanks,
Sayak | sayak.dev

Lance Norskog

unread,

Feb 10, 2022, 2:42:09 AM2/10/22

to Sayak Paul, René Larisch, Keras-users

I have never seen this in examples:

def call(self, x, training=None):
if training:

x = something(x)

return x

But I have seen smart_cond used and have used it. smart_cond executes both tensors and passes on one of them, based on a boolean input. Here is the source for Dropout:

https://github.com/keras-team/keras/blob/v2.8.0/keras/layers/core/dropout.py#L107

It would be helpful to have this called out in the Keras docs. smart_cond seems to be an important part of the training/inference runtime behavior and it is not in the Keras docs index anywhere.

Cheers,

Lance Norskog

Sayak Paul

unread,

Feb 10, 2022, 2:48:50 AM2/10/22

to Lance Norskog, René Larisch, Keras-users

Well, as you can guess this was reviewed and I have used this pattern in other examples too (VQ-VAE for example).

If you have a better fix, you're welcome to raise a PR :)

Sayak Paul | sayak.dev

Lance Norskog

unread,

Feb 10, 2022, 12:57:42 PM2/10/22

to Sayak Paul, René Larisch, Keras-users

Should this include the optional "training" parameter?

class CCTTokenizer(layers.Layer):
...
    def call(self, images):
        outputs = self.conv_model(images)

Sayak Paul

unread,

Feb 10, 2022, 8:27:10 PM2/10/22

to Lance Norskog, René Larisch, Keras-users

Probably not. Might be worth discussing this on discuss.tensorflow.org.

Thanks,
Sayak | sayak.dev

René Larisch

unread,

Feb 11, 2022, 3:59:49 AM2/11/22

to Keras-users

Hello,

thank you @Sayak Paul @Lance Norskog for you help and suggestions.

here is a little update:

I have put out the data_augmentation layer and yes, the differences seems to be smaller. But still, I got slightly different accuracy values for each time, I load the weights.

So I created a subnetwork, only consisted of the Input layer and the CCTTokenizer to see, if the output may differ for each time, I create the network and load weights.

To make it more clear, the steps are:

A complete new CCT network is created.

The pre-trained weights are loaded.

A smaller network is created out of the CCT network, only with the Input-layer and the CCTTokenizer.

The first 10 samples of the Testset are presented to the new smaller network.

Print out some outputs.

This is done, again, 3 times. And for all 3 times, the output is equal.

If I now create a smaller network, consisting of the first three layers of the CCT network (regarding to .summary() after an add-operation) and let print output values, the output values show small differences for each of the 3 trials.

You can find two more cells in the colab notebook for this.

But I will post it on discuss.tensorflow.org.

Again, thanks for your help!

Best regards,

Sayak Paul

unread,

Feb 11, 2022, 4:31:32 AM2/11/22

to René Larisch, Keras-users

This is indeed weird.

I wonder if this behavior stems at layer level. So, it's probably best to see what a Keras team member has to say about it.

Thanks,
Sayak | sayak.dev

To view this discussion on the web, visit https://groups.google.com/d/msgid/keras-users/559a7930-e416-4a9b-bdf8-b5377066fb15n%40googlegroups.com.

Reply all

Reply to author

Forward