TFJS Layers Model Not Loading Weights Correctly

647 views
Skip to first unread message

Josh

unread,
Oct 20, 2019, 5:16:07 AM10/20/19
to TensorFlow.js Discussion
Hi all,

I've converted a Keras .H5 model to a TFJS Layers Model to infer on a Node JS server and am noticing wildly different results between the two. This model is used for image classification and uses a slightly modified Xception architecture. Both model.summary() outputs match up, so it seems the conversion of the architecture has been successful.

After looking into it further, the weights for the first layer of the model are completely different between the two which would explain the different results. The input image tensor is consistent between both, meaning the problem should lie with the weights.


Here is a link to a comparison of the same layer's weights in TFJS and Keras...


I set the output of my Express HTTP file server to verbose to check whether all of the weight files were being fetched, and I noticed that they were all being fetched, but in an unsequential order, like this:



GET /model.json 200 91907 - 5.775 ms

GET /group1-shard20of20.bin 200 3770536 - 32.726 ms

GET /group1-shard3of20.bin 200 4194304 - 26.487 ms 

GET /group1-shard1of20.bin 200 4194304 - 26.892 ms 

GET /group1-shard7of20.bin 200 4194304 - 26.408 ms 

GET /group1-shard13of20.bin 200 4194304 - 24.671 ms

GET /group1-shard10of20.bin 200 4194304 - 24.925 ms

GET /group1-shard15of20.bin 200 4194304 - 24.931 ms

GET /group1-shard5of20.bin 200 4194304 - 24.827 ms 

GET /group1-shard2of20.bin 200 4194304 - 25.037 ms 

GET /group1-shard6of20.bin 200 4194304 - 24.942 ms

GET /group1-shard18of20.bin 200 4194304 - 24.893 ms

GET /group1-shard16of20.bin 200 4194304 - 25.468 ms

GET /group1-shard9of20.bin 200 4194304 - 25.111 ms

GET /group1-shard11of20.bin 200 4194304 - 25.136 ms

GET /group1-shard12of20.bin 200 4194304 - 26.097 ms

GET /group1-shard8of20.bin 200 4194304 - 23.290 ms

GET /group1-shard4of20.bin 200 4194304 - 26.133 ms

GET /group1-shard14of20.bin 200 4194304 - 27.097 ms

GET /group1-shard17of20.bin 200 4194304 - 26.265 ms

GET /group1-shard19of20.bin 200 4194304 - 33.396 ms


It seems to fetch the weight files in a different order each time it is executed. Does anyone know if this is expected behaviour with the tf.loadLayersModel function? Could my weights be out of order?


I noticed some issues like this arose almost a year ago, but the bug was identified and an update to Tensorflow solved it for these people.


I am using Tensorflow JS 1.2.11 to load the model on a Node JS v10.16.3 server, and I used Tensorflow 1.14 under Python 3.7 to convert the model. I used tfjs.converters.save_keras_model() to convert the model from Keras to TFJS, and I am using this code to get the model from the HTTP server:


model = await tf.loadLayersModel('http://127.0.0.1:3001/model.json');


Thanks for your time, and please let me know if any more information is needed,


Josh

model.json

Nikhil Thorat

unread,
Oct 21, 2019, 4:54:06 PM10/21/19
to Josh, TensorFlow.js Discussion, Ann Yuan, Shanqing Cai
Different results are not expected, though the order of network requests is non-deterministic (and that's okay! we make sure to load them in the right order in-memory).

Ann or Shanqing do you think you could take a look at this (Shanqing it's okay if it's EOW).

--
You received this message because you are subscribed to the Google Groups "TensorFlow.js Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tfjs+uns...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/tfjs/6c53de9e-d7b0-437a-8bd9-7bbc325b3277%40tensorflow.org.

Josh

unread,
Oct 23, 2019, 10:06:31 AM10/23/19
to TensorFlow.js Discussion, joshne...@gmail.com, ann...@google.com, ca...@google.com
Thanks for your reply and clarification on how tf.loadLayersModel() works. here is a link to a ZIP file with the two models, along with a sample image, to see if it is possible that the problem can be recreated. For 17.jpg, my Keras model predicts [0.49672168, 0.5032783] and my TFJS model predicts [0.7255099415779114, 0.2744901478290558]. The problem seems to lie with the weights but I don't know whether it happens when converting between Keras and TFJS models or when the model is being loaded in...

Thanks again for your reply, and feel free to ask for any more information!
Josh
To unsubscribe from this group and stop receiving emails from it, send an email to tf...@tensorflow.org.

Shanqing Cai

unread,
Oct 25, 2019, 10:15:02 AM10/25/19
to Josh, TensorFlow.js Discussion, Ann Yuan
Hi Josh,

Thanks for providing us with the model and data. I tried reproducing your Python-TFjs mismatch and so far I haven't had any luck. This is what I did to load your model and run it on your data (17.jpg) in Python:

```
import numpy as np
import tensorflow as tf
import cv2
import json

imx = cv2.imread("./17.jpg") / 255.0
imx = np.expand_dims(np.transpose(imx, axes=[1, 0, 2]), axis=0)
with open('17.jpg.json', 'wt') as f:
  json.dump(imx.tolist(), f)

model = tf.keras.models.load_model('./Keras/50_epochs.h5')
imy = model.predict(imx)
print(imy)
```

Note that 
1) I have to use np.transpose() to make sure that the input array size matches the model's input shape
2) I normalize the image array to [0, 1]
3) I dump it out to a JSON file so I can load it in JavaScript later.

Here are the numbers I got from three different environments:
1) Python tf.keras (tf.__version__: 2.1.0-dev20191023): [[0.38062853 0.6193714 ]]
2) tfjs-node (v1.2.10): [[0.3806291, 0.6193709],]
3) tfjs browser (v1.2.10, Linux Chrome): [[0.3806289, 0.6193711],]

So there doesn't seem to be any significant mismatch to me. But I also noticed that these output numbers don't match the ones you provided.

So if you can provide further details regarding how you load the image data and run it, I can look at it further.

Best,
Shanqing
--
---
Shanqing Cai
Software Engineer
Google
Message has been deleted

Josh

unread,
Oct 26, 2019, 6:22:54 AM10/26/19
to TensorFlow.js Discussion, joshne...@gmail.com, ann...@google.com
Thanks Shanqing,

I tried using your Python code for loading in an image and got slightly different results under Tensorflow-GPU, but the exact same results using Tensorflow-CPU. The only difference I can see between your code and my code is the how we added the batch dimension, I used image.reshape((1, image.shape[1], image.shape[0], image.shape[2])). Anyway, I re-ran my Python and Node on 17.jpg and managed to get [[0.38062847, 0.6193716]] with Python and [[0.303010613, 0.6193716]] with Node which isn't a million miles off. But, with some other images, the difference is much more pronounced e.g. with 18.jpg which I have attached ([[0.25102922, 0.74897075]] in Python and [[0.557361125946044, 0.442638874053955]] in Node).

To load the images in using Node, I use a canvas object to load and resize the image to 320x240 and then convert this to a Tensor using tf.browser.fromPixels(). Here is the code for some context:

readLocalImage(img_path) {
        const { Image, createCanvas } = require('canvas');
        const canvas = createCanvas(320, 240);
        const ctx = canvas.getContext('2d');
    
        var img = new Image();
        img.onload = () => ctx.drawImage(img, 0, 0);
        img.onerror = err => { throw err };
        img.src = img_path;
        var tensor = tf.browser.fromPixels(canvas).toFloat().expandDims(); // Converted to float for more accurate processing during prediction
        tensor = tf.transpose(tensor, [0, 2, 1, 3]); // Reorder the tensor so the batch dimension is first, followed by width, height and depth dimensions
        tensor = tensor.div(255.0); // Normalise pixel values between 0 and 1
        tensor.print();
        return tensor;
    }
    
async predictImage(image_tensor) {
var predictions = await model.predict(image_tensor).data();
        console.log(predictions);
}

I think this should (in theory) do the same thing as the Python code but I'm not sure if there are some subtleties that I'm missing out? I know this results in different predictions for images that need to be resized down to 320x240, but for images that are already the correct resolution, it seems that it should work fine...

Thanks again for your time and support,
Josh

Josh

unread,
Oct 31, 2019, 2:07:33 PM10/31/19
to TensorFlow.js Discussion, joshne...@gmail.com, ann...@google.com
I reckon there might be some discrepancy in how I'm loading the images, so I've just tried predicting from the 17.jpg.json file that was generated by your Python code and got the even closer result of [0.3194672763347626, 0.680532693862915], but still not bang on. Can you share the node code you were using to predict, and also which version of Node you used to get the correct predictions? I am on Node v10.16.3 with Tensorflow-JS v1.3.1 and TFJS-Node v1.3.1. I've attached my JSON file just in case there were any differences.

Thanks again,
Josh
17.jpg.json
Reply all
Reply to author
Forward
0 new messages