Hi,
I have tried to find an answer to this question for quite some time now, but it seems like there might be a piece missing that is not adressed with the upgoming TensorFlow 2.0 release that is absolutely fundamental for many Deep Learning approaches.
A common use-case is fine-tuning from Imagenet. At a first glance it seems like Tensorflow Hub takes over from TF-Slim in providing a model zoo for such things. However, TensorFlow Hub defines so-called "signatures" that dictate what you can do with the models available in the model zoo there. This means for a classifier you can typically only replace the last layer of the CNN. The rest of the CNN is an opaque blob and you can not access or modify its internals.
If I am right about TensorFlow Hub, then I can list numerous use-cases where the classifier signature is completely useless:
1: You can not train with multiple learning rates (say e.g. 1/10:th of the learning rate for the base-network, and the full learning rate for the final layer)
2: You can not extract intermediate layers from the CNN since they must be given by the signature that the one who provided the network defined for you. (This essentially means you have to do the Imagenet training yourself in order to get that level of control, and this makes it pointless to use TensorFlow Hub at all.)
3:If I am interested in, say, the only the 3 first layers of a CNN pretrained on Imagenet because I think they provide good initial feature extraction, and then want to design a completely different CNN for some other purpose, then this is not possible with TensorFlow Hub either.
4: If I want to finetune more than the last layer, I can not do that either. Say, for a very deep CNN with 100+ layers, it might make sense to finetune the 10 last layers, rather than only the last layer. This does not seem possible...
We use such techniques on a daily basis and REALLY need that flexibility.
It was provided with TF-Slim, but if I am right, this flexibility will no longer be provided in TensorFlow 2.0,
which puts the framework as a serious disadvantage compared to other frameworks such as PyTorch, MxNet etc.
Therefore I am hoping that I am either wrong about how TensorFlow Hub works,
or there will is an upcoming different kind of Model Zoo with greater flexibility available than I am not yet aware of.
I am very much looking forward to an answer to what the plan is to support such use-cases, since it seems quite urgent to solve if people are to migrate to TensorFlow 2.0 at all.
Thanks in advance,
/Niclas
--
You received this message because you are subscribed to the Google Groups "TensorFlow Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to developers+...@tensorflow.org.
Visit this group at https://groups.google.com/a/tensorflow.org/group/developers/.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/developers/1551088280942.41080%40axis.com.
Hi Aakash,
Yes Keras seems to be a much better API than TF-Slim but that is not what the question is about.
The problem is this:
The currently available Model Zoo with common classifiers (Inception, Resnets, Vgg16, MobileNet etc etc) are provided via the TF-Slim API in the graph formulation.
In TF 2.0 it seems obvious that defining CNNs in Keras in Eager formulation is the way to go. But there are no Imagenet-pretrained models available formulated in the "Eager-Keras" way except possibly in TensorFlow Hub, but I don't see the kind of support I need in TensorFlow Hub. So I see no way to get the model zoo unless I train the models myself.
Such model Zoo's are available for other Frameworks, since this has become a standard offering.
For instance, here is the source code for Resnet models in PyTorch defined in "Eager format".
https://pytorch.org/docs/stable/_modules/torchvision/models/resnet.html#resnet18
The definitions in TensorFlow Eager format can be written in a VERY similar way except minor naming replacements (such as inheriting from tf.keras.Model instead of from nn.Module etc).
I was expecting a similar Model Zoo for Tensorflow to be meaningful to use for developers that do not have arbitrarily huge datasets available.
So the question really is: Where is the new Model Zoo to use when TF-Slim becomes deprecated?!
BR,
/Niclas
Hi,
This certainly looks more like what I was looking for.
However, in TF 2.0 eager mode will be the default mode. These models seem to still be defined in the graph mode.
For instance:
https://github.com/keras-team/keras-applications/blob/master/keras_applications/vgg16.py
Are these models going to be available in eager formulation as well when TF 2.0 arrives, or what is the plan?
Also, obviously I did not see any reference to this when I was searching the TensorFlow/Keras documentations, but I guess this is because the framework is still in an early stage of development?
BR,
/Niclas
Yes it is good that the models can be used. But they are still in a fixed graph formulation.
In Eager Execution you can have completely dynamic forward passes via the imperatively defined "call" function etc.
I still do not know how to use a graph for retraining in Eager Mode.
That is, following the design paradigm illustrated here:
https://www.tensorflow.org/tutorials/eager/custom_layers
Will the model zoo be available with Eager Models like that as well in TF 2.0? Similar to the example (from the link above):
As you can see below, this design paradigm is very similar to the PyTorch example I provided in the earlier email, if you just change nn.module to tf.Keras:Model, change the name of the forward function to "call" instead of "forward" and replace the PyTorch
layer class definitions with their TensorFlow Keras equivalents, and it is really exactly that way of writing models for finetuning etc. that I am looking for. So the question is, will these also be available?
class ResnetIdentityBlock(tf.keras.Model):
def __init__(self, kernel_size, filters):
super(ResnetIdentityBlock, self).__init__(name='')
filters1, filters2, filters3 = filters
self.conv2a = tf.keras.layers.Conv2D(filters1, (1, 1))
self.bn2a = tf.keras.layers.BatchNormalization()
self.conv2b = tf.keras.layers.Conv2D(filters2, kernel_size, padding='same')
self.bn2b = tf.keras.layers.BatchNormalization()
self.conv2c = tf.keras.layers.Conv2D(filters3, (1, 1))
self.bn2c = tf.keras.layers.BatchNormalization()
def call(self, input_tensor, training=False):
x = self.conv2a(input_tensor)
x = self.bn2a(x, training=training)
x = tf.nn.relu(x)
x = self.conv2b(x)
x = self.bn2b(x, training=training)
x = tf.nn.relu(x)
x = self.conv2c(x)
x = self.bn2c(x, training=training)
x += input_tensor
return tf.nn.relu(x)
block = ResnetIdentityBlock(1, [1, 2, 3])
print(block(tf.zeros([1, 2, 3, 3])))
print([x.name for x in block.trainable_variables])
But they are not exactly the same.
You use a "predict" function which seems to be implicitly defined by the VGG16(...) function call, whereas the tensorflow example explicitly states I should implement the "call" function myself.
Also even if the graph RUNS in eager mode, the CNN is still defined in terms of the def VGG16(...) function which explicitly specifies the inputs and outputs of all the layers in the CNN and thus fixes it just like a computation graph setup function fixes a computation graph.
I can not see that they are the same. Do you mean the models in tf.Keras.Applications are exactly equivalent to the example in
https://www.tensorflow.org/tutorials/eager/custom_layers in every aspect?
Can I override the "predict" function in the same way that I can override the "call" function?
Can I do conditional inference passes using the tf.Keras.Applications models. Say for instance that I execute some of the layers layers in the pretrained model only if a condition given by some preprocessing function is fulfulled, whereas I otherwise provide some new other layers for that inference pass?
Am I missing something here, because I really can not see that the current pretrained models fulfil the requirements I am asking for, at least not in a way that is on-par with what is available in PyTorch, which is the framework I am benchmarking against
(and currently using while waiting for something similar to become available in TensorFlow)
One reason why I think this is important, rather than just "being able to run" the model in Eager mode is that we would really like to see TensorFlow as the more flexible framework, but it really requires that working with and modifying CNN models is as
flexible as for the other frameworks out there. Otherwise there is a tendency for developers to pick up the other frameworks such as PyTorch just because they seem more intuitive, straightforward and user friendly.
BR,
/Niclas
Hi again Akash and others,
Regarding the custom loading of pretrained weights in Tensorflow 2.0 that we talked about before,
I finally got some more time to sit down and try to produce a customization example based on CNNs pretrained on Imagenet,
but I still find the process much less straightforward than it was when using TF-Slim,
and also I still get stuck on the loading of an arbitrary subset of weights...
I will try to show you where I ended up and maybe you can tell me if I got something completely wrong, or if this maybe IS less supported than it first seemed in the new TensorFlow 2.0?
Firstly, it is obviously true as you wrote in the example that you can download and initialize the model with a 2-liner like this:
from tensorflow.keras.applications.vgg16 import VGG16, preprocess_input
vgg_model = VGG16(input_shape=(224,224,3), weights='imagenet')
However, for the purpose of extracting an arbitrary subset of layers with pretrained weights to use as a building block for a new model this has the following drawbacks:
1: I don't have the model description available. It is still in the Keras repository. I need to have access to it locally to customize it and only load the unchanged layers from the pretrained weights file.
2: I need the weights to be stored locally as well since I can not assume to always have access to the git where the weights are stored (and for future proofing of the model I don't want to risk that something happens to the remote repo which would render
my code useless, not having access to the pretrained weights, so storing them locally is a definitely must)
3: The loading of the weights is integrated in the model definition which I find strange since it makes it more complicated to use the subset of VGG16 CNN layers together with other code. Which means I would have to break out this code from the VGG16 definition
in my customized CNN. TF-Slim has none of these awkwardnesses
4: It is not immediately clear exactly what happens with these model injections, but spontaneously it seems they could later interfere with my custom layers that I add later that are defined outside of this Keras Model.
5: The model is defined here:
https://github.com/tensorflow/tensorflow/blob/r2.0/tensorflow/python/keras/applications/vgg16.py
But this is just a wrapper layer to the Keras repository with some modules injection decorators that does some magic.
Apparently I need this code copied to my local code as well since these model insertions only refer to the Keras repository so I can not use them with my custom VGG16 based CNN.
SOLUTION:
-----------------------------------------------------------------------------------------------------------------------
So I copied the CNN model from the Keras repository and added the magical model insertions (and the utilities file that goes with the CNN as well)
Then this is what I get in the main file (the other files not listed here, but they are the same as they are in the Keras repository):
from tensorflow.python.keras.applications import keras_modules_injection
from tensorflow.python.util.tf_export import keras_export
@keras_export('keras.applications.vgg16.VGG16',
'keras.applications.VGG16')
@keras_modules_injection
def VGG16(*args, **kwargs):
return vgg16.VGG16(*args, **kwargs)
@keras_export('keras.applications.vgg16.decode_predictions')
@keras_modules_injection
def decode_predictions(*args, **kwargs):
return vgg16.decode_predictions(*args, **kwargs)
@keras_export('keras.applications.vgg16.preprocess_input')
@keras_modules_injection
def preprocess_input(*args, **kwargs):
return vgg16.preprocess_input(*args, **kwargs)
# This is the original vgg16 wrapper around downloading the models from the keras git.
#from tensorflow.keras.applications.vgg16 import VGG16, preprocess_input
import vgg16 # This now imports my customized VGG16 model from a local file in the same folder.
vgg_model = VGG16(input_shape=(224,224,3), weights='vgg16_weights_tf_dim_ordering_tf_kernels.h5')
Now, I still get stuck here, because the saved weights is in .h5 binary format and I only have 2 options:
1: To load the whole CNN inlcudiong imagenet final layer (vgg16_weights_tf_dim_ordering_tf_kernels.h5)
2: or I can load all layer except the last (vgg16_weights_tf_dim_ordering_tf_kernels_notop). If I modify the CNN by removing some layers I get the error message: ValueError: You are trying to load a weight file containing 16 layers into a model with 13 layers.
This has the same limitations as the Tensorflow Hub and is not really helpful if I want to load fewer layers than "all except the last layer".
In TF-Slim one COULD remove layers and it would only load those that still had matching names, which was extremely useful for customization purposes.
This is also the same as one can do in PyTorch by using the "nonstrict" option for the weights load command, so PyTorch still has what I can not reproduce in the new "improved" Tensorflow.
So the fact that this weight loading this option seems to have been removed leaves me a bit stuck.
Is there a more straightforward way to load the weights into my customized vgg16 CNN, that do not involve coying the "module injection" code explicitly into my own CNN setup code and that allows me to load only the weights I am interested in from the h5 weight file?
I would appreciate some kind of example since this seems like a serious flaw if one has to go through this mess I had to create (if it works at all).
Thanks in advance,
/Niclas
1: I don't have the model description available. It is still in the Keras repository. I need to have access to it locally to customize it and only load the unchanged layers from the pretrained weights file.
3: The loading of the weights is integrated in the model definition which I find strange since it makes it more complicated to use the subset of VGG16 CNN layers together with other code. Which means I would have to break out this code from the VGG16 definition in my customized CNN. TF-Slim has none of these awkwardnesses
4: It is not immediately clear exactly what happens with these model injections, but spontaneously it seems they could later interfere with my custom layers that I add later that are defined outside of this Keras Model.
# Import the model definition
from vgg import VGG16
# instantiate and load weights
vgg_model = VGG16(input_shape=(224,224,3))
weights_file = h5py.File('filename.h5', 'r')
for layer in model.layers:
weight_arrays = [weights_file[layer.name][layer.name + '_W_1:0'].value,
weights_file[layer.name][layer.name + '_b_1:0'].value]
layer.set_weights(weights_arrays)
Thanks,
I think I got it to work now. Thanks for all the good feedback, it really helped in finding the relevant things to use for my use-case.
Just wondering one last thing... :-)
The weights are named with 'tf' in the name, whereas the default preprocessing mode is 'caffe' for VGG16.
This confused me a but initially since I assumed I should set 'tf' explicitly in the Keras preprocessing function. Do you have any idea why this is so? The file is named as:
'vgg16_weights_tf_dim_ordering_tf_kernels.h5'
Other than that I think I got it right, though the number of modifications when breaking out the code was still not that small (but does make sense, considering that I want to break out the code definition for customization like this). I include the code below and the changes I had to make.
Let me know if how I implemented it makes sense to you.
I used the new Udacity course for TF 2.0 with Paige Bailey as reference to check that the CNN calculates the right thing with the right preprocessing. In particular the finetuning colab exercise that uses an image of a military uniform when doing an inference test with the mobilenet classifier downloaded from Tensorflow Hub (which curiously has 1001 classes whereas Keras only has 1000). (see included section below for details...)
The course:
https://eu.udacity.com/course/intro-to-tensorflow-for-deep-learning--ud187
BR,
/Niclas
Code changes needed to break out the CNN definition:
-----------------------------------------------------------------------------
In the vgg16 model definition file:
- Changed weights input to None
- Removed function get_submodules_from_kwargs
Instead specify explicitly: data_format = channels_last
- Also need to import layers explicitly (this is part of what was "injected" before)
import tensorflow as tf
import tensorflow.keras.layers as layers
import tensorflow.keras.models as models
- Removed all preprocessing before model definition:
# backend, layers, models, keras_utils = get_submodules_from_kwargs(kwargs)
#
# if not (weights in {'imagenet', None} or os.path.exists(weights)):
# raise ValueError('The `weights` argument should be either '
# '`None` (random initialization), `imagenet` '
# '(pre-training on ImageNet), '
# 'or the path to the weights file to be loaded.')
#
# if weights == 'imagenet' and include_top and classes != 1000:
# raise ValueError('If using `weights` as `"imagenet"` with `include_top`'
# ' as true, `classes` should be 1000')
# # Determine proper input shape
# input_shape = _obtain_input_shape(input_shape,
# default_size=224,
# min_size=32,
# data_format=backend.image_data_format(),
# require_flatten=include_top,
# weights=weights)
- Kept:
if input_tensor is None:
img_input = layers.Input(shape=input_shape)
but removed:
else:
if not backend.is_keras_tensor(input_tensor):
img_input = layers.Input(tensor=input_tensor, shape=input_shape)
else:
img_input = input_tensor
- Removed this thingemabob:
# Ensure that the model takes into account
# any potential predecessors of `input_tensor`.
if input_tensor is not None:
inputs = keras_utils.get_source_inputs(input_tensor)
else:
inputs = img_input
- Change to matching input image name here: model = models.Model(img_input, x, name='vgg16')
- Removed the weights loading since we will do that outside the model definition.
- In the imagenet_utils function:
backend, _, _, _ = get_submodules_from_kwargs(kwargs)"
changed to
import tensorflow.keras.backend as backend
The main file that seems to work properly listed below (I used OpenCV for loading images, though I noted there is a Keras preprocessing repository they recommend that uses PIL instead, but I wanted to make it fit my usual preprocessing flow)
-------------------------------------------------------------------------------
# https://www.tensorflow.org/community/style_guide
# All code needs to be compatible with Python 2 and 3
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import tensorflow as tf
import numpy as np
import cv2
import h5py
from vgg16 import VGG16
from imagenet_utils import preprocess_input
# This list has the background class as 0:th class and thus 1001 classes in total (tensorflow hub pretrained CNNs are trained like that,
# whereas the Keras pretrained models has the usual 1000 classes). We correct for that below.
labels_path = tf.keras.utils.get_file('ImageNetLabels.txt','https://storage.googleapis.com/download.tensorflow.org/data/ImageNetLabels.txt')
print(labels_path)
imagenet_labels = open(labels_path).read().splitlines()
# Print the Imagenet labels
for idx, label in enumerate(imagenet_labels):
# Don't print the background class, and also subtract 1 for indexing offset (start vector indices on 0)
if idx > 0:
print("{}: {}".format(idx-1, label))
# Load the model without doing weight initialization
model = VGG16(input_shape=(224,224,3), weights=None)
# ****************************************************
# Load the weights with a higher level of control
# ****************************************************
# Weights file expects integer input 255 range.
weights_file = h5py.File('vgg16_weights_tf_dim_ordering_tf_kernels.h5', 'r')
for layer in model.layers:
# Remove layers that do not have weights to be loaded.
if not any(item in layer.name for item in ["input", "pool", "flatten"]):
# Used the print to identify the names of layers that did not have weightts in the given format.
print(layer.name)
print(weights_file[layer.name].keys()) # This line gives the keys the weights are stored with.
# Useful if the weights are not just convlayers so one needs to figure out how to access them.
# Example outpus: <KeysViewHDF5 ['block5_conv2_W_1:0', 'block5_conv2_b_1:0']>
weight_arrays = [weights_file[layer.name][layer.name + '_W_1:0'].value,
weights_file[layer.name][layer.name + '_b_1:0'].value]
layer.set_weights(weight_arrays)
# ****************************************************
# Load input image and preprocess
# ****************************************************
# The image used in the Tensorflow 2.0 course with Paige Bailey at Udacity in this exercise:
# https://colab.research.google.com/github/tensorflow/examples/blob/master/courses/udacity_intro_to_tensorflow_for_deep_learning/l06c01_tensorflow_hub_and_transfer_learning.ipynb
grace_hopper = tf.keras.utils.get_file('image.jpg','https://storage.googleapis.com/download.tensorflow.org/example_images/grace_hopper.jpg')
frame = cv2.imread(grace_hopper)
frame = cv2.resize(frame, (224,224), interpolation = cv2.INTER_NEAREST)
frame = frame[:,:,::-1] # OpenCV has flipped channel order vs PIL. This line corrects for that.
frame = np.expand_dims(frame, axis=0)
frame = preprocess_input(frame, data_format='channels_last')
# Run inference (should get 652 as class index (TF 2.0 course gets 653, since it has the added background class)
print(np.argmax(model.predict(frame)))
Hi again Aakash,
I have one more issue that appears when I try to use a more modern model than VGG16 the way you suggested.
In MobileNet v2 they dynamically need the dimensions of the input tensor in order to set up the CNN,
but even though it is claimed that since the tensors are all eager in TF 2.0, I can not get the values out of them.
Basically I get the error:
AttributeError: 'Tensor' object has no attribute 'numpy'
I fetch the model from here:
https://github.com/keras-team/keras-applications/blob/master/keras_applications/mobilenet_v2.py
And in the function
def _inverted_res_block(...)
the line that causes problems is originally:
in_channels = backend.int_shape(inputs)[channel_axis]
Now, I obviously do not have the backend object since, as you suggested, I am not using any Tensorflow "injections", but rather I should replace this line with something which explicitly uses Tensorflow but is equivalent.
It seemed most reasonable to get the shape and convert it to numpy values like this (but it obviously did not work)
in_channels = tf.shape(img_input).numpy()[channel_axis]
The tensor I want to get the channels value from looks like this when printed:
Tensor("Conv1_relu/Identity:0", shape=(None, 112, 112, 32), dtype=float32)
I created a short colab code snippet that illustrates the steps that go wrong:
https://colab.research.google.com/drive/1wPHqn-hc-itmWxy_TScTN_OUb9VdVz-Q
Is there any way to get that values (the value 32 which I can SEE in the print above, but can not access)
Since it works with the wrapped Keras Applications inside Tensorflow when used with the usual download API in Tensorflow, it should be possible, but I can not figure out how. The backend.int_shape() command does not seem to have any direct correspondence in Tensorflow either...
Furthermore, is there anywhere one can read about the contitions when one CAN get a numpy value out of an (Eager) tensor? It is very random when this command works. Before in the threads it turned out the numpy() command did not work when I wrapped the function with the decorator @tf.function in training loops, since the tensors seem to stop being eager then, and this makes the attribute vanish from the tensor object. But nothing of this is documented (other than in my old email threads here).
Thanks in advance!
/Niclas
• • • • | Paige Bailey Product Manager (TensorFlow) @DynamicWebPaige
|
print(img_input.shape) # prints the shape of the tensor
last_dim = img_input.shape[-1] # returns 3 or 32 depending on the number of channels
Hi Aakash,
Thanks for the feedback. Using tensor.shape works much better. Though the documentation (TF 2.0 doc) confusingly says it returns a tf.tensorshape object which "represents a possibly-partial shape specification for a Tensor", it DOES surprisingly seem like
what I get back is in fact a tuple of integers.
Though I finally realized which with hindsight should have been obvious, that the backend object IS present in the tf.keras API. That is, replacing:
backend, layers, models, keras_utils = get_submodules_from_kwargs(kwargs)
with
import tensorflow.keras.backend as backend
import tensorflow.keras.utils as keras_utils
import tensorflow.keras.layers as layers
import tensorflow.keras.models as models
did the trick with much fewer code modifications. Maybe something to document in the framework? :-)
---------------------------------------------------
Now regarding the .numpy() issue, I still beliewve this is an issue that needs to be adressed. Let me explain why:
-----------------------------------------------------
As I mentioned before, what I wanted to achieve is to use the code available in Keras Applications with as little modification as possible but kept as a local copy so I can do whatever customizations I wish (similar again to how it was possible to do with TF-Slim).
This obvisously meant I needed to handle the fact that the code is written for multiple backends in some way.
Mainly it means dealing with the imports described above.
I did not (initially) find much information about what some of these objects corresponded to in native Tensorflow so I tried to replace the affected code with TF native equivalents. That's where the .numpy() command did not work on what looked very much like Eager tensors to me. And that is where I think either there might be a bug but where at least documentation is clearly lacking.
In particular, it seems like the fact that .numpy() does not work when you add the tf.function decorator to functions that contain tensorflow operations highlights a general mechanism that is obviously there but whose consequences are not fully documented. I think it is not so common to have programming patterns where attributes of classes can vanish without any warning or where there are different "run-modes" (i.e. Eager vs graph mode) where the class objects seem to be converted to something else in an opaque way under-the-hood , sometimes due to secondary code that I as a framework user might not have developed myself.
So a clear documentation of this at each place in the documentation where an attribute or function is described that is affected by eager versus graph modes of execution is clearly needed in my opinion, as well as some conceptual explanation in a top-level tutorial about how the opaque graph mode affects tensors etc.
In particular if you are fresh to Tensorflow 2.0 without any prior experience of earlier versions of Tensorflow this "vanishing of attributes" would seem very confusing since the graph mode of execution is now kept opaque to the user.
With that said, I think in general the direction of TF 2.0 is indeed the right one. I am happy to see that so many things in general are becoming much more streamlined and "Pythonic" in the framework. :-)
BR,
/Niclas