tf Estimator input_fn using external Python reader

222 views
Skip to first unread message

Roy

unread,
May 4, 2017, 12:41:33 PM5/4/17
to Discuss
Trying to run the tf Estimator (high-level module) using input_fn, which is fed by an external, Python reader.
To simplify the question, I created a simulating code that anyone can run. attached, and pasted below.
How do I make the input_fn pull NEW 'samples' from the external-reader on each step?

Please note:
  • the architecture is not important. took a stupid simple net.
  • the external reader outputs a dictionary of numpy matrices.
  • the input_fn is using this reader.
  • In order to verify that the reader "pulls new values", I both
    • save the recent value to self.status (should be > 1.0)
    • save a summary, to be viewed in tensorboard.

Thanks for the helpers!


import tensorflow as tf
import numpy as np
modekeys
= tf.contrib.learn.ModeKeys
tf
.logging.set_verbosity(tf.logging.DEBUG)

class inputExample:
   
def __init__(self):
       
self.status = 0.0 # tracing which value was recently 'pushed' to the net
        self.model_dir = 'temp_dir'
        self.get_estimator()

   
def input_fn(self):
        batch_data
= self.reader()
       
for kk in batch_data.keys():
            batch_data
[kk] = tf.constant(batch_data[kk])
        features_dict
= dict(data=batch_data.pop('data'))
        labels_dict
= batch_data
       
return features_dict, labels_dict

   
def model_fn(self, features, labels, mode):
        features_in
= features['data']
        labels_in
= labels['labels']
        pred_layer
= tf.layers.conv2d(name='pred', inputs=features_in, filters=1, kernel_size=3)
        tf
.summary.scalar(name='label', tensor=tf.squeeze(labels_in))
        tf
.summary.scalar(name='pred', tensor=tf.squeeze(pred_layer))
        loss
= None
        if mode != modekeys.INFER:
            loss
= tf.losses.mean_squared_error(labels=labels_in, predictions=pred_layer)
        train_op
= None
        if mode == modekeys.TRAIN:
            train_op
= tf.contrib.layers.optimize_loss(
               
loss=loss,
               
learning_rate = 0.01,
               
optimizer = 'SGD',
               
global_step = tf.contrib.framework.get_global_step()
           
)
        predictions
= {'estim_exp': pred_layer}
       
return tf.contrib.learn.ModelFnOps(mode=mode, predictions=predictions, loss=loss, train_op=train_op)

   
def reader(self):
       
self.status += 1
        return dict(
           
data = np.ones([1,3,3,1], dtype=np.float32)*self.status,
           
labels = np.exp(np.ones([1,1,1,1], dtype=np.float32)*self.status)
       
)

   
def get_estimator(self):
       
self.Estimator = tf.contrib.learn.Estimator(
           
model_fn = self.model_fn,
           
model_dir = self.model_dir,
           
config = tf.contrib.learn.RunConfig(
               
save_checkpoints_steps = 10,
               
save_summary_steps = 10,
               
save_checkpoints_secs = None
            )
       
)

if __name__ == '__main__':
    ex
= inputExample()
    ex
.Estimator.fit(input_fn=ex.input_fn)


input_fn_example.py

Martin Wicke

unread,
May 4, 2017, 12:54:29 PM5/4/17
to Roy, Discuss
You have to wrap your reader function in a py_func (https://www.tensorflow.org/api_docs/python/tf/py_func). Make sure to set stateful=true.

As it is now, reader() will be called exactly once. The input_fn is called once when you call fit(), to create a graph. The graph is then executed in a loop. 

By wrapping your reader in a py_func, you create an op (once) which is then run in the loop as part of the graph.

--
You received this message because you are subscribed to the Google Groups "Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss+unsubscribe@tensorflow.org.
To post to this group, send email to dis...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/discuss/152ef246-99e0-4634-8453-70dc5a6733b0%40tensorflow.org.

Roy

unread,
May 4, 2017, 1:07:28 PM5/4/17
to Discuss
Thanks Martin.
 
I tried to add this py_func following your recommendation, however it expects numpy arrays as both the inputs and outputs, whereas my reader "spits" a dictionary with many keys, for all the features and labels of each (random) sample. In the simplified example, one key is 'data' which is 3*3 array, and another key is 'labels' which is a scalar.

So I didn't yet understand how to overcome this gap ...
To unsubscribe from this group and stop receiving emails from it, send an email to discuss+u...@tensorflow.org.

Roy

unread,
May 4, 2017, 1:14:15 PM5/4/17
to Discuss
(as far as i understand, tf expects us to shift to giving input through input_fn dictionaries, rather than through x y inputs. which makes more sense as it gives the flexibility to work with any type of input and output, not only columns)

Martin Wicke

unread,
May 4, 2017, 1:16:09 PM5/4/17
to Roy, Discuss
You don't need any inputs to the reader, so that should be simple.

I'm assuming your output would be a dict with a fixed set of keys (say, data1, data2, and labels). So your py_func would return three numpy arrays, and you have to agree with your consumer (in your input_fn implementation) that you will attach the first output to the key data1, the second to data2, and the third to labels (in a different dict).

In other words, py_func can only return a fixed set of outputs, but you can make them into a dict in input_fn.

To unsubscribe from this group and stop receiving emails from it, send an email to discuss+unsubscribe@tensorflow.org.

To post to this group, send email to dis...@tensorflow.org.

Roy

unread,
May 4, 2017, 1:30:17 PM5/4/17
to Discuss
Thanks again.
This is what I changed with the input_fn, what is wrong?
def input_fn(self):
data, labels = tf.py_func(func=self.input_fn_np(), inp=[], Tout=[tf.float32, tf.float32], stateful=True)
return dict(data=data), dict(labels=labels)

def input_fn_np(self):
batch_data = self.reader()
return batch_data['data'], batch_data['labels']

Martin Wicke

unread,
May 4, 2017, 1:37:13 PM5/4/17
to Roy, Discuss
You call self.input_fn_np -- you need to pass the function reference, not its result.

To unsubscribe from this group and stop receiving emails from it, send an email to discuss+unsubscribe@tensorflow.org.

To post to this group, send email to dis...@tensorflow.org.

Roy

unread,
May 4, 2017, 1:49:45 PM5/4/17
to Discuss
fixed. still not working. i attach the updated code so errors can be re-created. current error: Cannot take the length of Shape with unknown rank.

One thing I can think of - When using tf.py_func with an output which is a list of numpy 4D arrays, is this the right way to define Tout :
... Tout=[tf.float32, tf.float32] ...
?
input_fn_example.py

Martin Wicke

unread,
May 4, 2017, 2:14:06 PM5/4/17
to Roy, Discuss
Do you have a stack trace?

To unsubscribe from this group and stop receiving emails from it, send an email to discuss+unsubscribe@tensorflow.org.

To post to this group, send email to dis...@tensorflow.org.

Roy

unread,
May 4, 2017, 2:22:46 PM5/4/17
to Discuss
I guess no, as I don't know what you refer to.

Martin Wicke

unread,
May 4, 2017, 2:33:10 PM5/4/17
to Roy, Discuss
The problem appears to be the following: The tensors that py_func returns do not have static shape information (because TensorFlow does not know a priori what the shape of the returned numpy arrays will be)

It's an easy fix: In input_fn, add

data.set_shape( ... the shape of your data tensor ... )
label.set_shape( ... the shape of your label tensor ... )

before you return them in the dict.

Martin


To unsubscribe from this group and stop receiving emails from it, send an email to discuss+unsubscribe@tensorflow.org.

To post to this group, send email to dis...@tensorflow.org.

Roy

unread,
May 4, 2017, 2:48:36 PM5/4/17
to Discuss
Yaso!!
THANKS Martin!!

Roy

unread,
May 4, 2017, 2:53:12 PM5/4/17
to Discuss
Martin, following your discussion with Sebastian,
If you want I can arrange this simple example to serve as a use-example for others, as it has many of the components required for a full flow of the high-level modules

Martin Wicke

unread,
May 4, 2017, 3:35:12 PM5/4/17
to Roy, Discuss
I think it is a great example. A good first step would be to attach it to your StackOverflow question, it's likely that quite a few people will find it there.

Then, you could also extend the input_fn tutorial on the TensorFlow page. The source code for that is here: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/docs_src/get_started/input_fn.md

It would be very helpful to have an example of how to properly integrate a python reader. I do expect the need for this to go away once generator_input_fn is released (it exists at head).

Martin

To unsubscribe from this group and stop receiving emails from it, send an email to discuss+unsubscribe@tensorflow.org.

To post to this group, send email to dis...@tensorflow.org.
Reply all
Reply to author
Forward
0 new messages