tf Estimator input_fn using external Python reader

Roy

unread,

May 4, 2017, 12:41:33 PM5/4/17

to Discuss

Trying to run the tf Estimator (high-level module) using input_fn, which is fed by an external, Python reader.
To simplify the question, I created a simulating code that anyone can run. attached, and pasted below.
How do I make the input_fn pull NEW 'samples' from the external-reader on each step?

Please note:

the architecture is not important. took a stupid simple net.
the external reader outputs a dictionary of numpy matrices.
the input_fn is using this reader.
In order to verify that the reader "pulls new values", I both

save the recent value to self.status (should be > 1.0)
save a summary, to be viewed in tensorboard.

Thanks for the helpers!

import tensorflow as tf
import numpy as np
modekeys = tf.contrib.learn.ModeKeys
tf.logging.set_verbosity(tf.logging.DEBUG)

class inputExample:
    def __init__(self):
        self.status = 0.0 # tracing which value was recently 'pushed' to the net
        self.model_dir = 'temp_dir'
        self.get_estimator()

    def input_fn(self):
        batch_data = self.reader()
        for kk in batch_data.keys():
            batch_data[kk] = tf.constant(batch_data[kk])
        features_dict = dict(data=batch_data.pop('data'))
        labels_dict = batch_data
        return features_dict, labels_dict

    def model_fn(self, features, labels, mode):
        features_in = features['data']
        labels_in = labels['labels']
        pred_layer = tf.layers.conv2d(name='pred', inputs=features_in, filters=1, kernel_size=3)
        tf.summary.scalar(name='label', tensor=tf.squeeze(labels_in))
        tf.summary.scalar(name='pred', tensor=tf.squeeze(pred_layer))
        loss = None
        if mode != modekeys.INFER:
            loss = tf.losses.mean_squared_error(labels=labels_in, predictions=pred_layer)
        train_op = None
        if mode == modekeys.TRAIN:
            train_op = tf.contrib.layers.optimize_loss(
                loss=loss,
                learning_rate = 0.01,
                optimizer = 'SGD',
                global_step = tf.contrib.framework.get_global_step()
            )
        predictions = {'estim_exp': pred_layer}
        return tf.contrib.learn.ModelFnOps(mode=mode, predictions=predictions, loss=loss, train_op=train_op)

    def reader(self):
        self.status += 1
        return dict(
            data = np.ones([1,3,3,1], dtype=np.float32)*self.status,
            labels = np.exp(np.ones([1,1,1,1], dtype=np.float32)*self.status)
        )

    def get_estimator(self):
        self.Estimator = tf.contrib.learn.Estimator(
            model_fn = self.model_fn,
            model_dir = self.model_dir,
            config = tf.contrib.learn.RunConfig(
                save_checkpoints_steps = 10,
                save_summary_steps = 10,
                save_checkpoints_secs = None
            )
        )

if __name__ == '__main__':
    ex = inputExample()
    ex.Estimator.fit(input_fn=ex.input_fn)

input_fn_example.py

Martin Wicke

unread,

May 4, 2017, 12:54:29 PM5/4/17

to Roy, Discuss

You have to wrap your reader function in a py_func (https://www.tensorflow.org/api_docs/python/tf/py_func). Make sure to set stateful=true.

As it is now, reader() will be called exactly once. The input_fn is called once when you call fit(), to create a graph. The graph is then executed in a loop.

By wrapping your reader in a py_func, you create an op (once) which is then run in the loop as part of the graph.

--
You received this message because you are subscribed to the Google Groups "Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss+unsubscribe@tensorflow.org.
To post to this group, send email to dis...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/discuss/152ef246-99e0-4634-8453-70dc5a6733b0%40tensorflow.org.

Roy

unread,

May 4, 2017, 1:07:28 PM5/4/17

to Discuss

Thanks Martin.

I tried to add this py_func following your recommendation, however it expects numpy arrays as both the inputs and outputs, whereas my reader "spits" a dictionary with many keys, for all the features and labels of each (random) sample. In the simplified example, one key is 'data' which is 3*3 array, and another key is 'labels' which is a scalar.

So I didn't yet understand how to overcome this gap ...

To unsubscribe from this group and stop receiving emails from it, send an email to discuss+u...@tensorflow.org.

Roy

unread,

May 4, 2017, 1:14:15 PM5/4/17

to Discuss

(as far as i understand, tf expects us to shift to giving input through input_fn dictionaries, rather than through x y inputs. which makes more sense as it gives the flexibility to work with any type of input and output, not only columns)

Martin Wicke

unread,

May 4, 2017, 1:16:09 PM5/4/17

to Roy, Discuss

You don't need any inputs to the reader, so that should be simple.

I'm assuming your output would be a dict with a fixed set of keys (say, data1, data2, and labels). So your py_func would return three numpy arrays, and you have to agree with your consumer (in your input_fn implementation) that you will attach the first output to the key data1, the second to data2, and the third to labels (in a different dict).

In other words, py_func can only return a fixed set of outputs, but you can make them into a dict in input_fn.

To unsubscribe from this group and stop receiving emails from it, send an email to discuss+unsubscribe@tensorflow.org.

To post to this group, send email to dis...@tensorflow.org.

To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/discuss/81c59b00-4fb1-484f-943e-4cd130468646%40tensorflow.org.

Roy

unread,

May 4, 2017, 1:30:17 PM5/4/17

to Discuss

Thanks again.
This is what I changed with the input_fn, what is wrong?

def input_fn(self):
    data, labels = tf.py_func(func=self.input_fn_np(), inp=[], Tout=[tf.float32, tf.float32], stateful=True)
    return dict(data=data), dict(labels=labels)

def input_fn_np(self):
    batch_data = self.reader()
    return batch_data['data'], batch_data['labels']

To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/discuss/81c59b00-4fb1-484f-943e-4cd130468646%40tensorflow.org.

Martin Wicke

unread,

May 4, 2017, 1:37:13 PM5/4/17

to Roy, Discuss

You call self.input_fn_np -- you need to pass the function reference, not its result.

To unsubscribe from this group and stop receiving emails from it, send an email to discuss+unsubscribe@tensorflow.org.

To post to this group, send email to dis...@tensorflow.org.

To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/discuss/098ffc3a-9650-465c-9d09-476e5e687060%40tensorflow.org.

Roy

unread,

May 4, 2017, 1:49:45 PM5/4/17

to Discuss

fixed. still not working. i attach the updated code so errors can be re-created. current error: Cannot take the length of Shape with unknown rank.

One thing I can think of - When using tf.py_func with an output which is a list of numpy 4D arrays, is this the right way to define Tout :

... Tout=[tf.float32, tf.float32] ...

?

To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/discuss/098ffc3a-9650-465c-9d09-476e5e687060%40tensorflow.org.

input_fn_example.py

Martin Wicke

unread,

May 4, 2017, 2:14:06 PM5/4/17

to Roy, Discuss

Do you have a stack trace?

To unsubscribe from this group and stop receiving emails from it, send an email to discuss+unsubscribe@tensorflow.org.

To post to this group, send email to dis...@tensorflow.org.

To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/discuss/9e23bf37-f766-4249-a924-1c617d4ef705%40tensorflow.org.

Roy

unread,

May 4, 2017, 2:22:46 PM5/4/17

to Discuss

I guess no, as I don't know what you refer to.

To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/discuss/9e23bf37-f766-4249-a924-1c617d4ef705%40tensorflow.org.

Martin Wicke

unread,

May 4, 2017, 2:33:10 PM5/4/17

to Roy, Discuss

The problem appears to be the following: The tensors that py_func returns do not have static shape information (because TensorFlow does not know a priori what the shape of the returned numpy arrays will be)

It's an easy fix: In input_fn, add

data.set_shape( ... the shape of your data tensor ... )

label.set_shape( ... the shape of your label tensor ... )

before you return them in the dict.

Martin

To unsubscribe from this group and stop receiving emails from it, send an email to discuss+unsubscribe@tensorflow.org.

To post to this group, send email to dis...@tensorflow.org.

To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/discuss/3bcc24ff-a9e4-448e-bad6-349f73d877c2%40tensorflow.org.

Roy

unread,

May 4, 2017, 2:48:36 PM5/4/17

to Discuss

Yaso!!
THANKS Martin!!

To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/discuss/3bcc24ff-a9e4-448e-bad6-349f73d877c2%40tensorflow.org.

Roy

unread,

May 4, 2017, 2:53:12 PM5/4/17

to Discuss

Martin, following your discussion with Sebastian,
If you want I can arrange this simple example to serve as a use-example for others, as it has many of the components required for a full flow of the high-level modules

To view this discussion on the web visit <a href="https://groups.google.com/a/tensorflow.org/d/msgid/discuss/3bcc24ff-a9e4-448e-bad6-349f73d877c2%40tensorflow.org?utm_medium=email&utm_source=footer" target="_blank" rel="nofollow" onmousedown="this.href='https://groups.google.com/a/tensorflow.org/d/msgid/discuss/3bcc24ff-a9e4-448e-bad6-349f73d877c2%40tensorflow.org?utm_medium\x3demail\x26utm_source\x3dfooter';return true;" onclick="this.href='https://groups.google.com/a/tensorflow.org/d/msgid/discuss/3bcc24ff-a9e4

Martin Wicke

unread,

May 4, 2017, 3:35:12 PM5/4/17

to Roy, Discuss

I think it is a great example. A good first step would be to attach it to your StackOverflow question, it's likely that quite a few people will find it there.

Then, you could also extend the input_fn tutorial on the TensorFlow page. The source code for that is here: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/docs_src/get_started/input_fn.md

It would be very helpful to have an example of how to properly integrate a python reader. I do expect the need for this to go away once generator_input_fn is released (it exists at head).

Martin

To unsubscribe from this group and stop receiving emails from it, send an email to discuss+unsubscribe@tensorflow.org.

To post to this group, send email to dis...@tensorflow.org.

To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/discuss/a919217d-07e1-42db-bc05-c91f551bbbbc%40tensorflow.org.

Reply all

Reply to author

Forward