Where to find Street View House Number example for CNN

rob...@perchinteractive.com

unread,

Mar 10, 2016, 3:27:17 PM3/10/16

to Discuss

At the end of the Convolutional Neural Network example there is an exercise to modify the program to use the Street View House Numbers (SVHN) data set. I have been working on this for the past few days and am finding it quite difficult to modify, especially since the data formats are different and the SVHN data set has variably sized images.

This example would be extremely valuable to build an image classifier for use in a production environment. Any ideas where I could get this completed exercise?

Yaroslav Bulatov

unread,

Mar 10, 2016, 4:33:56 PM3/10/16

to rob...@perchinteractive.com, Discuss

SVHN dataset has 32x32, it's Format #2 on official page

It's Matlab format which can be loaded with scipy in Python

On Thu, Mar 10, 2016 at 12:27 PM, <rob...@perchinteractive.com> wrote:

At the end of the Convolutional Neural Network example there is an exercise to modify the program to use the Street View House Numbers (SVHN) data set. I have been working on this for the past few days and am finding it quite difficult to modify, especially since the data formats are different and the SVHN data set has variably sized images.

This example would be extremely valuable to build an image classifier for use in a production environment. Any ideas where I could get this completed exercise?

--
You received this message because you are subscribed to the Google Groups "Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss+u...@tensorflow.org.
To post to this group, send email to dis...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/discuss/9c33c20c-70d5-4c42-87d2-8a5bbff73f9e%40tensorflow.org.

Robert Grzesik

unread,

Mar 11, 2016, 5:37:59 PM3/11/16

to Discuss

Ok so I've made some headway on this and I'm using the scipy.io.loadmat function, but can't seem to figure out how to convert that into a format readable by TensorFlow. From what I can tell so far loadmat returns a dictionary with "X" and "y" keys, where y is the labels (straight forward) and X has the image data. I've searched google for some time now but I can't find how to convert X into a format readable by TensorFlow. Any ideas how to do that? Here's some code I'm messing around with:

file_contents = sio.loadmat(file_name)

  """ has all the classes for each object (aka label) """
  print(file_contents['y'][1][0])

  """ has all the RGB data for each object (aka uint8image?) """
  print(file_contents['X'][1][0][0][0])

Here's the code for the CIFAR example, which I presume is similar to what must be done with the matlab data.

class CIFAR10Record(object):
    pass
  result = CIFAR10Record()

  # Dimensions of the images in the CIFAR-10 dataset.
  # See http://www.cs.toronto.edu/~kriz/cifar.html for a description of the
  # input format.
  label_bytes = 1  # 2 for CIFAR-100
  result.height = 32
  result.width = 32
  result.depth = 3
  image_bytes = result.height * result.width * result.depth
  # Every record consists of a label followed by the image, with a
  # fixed number of bytes for each.
  record_bytes = label_bytes + image_bytes

  # Read a record, getting filenames from the filename_queue.  No
  # header or footer in the CIFAR-10 format, so we leave header_bytes
  # and footer_bytes at their default of 0.
  reader = tf.FixedLengthRecordReader(record_bytes=record_bytes)
  result.key, value = reader.read(filename_queue)

  # Convert from a string to a vector of uint8 that is record_bytes long.
  record_bytes = tf.decode_raw(value, tf.uint8)

  # The first bytes represent the label, which we convert from uint8->int32.
  result.label = tf.cast(
      tf.slice(record_bytes, [0], [label_bytes]), tf.int32)

  # The remaining bytes after the label represent the image, which we reshape
  # from [depth * height * width] to [depth, height, width].
  depth_major = tf.reshape(tf.slice(record_bytes, [label_bytes], [image_bytes]),
                           [result.depth, result.height, result.width])
  # Convert from [depth, height, width] to [height, width, depth].
  result.uint8image = tf.transpose(depth_major, [1, 2, 0])

  return result

mukul arora

unread,

Mar 27, 2016, 4:57:50 AM3/27/16

to Discuss, rob...@perchinteractive.com

On Friday, March 11, 2016 at 1:57:17 AM UTC+5:30, rob...@perchinteractive.com wrote:

At the end of the Convolutional Neural Network example there is an exercise to modify the program to use the Street View House Numbers (SVHN) data set. I have been working on this for the past few days and am finding it quite difficult to modify, especially since the data formats are different and the SVHN data set has variably sized images.

This example would be extremely valuable to build an image classifier for use in a production environment. Any ideas where I could get this completed exercise?

Hi I am trying the same to implement SVHN classifier using CNN in Tensorflow.
I tried for the following the code: https://github.com/codemukul95/SVHN-classification-using-Tensorflow
My process followed: Architecture code similar to MNIST using CNN not CIFAR-10 as I found CIFAR-10 tutorial a bit tough to understand.
Challenges I am facing very low accuracy 45%
Any suggestions regarding it?

Yuxin Wu

unread,

Mar 27, 2016, 11:51:40 AM3/27/16

to Discuss, rob...@perchinteractive.com

I wrote an example here with my own TF-based frontend called tensorpack: https://github.com/ppwwyyxx/tensorpack/blob/master/examples/svhn_digit_convnet.py.

The project is still not polished for serious use, but you can take a look at the architecture etc.

With simple augmentation it reaches about 3% validation error, and with the slow "GaussianDeform" augmentation I invented it can reach 2.7~2.8%.

One idea is that I resize the input image to 40x40.

Reply all

Reply to author

Forward