Hi,
I posted a question on how to use the Dataset API on StackOverflow on Monday and it hasn't received any comments.
In short, I'm having trouble reading image files encoded in TFRecords. For this example, I've encoded the MNIST dataset into TFRecord files (one for training, one for validation, and one for testing). You can find a gist that reproduces my error plus links to the data files here
I have this sort of code working with the "old" file and batch-queue API. I'm curious, in general though, as to whether my approach here is philosophically correct. What is the recommended way to consume very large (many millions) of "image" datasets? My actual application domain is physics and it is very easy to put together, for example, HDF5 files that contain millions of images (they are very sparse), but not super efficient to consume them with TensorFlow. I've had decent success converting HDF5 files into TFRecord files, but I have to break the HDF5 files up into many TFRecords files (there is a practical limit of about 20k of my images to a TFRecord file, but it is easy to use the file APIs to loop over them, so that is fine).
Any thoughts on my specific question on SO and/or on my general recommendations question will be deeply appreciated.
Thanks!
pax
Gabe