How to load a bunch of images for deep learning without making my computer freeze

662 views
Skip to first unread message

Felipe Damasceno

unread,
Mar 31, 2020, 6:15:47 PM3/31/20
to Discuss

I have a bunch of tiff images and my goal is to load those images into a numpy array to use it in my keras model in python 3. The problem is that when I turn my images into arrays, my computer froze. I tried to turn the images into array and save everything in a HDF5 format, now I tried to save only 1000 images and then I used the gc.collect() function to free some memory and then I did this same procedure until there was no images left, but it did not work as well.

So I would like to know an efficient way to get those images into my model. I have about 50.000 images in here. Can you help me? By the way, tensorflow does not support tiff images and my image has 8 bands and I would like to load all of them.

Lance Norskog

unread,
Mar 31, 2020, 6:37:36 PM3/31/20
to Felipe Damasceno, Discuss
The IOTensor API may be what you want. It supports TIFF:


I have no experience with it and don't know how it interacts with Keras.

Cheers,

Lance Norskog

On Tue, Mar 31, 2020 at 3:15 PM Felipe Damasceno <felipeda...@gmail.com> wrote:

I have a bunch of tiff images and my goal is to load those images into a numpy array to use it in my keras model in python 3. The problem is that when I turn my images into arrays, my computer froze. I tried to turn the images into array and save everything in a HDF5 format, now I tried to save only 1000 images and then I used the gc.collect() function to free some memory and then I did this same procedure until there was no images left, but it did not work as well.

So I would like to know an efficient way to get those images into my model. I have about 50.000 images in here. Can you help me? By the way, tensorflow does not support tiff images and my image has 8 bands and I would like to load all of them.

--
You received this message because you are subscribed to the Google Groups "Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss+u...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/discuss/afe365ea-b8de-496c-8360-f5818f3644ab%40tensorflow.org.


--
Lance Norskog
lance....@gmail.com
Redwood City, CA

Robert Lugg

unread,
Mar 31, 2020, 10:49:26 PM3/31/20
to Discuss
You might consider this dataprep flow:

TIFF image -> sharded TFRecords > datasets > network

So, this is saying to take n number of TIFF images and turn them into m number of .tfrecord files (Which they call 'shards'...just breaking what would be one big file up into many files).  Once they are put in a TFRecord, then TensorFlow has nice mechanisms to load a few of them at a time into your network.  All this seems unnecessary but it kicks in when you have large datasets (which you have).

If you have any ability to remove some of those 'bands' (which ML folks call 'channels') that would be a big deal.  If you can downsample or crop that would also help.

What you implement is the TIFF to TFRecords converter in numpy, pillow etc, call TF functions to write that structure to a TFRecord file(s).  Then on the other end, you need to write the tensorflow graph logic to convert a given element back to a TIFF.  It is involved.

As an alternative, you could read sets of TIFF files (referred to as a 'batch') from disk,  then feed it into the training network.  It wouldn't hurt to try this first just so you get used to what is going on.  But ultimately, I think you will want the TFRecord flow I described above.

There are also lots of experienced people in this group and they may have better ideas in the morning.

Robert Lugg

unread,
Mar 31, 2020, 10:55:28 PM3/31/20
to Discuss
I mistyped.  The following:

write the tensorflow graph logic to convert a given element back to a TIFF

should read something like:

write the tensorflow graph logic to convert a given element to an in-memory type compatible with TF Tensors (such as a numpy array).  There is no need to write to disk.
Reply all
Reply to author
Forward
0 new messages