is fit_generator supposed to slow?

2,547 views
Skip to first unread message

Loser

unread,
Apr 25, 2017, 7:12:37 PM4/25/17
to Keras-users
Is the fit_generator supposed to be slow?... 

I am currently trying to train given a large dataset. 
I made a data_generator so data batches can be feed - but extracting and processing takes time  and makes everything just become buggy. 


pickle_safe = true, seem to work a bit bit still very buggy?...

how is keras supposed to handle large dataset?

Daπid

unread,
Apr 26, 2017, 10:33:19 AM4/26/17
to Loser, Keras-users
On 26 April 2017 at 01:12, Loser <nof...@gmail.com> wrote:
Is the fit_generator supposed to be slow?... 

No.

I made a data_generator so data batches can be feed - but extracting and processing takes time  

You may need more workers, a more optimised pre-processing, a faster database, a faster drive, or saving them before hand. What is slow, exactly? Have you profiled it?
 
and makes everything just become buggy. 

Define "buggy".
 

pickle_safe = true, seem to work a bit bit still very buggy?...

how is keras supposed to handle large dataset?

I am training on a fairly large dataset, reading arrays from a HDF5 on the fly with a generator, and my GPU utilisation is at 90%. It can definitely be fast.

Loser

unread,
Apr 26, 2017, 12:01:29 PM4/26/17
to Keras-users, nof...@gmail.com
So my data_generator minimal processing (only reshaping),  I load my data, which are files are saved on my drive as .h5 file.

Like this is the generator: 

def train_generator(batch_size):
    while True:
        for input in train_files:
            #print input
            output = input.split("_")
            output[-1] = "output.h5"
            output = "_".join(output)
            #print output

            h5f = h5py.File(numpy_train_input+'/'+input, 'r')
            train_input = h5f['train_input'][:]
            h5f.close()

            h5f = h5py.File(numpy_train_output+'/'+output, 'r')
            train_output = h5f['train_output'][:]
            h5f.close()

            train_input = train_input.reshape((batch_size,splits,total_frames_with_deltas,window_height,3))
            train_input_list = np.split(train_input,33,axis=1)

            for i in range(len(train_input_list)):
                train_input_list[i] = train_input_list[i].reshape(batch_size,45,8,3)

            #print train_input_list[0].shape
            #print train_output.shape
            yield (train_input_list, train_output)


 So for instance a batch of 1000 examples takes 

1000/1000 [==============================] - 12166s - loss: 3.5786 - categorical_accuracy: 0.1081 - val_loss: 4.2263 - val_categorical_accuracy: 0.0500

to complete one epoch -  where i've before completed 16000 for 2000s . 

I haven't tried profiling... 

Another thing is starting the first epoch... that takes time aswell...

Isaac Gerg

unread,
Apr 26, 2017, 1:31:37 PM4/26/17
to Keras-users, nof...@gmail.com
You are reading from disk every call to the generator which will incur an overhead especially if your reading pattern is random.  I am not sure threading will buy you much here as the h5py lib i believe is like the GIL in that everything passes through the lib so you really dont get parallelism (you should try this though and hopefully prove me wrong).  You could make sure your generator is multiprocess safe and try setting pickle_safe to true and see if that helps the h5py lib locking issue.

Also, the first epoch is always slow because your model is compiling.  I am not sure what model.compile() does but at the first epoch, a keras model has a first look at the data input sizes (mainly the batch size) and has to build the GPU code from the underlying graph to begin execution.  The time it takes to do this is the wait you are seeing on the first epoch.

hope that helps,
isaac

Carlton Banks

unread,
Apr 26, 2017, 1:47:44 PM4/26/17
to Isaac Gerg, Keras-users
You are reading from disk every call to the generator which will incur an overhead especially if your reading pattern is random.  I am not sure threading will buy you much here as the h5py lib i believe is like the GIL in that everything passes through the lib so you really dont get parallelism (you should try this though and hopefully prove me wrong).  You could make sure your generator is multiprocess safe and try setting pickle_safe to true and see if that helps the h5py lib locking issue.

I am currently using pickle_safe = true and workers = 4.  H5py should  according to http://docs.h5py.org/en/latest/mpi.html work in parallel -  I am using version 2.7. 
So the generator should be thread safe. 


Also, the first epoch is always slow because your model is compiling.  I am not sure what model.compile() does but at the first epoch, a keras model has a first look at the data input sizes (mainly the batch size) and has to build the GPU code from the underlying graph to begin execution.  The time it takes to do this is the wait you are seeing on the first epoch.


Makes sense. 

hope that helps,
isaac

Isaac Gerg

unread,
Apr 26, 2017, 1:50:29 PM4/26/17
to Keras-users, isaac...@gergltd.com
Something else I've used to sanity check my hdf5 read rates is windows resource monitor (if you are using windows).

Daπid

unread,
Apr 26, 2017, 3:18:20 PM4/26/17
to Loser, Keras-users
So, you are taking 12 s/batch. How long does it take for the generator alone to yield each element? Time that.

I see that you have many small HDF5 files with small arrays. That is the worst structure for HDF5, it is designed with the use case of a few large arrays, all of them in the same file. My bet is that you are spending a lot of time opening the files (but I cannot know, test it!). A slightly better structure would be to embed each input and output in the same file. And an even better one, if all your inputs have the same shape, would be to stack them together and save them in a single, fat file. So, if your input has shape (10, 5), you stack them in an array of shape (1000, 10, 5) and take indexes from there (HDF5 supports this efficiently).

Another possibility is that your reading is slow because your drive is slow. How big are your chunks of data? Is the drive physically connected to the computer, or is it a network drive?

Also, unrelated to this issue, but you should shuffle the input files between epochs, so that they not always come in the same order.

--
You received this message because you are subscribed to the Google Groups "Keras-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to keras-users+unsubscribe@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/keras-users/077bee62-a666-4c6c-9a04-98a09608347a%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages