How to make oversampling in Keras?

1,840 views
Skip to first unread message

Nad

unread,
Jul 17, 2018, 11:57:44 AM7/17/18
to Keras-users

I want to create a deep learning model to classify images. My dataset has around 400 classes and the classes have different number of images (15,20,30,40,60... images)

  • so, I will apply oversampling.. 

Is there a way to do oversampling using Keras? or any way rather than do it manually?

  • When should I apply oversampling before or after splitting the images into training, testing, validation sets?

Thank you

Sergey O.

unread,
Jul 17, 2018, 12:27:29 PM7/17/18
to Nad, Keras-users
By oversampling, do you mean you want to roughly sample the same number of images for each class (even if that means repeating some images from time to time)?

If so, you can use the fit_geneator function, where you randomly pick a class, and then randomly pick an image from the class

 For exampling:
assuming "data" is a list of lists where the first dimension is the class, and the second dimension is a list of images

def get_data():
  while True:
    c = np.random.randint(len(data))
    i = np.random.randint(len(data[c]))
    yield data[c][i], labels[c][i]

model.fit_generator(get_data(), samples_per_epoch=1000)

--
You received this message because you are subscribed to the Google Groups "Keras-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to keras-users+unsubscribe@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/keras-users/92dc7b28-a2dc-4d67-939c-726f6be74421%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Dennis S

unread,
Jul 17, 2018, 2:14:28 PM7/17/18
to Sergey O., Nad, Keras-users
I work plenty with NN but not a lot with image processing. Does 400 categories strike anyone else as being a lot? Is that common in handling images???

Thanks

Dennis
To unsubscribe from this group and stop receiving emails from it, send an email to keras-users...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/keras-users/CAO4rxgNYz3AWAh4Z1M-fOspfUhR4KM4%3DadMqi6b_B9ManX_Zeg%40mail.gmail.com.

Nad

unread,
Jul 17, 2018, 2:18:27 PM7/17/18
to Keras-users
Thank you sokrypton for replying. 
Yes, I want all classes have the same number of images approximately.  
Do you mean if I use the code I do not have to delicate images?
When the while loop will stop?


On Tuesday, July 17, 2018 at 7:27:29 PM UTC+3, sokrypton wrote:
By oversampling, do you mean you want to roughly sample the same number of images for each class (even if that means repeating some images from time to time)?

If so, you can use the fit_geneator function, where you randomly pick a class, and then randomly pick an image from the class

 For exampling:
assuming "data" is a list of lists where the first dimension is the class, and the second dimension is a list of images

def get_data():
  while True:
    c = np.random.randint(len(data))
    i = np.random.randint(len(data[c]))
    yield data[c][i], labels[c][i]

model.fit_generator(get_data(), samples_per_epoch=1000)
On Tue, Jul 17, 2018 at 11:57 AM, Nad <alay...@gmail.com> wrote:

I want to create a deep learning model to classify images. My dataset has around 400 classes and the classes have different number of images (15,20,30,40,60... images)

  • so, I will apply oversampling.. 

Is there a way to do oversampling using Keras? or any way rather than do it manually?

  • When should I apply oversampling before or after splitting the images into training, testing, validation sets?

Thank you

--
You received this message because you are subscribed to the Google Groups "Keras-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to keras-users...@googlegroups.com.

Nad

unread,
Jul 17, 2018, 2:22:40 PM7/17/18
to Keras-users
Sorry I did not understand your questions


On Tuesday, July 17, 2018 at 9:14:28 PM UTC+3, DMS wrote:
I work plenty with NN but not a lot with image processing. Does 400 categories strike anyone else as being a lot? Is that common in handling images???

Thanks

Dennis

On Jul 17, 2018, at 11:27 AM, Sergey O. <kings...@gmail.com> wrote:

By oversampling, do you mean you want to roughly sample the same number of images for each class (even if that means repeating some images from time to time)?

If so, you can use the fit_geneator function, where you randomly pick a class, and then randomly pick an image from the class

 For exampling:
assuming "data" is a list of lists where the first dimension is the class, and the second dimension is a list of images

def get_data():
  while True:
    c = np.random.randint(len(data))
    i = np.random.randint(len(data[c]))
    yield data[c][i], labels[c][i]

model.fit_generator(get_data(), samples_per_epoch=1000)
On Tue, Jul 17, 2018 at 11:57 AM, Nad <alay...@gmail.com> wrote:

I want to create a deep learning model to classify images. My dataset has around 400 classes and the classes have different number of images (15,20,30,40,60... images)

  • so, I will apply oversampling.. 

Is there a way to do oversampling using Keras? or any way rather than do it manually?

  • When should I apply oversampling before or after splitting the images into training, testing, validation sets?

Thank you

--
You received this message because you are subscribed to the Google Groups "Keras-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to keras-users...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/keras-users/92dc7b28-a2dc-4d67-939c-726f6be74421%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Sergey O.

unread,
Jul 17, 2018, 2:50:44 PM7/17/18
to Nad, Keras-users
Yep! No need to duplicate the images.
"samples_per_epoch" controls how many samples are generated (when the function exits the while loop).

I think Dannis is worried that you might not have enough data, causing the NN to overfit. 400 is a LOT of categories, given you only have a few examples for each.

PS, alternative way to potentially fix the class imbalance issue (different number of samples per class), is to use "class_weight". Where you can up-weight the class that has less samples...

To unsubscribe from this group and stop receiving emails from it, send an email to keras-users+unsubscribe@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/keras-users/db4f4cdf-41d5-4f61-8244-5968b10198f9%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages