On 23 July 2016 at 11:50, <
facel...@gmail.com> wrote:
> I'm using vgg19 to train about 150,000 images and classify them into 20
> classes. I don't know if the vgg19 is the most appropriate model that I can
> use. Because my amount of data is not as much as vgg19 original training
> data. So I'm not sure if there is any better model can train 150,000 images
> to 20 classes?
If your images are natural images, similar to those that VGG was
designed and trained for, you could take the pretrained VGG model and
tune the top layers to your data. Here is a blog post that describes
how to do it:
http://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html
A simpler alternative would be to just train from scratch a simpler
model and see where it takes you.
The advantage here is that you can more easily engineer your
architecture to your data, for example, using branched architectures
to get different perspectives.
> On the other hand, I met a memory error when I converted all images into
> numpy array. My CPU ram is 64G, and this error seems my ram is still not big
> enough. Is that possible? The size of my image is 3*187*100.
The total amount of data is:
np.empty((3, 100, 187), dtype=np.float32).nbytes * 150000*1e-9
33.66
So, it should fit. I can think of three reasons why it wouldn't:
- You are loading them as float64
- You are loading all the images in memory and then stacking them,
so, at some point, storing them twice, that requires 67.
- Your Python is 32 bits, that is limited to 4.5 GB of memory per process.
The second one can be solved by preallocating the full array, and
loading each image on it.