Misfit (Over or Under fit) with Alexnet

Neelesh

unread,

Nov 15, 2015, 6:36:13 PM11/15/15

to torch7

Hello all,

Wanted to explore and experience multi-gpu training

Came across https://github.com/soumith/imagenet-multiGPU.torch

So I kicked off by following the instructions and ventured into training the Alexnet Model on imagenet dataset against 2 Nvidia GPUs

I seem to see a huge gap between training and validation loss. (see the snapshot). Guess its a misfit

Has any one else experienced this ? Could anyone throw some light on what's causing this ?

Neelesh

unread,

Nov 19, 2015, 1:01:17 AM11/19/15

to torch7

Have none experienced this ? I am curios to know the cause of this. Is it because of the Alexnet Model parameters ?

soumith

unread,

Nov 19, 2015, 1:59:10 AM11/19/15

to torch7 on behalf of Neelesh

what's the accuracy like?

Here's the full log of my imagenet run on alexnetowtbn

https://gist.github.com/soumith/4766a592cb3645035ef8

it has train loss reported per epoch as train_loss

it has validation loss reported per epoch as test_loss

--
You received this message because you are subscribed to the Google Groups "torch7" group.
To unsubscribe from this group and stop receiving emails from it, send an email to torch7+un...@googlegroups.com.
To post to this group, send email to tor...@googlegroups.com.
Visit this group at http://groups.google.com/group/torch7.
For more options, visit https://groups.google.com/d/optout.

Neelesh

unread,

Nov 19, 2015, 2:53:41 AM11/19/15

to torch7

I had used the Alexnet Model and not the AlexnetOWT. Let me check out with that and get back here

Thanks
Neelakandan

On Thursday, November 19, 2015 at 12:29:10 PM UTC+5:30, smth chntla wrote:

what's the accuracy like?

Here's the full log of my imagenet run on alexnetowtbn

https://gist.github.com/soumith/4766a592cb3645035ef8

it has train loss reported per epoch as train_loss
it has validation loss reported per epoch as test_loss

Neelesh

unread,

Nov 19, 2015, 7:06:59 AM11/19/15

to torch7

In your logs I see that the training accuracy and test accuracy after 29 epochs are 69.21 and 56.502 respectively. with a difference of 13 %

Can we consider this difference as an overfit? If no what should be the threshold beyond which we can claim the model overfits

soumith

unread,

Nov 19, 2015, 9:52:25 AM11/19/15

to torch7 on behalf of Neelesh

yes this is an overfit. All imagenet models overfit.

On Thu, Nov 19, 2015 at 7:06 AM, Neelesh via torch7 <torch7+APn2wQfz7MHLXlgHmhDG5Le3E...@googlegroups.com> wrote:

In your logs I see that the training accuracy and test accuracy after 29 epochs are 69.21 and 56.502 respectively. with a difference of 13 %

Can we consider this difference as an overfit? If no what should be the threshold beyond which we can claim the model overfits

On Thursday, November 19, 2015 at 1:23:41 PM UTC+5:30, Neelesh wrote:

I had used the Alexnet Model and not the AlexnetOWT. Let me check out with that and get back here

Thanks
Neelakandan

On Thursday, November 19, 2015 at 12:29:10 PM UTC+5:30, smth chntla wrote:

what's the accuracy like?

Here's the full log of my imagenet run on alexnetowtbn

https://gist.github.com/soumith/4766a592cb3645035ef8

it has train loss reported per epoch as train_loss
it has validation loss reported per epoch as test_loss

On Thu, Nov 19, 2015 at 1:01 AM, Neelesh via torch7 <torch7+APn2wQfz7MHLXlgHmhDG5Le3E...@googlegroups.com> wrote:
Have none experienced this ? I am curios to know the cause of this. Is it because of the Alexnet Model parameters ?

On Monday, November 16, 2015 at 5:06:13 AM UTC+5:30, Neelesh wrote:
Hello all,

Wanted to explore and experience multi-gpu training

Came across https://github.com/soumith/imagenet-multiGPU.torch

So I kicked off by following the instructions and ventured into training the Alexnet Model on imagenet dataset against 2 Nvidia GPUs

I seem to see a huge gap between training and validation loss. (see the snapshot). Guess its a misfit

Has any one else experienced this ? Could anyone throw some light on what's causing this ?

--
You received this message because you are subscribed to the Google Groups "torch7" group.
To unsubscribe from this group and stop receiving emails from it, send an email to torch7+un...@googlegroups.com.
To post to this group, send email to tor...@googlegroups.com.
Visit this group at http://groups.google.com/group/torch7.
For more options, visit https://groups.google.com/d/optout.

Neelesh

unread,

Nov 19, 2015, 12:37:08 PM11/19/15

to torch7

How are people using these models in imagenet challenge ? Any suggestions to mitigate this ?
Not sure may be the challenge is for quicker prediction with higher accuracy.
I keep seeing the term pretrained models in literature. What does that mean ? Could any one explain it here?

On Thursday, November 19, 2015 at 8:22:25 PM UTC+5:30, smth chntla wrote:

yes this is an overfit. All imagenet models overfit.

Neelesh

unread,

Nov 25, 2015, 3:48:43 AM11/25/15

to torch7

Any suggestions or thoughts to overcome this ?

Vislab

unread,

Nov 25, 2015, 7:30:55 PM11/25/15

to torch7

An empirical observation about cnn's and accuracy is that the bigger the network, the less it overfits. And pretrained means that a network is already trained and ready to use (usually some networks take days or weeks to train, so its best to use a pre-trained network instead of training one from scratch)

Neelesh

unread,

Dec 12, 2015, 5:06:44 AM12/12/15

to torch7

Couldn't move any further here. Has anyone able to get through this issue ?

wang xiaoqun

unread,

Dec 15, 2015, 11:09:32 PM12/15/15

to torch7

I did imagenet training on alexnetowtbn. All the settings were followed this link. But my result is much worse than yours.

I have two question on the running parameters.

(1) "batchSize":128.

Is it the batch size for 4GPUs (32images / GPU) or for each GPU (1024 images in total)?

(2) "learningRate":0.25.

Is it too large?

Since the LR starts from 0.01 in https://github.com/soumith/imagenet-multiGPU.torch/blob/master/train.lua .

On Thursday, November 19, 2015 at 2:59:10 PM UTC+8, smth chntla wrote:

what's the accuracy like?

Here's the full log of my imagenet run on alexnetowtbn

https://gist.github.com/soumith/4766a592cb3645035ef8

it has train loss reported per epoch as train_loss
it has validation loss reported per epoch as test_loss

Neelesh

unread,

Jan 11, 2016, 2:43:08 AM1/11/16

to torch7

Hello Soumith and all

Are there ways to regularize these Imagenet models to address overfit ?

Any regularization parameter that one could tune to address the overfitting ? Does Dropout help ?

With overfitting in place, the training remains useless right.

There was a suggestion earlier to use pretrained models.

Has anybody done that ? Could you provide the steps to use pretrained models on this multi-gpu example ?

Reply all

Reply to author

Forward