How to load trained weights?

8,256 views
Skip to first unread message

Sravan Kumar Reddy Javaji

unread,
Feb 13, 2015, 4:12:31 PM2/13/15
to tor...@googlegroups.com
Hello Everyone,

I am saving the Weights of the model after 10 epochs as shown below,

torch.save("Model_Weights.t7", Weights)   -> Accuracy is around 78%

And I have written one more code which loads Weights file (as shown below) and continued the training.

local model = require('./Models/' .. opt.network)
local weights, gradients = model:getParameters()
weights=torch.load("Model_Weights.t7")

But my initial accuracy starts from 30%, I thought accuracy will start from 78%, as I loaded the previously saved weights.

Also I didn't understand why sometimes we will save only weights instead of entire model. Is it just to save the space?

Could someone please let me know where I did wrong. Thanks for your time and help.

-
Regards,
Sravan



Jonathan Tompson

unread,
Feb 13, 2015, 4:35:45 PM2/13/15
to torch7 on behalf of Sravan Kumar Reddy Javaji
Sravan,

If you save the network using torch.save() the weights will definitely be saved as they are in memory.  My guess is that there's a bug in your evaluation code somewhere higher up.  Are you absolutely sure you're repeating the exact evaluation of your model.

Alternatively, if you're saving the weights but you're not saving the optim data, then after the first resumed epoch it's possible that very odd things could occur (because the optimization may take a step in weight space that destroys performance).  If you're using momentum at all then you absolutely need to save the optim data to resume training (otherwise the optimizer will restart in a different state).

I hope that helps.



--
You received this message because you are subscribed to the Google Groups "torch7" group.
To unsubscribe from this group and stop receiving emails from it, send an email to torch7+un...@googlegroups.com.
To post to this group, send email to tor...@googlegroups.com.
Visit this group at http://groups.google.com/group/torch7.
For more options, visit https://groups.google.com/d/optout.

Sravan Kumar Reddy Javaji

unread,
Feb 13, 2015, 5:24:08 PM2/13/15
to tor...@googlegroups.com
Hello Jonathan,

Yes, I am using momentum but I am not saving optim state, probably that’s the issue. I will try to save and load optim state and check the accuracy.

Now I am trying to save the model and load it back in different file as shown below.

torch.save("model.net", model)

local model =torch.load("model.net")

But I am getting the below error. What may be the reason,

/home/ubuntu/torch/install/share/lua/5.1/torch/File.lua:227: unknown Torch class <nn.Sequential>
stack traceback:
    [C]: in function 'error'
    /home/ubuntu/torch/install/share/lua/5.1/torch/File.lua:227: in function 'readObject'
    /home/ubuntu/torch/install/share/lua/5.1/torch/File.lua:271: in function 'load'
    main2.lua:39: in main chunk
    [C]: in function 'dofile'
    ...untu/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:131: in main chunk
    [C]: at 0x00406170

Please let me know. Thanks for your help.

-
Thanks,
Regards

Jonathan Tompson

unread,
Feb 13, 2015, 5:36:24 PM2/13/15
to torch7 on behalf of Sravan Kumar Reddy Javaji
If I was to guess, I would say that you're not doing a "require 'nn'" before loading.

Sravan Kumar Reddy Javaji

unread,
Feb 13, 2015, 6:06:02 PM2/13/15
to tor...@googlegroups.com
Thanks, adding "require" statement corrected the error.

Size of my model is 3 GB. Now when I try to upload it, it is giving "out of memory" error :(

I reduced mini batch size from 128 to 32 but still I am getting out of memory error. I know that saving model will save all other information like gradInputs, output etc. And from the below link, I understood that I need to save only bias and weights to reduce the size of network.

https://github.com/torch/DEPRECEATED-torch7-distro/issues/47

As I am using momentum, do I need to save optim state also along with weights and bias? Please let me know.


-
Regards,
Sravan


On Friday, February 13, 2015 at 2:36:24 PM UTC-8, Jonathan Tompson wrote:
If I was to guess, I would say that you're not doing a "require 'nn'" before loading.

Jonathan Tompson

unread,
Feb 13, 2015, 6:14:19 PM2/13/15
to torch7 on behalf of Sravan Kumar Reddy Javaji
I just posted a new code snippet to that old issue for you.  It describes how to zero the tensors in a model before saving them to disk to reduce on disk filesize.

I would say that you only need to save the optim state if you want to continue training using SGD from where you left off.  If you just want to save the model in order to run FPROP through it at a later date then you don't need it.  Does that make sense?

Maybe have a look in the sgd method in optim to try and understand what state it actually uses and what state you will need for restarting training.

Sravan Kumar Reddy Javaji

unread,
Feb 13, 2015, 6:21:38 PM2/13/15
to tor...@googlegroups.com
Oh, now I got it. Actually I just want to run FPROP using the saved weights.

I felt that the result I got is not accurate. So I started resuming training in the same code, to check whether I have uploaded weights correctly or not.

Now I clearly understood that, to resume SGD training we may need to save both weights and optim state, but to just classify one image, then just saving the weights is sufficient. Right?

Thanks for your time Jonathan :)

-
Regards,
Sravan



On Friday, February 13, 2015 at 3:14:19 PM UTC-8, Jonathan Tompson wrote:
I just posted a new code snippet to that old issue for you.  It describes how to zero the tensors in a model before saving them to disk to reduce on disk filesize.

I would say that you only need to save the optim state if you want to continue training using SGD from where you left off.  If you just want to save the model in order to run FPROP through it at a later date then you don't need it.  Does that make sense?

Maybe have a look in the sgd method in optim to try and understand what state it actually uses and what state you will need for restarting training.

Sravan Kumar Reddy Javaji

unread,
Feb 18, 2015, 7:49:53 PM2/18/15
to tor...@googlegroups.com
Hello Jonathan,

I am not getting expected results when I use pre-trained weights. Just for cross check, I loaded the pre trained weights and ran it on whole valuation data (which I used earlier for training the model). I got error rate of 98%, which is not the expected. Before saving the model, error rate was 72%, so I believe either weights aren't saved properly or not loaded properly or probably just the weights is not sufficient to load pre-trained model state.

I am saving the weights as shown below,

torch.save(weights_filename, Weights)

Loading the weights as shown below,

torch.setdefaulttensortype('torch.CudaTensor)  
local model = require ('AlexNet')
local weights, gradients = model:getParameters()
weights = torch.load(weights_filename)


I know that setting defaulttensortype to Cuda is not a good idea. but if I remove this, entire system gets hanged and the system will remain in hang status for more than 10 minutes until it loads entire weights. Size of weights file is not so big, its just 280 MB.

Please let me know what am I missing here. Thanks for your time and help

-
Regards,
Sravan

soumith

unread,
Feb 18, 2015, 9:05:17 PM2/18/15
to torch7 on behalf of abes

Sravan, your code to load is wrong.

You have to do:
weights: copy(torch.load(...))

Jonathan Tompson

unread,
Feb 18, 2015, 10:45:07 PM2/18/15
to torch7 on behalf of smth chntla
Why don't you just serialize the entire model?  Sorry, I thought that's what you were doing.

Sravan Kumar Reddy Javaji

unread,
Feb 19, 2015, 1:00:08 AM2/19/15
to tor...@googlegroups.com
Thanks Soumith :)

Yes Jonathan, I used your code to clean up the model and save it. It is working very fine. The size of the model also reduced from 3 GB to 500 MB after using your clean model logic. I have some pre-trained weights, so I tried loading them, then I faced this issue. Your cleanup code is very good :) Thanks Jonathan.

-
Regards,
Sravan


On Wednesday, February 18, 2015 at 7:45:07 PM UTC-8, Jonathan Tompson wrote:
Why don't you just serialize the entire model?  Sorry, I thought that's what you were doing.
On Wed, Feb 18, 2015 at 9:05 PM, torch7 on behalf of smth chntla <tor...@googlegroups.com> wrote:

Sravan, your code to load is wrong.

You have to do:
weights: copy(torch.load(...))

Paweł S

unread,
May 6, 2015, 10:03:46 AM5/6/15
to tor...@googlegroups.com
Thanks for sharing code Jonathan, I was about to write something similar, saved my afternoon (decreases size of output file 5x).

I would never need it if I hadn't encountered weird error. I was not able to load a model as "cude run out of memory". 
I guess it is because of how torch.load works. With cleaned up network it works perfectly.

Sravan Kumar Reddy Javaji

unread,
May 8, 2015, 7:23:13 PM5/8/15
to tor...@googlegroups.com
Hello Jonathan/Soumith,

I have to continue training in future also I want to test the image. So I decided to save the model.

In my requirement I have to save model at every epoch. I am wondering if I zero the tensors in a model (using Jonathan's code from https://github.com/torch/DEPRECEATED-torch7-distro/issues/47) between epochs, will it effect the accuracy or performance of the network?

or do I need to copy the model into different temporary tensor and clean that temporary tensor before saving it. 

Please let me know which approach I need to follow.

Thanks for your time and help.

-
Regards,
Sravan

Sravan Kumar Reddy Javaji

unread,
May 12, 2015, 4:26:41 PM5/12/15
to tor...@googlegroups.com
I found solution for my previous question, from Jonathan's reply to the following link : https://groups.google.com/forum/#!searchin/torch7/optimstate/torch7/uNxnrH-7C-4/pgIBdAFVaOYJ

Currently I am saving optim state as well. To my surprise, both the optim state and weights are of same size. So, I loaded optim state and checked the values that it contain.

th> m=torch.load("optimstate.t7")
                                                                      [1.4413s]
th> m
{
  evalCounter : 5327
  learningRate : 0.001
  weightDecay : 0.0001
  learningRateDecay : 0
  dfdx : CudaTensor - size: 431080
  momentum : 0.9
}

What is dfdx here? is it fine if I clear dfdx, before saving the optim state to save memory.

Please let me know. Thanks for your time :)

-
Regards,
Sravan

soumith

unread,
May 12, 2015, 4:46:20 PM5/12/15
to torch7 on behalf of Sravan Kumar Reddy Javaji
dfdx is the momentum vector. You can clear it, it will reestimate the momentum from scratch.

--

Mrinal Haloi

unread,
Oct 19, 2015, 5:18:52 AM10/19/15
to torch7
Hi Sravan,
  
   I have a torch model (model.t7) its very huge 1.6 GB. I want to compress it to 500MB. Could you please help me out here.

Remi Cadene

unread,
Mar 2, 2016, 7:57:52 AM3/2/16
to torch7
You can use this function to reduce its memory footprint
sanitize = function (net)
   local list = net:listModules()
   for _,val in ipairs(list) do
       for name,field in pairs(val) do
           if torch.type(field) == 'cdata' then val[name] = nil end
           if (name == 'output' or name == 'gradInput') then
               val[name] = field.new()
           end
       end
   end
end


soumith

unread,
Mar 2, 2016, 10:22:03 AM3/2/16
to torch7 on behalf of Remi Cadene
If you have latest torch, you can now do:
model:clearState()

Jitendra Bansal

unread,
Aug 16, 2016, 2:00:27 AM8/16/16
to torch7
Hi Smth,
I have used clearState() before saving the model. Still my saved model size is 1.4GB.
Could you please tell me how to reduce this size. I am using Alexnet model provided in Example (https://github.com/soumith/imagenet-multiGPU.torch/blob/master/models/alexnetowtbn.lua).

Regards,
Jitendra


On Wednesday, 2 March 2016 20:52:03 UTC+5:30, smth chntla wrote:
If you have latest torch, you can now do:
model:clearState()

Jon Pi

unread,
Apr 18, 2017, 8:17:32 AM4/18/17
to torch7
Hi Sravan,

Where you able to solve this issue? i.e. accuracy of 30% instead of 78%...

Thanks,

Jonathan
Reply all
Reply to author
Forward
0 new messages