How to load trained weights?

Sravan Kumar Reddy Javaji

unread,

Feb 13, 2015, 4:12:31 PM2/13/15

to tor...@googlegroups.com

Hello Everyone,

I am saving the Weights of the model after 10 epochs as shown below,

torch.save("Model_Weights.t7", Weights) -> Accuracy is around 78%

And I have written one more code which loads Weights file (as shown below) and continued the training.

local model = require('./Models/' .. opt.network)
local weights, gradients = model:getParameters()
weights=torch.load("Model_Weights.t7")

But my initial accuracy starts from 30%, I thought accuracy will start from 78%, as I loaded the previously saved weights.

Also I didn't understand why sometimes we will save only weights instead of entire model. Is it just to save the space?

Could someone please let me know where I did wrong. Thanks for your time and help.

-
Regards,
Sravan

Jonathan Tompson

unread,

Feb 13, 2015, 4:35:45 PM2/13/15

to torch7 on behalf of Sravan Kumar Reddy Javaji

Sravan,

If you save the network using torch.save() the weights will definitely be saved as they are in memory. My guess is that there's a bug in your evaluation code somewhere higher up. Are you absolutely sure you're repeating the exact evaluation of your model.

Alternatively, if you're saving the weights but you're not saving the optim data, then after the first resumed epoch it's possible that very odd things could occur (because the optimization may take a step in weight space that destroys performance). If you're using momentum at all then you absolutely need to save the optim data to resume training (otherwise the optimizer will restart in a different state).

I hope that helps.

--
You received this message because you are subscribed to the Google Groups "torch7" group.
To unsubscribe from this group and stop receiving emails from it, send an email to torch7+un...@googlegroups.com.
To post to this group, send email to tor...@googlegroups.com.
Visit this group at http://groups.google.com/group/torch7.
For more options, visit https://groups.google.com/d/optout.

Sravan Kumar Reddy Javaji

unread,

Feb 13, 2015, 5:24:08 PM2/13/15

to tor...@googlegroups.com

Hello Jonathan,

Yes, I am using momentum but I am not saving optim state, probably that’s the issue. I will try to save and load optim state and check the accuracy.

Now I am trying to save the model and load it back in different file as shown below.

torch.save("model.net", model)

local model =torch.load("model.net")

But I am getting the below error. What may be the reason,

/home/ubuntu/torch/install/share/lua/5.1/torch/File.lua:227: unknown Torch class <nn.Sequential>
stack traceback:
    [C]: in function 'error'
    /home/ubuntu/torch/install/share/lua/5.1/torch/File.lua:227: in function 'readObject'
    /home/ubuntu/torch/install/share/lua/5.1/torch/File.lua:271: in function 'load'
    main2.lua:39: in main chunk
    [C]: in function 'dofile'
    ...untu/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:131: in main chunk
    [C]: at 0x00406170

Please let me know. Thanks for your help.

-
Thanks,
Regards

Jonathan Tompson

unread,

Feb 13, 2015, 5:36:24 PM2/13/15

to torch7 on behalf of Sravan Kumar Reddy Javaji

If I was to guess, I would say that you're not doing a "require 'nn'" before loading.

Sravan Kumar Reddy Javaji

unread,

Feb 13, 2015, 6:06:02 PM2/13/15

to tor...@googlegroups.com

Thanks, adding "require" statement corrected the error.

Size of my model is 3 GB. Now when I try to upload it, it is giving "out of memory" error :(

I reduced mini batch size from 128 to 32 but still I am getting out of memory error. I know that saving model will save all other information like gradInputs, output etc. And from the below link, I understood that I need to save only bias and weights to reduce the size of network.

https://github.com/torch/DEPRECEATED-torch7-distro/issues/47

As I am using momentum, do I need to save optim state also along with weights and bias? Please let me know.

-
Regards,
Sravan

On Friday, February 13, 2015 at 2:36:24 PM UTC-8, Jonathan Tompson wrote:

If I was to guess, I would say that you're not doing a "require 'nn'" before loading.

Jonathan Tompson

unread,

Feb 13, 2015, 6:14:19 PM2/13/15

to torch7 on behalf of Sravan Kumar Reddy Javaji

I just posted a new code snippet to that old issue for you. It describes how to zero the tensors in a model before saving them to disk to reduce on disk filesize.

I would say that you only need to save the optim state if you want to continue training using SGD from where you left off. If you just want to save the model in order to run FPROP through it at a later date then you don't need it. Does that make sense?

Maybe have a look in the sgd method in optim to try and understand what state it actually uses and what state you will need for restarting training.

Sravan Kumar Reddy Javaji

unread,

Feb 13, 2015, 6:21:38 PM2/13/15

to tor...@googlegroups.com

Oh, now I got it. Actually I just want to run FPROP using the saved weights.

I felt that the result I got is not accurate. So I started resuming training in the same code, to check whether I have uploaded weights correctly or not.

Now I clearly understood that, to resume SGD training we may need to save both weights and optim state, but to just classify one image, then just saving the weights is sufficient. Right?

Thanks for your time Jonathan :)

-
Regards,
Sravan

On Friday, February 13, 2015 at 3:14:19 PM UTC-8, Jonathan Tompson wrote:

I just posted a new code snippet to that old issue for you. It describes how to zero the tensors in a model before saving them to disk to reduce on disk filesize.

I would say that you only need to save the optim state if you want to continue training using SGD from where you left off. If you just want to save the model in order to run FPROP through it at a later date then you don't need it. Does that make sense?

Maybe have a look in the sgd method in optim to try and understand what state it actually uses and what state you will need for restarting training.

Sravan Kumar Reddy Javaji

unread,

Feb 18, 2015, 7:49:53 PM2/18/15

to tor...@googlegroups.com

Hello Jonathan,

I am not getting expected results when I use pre-trained weights. Just for cross check, I loaded the pre trained weights and ran it on whole valuation data (which I used earlier for training the model). I got error rate of 98%, which is not the expected. Before saving the model, error rate was 72%, so I believe either weights aren't saved properly or not loaded properly or probably just the weights is not sufficient to load pre-trained model state.

I am saving the weights as shown below,

torch.save(weights_filename, Weights)

Loading the weights as shown below,

torch.setdefaulttensortype('torch.CudaTensor)

local model = require ('AlexNet')

local weights, gradients = model:getParameters()

weights = torch.load(weights_filename)

I know that setting defaulttensortype to Cuda is not a good idea. but if I remove this, entire system gets hanged and the system will remain in hang status for more than 10 minutes until it loads entire weights. Size of weights file is not so big, its just 280 MB.

Please let me know what am I missing here. Thanks for your time and help

-

Regards,

Sravan

soumith

unread,

Feb 18, 2015, 9:05:17 PM2/18/15

to torch7 on behalf of abes

Sravan, your code to load is wrong.

You have to do:
weights: copy(torch.load(...))

Jonathan Tompson

unread,

Feb 18, 2015, 10:45:07 PM2/18/15

to torch7 on behalf of smth chntla

Why don't you just serialize the entire model? Sorry, I thought that's what you were doing.

Sravan Kumar Reddy Javaji

unread,

Feb 19, 2015, 1:00:08 AM2/19/15

to tor...@googlegroups.com

Thanks Soumith :)

Yes Jonathan, I used your code to clean up the model and save it. It is working very fine. The size of the model also reduced from 3 GB to 500 MB after using your clean model logic. I have some pre-trained weights, so I tried loading them, then I faced this issue. Your cleanup code is very good :) Thanks Jonathan.

-

Regards,

Sravan

On Wednesday, February 18, 2015 at 7:45:07 PM UTC-8, Jonathan Tompson wrote:

Why don't you just serialize the entire model? Sorry, I thought that's what you were doing.

On Wed, Feb 18, 2015 at 9:05 PM, torch7 on behalf of smth chntla <tor...@googlegroups.com> wrote:

Sravan, your code to load is wrong.

You have to do:
weights: copy(torch.load(...))

Paweł S

unread,

May 6, 2015, 10:03:46 AM5/6/15

to tor...@googlegroups.com

Thanks for sharing code Jonathan, I was about to write something similar, saved my afternoon (decreases size of output file 5x).

I would never need it if I hadn't encountered weird error. I was not able to load a model as "cude run out of memory".

I guess it is because of how torch.load works. With cleaned up network it works perfectly.

Sravan Kumar Reddy Javaji

unread,

May 8, 2015, 7:23:13 PM5/8/15

to tor...@googlegroups.com

Hello Jonathan/Soumith,

I have to continue training in future also I want to test the image. So I decided to save the model.

In my requirement I have to save model at every epoch. I am wondering if I zero the tensors in a model (using Jonathan's code from https://github.com/torch/DEPRECEATED-torch7-distro/issues/47) between epochs, will it effect the accuracy or performance of the network?

or do I need to copy the model into different temporary tensor and clean that temporary tensor before saving it.

Please let me know which approach I need to follow.

Thanks for your time and help.

-

Regards,

Sravan

Sravan Kumar Reddy Javaji

unread,

May 12, 2015, 4:26:41 PM5/12/15

to tor...@googlegroups.com

I found solution for my previous question, from Jonathan's reply to the following link : https://groups.google.com/forum/#!searchin/torch7/optimstate/torch7/uNxnrH-7C-4/pgIBdAFVaOYJ

Currently I am saving optim state as well. To my surprise, both the optim state and weights are of same size. So, I loaded optim state and checked the values that it contain.

th> m=torch.load("optimstate.t7")

[1.4413s]

th> m

{

evalCounter : 5327

learningRate : 0.001

weightDecay : 0.0001

learningRateDecay : 0

dfdx : CudaTensor - size: 431080

momentum : 0.9

}

What is dfdx here? is it fine if I clear dfdx, before saving the optim state to save memory.

Please let me know. Thanks for your time :)

-

Regards,

Sravan

soumith

unread,

May 12, 2015, 4:46:20 PM5/12/15

to torch7 on behalf of Sravan Kumar Reddy Javaji

dfdx is the momentum vector. You can clear it, it will reestimate the momentum from scratch.

--

Mrinal Haloi

unread,

Oct 19, 2015, 5:18:52 AM10/19/15

to torch7

Hi Sravan,

I have a torch model (model.t7) its very huge 1.6 GB. I want to compress it to 500MB. Could you please help me out here.

Remi Cadene

unread,

Mar 2, 2016, 7:57:52 AM3/2/16

to torch7

You can use this function to reduce its memory footprint

sanitize = function (net)
    local list = net:listModules()
    for _,val in ipairs(list) do
        for name,field in pairs(val) do
            if torch.type(field) == 'cdata' then val[name] = nil end
            if (name == 'output' or name == 'gradInput') then
                val[name] = field.new()
            end
        end
    end
end

soumith

unread,

Mar 2, 2016, 10:22:03 AM3/2/16

to torch7 on behalf of Remi Cadene

If you have latest torch, you can now do:
model:clearState()

Visit this group at https://groups.google.com/group/torch7.

Jitendra Bansal

unread,

Aug 16, 2016, 2:00:27 AM8/16/16

to torch7

Hi Smth,

I have used clearState() before saving the model. Still my saved model size is 1.4GB.

Could you please tell me how to reduce this size. I am using Alexnet model provided in Example (https://github.com/soumith/imagenet-multiGPU.torch/blob/master/models/alexnetowtbn.lua).

Regards,

Jitendra

On Wednesday, 2 March 2016 20:52:03 UTC+5:30, smth chntla wrote:

If you have latest torch, you can now do:
model:clearState()

Jon Pi

unread,

Apr 18, 2017, 8:17:32 AM4/18/17

to torch7

Hi Sravan,

Where you able to solve this issue? i.e. accuracy of 30% instead of 78%...

Thanks,

Jonathan

Reply all

Reply to author

Forward

sanitize = function (net)
	local list = net:listModules()
	for _,val in ipairs(list) do
	for name,field in pairs(val) do
	if torch.type(field) == 'cdata' then val[name] = nil end
	if (name == 'output' or name == 'gradInput') then
	val[name] = field.new()
	end
	end
	end
	end