Lua FFI Memory Leak

Heini Korn

unread,

Jan 18, 2016, 11:19:11 PM1/18/16

to torch7

Hey there,

so I'm working with speech and wrote a simple script to do frame extensions for some given context window. I wrote that in native lua, but it was far too slow so I switched to FFI.

Using the FFI it is ~ 100 times faster, but somehow a memory leak occurs.

Basically what I do is concatinating the left and right frameextension frames (e.g. 5 left + 5 right + the current one = 11) of a given frame.

I based my code on this answer.

Here is my code for the C++ part ( I used C++11 )

bool extendframes(THDoubleTensor* input,int fext, THDoubleTensor* output) {

	auto result = false;

	auto inputsize1d=input->size[0];
	auto frameextdim = 2*fext+ 1;
	auto tlen = inputsize1d * frameextdim;

	double* inputdata = THDoubleTensor_data(input);
	double* storage = new double[tlen];
	if (storage){
		auto k=0u;
		for (auto i = 0; i < inputsize1d; ++i) {
			for(auto j = -fext; j <= fext; j++){
	//			If its smaller than 0, take the 0th element
				auto tmpidx= i+j>0?(i+j):0;
				// Check if the extended index is in range, if it is use the extended index
				// otherwise use the last sample
				tmpidx= (tmpidx>inputsize1d-1)?inputsize1d-1:tmpidx;
				storage[k++] = inputdata[tmpidx];
			}
		}
	}

	THDoubleStorage* outputstorage  = THDoubleStorage_newWithData(storage,tlen);
	if (outputstorage){

		long sizedata[2]   = { inputsize1d,frameextdim };
		long stridedata[2] = { frameextdim, 1};

		THLongStorage* size    = THLongStorage_newWithData(sizedata, 2);
		THLongStorage* stride  = THLongStorage_newWithData(stridedata, 2);

		THDoubleTensor_setStorage(output, outputstorage, 0LL, size, stride);
		result =true;
	}
	return result;
}

extern "C" {
    bool extendframes_d(THDoubleTensor* input,int fext,THDoubleTensor* output){ return extendframes(input,fext,output);}
}

Moreover, the Lua API then looks a

require 'torch'
require 'xlua'
local ffi = require 'ffi'
package.cpath = package.cpath .. ";./extendframes/?.so"

ffi.cdef[[
    bool extendframes_d(THDoubleTensor* input,int fext,THDoubleTensor* output);
]]

local cflua = ffi.load(package.searchpath('libextendframes', package.cpath))

function extendframes(input,fext)
    if not ((input:type() ~= 'torch.FloatTensor') or (input:type() ~= 'torch.DoubleTensor')) then
        xlua.error("Error, currently only torch.FloatTensors or torch.DoubleTensors are supported!"
                   )
    end
    local out = nil
    if input:type() == 'torch.DoubleTensor' then
        out = torch.DoubleTensor()
        cflua.extendframes_d(input:contiguous():cdata(),fext,out:cdata())
    end
    return out
end

Now if I test the script by using a simple loop to check if the tensors are correctly garbage collected I can observe that the garbagecollect('count') does not significantly increase, by my machines memory does.

while (true)do
    local tic = torch.tic()
    local extended = extendframes(torch.Tensor(1000000):fill(5),5)
    collectgarbage()
    print(collectgarbage('count'),torch.toc(tic))
end

Does anybody know why there is a memory leak here? How can I avoid it?

Francisco Vitor Suzano Massa

unread,

Jan 19, 2016, 2:02:32 AM1/19/16

to torch7

Are you on Linux ? If yes, then this issue might be relevant to you
https://github.com/torch/torch7/issues/229

Heini Korn

unread,

Jan 19, 2016, 3:16:07 AM1/19/16

to torch7

Hey there, thanks for the suggestion, yes I am on Linux but unfortunately I do not experience the same problem as in Issue 229.

Running the code from Issue 229 works flawlessly, however if I modify the code with my example, I still obtain some memory leaks.

local function foo()
    for i=1,1000 do
        local a = extendframes(torch.randn(1000),5)
    end
end

local function foo2()
    foo()
    collectgarbage(); collectgarbage()
    os.execute('ps ax -o rss,user,command | grep luajit | sort -nr')
end


while (true)do
    -- local tic = torch.tic()
    foo2()
    -- local extended = extf(torch.randn(100000),5)
    -- collectgarbage()
    -- print(extended)
    -- print(collectgarbage('count'),torch.toc(tic))
end

Unfortunately even with using jdmalloc it does not free any memory, I get a log like this for my used memory:

It does increase steadily until the computer crashes.

Francisco Vitor Suzano Massa

unread,

Jan 19, 2016, 5:00:22 AM1/19/16

to torch7

Just a guess in here, but maybe you should try allocating the memory using THAlloc https://github.com/torch/torch7/blob/master/lib/TH/THGeneral.c#L204-L225 instead of new ?
Memory deallocation is done using free, and it's probably not a good idea to mix new/free.

Heini Korn

unread,

Jan 19, 2016, 5:58:08 AM1/19/16

to torch7

Thanks for the suggestion, I modifed:

void* outputstorage = THAlloc(sizeof(storage));

	memcpy(outputstorage,storage,sizeof(storage));

	if (outputstorage){
		long sizedata[2]   = { inputsize1d,frameextdim };
		long stridedata[2] = { frameextdim, 1};

		THLongStorage* size    = THLongStorage_newWithData(sizedata, 2);
		THLongStorage* stride  = THLongStorage_newWithData(stridedata, 2);

		THFloatTensor_setStorage(output, (THFloatStorage*)outputstorage, 0LL, size, stride);
		result = true;
	}

But still no improvement, no garbage is collected at all! :(

Francisco Vitor Suzano Massa

unread,

Jan 19, 2016, 6:33:31 AM1/19/16

to torch7

in this case you need to delete output after the memcopy, right ?

Francisco Vitor Suzano Massa

unread,

Jan 19, 2016, 6:39:27 AM1/19/16

to torch7

but what I had in mind was doing something like

double *storage = (double *) malloc(tlen*sizeof(double))

Message has been deleted

Heini Korn

unread,

Jan 19, 2016, 7:15:57 AM1/19/16

to torch7

Finally!

Yes this was the answer, there is no need to deallocate that though.

Thanks a lot !

Heini Korn

unread,

Jan 19, 2016, 7:25:25 AM1/19/16

to torch7

Oh no I'm sorry I didnt check the output.

So one of the two is happening:

Either I do free/delete at the end and the output is garbled but the memory is saved
I dont do free/delete and I get the correct output but memory overflows

Francisco Vitor Suzano Massa

unread,

Jan 19, 2016, 8:33:45 AM1/19/16

to torch7

Just to make sure I understand correctly,
If you just modify in your very first example

double* storage = new double[tlen];

by

double* storage = (double*) malloc(tlen*sizeof(double));

it still doesn't deallocate the memory, right ?

Heini Korn

unread,

Jan 19, 2016, 9:21:16 PM1/19/16

to torch7

Correct, it does not change the behavior that the deallocation is not performed at all.

Btw. I'm on Xubuntu 14.04, 3.13.0-74 Kernel.

soumith

unread,

Jan 20, 2016, 8:26:49 AM1/20/16

to torch7 on behalf of Heini Korn

Hi Heini,

I'm late to the party, but you dont free the THStorage, that's why the memory is not getting deallocated.

THDoubleStorage* outputstorage = THDoubleStorage_newWithData(storage,tlen);

// This makes refcount of outputstorage to be 1

THDoubleTensor_setStorage(output, outputstorage, 0LL, size, stride);

// This makes refcount of outputstorage to be 2

// Next, you need to do

THDoubleStorage_free(outputstorage)

// so that it's refcount is correctly back to 1 (so that when the tensor is deallocated, the storage is also deallocated)

Also, dont allocate the data with C++ new, but use THAlloc or malloc, because new/delete are not always compatible with malloc/free.

--
You received this message because you are subscribed to the Google Groups "torch7" group.
To unsubscribe from this group and stop receiving emails from it, send an email to torch7+un...@googlegroups.com.
To post to this group, send email to tor...@googlegroups.com.
Visit this group at https://groups.google.com/group/torch7.
For more options, visit https://groups.google.com/d/optout.

Heini Korn

unread,

Jan 20, 2016, 8:56:21 PM1/20/16

to torch7

Yes that's it!

Thank you very much Soumith!

On Wednesday, 20 January 2016 21:26:49 UTC+8, smth chntla wrote:

Hi Heini,

I'm late to the party, but you dont free the THStorage, that's why the memory is not getting deallocated.

THDoubleStorage* outputstorage = THDoubleStorage_newWithData(storage,tlen);
// This makes refcount of outputstorage to be 1

THDoubleTensor_setStorage(output, outputstorage, 0LL, size, stride);
// This makes refcount of outputstorage to be 2

// Next, you need to do
THDoubleStorage_free(outputstorage)
// so that it's refcount is correctly back to 1 (so that when the tensor is deallocated, the storage is also deallocated)

Also, dont allocate the data with C++ new, but use THAlloc or malloc, because new/delete are not always compatible with malloc/free.

Reply all

Reply to author

Forward