Lua FFI Memory Leak

234 views
Skip to first unread message

Heini Korn

unread,
Jan 18, 2016, 11:19:11 PM1/18/16
to torch7
Hey there,

so I'm working with speech and wrote a simple script to do frame extensions for some given context window. I wrote that in native lua, but it was far too slow so I switched to FFI.
Using the FFI it is ~ 100 times faster, but somehow a memory leak occurs.

Basically what I do is concatinating the left and right frameextension frames (e.g. 5 left + 5 right + the current one = 11) of a given frame.
I based my code on this answer.

Here is my code for the C++ part ( I used C++11 )

bool extendframes(THDoubleTensor* input,int fext, THDoubleTensor* output) {

auto result = false;

auto inputsize1d=input->size[0];
auto frameextdim = 2*fext+ 1;
auto tlen = inputsize1d * frameextdim;

double* inputdata = THDoubleTensor_data(input);
double* storage = new double[tlen];
if (storage){
auto k=0u;
for (auto i = 0; i < inputsize1d; ++i) {
for(auto j = -fext; j <= fext; j++){
// If its smaller than 0, take the 0th element
auto tmpidx= i+j>0?(i+j):0;
// Check if the extended index is in range, if it is use the extended index
// otherwise use the last sample
tmpidx= (tmpidx>inputsize1d-1)?inputsize1d-1:tmpidx;
storage[k++] = inputdata[tmpidx];
}
}
}

THDoubleStorage* outputstorage  = THDoubleStorage_newWithData(storage,tlen);
if (outputstorage){

long sizedata[2]   = { inputsize1d,frameextdim };
long stridedata[2] = { frameextdim, 1};

THLongStorage* size    = THLongStorage_newWithData(sizedata, 2);
THLongStorage* stride  = THLongStorage_newWithData(stridedata, 2);

THDoubleTensor_setStorage(output, outputstorage, 0LL, size, stride);
result =true;
}
return result;
}

extern "C" {
    bool extendframes_d(THDoubleTensor* input,int fext,THDoubleTensor* output){ return extendframes(input,fext,output);}
}


Moreover, the Lua API then looks a

require 'torch'
require 'xlua'
local ffi = require 'ffi'
package.cpath = package.cpath .. ";./extendframes/?.so"

ffi.cdef[[
    bool extendframes_d(THDoubleTensor* input,int fext,THDoubleTensor* output);
]]

local cflua = ffi.load(package.searchpath('libextendframes', package.cpath))

function extendframes(input,fext)
    if not ((input:type() ~= 'torch.FloatTensor') or (input:type() ~= 'torch.DoubleTensor')) then
        xlua.error("Error, currently only torch.FloatTensors or torch.DoubleTensors are supported!"
                   )
    end
    local out = nil
    if input:type() == 'torch.DoubleTensor' then
        out = torch.DoubleTensor()
        cflua.extendframes_d(input:contiguous():cdata(),fext,out:cdata())
    end
    return out
end

Now if I test the script by using a simple loop to check if the tensors are correctly garbage collected I can observe that the garbagecollect('count') does not significantly increase, by my machines memory does.

while (true)do
    local tic = torch.tic()
    local extended = extendframes(torch.Tensor(1000000):fill(5),5)
    collectgarbage()
    print(collectgarbage('count'),torch.toc(tic))
end

Does anybody know why there is a memory leak here? How can I avoid it?


Francisco Vitor Suzano Massa

unread,
Jan 19, 2016, 2:02:32 AM1/19/16
to torch7
Are you on Linux ? If yes, then this issue might be relevant to you
https://github.com/torch/torch7/issues/229

Heini Korn

unread,
Jan 19, 2016, 3:16:07 AM1/19/16
to torch7
Hey there, thanks for the suggestion, yes I am on Linux but unfortunately I do not experience the same problem as in Issue 229.

Running the code from Issue 229 works flawlessly, however if I modify the code with my example, I still obtain some memory leaks.


local function foo()
    for i=1,1000 do
        local a = extendframes(torch.randn(1000),5)
    end
end

local function foo2()
    foo()
    collectgarbage(); collectgarbage()
    os.execute('ps ax -o rss,user,command | grep luajit | sort -nr')
end


while (true)do
    -- local tic = torch.tic()
    foo2()
    -- local extended = extf(torch.randn(100000),5)
    -- collectgarbage()
    -- print(extended)
    -- print(collectgarbage('count'),torch.toc(tic))
end

Unfortunately even with using jdmalloc it does not free any memory, I get a log like this for my used memory:

99832
185848
280152
371916
459676
548520
636800
724244
811304
898152
984868
1071580
1158248
1244792
1331340
1417884

It does increase steadily until the computer crashes.

Francisco Vitor Suzano Massa

unread,
Jan 19, 2016, 5:00:22 AM1/19/16
to torch7
Just a guess in here, but maybe you should try allocating the memory using THAlloc https://github.com/torch/torch7/blob/master/lib/TH/THGeneral.c#L204-L225 instead of new ?
Memory deallocation is done using free, and it's probably not a good idea to mix new/free.

Heini Korn

unread,
Jan 19, 2016, 5:58:08 AM1/19/16
to torch7
Thanks for the suggestion, I modifed:

void* outputstorage = THAlloc(sizeof(storage));

memcpy(outputstorage,storage,sizeof(storage));
if (outputstorage){
long sizedata[2]   = { inputsize1d,frameextdim };
long stridedata[2] = { frameextdim, 1};

THLongStorage* size    = THLongStorage_newWithData(sizedata, 2);
THLongStorage* stride  = THLongStorage_newWithData(stridedata, 2);

THFloatTensor_setStorage(output, (THFloatStorage*)outputstorage, 0LL, size, stride);
result = true;
}


But still no improvement, no garbage is collected at all! :(

Francisco Vitor Suzano Massa

unread,
Jan 19, 2016, 6:33:31 AM1/19/16
to torch7
in this case you need to delete output after the memcopy, right ?

Francisco Vitor Suzano Massa

unread,
Jan 19, 2016, 6:39:27 AM1/19/16
to torch7
but what I had in mind was doing something like

double *storage = (double *) malloc(tlen*sizeof(double))
Message has been deleted
Message has been deleted

Heini Korn

unread,
Jan 19, 2016, 7:15:57 AM1/19/16
to torch7
Finally!

Yes this was the answer, there is no need to deallocate that though.

Thanks a lot !

Heini Korn

unread,
Jan 19, 2016, 7:25:25 AM1/19/16
to torch7
Oh no I'm sorry I didnt check the output.

So one of the two is happening:

  1. Either I do free/delete at the end and the output is garbled but the memory is saved
  2. I dont do free/delete and I get the correct output but memory overflows

Francisco Vitor Suzano Massa

unread,
Jan 19, 2016, 8:33:45 AM1/19/16
to torch7
Just to make sure I understand correctly,
If you just modify in your very first example

double* storage = new double[tlen];

by

double* storage = (double*) malloc(tlen*sizeof(double));

it still doesn't deallocate the memory, right ?

Heini Korn

unread,
Jan 19, 2016, 9:21:16 PM1/19/16
to torch7
Correct, it does not change the behavior that the deallocation is not performed at all.

Btw. I'm on Xubuntu 14.04, 3.13.0-74 Kernel.

soumith

unread,
Jan 20, 2016, 8:26:49 AM1/20/16
to torch7 on behalf of Heini Korn
Hi Heini,

I'm late to the party, but you dont free the THStorage, that's why the memory is not getting deallocated.

THDoubleStorage* outputstorage  = THDoubleStorage_newWithData(storage,tlen);
// This makes refcount of outputstorage to be 1

THDoubleTensor_setStorage(output, outputstorage, 0LL, size, stride);
// This makes refcount of outputstorage to be 2

// Next, you need to do 
THDoubleStorage_free(outputstorage)
// so that it's refcount is correctly back to 1 (so that when the tensor is deallocated, the storage is also deallocated)

Also, dont allocate the data with C++ new, but use THAlloc or malloc, because new/delete are not always compatible with malloc/free.

--
You received this message because you are subscribed to the Google Groups "torch7" group.
To unsubscribe from this group and stop receiving emails from it, send an email to torch7+un...@googlegroups.com.
To post to this group, send email to tor...@googlegroups.com.
Visit this group at https://groups.google.com/group/torch7.
For more options, visit https://groups.google.com/d/optout.

Heini Korn

unread,
Jan 20, 2016, 8:56:21 PM1/20/16
to torch7
Yes that's it!

Thank you very much Soumith!


On Wednesday, 20 January 2016 21:26:49 UTC+8, smth chntla wrote:
Hi Heini,

I'm late to the party, but you dont free the THStorage, that's why the memory is not getting deallocated.

THDoubleStorage* outputstorage  = THDoubleStorage_newWithData(storage,tlen);
// This makes refcount of outputstorage to be 1

THDoubleTensor_setStorage(output, outputstorage, 0LL, size, stride);
// This makes refcount of outputstorage to be 2

// Next, you need to do 
THDoubleStorage_free(outputstorage)
// so that it's refcount is correctly back to 1 (so that when the tensor is deallocated, the storage is also deallocated)

Also, dont allocate the data with C++ new, but use THAlloc or malloc, because new/delete are not always compatible with malloc/free.
Reply all
Reply to author
Forward
0 new messages