Reading, parsing and storing .txt files contents in Torch tensors efficiently

262 views

Skip to first unread message

Amir

unread,

Oct 19, 2016, 10:58:03 AM10/19/16

to torch7

I have a huge number of .txt files (maybe around 10 millions) each having the same number of rows/colums. They actually are some single channel images and the pixel values are separated with an space. Here's the code I've written to do the work but it's very slow. I wonder if someone can suggest a more optimized/efficient way of doing this:



    f = assert(io.open(txtFilePath, 'r'))
    local tempTensor = torch.Tensor(1, 64, 64):fill(0)
    local i = 1
    for line in f:lines() do
        local l = line:split(' ')
        for key, val in ipairs(l) do
            tempTensor[{1, i, key}] = tonumber(val)
        end
        i = i + 1
    end
    f:close()

Simon Niklaus

unread,

Oct 19, 2016, 5:54:23 PM10/19/16

to torch7

Accessing elements like this is not very fast, see: https://github.com/torch/torch7/issues/474#issuecomment-159891806

The string operations might also be a bottleneck. Splitting large strings is no necessarily fast, and so is the tonumber() function. You could iterate over the string character by character and do the parsing yourself to speed it up further.