.th7 file format

1,493 views
Skip to first unread message

Kevin Craft

unread,
Oct 31, 2013, 5:05:04 PM10/31/13
to tor...@googlegroups.com
Several of the examples I am working with (e.g. https://github.com/clementfarabet/torch7-demos/blob/master/train-a-digit-classifier/dataset-mnist.lua) use .th7 files as data. I'm guessing these files contain lists of binary images. I have a directory with several thousand images in it and I would like to archive them into a .th7 file like the demos use. Is there a function to do this or do I need to write to the file manually in a specific format?

Thanks.

smth chntla

unread,
Oct 31, 2013, 5:06:52 PM10/31/13
to tor...@googlegroups.com
use torch.save() and torch.load() :)

Kevin Craft

unread,
Oct 31, 2013, 5:16:01 PM10/31/13
to tor...@googlegroups.com
Woohoo!

Thanks again. :)

Is there perhaps some better documentation somewhere that I am missing? I feel like I'm filling up this board with stupid questions.

smth chntla

unread,
Oct 31, 2013, 5:20:20 PM10/31/13
to tor...@googlegroups.com
http://torch.ch/

Sroll down to the packages section, and each package has its own documentation, pretty well detailed.

Kevin Craft

unread,
Oct 31, 2013, 5:28:03 PM10/31/13
to tor...@googlegroups.com
Thank you. :)

Jitendra Bansal

unread,
Oct 1, 2015, 8:29:08 AM10/1/15
to torch7
Hi All,
I am also in same situation where I want to save thousands images into a .t7 file format. When I use torch.save and torch.load() function, it is saving only 1 image at a time.
How can I save all images in one file?

I am using Image.load to read images and then try to save this data in the file using torch.save(filename, imagedata)

Regards,
Jitendra

Francisco Vitor Suzano Massa

unread,
Oct 1, 2015, 8:37:44 AM10/1/15
to torch7
the simplest thing you can do is to store the images in a table and save the table.
images = {}
for i=1,10 do
  images[i] = image.load(...)
end
torch.save('myimages.t7',images)


That requires having all the images in memory, which might be prohibitive. Also, there is no compression in standard torch files, so it could become quite a big file. You can eventually compress the images in memory using compressJPEG before saving.

Another option is to save the images to a different format, like hdf5 https://github.com/deepmind/torch-hdf5 , which allows you to save image per image in the same file and with compression, or LMDB https://github.com/eladhoffer/lmdb.torch .

alban desmaison

unread,
Oct 1, 2015, 8:44:38 AM10/1/15
to torch7
If you dataset is not too big (fits in RAM) you can store everything as one single Tensor:

If for example you have 100 images of size 3x48x48 you can do

local all_data = torch.Tensor(100, 3, 48, 48)
for i=1,100 do
    local one_data = torch.load(data_paths[i])
    all_data[i]:copy(one_data)
end
torch.save(out_file_path, all_data)

Jitendra Bansal

unread,
Oct 1, 2015, 8:47:23 AM10/1/15
to torch7
Hi Francisco,
Thanks for the suggestion.
Is there no function/method which will allow to append image data in file one by one so that there is no need to save data in temporary table.
As I have 100,000 + images with each size more than 500*500.
So, If I save images in table then it will take lot of memory space. 

Actually I am thinking to this as When I train my model it is taking lot of time to load images for every epoch. thus makes my model inefficient.

Is there any better approach to load images faster in batches when train machine.

Regards,
Jitendra Bansal

Francisco Vitor Suzano Massa

unread,
Oct 1, 2015, 8:54:32 AM10/1/15
to torch7
I'd advise you to try hdf5 or lmdb.
hdf5 enables you to append the images in the file, and also its possible to use compression, which might make the reading from disk faster.
Or, you could use threads and load the images in several threads, as in https://github.com/soumith/imagenet-multiGPU.torch

Francisco Vitor Suzano Massa

unread,
Oct 1, 2015, 9:00:39 AM10/1/15
to torch7
but if you really want to use standard torch save function, you can save the elements one by one by doing the folowing:

f = torch.DiskFile('images.t7','w')
for i=1,10 do
  im = image.load(...)
  im:write(f)
end
f:close()


But then, you also need to read it using torch.DiskFile

Jitendra Bansal

unread,
Oct 1, 2015, 9:08:51 AM10/1/15
to torch7
Thanks for suggestion.
will it be slower than reading images using image.load()?

Francisco Vitor Suzano Massa

unread,
Oct 1, 2015, 9:19:52 AM10/1/15
to torch7
I don't know, you'd need to try.

another thing you could is to save batches of say 1000 images in compressed (or uncompressed) files, and read them batch-wise. that would make training faster, but you would bias a bit the training distribution.

Sowjanya Boddeti

unread,
Nov 6, 2015, 4:22:21 AM11/6/15
to torch7
This query helped me a lot..! But if i have labels along with the images to be saved into a .t7 file, what can be done? Maybe this is so stupid, but please help. By the way, the labels are in excel file.
Reply all
Reply to author
Forward
0 new messages