How long does it take to run the tutorial with SVHN dataset?

328 views
Skip to first unread message

Aziza Satkhozhina

unread,
Jul 24, 2014, 7:07:28 PM7/24/14
to tor...@googlegroups.com
I installed Torch and running the example from this tutorial: http://code.cogbits.com/wiki/doku.php?id=tutorial_sup

I use -size small,  which means training uses 10,000 examples. I didn't alter the code at all, but it gets stuck at training. It would get stuck at epoch 1,  1/10000. I am using my personal laptop which has only 2 cores and 2.6Ghz. It's been several hours and there is no progress. Is this normal ? Does it take really long time to run this tutorial?

If not, what should I check?

Thank you in advance, I would appreciate any help.

-Aziza
Message has been deleted
Message has been deleted

Fei Shen

unread,
Aug 2, 2014, 6:10:49 AM8/2/14
to tor...@googlegroups.com
I got this problem few times on my own training code. It worked after restart the computer. I suspect  it 's caused by the  openMP. You can try to close the openMP by torch.setnumthreads(1) 

在 2014年7月25日星期五UTC+8上午7时07分28秒,Aziza Satkhozhina写道:
Message has been deleted

Robert Stahr

unread,
Dec 18, 2014, 5:11:26 AM12/18/14
to tor...@googlegroups.com
Hi Aziza,

I know your post is several months old, but since I'm currently having the same problem, I'm re-activating the thread.

Did you find a solution to your problem?

My situation is the following:

I have a virtual machine and a physical machine. Both run Kubuntu 14.04 and both were installed specifically for doing deep learning with torch, so only Torch and its dependencies are installed on them.

On the virtual machine, the tutorial runs without problems and I can definitely tell you that on your laptop, there should have been tons of progress during the time you waited.

On my physical machine, the tutorial (torch-tutorials/2_supervised run with "th -i doall.lua -size small") runs part of the first epoch and then freezes. When I stop it with Ctrl-C and start it again, it freezes (like yours) at epoch 1, 1/10000. It will continue to freeze at epoch 1, 1/10000 until I reboot. Then it'll run part of the first epoch and freeze again. And so on.

I have tried to re-download the data files, but this didn't solve the problem.

I'm currently trying to debug to find out what is happening, but if somebody had an idea, it would be great.

Thanks in advance for any help.

Cheers,
Robert Stahr

Robert Stahr

unread,
Dec 18, 2014, 5:54:38 AM12/18/14
to tor...@googlegroups.com
Hi again,

I found something in another thread (Xubuntu results and hangs) that sounded promising and it seems to work. Apparently there are well-known bugs around OpenBlas and OpenMP on Ubuntu. So a solution is to ask OpenMP to use only one thread. You do this with the following command on your linux command line:

export OMP_NUM_THREADS=1

This works for me. When I run the doall.lua thing, training merrily runs for epochs and epochs. 

Cheers,
Robert Stahr

Toru Hironaka

unread,
Mar 3, 2015, 8:52:12 PM3/3/15
to tor...@googlegroups.com
Hi.

I got the same problem and I could fixed what you wrote here. However, there is a problem. doall.lua runs forever. Is the last 4 lines of code in doall.lua infinite loop? I set -max_iter 2 and -size small but it seems to run forever.

while true do
   train()
   test()
end

soumith

unread,
Mar 3, 2015, 9:23:29 PM3/3/15
to torch7 on behalf of Toru Hironaka
You can Ctrl + C the script and it will end. Or you can replace the while statement with a limited number of epochs.

--
You received this message because you are subscribed to the Google Groups "torch7" group.
To unsubscribe from this group and stop receiving emails from it, send an email to torch7+un...@googlegroups.com.
To post to this group, send email to tor...@googlegroups.com.
Visit this group at http://groups.google.com/group/torch7.
For more options, visit https://groups.google.com/d/optout.

Toru Hironaka

unread,
Mar 4, 2015, 1:40:11 PM3/4/15
to tor...@googlegroups.com
Thanks, I thought so.





On Tuesday, March 3, 2015 at 9:23:29 PM UTC-5, smth chntla wrote:
You can Ctrl + C the script and it will end. Or you can replace the while statement with a limited number of epochs.
Reply all
Reply to author
Forward
0 new messages