Training the network from scratch on GTX 660 Ti

34 views

Skip to first unread message

Alexey Abramov

unread,

Feb 26, 2016, 8:46:39 AM2/26/16

to Caffe Users

Hello everyone,

I'm trying to train the AlexNet on "GeForce GTX 660 Ti" on my own data from scratch (no transfer learning), but the training does not proceed somehow. The output looks as follows:

I0219 11:42:25.889590 10926 solver.cpp:338] Iteration 0, Testing net (#0)
I0219 11:43:53.226636 10926 solver.cpp:406] Test net output #0: accuracy = 0.333406
I0219 11:43:53.226735 10926 solver.cpp:406] Test net output #1: loss = 1.23308 (* 1 = 1.23308 loss)
I0219 11:43:53.400267 10926 solver.cpp:229] Iteration 0, loss = 1.56923
I0219 11:43:53.400305 10926 solver.cpp:245] Train net output #0: loss = 1.56923 (* 1 = 1.56923 loss)
I0219 11:43:53.400331 10926 sgd_solver.cpp:106] Iteration 0, lr = 0.017
I0219 11:43:59.131695 10926 solver.cpp:229] Iteration 20, loss = 1.05981
I0219 11:43:59.131750 10926 solver.cpp:245] Train net output #0: loss = 1.0598 (* 1 = 1.0598 loss)
I0219 11:43:59.131763 10926 sgd_solver.cpp:106] Iteration 20, lr = 0.017
I0219 11:44:04.849448 10926 solver.cpp:229] Iteration 40, loss = 1.1099
I0219 11:44:04.849504 10926 solver.cpp:245] Train net output #0: loss = 1.1099 (* 1 = 1.1099 loss)
I0219 11:44:04.849516 10926 sgd_solver.cpp:106] Iteration 40, lr = 0.017
I0219 11:44:10.567837 10926 solver.cpp:229] Iteration 60, loss = 1.02336
I0219 11:44:10.567888 10926 solver.cpp:245] Train net output #0: loss = 1.02336 (* 1 = 1.02336 loss)
I0219 11:44:10.567900 10926 sgd_solver.cpp:106] Iteration 60, lr = 0.017
I0219 11:44:16.299978 10926 solver.cpp:229] Iteration 80, loss = 1.17229
I0219 11:44:16.300027 10926 solver.cpp:245] Train net output #0: loss = 1.17228 (* 1 = 1.17228 loss)
I0219 11:44:16.300040 10926 sgd_solver.cpp:106] Iteration 80, lr = 0.017
I0219 11:44:22.036911 10926 solver.cpp:229] Iteration 100, loss = 1.13178
I0219 11:44:22.036970 10926 solver.cpp:245] Train net output #0: loss = 1.13177 (* 1 = 1.13177 loss)
I0219 11:44:22.036983 10926 sgd_solver.cpp:106] Iteration 100, lr = 0.017
I0219 11:44:27.781303 10926 solver.cpp:229] Iteration 120, loss = 1.09592
I0219 11:44:27.781484 10926 solver.cpp:245] Train net output #0: loss = 1.09592 (* 1 = 1.09592 loss)
I0219 11:44:27.781505 10926 sgd_solver.cpp:106] Iteration 120, lr = 0.017
I0219 11:44:33.524081 10926 solver.cpp:229] Iteration 140, loss = 1.09416
I0219 11:44:33.524142 10926 solver.cpp:245] Train net output #0: loss = 1.09416 (* 1 = 1.09416 loss)

The loss does not go down. I presume that a small batch_size causes this problem. I'm running the training and validation with batch_size = 32, stepsize=100000*8, max_iter=450000*8 and base_lr=base_lr*sqrt(8), since the batch size is decreased by a factor of 8 in my case. Is there any possibility to get the learning on this hardware running (with some other adjustments maybe), or I should give it up and go immediately for GTX Titan X?

Many thanks in advance!

Best regards,
Alexey

Joshua Slocum

unread,

Feb 26, 2016, 11:58:23 AM2/26/16

to Alexey Abramov, Caffe Users

1.09 < 1.2, so it looks to me like your loss is going down. Keep in mind that with *stochastic* gradient descent, your loss will not decrease monotonically. Also keep in mind that you've only run 140 out of ~400,000 iterations.

--
You received this message because you are subscribed to the Google Groups "Caffe Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to caffe-users...@googlegroups.com.
To post to this group, send email to caffe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/db830d93-c34b-45b6-b3ec-f4b134bd1535%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward

0 new messages