Hi,
when I was training a customized vgg16 network in caffe, it get killed halfway without extra information. Here is part of training log.
I1012 16:02:20.142881 16357 solver.cpp:218] Iteration 600 (1.45809 iter/s, 68.5831s/100 iters), loss = 0.307288
I1012 16:02:20.142946 16357 solver.cpp:237] Train net output #0: loss = 0.188233 (* 1 = 0.188233 loss)
I1012 16:02:20.142953 16357 sgd_solver.cpp:105] Iteration 600, lr = 0.001
I1012 16:03:28.003197 16357 solver.cpp:218] Iteration 700 (1.47367 iter/s, 67.8577s/100 iters), loss = 0.495435
I1012 16:03:28.003325 16357 solver.cpp:237] Train net output #0: loss = 0.215491 (* 1 = 0.215491 loss)
I1012 16:03:28.003334 16357 sgd_solver.cpp:105] Iteration 700, lr = 0.001
I1012 16:04:35.836854 16357 solver.cpp:218] Iteration 800 (1.47425 iter/s, 67.831s/100 iters), loss = 0.47339
I1012 16:04:35.836949 16357 solver.cpp:237] Train net output #0: loss = 0.00221339 (* 1 = 0.00221339 loss)
I1012 16:04:35.836957 16357 sgd_solver.cpp:105] Iteration 800, lr = 0.001
I1012 16:06:31.388617 16357 solver.cpp:218] Iteration 900 (0.865515 iter/s, 115.538s/100 iters), loss = 0.425094
I1012 16:06:31.449453 16357 solver.cpp:237] Train net output #0: loss = 0.21385 (* 1 = 0.21385 loss)
I1012 16:06:31.472916 16357 sgd_solver.cpp:105] Iteration 900, lr = 0.001
./examples/weighted_bilinear/ft_last_layer3.sh: line 9: 16357 Killed
The file ft_last_layer3.sh is given below.
#!/bin/bash
# first fine tune the last layer only
GLOG_logtostderr=0 GLOG_log_dir=/home/qy/documents/caffe/examples/weighted_bilinear/log/ \
./build/tools/caffe train \
-model "examples/weighted_bilinear/ft_last_layer3.prototxt" \
-solver "examples/weighted_bilinear/ft_last_layer3.solver" \
-weights "/home/qy/documents/CaffeModel/VGG_ILSVRC_16_layers.caffemodel" \
-gpu 0
Line 9 just contains one statement "
-gpu 0".
I am also monitoring the gpu state as shown below. It seems that the memory is enough.
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.66 Driver Version: 375.66 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1080 Off | 0000:01:00.0 On | N/A |
| 52% 79C P2 127W / 180W |
5120MiB / 8112MiB | 96% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1127 G /usr/lib/xorg/Xorg 240MiB |
| 0 2049 G compiz 152MiB |
| 0 2420 G ...el-token=4D887FF09714CDAAFA04F7E91E9C165A 54MiB |
| 0 15970 G /usr/lib/firefox/firefox 2MiB |
| 0 16357 C ./build/tools/caffe
4664MiB |
+-----------------------------------------------------------------------------+
So I am rather confused about this issue. Does anyone give me some advice?
Thanks.