While trying to train a few models, I discovered the symptoms of improper data labelling: The loss changes but the accuracy doesn't change (much). Here's output from training one model:
1130 13:40:19.760501 2041447168 solver.cpp:160] Solving CaffeNet
I1130 13:40:19.760577 2041447168 solver.cpp:247] Iteration 0, Testing net (#0)
I1130 14:18:50.135545 2041447168 solver.cpp:298] Test net output #0: accuracy = 0.343721
I1130 14:18:50.135606 2041447168 solver.cpp:298] Test net output #1: loss = 0.831318 (* 1 = 0.831318 loss)
...
I1130 19:30:21.624950 2041447168 solver.cpp:247] Iteration 2000, Testing net (#0)
I1130 20:08:46.390735 2041447168 solver.cpp:298] Test net output #0: accuracy = 0.343721
I1130 20:08:46.390794 2041447168 solver.cpp:298] Test net output #1: loss = 3.80082 (* 1 = 3.80082 loss)
```
The score predictions are also hugely skewed. I use only two labels (0 and 1), but the 1 label nearly always has scores of about 0.999 when I use the bvlc reference caffenet, or the 0 label has a prediction score of all 0s if I use the NIN model. The same symptoms occur if I train from scratch vs finetune the models, and the problem remains if I use an IMAGE_DATA layer rather than a DATA layer that depends on LMDB.
Yet my train.txt and test.txt appear to be formatted correctly:
Here’s my train.txt: 01eggs.533span_0.jpg 1 01eggs.533span_1.jpg 1 01eggs.533span_10.jpg 1 01eggs.533span_11.jpg 1 01eggs.533span_12.jpg 1 … n02093056_2211045646_a3df4790b8.jpg 0 n02093056_298841044_552ffd4061.jpg 0 n02093428_1851150959_e32c79c88a.jpg 0 n02093754_161646818_c922da8140.jpg 0 and my test.txt: 3417053458_e45d068b20_0.jpg 1 3422082793_72bdb5b2a2_0.jpg 1 3422082793_72bdb5b2a2_1.jpg 1 3422082793_72bdb5b2a2_2.jpg 1 3423114055_c5294b1832_0.jpg 1 3423114055_c5294b1832_1.jpg 1 3423114055_c5294b1832_2.jpg 1 ... n02251067_5119056.jpg 0 n02251067_k10880-1i.jpg 0 n02251233_5119056.jpg 0 n02251593_mealybug-1.jpg 0 I make sure that the data is shuffled during LMDB creation. I used absolute filenames when using an IMAGE_DATA layer instead of a DATA layer, but I still had the same results. This train.txt looks like it’s the same format other users use. Maybe there’s something wrong with whitespace characters? Here’s the code I use to generate train.txt and test.txt: # stage is ‘test’ or ‘train' positive_dir = join(wnid_dir, 'images', FLAGS.dataset, stage + '-positive') negative_dir = join(wnid_dir, 'images', FLAGS.dataset, stage + '-negative') with open(join(wnid_dir,'images', FLAGS.dataset, stage + '.txt'), 'w') as f: for name in listdir(positive_dir): f.write(name + ' 1\n') for name in listdir(negative_dir): f.write(name + ' 0\n') Here’s an explanation of this code: I have stored images in directories named train-positive, test-positive, train-negative, and test-negative, and then I symlink images in train-* into train/ and test-* into test/ I’ve attached the script I use to generate the LMDB database from the train.txt, test.txt, train/ and test/. What else can help diagnose the problem?