Question: the failure reason of test_dnn_training test

88 views
Skip to first unread message

Xiaohai Lee

unread,
Dec 3, 2016, 2:16:32 AM12/3/16
to CMU-OpenFace
Hi,

I'm building my openface docker image which is based on CUDA8.0 and Ubuntu16.04:
https://hub.docker.com/r/nightseas/openface/

I got a failed test case when running unit test script, and it looks like the test was asserted when training loss < 0.3
Could anyone tell me the reason and how to locate and solve the issue?

Thanks and BR.

Xiaohai Li


Logs:

```
root:~# cd openface/
root:~/openface# ./run-tests.sh
ttests.openface_api_tests.test_pipeline ... ok
tests.openface_batch_represent_tests.test_batch_represent ... ok
tests.openface_demo_tests.test_compare_demo ... ok
tests.openface_demo_tests.test_classification_demo_pretrained ... ok
tests.openface_demo_tests.test_classification_demo_pretrained_multi ... ok
tests.openface_demo_tests.test_classification_demo_training ... ok
tests.openface_neural_net_training_tests.test_dnn_training ... FAIL

======================================================================
FAIL: tests.openface_neural_net_training_tests.test_dnn_training
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/root/openface/tests/openface_neural_net_training_tests.py", line 82, in test_dnn_training
    assert np.mean(trainLoss) < 0.3
AssertionError:
-------------------- >> begin captured stdout << ---------------------
=== /root/openface/data/lfw-subset/raw/Anna_Kournikova/Anna_Kournikova_0006.jpg ===
=== /root/openface/data/lfw-subset/raw/Ann_Veneman/Ann_Veneman_0005.jpg ===
=== /root/openface/data/lfw-subset/raw/Adrien_Brody/Adrien_Brody_0005.jpg ===
=== /root/openface/data/lfw-subset/raw/Ann_Veneman/Ann_Veneman_0011.jpg ===
=== /root/openface/data/lfw-subset/raw/Adrien_Brody/Adrien_Brody_0011.jpg ===
=== /root/openface/data/lfw-subset/raw/Anna_Kournikova/Anna_Kournikova_0009.jpg ===
=== /root/openface/data/lfw-subset/raw/Ann_Veneman/Ann_Veneman_0003.jpg ===
=== /root/openface/data/lfw-subset/raw/Adrien_Brody/Adrien_Brody_0003.jpg ===
=== /root/openface/data/lfw-subset/raw/Adrien_Brody/Adrien_Brody_0010.jpg ===
=== /root/openface/data/lfw-subset/raw/Ann_Veneman/Ann_Veneman_0001.jpg ===
=== /root/openface/data/lfw-subset/raw/Anna_Kournikova/Anna_Kournikova_0003.jpg ===
=== /root/openface/data/lfw-subset/raw/Anna_Kournikova/Anna_Kournikova_0005.jpg ===
=== /root/openface/data/lfw-subset/raw/Ann_Veneman/Ann_Veneman_0010.jpg ===
=== /root/openface/data/lfw-subset/raw/Anna_Kournikova/Anna_Kournikova_0002.jpg ===
=== /root/openface/data/lfw-subset/raw/Ann_Veneman/Ann_Veneman_0007.jpg ===
=== /root/openface/data/lfw-subset/raw/Adrien_Brody/Adrien_Brody_0006.jpg ===
=== /root/openface/data/lfw-subset/raw/Anna_Kournikova/Anna_Kournikova_0012.jpg ===
=== /root/openface/data/lfw-subset/raw/Anna_Kournikova/Anna_Kournikova_0010.jpg ===
=== /root/openface/data/lfw-subset/raw/Adrien_Brody/Adrien_Brody_0007.jpg ===
=== /root/openface/data/lfw-subset/raw/Adrien_Brody/Adrien_Brody_0002.jpg ===
=== /root/openface/data/lfw-subset/raw/Anna_Kournikova/Anna_Kournikova_0011.jpg ===
=== /root/openface/data/lfw-subset/raw/Adrien_Brody/Adrien_Brody_0012.jpg ===
=== /root/openface/data/lfw-subset/raw/Adrien_Brody/Adrien_Brody_0004.jpg ===
=== /root/openface/data/lfw-subset/raw/Anna_Kournikova/Anna_Kournikova_0007.jpg ===
=== /root/openface/data/lfw-subset/raw/Ann_Veneman/Ann_Veneman_0006.jpg ===
=== /root/openface/data/lfw-subset/raw/Anna_Kournikova/Anna_Kournikova_0004.jpg ===
=== /root/openface/data/lfw-subset/raw/Adrien_Brody/Adrien_Brody_0008.jpg ===
=== /root/openface/data/lfw-subset/raw/Anna_Kournikova/Anna_Kournikova_0008.jpg ===
=== /root/openface/data/lfw-subset/raw/Anna_Kournikova/Anna_Kournikova_0001.jpg ===
=== /root/openface/data/lfw-subset/raw/Ann_Veneman/Ann_Veneman_0004.jpg ===
=== /root/openface/data/lfw-subset/raw/Adrien_Brody/Adrien_Brody_0009.jpg ===
=== /root/openface/data/lfw-subset/raw/Adrien_Brody/Adrien_Brody_0001.jpg ===
=== /root/openface/data/lfw-subset/raw/Ann_Veneman/Ann_Veneman_0002.jpg ===
=== /root/openface/data/lfw-subset/raw/Ann_Veneman/Ann_Veneman_0009.jpg ===
=== /root/openface/data/lfw-subset/raw/Ann_Veneman/Ann_Veneman_0008.jpg ===


=== /root/openface/data/lfw-subset/raw/Anna_Kournikova/Anna_Kournikova_0009.jpg ===
=== /root/openface/data/lfw-subset/raw/Ann_Veneman/Ann_Veneman_0002.jpg ===
=== /root/openface/data/lfw-subset/raw/Adrien_Brody/Adrien_Brody_0012.jpg ===
=== /root/openface/data/lfw-subset/raw/Anna_Kournikova/Anna_Kournikova_0010.jpg ===
=== /root/openface/data/lfw-subset/raw/Anna_Kournikova/Anna_Kournikova_0007.jpg ===
=== /root/openface/data/lfw-subset/raw/Adrien_Brody/Adrien_Brody_0010.jpg ===
=== /root/openface/data/lfw-subset/raw/Anna_Kournikova/Anna_Kournikova_0004.jpg ===
=== /root/openface/data/lfw-subset/raw/Ann_Veneman/Ann_Veneman_0005.jpg ===
=== /root/openface/data/lfw-subset/raw/Ann_Veneman/Ann_Veneman_0001.jpg ===
=== /root/openface/data/lfw-subset/raw/Adrien_Brody/Adrien_Brody_0011.jpg ===
=== /root/openface/data/lfw-subset/raw/Ann_Veneman/Ann_Veneman_0004.jpg ===
=== /root/openface/data/lfw-subset/raw/Adrien_Brody/Adrien_Brody_0001.jpg ===
=== /root/openface/data/lfw-subset/raw/Ann_Veneman/Ann_Veneman_0006.jpg ===
=== /root/openface/data/lfw-subset/raw/Ann_Veneman/Ann_Veneman_0009.jpg ===
=== /root/openface/data/lfw-subset/raw/Adrien_Brody/Adrien_Brody_0008.jpg ===
=== /root/openface/data/lfw-subset/raw/Adrien_Brody/Adrien_Brody_0003.jpg ===
=== /root/openface/data/lfw-subset/raw/Anna_Kournikova/Anna_Kournikova_0003.jpg ===
=== /root/openface/data/lfw-subset/raw/Anna_Kournikova/Anna_Kournikova_0011.jpg ===
=== /root/openface/data/lfw-subset/raw/Adrien_Brody/Adrien_Brody_0002.jpg ===
=== /root/openface/data/lfw-subset/raw/Adrien_Brody/Adrien_Brody_0004.jpg ===
=== /root/openface/data/lfw-subset/raw/Adrien_Brody/Adrien_Brody_0006.jpg ===
=== /root/openface/data/lfw-subset/raw/Adrien_Brody/Adrien_Brody_0009.jpg ===
=== /root/openface/data/lfw-subset/raw/Anna_Kournikova/Anna_Kournikova_0006.jpg ===
=== /root/openface/data/lfw-subset/raw/Ann_Veneman/Ann_Veneman_0011.jpg ===
=== /root/openface/data/lfw-subset/raw/Anna_Kournikova/Anna_Kournikova_0002.jpg ===
=== /root/openface/data/lfw-subset/raw/Ann_Veneman/Ann_Veneman_0007.jpg ===
=== /root/openface/data/lfw-subset/raw/Ann_Veneman/Ann_Veneman_0008.jpg ===
=== /root/openface/data/lfw-subset/raw/Anna_Kournikova/Anna_Kournikova_0005.jpg ===
=== /root/openface/data/lfw-subset/raw/Ann_Veneman/Ann_Veneman_0003.jpg ===
=== /root/openface/data/lfw-subset/raw/Adrien_Brody/Adrien_Brody_0007.jpg ===
=== /root/openface/data/lfw-subset/raw/Adrien_Brody/Adrien_Brody_0005.jpg ===
=== /root/openface/data/lfw-subset/raw/Anna_Kournikova/Anna_Kournikova_0001.jpg ===
=== /root/openface/data/lfw-subset/raw/Anna_Kournikova/Anna_Kournikova_0012.jpg ===
=== /root/openface/data/lfw-subset/raw/Anna_Kournikova/Anna_Kournikova_0008.jpg ===
=== /root/openface/data/lfw-subset/raw/Ann_Veneman/Ann_Veneman_0010.jpg ===


{
  cudnn : false
  testBatchSize : 800
  embSize : 128
  cache : "/tmp/OpenFaceTrainingTest-Net-RLtNQP"
  cudnn_bench : false
  cuda : false
  modelDef : "../models/openface/nn4.def.lua"
  data : "/tmp/OpenFaceTrainingTest-Img-uMvdxF/aligned"
  epochSize : 1
  nDonkeys : -1
  save : "/tmp/OpenFaceTrainingTest-Net-RLtNQP/1"
  nGPU : 1
  device : 1
  epochNumber : 1
  manualSeed : 2
  testing : false
  alpha : 0.2
  nEpochs : 10
  peoplePerBatch : 3
  imagesPerPerson : 10
  lfwDir : "../data/lfw/aligned"
  imgDim : 96
  retrain : "none"
}
Saving everything to: /tmp/OpenFaceTrainingTest-Net-RLtNQP/1   
Creating train metadata   
{
  sampleSize :
    {
      1 : 3
      2 : 96
      3 : 96
    }
  split : 100
  verbose : true
  paths :
    {
      1 : "/tmp/OpenFaceTrainingTest-Img-uMvdxF/aligned"
    }
  samplingMode : "balanced"
  loadSize :
    {
      1 : 3
      2 : 96
      3 : 96
    }
}
running "find" on each class directory, and concatenate all those filenames into a single file containing all image paths for a given class   
now combine all the files to a single large file   
load the large concatenated list of sample paths to self.imagePath   
34 samples found...... 0/34 ....................]  ETA: 0ms | Step: 0ms        
Updating classList and imageClass appropriately   
 [==================== 3/3 ====================>]  Tot: 7ms | Step: 2ms        
Cleaning up temporary files   
nClasses:     3   
==> doing epoch on training data:   
==> online epoch # 1   
  + (nTrips, nTripsFound) = (135, 135)   
Epoch: [1][1/1]    Time 2.954    tripErr 2.06e-01   
Epoch: [1][TRAINING SUMMARY] Total Time(s): 2.99    average triplet loss (per batch): 0.21   

   
==> doing epoch on training data:   
==> online epoch # 2   
  + (nTrips, nTripsFound) = (135, 134)   
Epoch: [2][1/1]    Time 3.476    tripErr 3.18e-01   
Epoch: [2][TRAINING SUMMARY] Total Time(s): 3.51    average triplet loss (per batch): 0.32   

   
==> doing epoch on training data:   
==> online epoch # 3   
  + (nTrips, nTripsFound) = (135, 133)   
Epoch: [3][1/1]    Time 3.222    tripErr 1.75e-01   
Epoch: [3][TRAINING SUMMARY] Total Time(s): 3.25    average triplet loss (per batch): 0.17   

   
==> doing epoch on training data:   
==> online epoch # 4   
  + (nTrips, nTripsFound) = (135, 134)   
Epoch: [4][1/1]    Time 3.120    tripErr 6.38e-01   
Epoch: [4][TRAINING SUMMARY] Total Time(s): 3.15    average triplet loss (per batch): 0.64   

   
==> doing epoch on training data:   
==> online epoch # 5   
  + (nTrips, nTripsFound) = (135, 112)   
Epoch: [5][1/1]    Time 3.380    tripErr 1.72e-01   
Epoch: [5][TRAINING SUMMARY] Total Time(s): 3.41    average triplet loss (per batch): 0.17   

   
==> doing epoch on training data:   
==> online epoch # 6   
  + (nTrips, nTripsFound) = (135, 127)   
Epoch: [6][1/1]    Time 3.055    tripErr 4.27e-01   
Epoch: [6][TRAINING SUMMARY] Total Time(s): 3.08    average triplet loss (per batch): 0.43   

   
==> doing epoch on training data:   
==> online epoch # 7   
  + (nTrips, nTripsFound) = (135, 135)   
Epoch: [7][1/1]    Time 3.330    tripErr 2.51e-01   
Epoch: [7][TRAINING SUMMARY] Total Time(s): 3.36    average triplet loss (per batch): 0.25   

   
==> doing epoch on training data:   
==> online epoch # 8   
  + (nTrips, nTripsFound) = (135, 129)   
Epoch: [8][1/1]    Time 3.395    tripErr 2.71e-01   
Epoch: [8][TRAINING SUMMARY] Total Time(s): 3.42    average triplet loss (per batch): 0.27   

   
==> doing epoch on training data:   
==> online epoch # 9   
  + (nTrips, nTripsFound) = (135, 122)   
Epoch: [9][1/1]    Time 2.930    tripErr 2.35e-01   
Epoch: [9][TRAINING SUMMARY] Total Time(s): 2.96    average triplet loss (per batch): 0.24   

   
==> doing epoch on training data:   
==> online epoch # 10   
  + (nTrips, nTripsFound) = (135, 135)   
Epoch: [10][1/1]    Time 2.794    tripErr 3.50e-01   
Epoch: [10][TRAINING SUMMARY] Total Time(s): 2.82    average triplet loss (per batch): 0.35   

   



--------------------- >> end captured stdout << ----------------------

----------------------------------------------------------------------
Ran 7 tests in 83.837s
```
Reply all
Reply to author
Forward
0 new messages