Floating point exception after first training

49 views
Skip to first unread message

goldst...@gmail.com

unread,
May 9, 2014, 3:04:44 PM5/9/14
to ebl...@googlegroups.com
Hello everyone,

Sorry for the duplicity, as there is a similar topic here: https://groups.google.com/forum/#!msg/eblearn/t6KE76O0PhQ/MBX-A-w09fgJ
The different comes that my problem is permanent. I cannot get through the first epoch.

I followed mnist tutorial and worked without trouble. All 20 computed epochs worked fine and obtained decent results.

I also generated (using dscompile) a training/testing datasets composed by 120x120 greyscale images using 7x7 kernels (also tried 25x25, I don't really understand the difference). I think it worked fine, as they look good in dsdisplay.

I created a .conf file using mnist as template, and changed convolution size to 21, 19, 17 (which seems to work) but gives this error:

computing 2nd order derivatives on 100 samples...
parameters: (x:(579614 min -0.0900183 max 0.5),dx: (579614 min 0 max 0),ddx: (579614 min 0 max 0))
Excepción de coma flotante (`core' generado)

As suggested in the other post, I'm also using "training_precision = double #float". I tried different training/testing sizes, and played with different parameters, but never got through that step... Comparing output with mnist tutorial, the next step is "diaghessian" output:

diaghessian inf: 0.0500128 sup: 46.7299 diaghessian_minutes=0.0333333

Why is giving me trouble this step?

I'm running 64-bit Ubuntu 12.04.

Thank you for your help!

Tarek



Complete output:


* Generic trainer
Using random seed: -340745579
Setting conf directory to: ./
_____________________ Configuration _____________________
add_features_dimension : 1
addc0_weights :
addc1_weights :
addc2_weights :
addc3_weights :
anneal_period : 0
anneal_value : 0.0
answer : class_answer
arch : zpad0,conv0,addc0,tanh,abs0,wstd0,subs1,addc1,tanh,conv2,addc2,tanh,abs2,wstd2,subs3,addc3,tanh,conv5,addc5,tanh,linear7,addc7,tanh
arch_detect : conv0,addc0,tanh,abs0,wstd0,subs1,addc1,tanh,conv2,addc2,tanh,abs2,wstd2,subs3,addc3,tanh,conv5,addc5,tanh,linear7,addc7,tanh
arch_fprop : conv0,addc0,tanh,abs0,wstd0,subs1,addc1,tanh,conv2,addc2,tanh,abs2,wstd2,subs3,addc3,tanh
arch_name : conv0,addc0,tanh,abs0,wstd0,subs1,addc1,tanh,conv2,addc2,tanh,abs2,wstd2,subs3,addc3,tanh,conv5,addc5,tanh,linear7,addc7,tanh
arch_name_fprop : cscs
arch_name_train : conv0,addc0,tanh,abs0,wstd0,subs1,addc1,tanh,conv2,addc2,tanh,abs2,wstd2,subs3,addc3,tanh,conv5,addc5,tanh,linear7,addc7,tanh
arch_train : conv0,addc0,tanh,abs0,wstd0,subs1,addc1,tanh,conv2,addc2,tanh,abs2,wstd2,subs3,addc3,tanh,conv5,addc5,tanh,linear7,addc7,tanh
balanced_training : 1
binary_target : 0
c0 : conv0,addc0,tanh,abs0,wstd0
c2 : conv2,addc2,tanh,abs2,wstd2
c5 : conv5,addc5,tanh
camera : directory
classes : /home/tarekh/Paquetes/DNN/eblearn/demos/cherenkov/trained/cherenkov_train_classes.mat
classification : 1
classifier : conv5,addc5,tanh,linear7,addc7,tanh
classifier_c : conv5,addc5,tanh
classifier_cf : conv5,addc5,tanh,linear7,addc7,tanh
classifier_hidden : 16
classifier_type : cf
conv0_kernel : 21x21
conv0_stride : 1x1
conv0_table :
conv0_table_in : 1
conv0_table_out : 6
conv0_weights :
conv2_kernel : 19x19
conv2_stride : 1x1
conv2_table : /home/tarekh/Paquetes/DNN/eblearn/tools/data/tables//table_6_16_connect_60.mat
conv2_table_in : thickness
conv2_table_out :
conv2_weights :
conv5_kernel : 17x17
conv5_stride : 1x1
conv5_table_in : thickness
conv5_table_out : 120
current_dir : ./
data_coeff : .01
display : 1
ebl : /home/tarekh/Paquetes/DNN/eblearn
epoch_mode : 1
epoch_show_modulo : 400
eta : .0001
f6 : linear6,addc6,tanh
f7 : linear7,addc7,tanh
features : conv0,addc0,tanh,abs0,wstd0,subs1,addc1,tanh,conv2,addc2,tanh,abs2,wstd2,subs3,addc3,tanh
features_cscs : conv0,addc0,tanh,abs0,wstd0,subs1,addc1,tanh,conv2,addc2,tanh,abs2,wstd2,subs3,addc3,tanh
features_name : cscs
features_type : cscs
gradient_threshold : 0.0
hardest_focus : 1
ignore_correct : 0
inertia : 0.0
input_dir : /home/tarekh/Paquetes/DNN/eblearn/tools/data/mnist/
input_gain : .01
iterations : 20
keep_outputs : 1
l2pool1_kernel : 2x2
l2pool1_stride : 2x2
l2pool3_kernel : 2x2
l2pool3_stride : 2x2
linear5_in :
linear5_out : noutputs
linear6_in : thickness
linear6_out : 16
linear7_in : thickness
linear7_out : noutputs
manual_load : 0
max_testing : 0
min_sample_weight : 0
name : cherenkov
ndiaghessian : 100
no_testing_test : 0
no_training_test : 0
nonlin : tanh
norm : 1
norm0_0 :
norm0_1 : wstd0
norm2_0 :
norm2_1 : wstd2
per_class_norm : 1
pool : subs
pp : zpad0
pp_detect : zpad0
pp_fprop : zpad0
pp_train : zpad0
pp_y : rgb_to_y0
pp_yp : rgb_to_yp0
pp_ypuv : rgb_to_ypuv0
pp_yuv : rgb_to_yuv0
random_class_order : 0
reg : 0
reg_l1 : 0
reg_l2 : 0
reg_time : 0
resizepp0_pp : rgb_to_ypuv0
resizepp0_zpad : 2x2x2x2
rgb_to_ypuv0_kernel : 7x7
root : /home/tarekh/Paquetes/DNN/eblearn/demos/cherenkov
run_type : train
s1 : subs1,addc1,tanh
s3 : subs3,addc3,tanh
sample_probabilities : 0
save_pickings : 0
save_weights : 1
scaling_type : 4
show_conf : 1
show_hsample : 5
show_train : 1
show_train_correct : 1
show_train_errors : 1
show_train_ninternals : 1
show_val_correct : 1
show_val_errors : 1
show_wait_user : 0
show_wsample : 18
shuffle_passes : 1
subs1_kernel : 2x2
subs1_stride : 2x2
subs3_kernel : 2x2
subs3_stride : 2x2
tblroot : /home/tarekh/Paquetes/DNN/eblearn/tools/data/tables/
test_display_modulo : 0
test_only : 0
train : /home/tarekh/Paquetes/DNN/eblearn/demos/cherenkov/trained/cherenkov_train_data.mat
train_labels : /home/tarekh/Paquetes/DNN/eblearn/demos/cherenkov/trained/cherenkov_train_labels.mat
train_size : 200
trainable_module1_energy : l2_energy
trainer : trainable_module1
training_precision : double
val : /home/tarekh/Paquetes/DNN/eblearn/demos/cherenkov/tested/cherenkov_test_data.mat
val_labels : /home/tarekh/Paquetes/DNN/eblearn/demos/cherenkov/tested/cherenkov_test_labels.mat
val_size : 100
weights : mnist_net00020.mat
wstd0_kernel : 21x21
wstd2_kernel : 19x19
zpad0_dims : 5x5
_________________________________________________________
Training precision: double
Varied variables:
Not using Intel IPP.
No category names found, using numbers.
No jitter information loaded.
Limiting val to 100 samples.
val: Shuffling of samples (training only) after each pass is activated.
val: Weighing of samples (training only) based on classification is activated.
val: learning is focused on hardest misclassified samples
Sample picking probabilities are normalized per class with minimum probability 0
val: Each training epoch sees 100 samples.

No scale information loaded.
val: Setting training as balanced (taking class distributions into account).
val: Setting training as balanced (taking class distributions into account).
val: This is a testing set only.
val: classification dataset "val" contains 100 samples of dimension 1x120x120 and defines an epoch as 100 samples.
val: It has 2 classes: 5816 "0" 2841 "1"
val: no scales information.
val: Setting data coefficient to 0.01
val: Print training count every 400 samples.
Keeping model outputs for each sample.
No category names found, using numbers.
No jitter information loaded.
Limiting train to 200 samples.
train: Shuffling of samples (training only) after each pass is activated.
train: Weighing of samples (training only) based on classification is activated.
train: learning is focused on hardest misclassified samples
Sample picking probabilities are normalized per class with minimum probability 0
train: Each training epoch sees 200 samples.

No scale information loaded.
train: Setting training as balanced (taking class distributions into account).
train: Setting training as balanced (taking class distributions into account).
train: Setting training as balanced (taking class distributions into account).
Classes order is not random.
train: Weighing of samples (training only) based on classification is deactivated.
train: Shuffling of samples (training only) after each pass is activated.
train: Setting epoch mode to 1 (see all samples at least once)
train: Print training count every 400 samples.
train: classification dataset "train" contains 200 samples of dimension 1x120x120 and defines an epoch as 200 samples.
train: It has 2 classes: 3140 "0" 2516 "1"
train: no scales information.
train: Setting data coefficient to 0.01
Keeping model outputs for each sample.
Targets: 2x2
[[ 1 -1 ] [ -1 1 ]]
Using max confidence formula with normalization ratio 2
smoothing kernel:
[[ -0.00524478 -0.0105816 -0.0163916 -0.0204344 -0.0217705 -0.0204344 -0.0163916 -0.0105816 -0.00524478 ]
[ -0.0105816 -0.0190743 -0.0251934 -0.0262128 -0.0254891 -0.0262128 -0.0251934 -0.0190743 -0.0105816 ]
[ -0.0163916 -0.0251934 -0.0239492 -0.0118684 -0.00420322 -0.0118684 -0.0239492 -0.0251934 -0.0163916 ]
[ -0.0204344 -0.0262128 -0.0118684 0.0194892 0.0368525 0.0194892 -0.0118684 -0.0262128 -0.0204344 ]
[ -0.0217705 -0.0254891 -0.00420322 0.0368525 0.059015 0.0368525 -0.00420322 -0.0254891 -0.0217705 ]
[ -0.0204344 -0.0262128 -0.0118684 0.0194892 0.0368525 0.0194892 -0.0118684 -0.0262128 -0.0204344 ]
[ -0.0163916 -0.0251934 -0.0239492 -0.0118684 -0.00420322 -0.0118684 -0.0239492 -0.0251934 -0.0163916 ]
[ -0.0105816 -0.0190743 -0.0251934 -0.0262128 -0.0254891 -0.0262128 -0.0251934 -0.0190743 -0.0105816 ]
[ -0.00524478 -0.0105816 -0.0163916 -0.0204344 -0.0217705 -0.0204344 -0.0163916 -0.0105816 -0.00524478 ]]
Targets: 2x2
[[ 1 -1 ] [ -1 1 ]]
Answering module: class_answer module class_answer with 2 classes, confidence type 2 and targets 2x2.
Creating a network with 2 outputs and 23 modules (input thickness is -1): zpad0,conv0,addc0,tanh,abs0,wstd0,subs1,addc1,tanh,conv2,addc2,tanh,abs2,wstd2,subs3,addc3,tanh,conv5,addc5,tanh,linear7,addc7,tanh
arch 0: Added zpad module zpad0 is padding with: [ 2x2x2x2 ] (#params 0, thickness -1)
arch 1: Using a full table for conv0_table: 1 -> 6 (6x2)
Added convolution module conv0 with 6 kernels with size 21x21, stride 1x1 and table 6x2 (1->6) (#params 2646, thickness 6)
arch 2: Added bias module addc0 with 6 biases (#params 2652, thickness 6)
arch 3: Added tanh module tanh with linear coefficient 0 (#params 2652, thickness 6)
arch 4: Added abs (#params 2652, thickness 6)
arch 5: Added contrast_norm module with subtractive_norm module with fixed mean weighting and kernel (x:(6x21x21 min 2.69921e-11 max 0.00505258),dx:,ddx:), across features, not using global normalization, and same convolution and divisive_norm module wstd0_divnorm with kernel 21x21, using zero padding, across features, using fixed filter (21x21 min 1.61953e-10 max 0.0303155) (#params 2652, thickness 6)
arch 6: Random seed initialized to 0
Added subsampling module subs1 with thickness 6, kernel 2x2 and stride 2x2 (#params 2658, thickness 6)
arch 7: Added bias module addc1 with 6 biases (#params 2664, thickness 6)
arch 8: Added tanh module tanh with linear coefficient 0 (#params 2664, thickness 6)
arch 9: Loaded conv2_table (60x2) from /home/tarekh/Paquetes/DNN/eblearn/tools/data/tables//table_6_16_connect_60.mat
Added convolution module conv2 with 60 kernels with size 19x19, stride 1x1 and table 60x2 (6->16) (#params 24324, thickness 16)
arch 10: Added bias module addc2 with 16 biases (#params 24340, thickness 16)
arch 11: Added tanh module tanh with linear coefficient 0 (#params 24340, thickness 16)
arch 12: Added abs (#params 24340, thickness 16)
arch 13: Added contrast_norm module with subtractive_norm module with fixed mean weighting and kernel (x:(16x19x19 min 8.22531e-11 max 0.00209419),dx:,ddx:), across features, not using global normalization, and same convolution and divisive_norm module wstd2_divnorm with kernel 19x19, using zero padding, across features, using fixed filter (19x19 min 1.31605e-09 max 0.033507) (#params 24340, thickness 16)
arch 14: Random seed initialized to 0
Added subsampling module subs3 with thickness 16, kernel 2x2 and stride 2x2 (#params 24356, thickness 16)
arch 15: Added bias module addc3 with 16 biases (#params 24372, thickness 16)
arch 16: Added tanh module tanh with linear coefficient 0 (#params 24372, thickness 16)
arch 17: Using a full table for conv5_table: 16 -> 120 (1920x2)
Added convolution module conv5 with 1920 kernels with size 17x17, stride 1x1 and table 1920x2 (16->120) (#params 579252, thickness 120)
arch 18: Added bias module addc5 with 120 biases (#params 579372, thickness 120)
arch 19: Added tanh module tanh with linear coefficient 0 (#params 579372, thickness 120)
arch 20: Added linear module linear7 120 -> 2 (#params 579612, thickness 2)
arch 21: Added bias module addc7 with 2 biases (#params 579614, thickness 2)
arch 22: Added tanh module tanh with linear coefficient 0 (#params 579614, thickness 2)
arch: loaded 23 modules.
Targets: 2x2
[[ 1 -1 ] [ -1 1 ]]
Training with: trainer module : energy l2_energy is the euclidean distance between inputs, class_answer module class_answer with 2 classes, confidence type 2 and targets 2x2.
Setting progress file to "progress"
Random seed initialized to 0
Initializing weights from random.
Gradient parameters: eta 0.0001 stopping threshold 0 decay_l1 0 decay_l2 0 decay_time 0 inertia 0 anneal_value 0 annueal_period 0 gradient threshold 0
Testing on 100 samples...machine sizes: [ 1x120x120 ] -> zpad0 -> [ 1x124x124 ] -> conv0 -> [ (0,-2,-2)6x108x108 ] -> addc0 -> [ (0,-2,-2)6x108x108 ] -> tanh -> [ (0,-2,-2)6x108x108 ] -> abs -> [ (0,-2,-2)6x108x108 ] -> wstd0 -> [ (0,-2,-2)6x108x108 ] -> subs1 -> [ (0,-2,-2)6x54x54 ] -> addc1 -> [ (0,-4,-4)6x54x54 ] -> tanh -> [ (0,-4,-4)6x54x54 ] -> conv2 -> [ (0,-4,-4)16x36x36 ] -> addc2 -> [ (0,-4,-4)16x36x36 ] -> tanh -> [ (0,-4,-4)16x36x36 ] -> abs -> [ (0,-4,-4)16x36x36 ] -> wstd2 -> [ (0,-4,-4)16x36x36 ] -> subs3 -> [ (0,-4,-4)16x18x18 ] -> addc3 -> [ (0,-8,-8)16x18x18 ] -> tanh -> [ (0,-8,-8)16x18x18 ] -> conv5 -> [ (0,-8,-8)120x2x2 ] -> addc5 -> [ (0,-8,-8)120x2x2 ] -> tanh -> [ (0,-8,-8)120x2x2 ] -> linear7 -> [ (0,-8,-8)2x2x2 ] -> addc7 -> [ (0,-8,-8)2x2x2 ] -> tanh -> [ (0,-8,-8)2x2x2 ]
trainable parameters: (x:[ 579614 ],dx:[ 579614 ],ddx:[ 579614 ])
training: 0 / 200, elapsed: 1s, ETA: 3m 20s, remaining: 0: 200 1: 0
i=0 name=train [0] sz=200 energy=0.971096 (class-normalized) errors=29.5% uerrors=29.5% rejects=0% (class-normalized) correct=70.5% ucorrect=70.5%
errors per class: 0_samples=200 0_errors=29.5%
success per class: 0_samples=200 0_success=70.5%

testing: 0 / 100, elapsed: 1s, ETA: 1m 40s
i=0 name=val [0] sz=100 test_energy=1.00066 (class-normalized) test_errors=32% test_uerrors=32% test_rejects=0% (class-normalized) test_correct=68% test_ucorrect=68%
errors per class: test_0_samples=100 test_0_errors=32%
success per class: test_0_samples=100 test_0_success=68%

testing_time=56s
Displaying training...
saving net to cherenkov_net00000.mat
saved=cherenkov_net00000.mat
save_pattern=cherenkov_net00000
Training network with 200 training samples and 100 val samples for 20 iterations:
__ epoch 1 ______________________________________________________________________
training on 200 samples and recomputing 2nd order derivatives on 100 samples after every 4000 trained samples...
computing 2nd order derivatives on 100 samples...
parameters: (x:(579614 min -0.0900183 max 0.5),dx: (579614 min 0 max 0),ddx: (579614 min 0 max 0))
Excepción de coma flotante (`core' generado)
Reply all
Reply to author
Forward
0 new messages