Multiple Digit Recognition Keras

Ritchie Ng

unread,

Sep 30, 2016, 5:48:33 PM9/30/16

to Keras-users

Hi,

I'm trying to use the convolution layer as an input and to have 5 multiple fully connected layers to recognize 5 digits in the SVHN dataset. Does anyone know how to do this in Keras? I'm stuck at the at convolution layer as this branches out. I can't find anything in the documentation. Thanks a lot. I've been struggling.

model = Sequential()

model.add(Convolution2D(32, 3, 3, border_mode='same',
                        input_shape=(img_channels, img_rows, img_cols)))
model.add(Activation('relu'))
model.add(Convolution2D(32, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Convolution2D(64, 3, 3, border_mode='same'))
model.add(Activation('relu'))
model.add(Convolution2D(64, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

# So what happens after this to have 5 independent fully connected layers to recognise the multiple digits?

Klemen Grm

unread,

Sep 30, 2016, 6:35:04 PM9/30/16

to Keras-users

The model has multiple outputs, so you should specify it with the functional API, something like this:

x = Input((img_channels, img_rows, img_cols))

y = Convolution2D(32, 3, 3, activation="relu", border_mode="same")(x)
y = Convolution2D(32, 3, 3, activation="relu")(y)
y = MaxPooling2D((2, 2))(y)
y = Dropout(0.25)(y)

y = Convolution2D(64, 3, 3, border_mode="same", activation="relu")(y)
y = Convolution2D(64, 3, 3, activation="relu")(y)
y = MaxPooling2D((2, 2))(y)
y = Dropout(0.25)(y)


y = Flatten()(y)
y = Dense(1024, activation="relu")(y)

length = Dense(4, activation="softmax")(y)
digit1 = Dense(10, activation="softmax")(y)
digit2 = Dense(10, activation="softmax")(y)
digit3 = Dense(10, activation="softmax")(y)

model = Model(input=x, output=[length, digit1, digit2, digit3])

See documentation at https://keras.io/getting-started/functional-api-guide/ for more details.

Ritchie Ng

unread,

Oct 1, 2016, 7:55:28 PM10/1/16

to Keras-users

Thank you so so much for your help. I've been working for the whole day struggling with an error:

Exception: The model expects 5 input arrays, but only received one array. Found: array with shape (188602, 5, 11)

I've the following well prepared data shapes:

X_train: (188602, 32, 32, 1)

y_train_dummy: (188602, 5, 11)

X_val: (47151, 32, 32, 1)

y_val_dummy: (47151, 5, 11)

My code:

batch_size = 128
nb_classes = 10
nb_epoch = 2

_, img_rows, img_cols, img_channels = X_train.shape

model_input = Input(shape=(img_rows, img_cols, img_channels))
x = Convolution2D(32, 3, 3, border_mode='same')(model_input)
x = Activation('relu')(x)
x = Convolution2D(32, 3, 3)(x)
x = Activation('relu')(x)
x = MaxPooling2D(pool_size=(2, 2))(x)
x = Dropout(0.25)(x)
x = Flatten()(x)

x = Dense(1024, activation="relu")(x)


x1 = Dense(nb_classes, activation='softmax')(x)
x2 = Dense(nb_classes, activation='softmax')(x)
x3 = Dense(nb_classes, activation='softmax')(x)
x4 = Dense(nb_classes, activation='softmax')(x)
x5 = Dense(nb_classes, activation='softmax')(x)

lst = [x1, x2, x3, x4, x5]

model = Model(input=model_input, output=lst)

sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd)
model.fit(X_train, y_train_dummy, batch_size=batch_size, nb_epoch=nb_epoch, verbose=1, validation_data=(X_val, y_val_dummy))

Klemen Grm

unread,

Oct 1, 2016, 8:50:33 PM10/1/16

to Keras-users

Your model's outputs are a list, so the training/testing output tensors should be split into five arrays and provided as a list as well, as in

model.fit(x, [y1, y2, y3, y4, y5], validatation_data=(vx, [vy1, vy2, vy3, vy4, vy5]))

Where each of the output tensors has the same number of samples and appropriate further dimensions for the model's output shapes.

Ritchie Ng

unread,

Oct 1, 2016, 10:37:44 PM10/1/16

to Keras-users

You're a lifesaver! Yeah. I split the data right and it works now.

Although I'm bit confused with the accuracy evaluation. When I run it model.compile, it gives the following (it works!):

Train on 188602 samples, validate on 47151 samples 

Epoch 1/10 188602/188602 [==============================] - 229s - loss: 4.2374 - dense_14_loss: 1.4432 - dense_15_loss: 1.6796 - dense_16_loss: 0.9792 - dense_17_loss: 0.1342 - dense_18_loss: 0.0013 - dense_14_acc: 0.5091 - dense_15_acc: 0.3580 - dense_16_acc: 0.3715 - dense_17_acc: 0.4727 - dense_18_acc: 0.0039 - val_loss: 2.3544 - val_dense_14_loss: 0.7367 - val_dense_15_loss: 0.9298 - val_dense_16_loss: 0.5922 - val_dense_17_loss: 0.0947 - val_dense_18_loss: 0.0010 - val_dense_14_acc: 0.7791 - val_dense_15_acc: 0.6632 - val_dense_16_acc: 0.4265 - val_dense_17_acc: 0.1512 - val_dense_18_acc: 0.0154

Subsequently I ran model.evaluate it has that many accuracies (5)

score = model.evaluate(X_val, y_val_lst, verbose=1) 
print('Validation error:', 100 - score[1]*100)


There are multiple accuracy scores on top (5 actually for 5 branches/output). Does it mean Keras automatically calculates the accuracy for say "712" from an image showing "712" after training through model.evaluate?

How do we calculate the entire accuracy "712" instead of individual numbers?

Klemen Grm

unread,

Oct 2, 2016, 9:15:51 AM10/2/16

to Keras-users

Keras won't do this for you, as each output is treated independently. Their loss functions can be added up to make up the overall model loss, but it would be inaccurate to do the same with metrics such as accuracy. You can, however, calculate it yourself. If I understand your problem correctly, the model's prediction is correct if every softmax output is correct. Therefore, you need something like this:

y_pred_list = model.predict(x_val)

correct_preds = 0
for i in xrange(x_val.shape[0]):          #iterate over sample dimension
    pred_list_i = [y_pred_list[i] for y_pred in y_pred_list]
    val_list_i  = [y_val_list[i] for y_val in y_val_list]
    matching_preds = [pred.argmax(-1) == val.argmax(-1) for pred, val in zip(pred_list_i, val_list_i)
    correct_preds = int(np.all(matching_preds))

total_acc = correct_preds / float(x_val.shape[0])

Ritchie Ng

unread,

Oct 2, 2016, 12:18:14 PM10/2/16

to Keras-users

I've been working on this since you gave the suggestion thanks :)

The process seems to be running forever, I think the code is computationally intensive.

Is there something wrong when my prediction values are not integers? They are like small decimal places. I'm not sure why.

Link to python notebook.

Klemen Grm

unread,

Oct 2, 2016, 12:43:31 PM10/2/16

to Keras-users

I've just noticed an error. It should be

 code here...for i in xrange(x_val.shape[0]):             p
    red_list_i = [y_pred[i] for y_pred in y_pred_list]
    val_list_i  = [y_val[i] for y_val in y_val_list]


    matching_preds = [pred.argmax(-1) == val.argmax(-1) for pred, val in zip(pred_list_i, val_list_i)
    correct_preds = int(np.all(matching_preds))

Also, your softmax output will always be real-valued - the output of a softmax classifier is a probability distribution over the possible output classes. You extract the maximum likelihood prediction from individual sample outputs by argmax over the final axis of the output tensor.

Ritchie Ng

unread,

Oct 2, 2016, 3:16:42 PM10/2/16

to Keras-users

I tried running this for 10 epochs (took quite long on my laptop), it gave 0% accuracy. Are we doing something entirely wrong here? :X

model_input = Input(shape=(img_rows, img_cols, img_channels))
x = Convolution2D(32, 3, 3, border_mode='same')(model_input)
x = Activation('relu')(x)

x = Convolution2D(32, 3, 3, border_mode='same')(x)

x = Activation('relu')(x)

x = MaxPooling2D((2,2), strides=(2,2))(x)
x = Dropout(0.5)(x)

x = Flatten()(x)
x = Dense(1024, activation="relu")(x)

# length = Dense(4, activation='softmax')(x)
digit_1 = Dense(nb_classes, activation='softmax')(x)
digit_2 = Dense(nb_classes, activation='softmax')(x)
digit_3 = Dense(nb_classes, activation='softmax')(x)
digit_4 = Dense(nb_classes, activation='softmax')(x)
digit_5 = Dense(nb_classes, activation='softmax')(x)
branches = [digit_1, digit_2, digit_3, digit_4, digit_5]
model = Model(input=model_input, output=branches)
# let's train the model using SGD + momentum

sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)

model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['categorical_accuracy'])
history = model.fit(X_train, y_train_lst, batch_size=batch_size, nb_epoch=nb_epoch, verbose=1, validation_data=(X_val, y_val_lst))
correct_preds = 0
# Iterate over sample dimension
for i in range(X_val.shape[0]):
pred_list_i = [y_pred[i] for y_pred in y_pred_list]
val_list_i = [y_val[i] for y_val in y_val_lst]

matching_preds = [pred.argmax(-1) == val.argmax(-1) for pred, val in zip(pred_list_i, val_list_i)]
correct_preds = int(np.all(matching_preds))

total_acc = (correct_preds / float(X_val.shape[0]))*100
print(total_acc)

Output at last epoch

Epoch 8/10
188602/188602 [==============================] - 248s - loss: 0.9182 - dense_8_loss: 0.2776 - dense_9_loss: 0.3641 - dense_10_loss: 0.2361 - dense_11_loss: 0.0395 - dense_12_loss: 8.5433e-04 - dense_8_categorical_accuracy: 0.9130 - dense_9_categorical_accuracy: 0.8242 - dense_10_categorical_accuracy: 0.5283 - dense_11_categorical_accuracy: 0.1643 - dense_12_categorical_accuracy: 0.1027 - val_loss: 0.9377 - val_dense_8_loss: 0.2888 - val_dense_9_loss: 0.3668 - val_dense_10_loss: 0.2378 - val_dense_11_loss: 0.0434 - val_dense_12_loss: 8.4473e-04 - val_dense_8_categorical_accuracy: 0.9160 - val_dense_9_categorical_accuracy: 0.8329 - val_dense_10_categorical_accuracy: 0.5340 - val_dense_11_categorical_accuracy: 0.1562 - val_dense_12_categorical_accuracy: 0.0739
 
Epoch 9/10
188602/188602 [==============================] - 248s - loss: 0.8456 - dense_8_loss: 0.2573 - dense_9_loss: 0.3367 - dense_10_loss: 0.2145 - dense_11_loss: 0.0362 - dense_12_loss: 7.9260e-04 - dense_8_categorical_accuracy: 0.9189 - dense_9_categorical_accuracy: 0.8320 - dense_10_categorical_accuracy: 0.5320 - dense_11_categorical_accuracy: 0.1652 - dense_12_categorical_accuracy: 0.1109 - val_loss: 0.9151 - val_dense_8_loss: 0.2821 - val_dense_9_loss: 0.3587 - val_dense_10_loss: 0.2317 - val_dense_11_loss: 0.0417 - val_dense_12_loss: 8.2378e-04 - val_dense_8_categorical_accuracy: 0.9185 - val_dense_9_categorical_accuracy: 0.8358 - val_dense_10_categorical_accuracy: 0.5308 - val_dense_11_categorical_accuracy: 0.1610 - val_dense_12_categorical_accuracy: 0.0831
 
Epoch 10/10
188602/188602 [==============================] - 247s - loss: 0.7835 - dense_8_loss: 0.2402 - dense_9_loss: 0.3115 - dense_10_loss: 0.1978 - dense_11_loss: 0.0332 - dense_12_loss: 7.1053e-04 - dense_8_categorical_accuracy: 0.9236 - dense_9_categorical_accuracy: 0.8406 - dense_10_categorical_accuracy: 0.5379 - dense_11_categorical_accuracy: 0.1640 - dense_12_categorical_accuracy: 0.1093 - val_loss: 0.9023 - val_dense_8_loss: 0.2776 - val_dense_9_loss: 0.3548 - val_dense_10_loss: 0.2266 - val_dense_11_loss: 0.0425 - val_dense_12_loss: 8.0618e-04 - val_dense_8_categorical_accuracy: 0.9197 - val_dense_9_categorical_accuracy: 0.8368 - val_dense_10_categorical_accuracy: 0.5357 - val_dense_11_categorical_accuracy: 0.1713 - val_dense_12_categorical_accuracy: 0.1190

Daπid

unread,

Oct 2, 2016, 3:34:05 PM10/2/16

to Ritchie Ng, Keras-users

A random predictor would have an accuracy of 0.1, your accuracies are 0.9197, 0.8368, 0.5357, 0.1713 and 0.1190. They are raw values, not percentages (so, 91% accuracy for the first, and so on). You are definitely learning, but there is some problem with the last two digits. Possibly, longer training and adaptative learning rate (for example, Adam instead of SGD) will help you here.

What is the size of your images? MaxPooling may or may not be your friend here.

Aside, deep learning is not computationally cheap, I strongly recommend you get access to a GPU, or at the very least, a proper computer you can leave things running for hours.

--
You received this message because you are subscribed to the Google Groups "Keras-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to keras-users+unsubscribe@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/keras-users/c2f353e3-2424-46b5-9a47-f0b3b0f882fd%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Ritchie Ng

unread,

Oct 2, 2016, 3:51:19 PM10/2/16

to Daπid, Keras-users

Hi,

Thanks for your help, really, I’ve been struggling.

1. My images are 32 x 32.

2. I modified the code to use adam’s algorithm for optimisation.

3. I’m currently using a GPU 750M (300+ CUDA cores with 2GB GDDR) running about 300s per epoch of 188602 images.

So my code is fine? I’m confused as to whether it’s my code, or that I’m not running long enough. Because I’m trying to calculate the final validation accuracy and test accuracy.

Even with 10 epochs, does it make sense it’s 0?

My notebook (if you scroll all the way down, you can find the model, the rest are data pre-processing which has been done well):

https://github.com/ritchieng/gloqo/blob/master/NumNum/project.ipynb

correct_preds = 0
# Iterate over sample dimension
for i in range(X_val.shape[0]):
pred_list_i = [y_pred[i] for y_pred in y_pred_list]
val_list_i = [y_val[i] for y_val in y_val_lst]
matching_preds = [pred.argmax(-1) == val.argmax(-1) for pred, val in zip(pred_list_i, val_list_i)]
correct_preds = int(np.all(matching_preds))
total_acc = (correct_preds / float(X_val.shape[0]))*100
print(total_acc)

Daπid

unread,

Oct 2, 2016, 4:01:54 PM10/2/16

to Ritchie Ng, Keras-users

On 2 October 2016 at 21:51, Ritchie Ng <ritchi...@gmail.com> wrote:

So my code is fine? I’m confused as to whether it’s my code, or that I’m not running long enough.

Your accuracies keep increasing and the loss decreasing, you can learn more. But you should add more regularisation: dropouts everywhere, and specially before each dense layer (it doesn't help so much for convolutional layers)

Because I’m trying to calculate the final validation accuracy and test accuracy.

Even with 10 epochs, does it make sense it’s 0?

It is not 0, Keras is telling you it is between 0.9 and 0.11.

Ritchie Ng

unread,

Oct 2, 2016, 4:08:55 PM10/2/16

to Daπid, Keras-users

1. I’ll add dropouts everywhere particular before each dense layer.

2. I understand accuracy is not 0 but between 0.9 and 0.11.

However what I’m trying to do here is to predict the number say “712” from the google image from the dataset.

And according to another helpful user, Klemen Grm, I used the following code to calculate the overall prediction instead of the individual number’s accuracy. When I run it. It’s 0%. It means not even one prediction was right. Is there some issue?

correct_preds = 0
# Iterate over sample dimension
for i in range(X_val.shape[0]):         
    pred_list_i = [y_pred[i] for y_pred in y_pred_list]
    val_list_i  = [y_val[i] for y_val in y_val_lst]
    matching_preds = [pred.argmax(-1) == val.argmax(-1) for pred, val in zip(pred_list_i, val_list_i)]
    correct_preds = int(np.all(matching_preds))

total_acc = (correct_preds / float(X_val.shape[0]))*100
print(total_acc)

Thanks for your help. I’m letting this project be open-source so others can learn how to use keras for the SVHN dataset for multi-digit prediction. But I’m stucked here.

Notebook: https://github.com/ritchieng/gloqo/blob/master/NumNum/project.ipynb

Daπid

unread,

Oct 2, 2016, 4:27:11 PM10/2/16

to Ritchie Ng, Keras-users

On 2 October 2016 at 22:08, Ritchie Ng <ritchi...@gmail.com> wrote:
>
> However what I’m trying to do here is to predict the number say “712” from
> the google image from the dataset.
> And according to another helpful user, Klemen Grm, I used the following code
> to calculate the overall prediction instead of the individual number’s
> accuracy. When I run it. It’s 0%. It means not even one prediction was
> right. Is there some issue?

Ah, I understand what you mean now. Assuming the errors are
independent, your total accuracy would be the product of the digit
accuracies, or 0.8%. Not surprising you don't get anything right.

I recommend you play with the architectures (more/less layers,
less/more maxpool, more/less filters per convolutional layer), and
more importantly, see the cases you are getting wrong. My guess is
that the ones with very bad resolution or contrast are the ones you
are getting wrong, and some enhancement preprocessing may be helpful.

Plotting confusion matrices for each digit may reveal some insight here as well.

Klemen Grm

unread,

Oct 2, 2016, 4:29:49 PM10/2/16

to Keras-users, david...@gmail.com

If you are unsure about the total accuracy calculation, try looking at the predictions themselves - look at which house number your network would predict and see whether it matches entirely with the test set output.

Also, since you're predicting individual digits, how are you masking superfluous digits for samples that are shorter than the number of model outputs? I recommend inventing a dummy class for them, reformatting your output tensors and increasing the number of classes by 1 in all digit outputs accordingly.

Ritchie Ng

unread,

Oct 2, 2016, 4:41:25 PM10/2/16

to Daπid, Keras-users

The weird thing is that this works with an implementation on TensorFlow. But I can’t figure out what’s going wrong here.

It seems that the last digits, digit 4 and 5 with low accuracies are not learning. Their loss did not decrease regardless of the architecture. Their initial loss are already almost 0. How is this possible?

Message has been deleted

Qixianbiao Qixianbiao

unread,

Oct 2, 2016, 9:32:46 PM10/2/16

to Keras-users, david...@gmail.com

The following content is based on my guess.

The google street house number should have 1-5 numbers (I am not sure, pls check it), thus you should use five dense classification outputs, but you should determine when you should output 1, 2, 3, 4, or 5 numbers.

if the last two outputs are classified into two number with small probs, you should output only three numbers. If the last four classifiers obtain low probs, you should consider only outputing one number.

That is why the training losts momentumly decrease from classifier one to classifier five,

and also the reason why the val accuracies are 0.9197, 0.8368, 0.5357, 0.1713 and 0.1190.

在 2016年10月3日星期一 UTC+8上午4:41:25，Ritchie Ng写道：

whet...@gmail.com

unread,

Nov 20, 2016, 3:32:27 AM11/20/16

to Keras-users

Ritchie I am not sure I follow. How does

model.fit(x, [y1, y2, y3, y4, y5], validatation_data=(vx, [vy1, vy2, vy3, vy4, vy5]))

solve your problem? What does your code look like prior the start of the training procedure?

whet...@gmail.com

unread,

Nov 20, 2016, 3:49:48 AM11/20/16

to Keras-users

Can you help me here: history = model.fit(X_train, y_train_lst, batch_size=batch_size, nb_epoch=nb_epoch, verbose=1, validation_data=(X_val, y_val_lsty_train_dummy: (188602, 5, 11)))? How do you go from y_train_dummy: (188602, 5, 11) to y_train_lst? Per Klemens response it seems this is critical to your fitting process but can you provide some example code of how to actually go from the original y_train_dummy to the important y_train_lst?

Thank you!

gru...@gmail.com

unread,

Jan 1, 2017, 7:00:12 PM1/1/17

to Keras-users

I, too, am trying to implement a model whose outputs are a list, and have split the labels into separate y1, y2, etc., but am getting the following error:

Exception: Error when checking model target: expected length to have shape (None, 4) but got array with shape (111897, 1)

My (admittedly lengthy) model is reproduced below:

# Layer 0: Input
x = Input((img_rows, img_cols, img_channels))

# Layer 1: 48-unit maxout convolution
y = Convolution2D(nb_filter = 48, nb_row = 5, nb_col = 5, border_mode="same", name="1conv")(x)
y = MaxPooling2D(pool_size = (2, 2), strides = (2, 2), border_mode="same", name="1maxpool")(y)
# y = SubtractiveNormalization((3,3))(y)
y = Dropout(0.25, name="1drop")(y)
# y = MaxoutDense(output_dim = 48, nb_feature=3)(y)
y = Activation('relu', name="1activ")(y)

# Layer 2: 64-unit relu convolution
y = Convolution2D(nb_filter = 64, nb_row = 5, nb_col = 5, border_mode="same", name="2conv")(y)
y = MaxPooling2D(pool_size = (2, 2), strides = (1, 1), border_mode="same", name="2maxpool")(y)
# y = SubtractiveNormalization((3,3))(y)
y = Dropout(0.25, name="2drop")(y)
y = Activation('relu', name="2activ")(y)

# Layer 3: 128-unit relu convolution
y = Convolution2D(nb_filter = 128, nb_row = 5, nb_col = 5, border_mode="same", name="3conv")(y)
y = MaxPooling2D(pool_size = (2, 2), strides = (2, 2), border_mode="same", name="3maxpool")(y)
# y = SubtractiveNormalization((3,3))(y)
y = Dropout(0.25, name="3drop")(y)
y = Activation('relu', name="3activ")(y)

# Layer 4: 160-unit relu convolution
y = Convolution2D(nb_filter = 160, nb_row = 5, nb_col = 5, border_mode="same", name="4conv")(y)
y = MaxPooling2D(pool_size = (2, 2), strides = (1, 1), border_mode="same", name="4maxpool")(y)
# y = SubtractiveNormalization((3,3))(y)
y = Dropout(0.25, name="4drop")(y)
y = Activation('relu', name="4activ")(y)

# Layer 5: 192-unit relu convolution
y = Convolution2D(nb_filter = 192, nb_row = 5, nb_col = 5, border_mode="same", name="5conv")(y)
y = MaxPooling2D(pool_size = (2, 2), strides = (2, 2), border_mode="same", name="5maxpool")(y)
# y = SubtractiveNormalization((3,3))(y)
y = Dropout(0.25, name="5drop")(y)
y = Activation('relu', name="5activ")(y)

# Layer 6: 192-unit relu convolution
y = Convolution2D(nb_filter = 192, nb_row = 5, nb_col = 5, border_mode="same", name="6conv")(y)
y = MaxPooling2D(pool_size = (2, 2), strides = (1, 1), border_mode="same", name="6maxpool")(y)
# y = SubtractiveNormalization((3,3))(y)
y = Dropout(0.25, name="6drop")(y)
y = Activation('relu', name="6activ")(y)

# Layer 7: 192-unit relu convolution
y = Convolution2D(nb_filter = 192, nb_row = 5, nb_col = 5, border_mode="same", name="7conv")(y)
y = MaxPooling2D(pool_size = (2, 2), strides = (2, 2), border_mode="same", name="7maxpool")(y)
# y = SubtractiveNormalization((3,3))(y)
y = Dropout(0.25, name="7drop")(y)
y = Activation('relu', name="7activ")(y)

# Layer 8: 192-unit relu convolution
y = Convolution2D(nb_filter = 192, nb_row = 5, nb_col = 5, border_mode="same", name="8conv")(y)
y = MaxPooling2D(pool_size = (2, 2), strides = (1, 1), border_mode="same", name="8maxpool")(y)
# y = SubtractiveNormalization((3,3))(y)
y = Dropout(0.25, name="8drop")(y)
y = Activation('relu', name="8activ")(y)

# Layer 9: Flatten
y = Flatten()(y)

# Layer 10: Fully-Connected Layer
y = Dense(3072, activation=None, name="fc1")(y)

# Layer 11: Fully-Connected Layer
y = Dense(3072, activation=None, name="fc2")(y)

length = Dense(4, activation="softmax", name="length")(y)
digit1 = Dense(10, activation="softmax", name="digit1")(y)
digit2 = Dense(10, activation="softmax", name="digit2")(y)
digit3 = Dense(10, activation="softmax", name="digit3")(y)
digit4 = Dense(10, activation="softmax", name="digit4")(y)
digit5 = Dense(10, activation="softmax", name="digit5")(y)

model = Model(input=x, output=[length, digit1, digit2, digit3, digit4, digit5])

My compilation / fit / evaluation is as follows:

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])


model.fit(X_train,
          [y0_train, y1_train, y2_train, y3_train, y4_train, y5_train],
          validation_data=(X_test,
                           [y0_test, y1_test, y2_test, y3_test, y4_test, y5_test]),
          nb_epoch=10,
          batch_size=200,
          verbose=2)
model.evaluate(X_test,
               [y0_test, y1_test, y2_test, y3_test, y4_test, y5_test],
               verbose=0)

where y_train is split in the following way:

>>> y0_train[0], y1_train[0], y2_train[0], y3_train[0], y4_train[0], y5_train[0]
(3, 6.0, 8.0, 2.0, None, None)

gru...@gmail.com

unread,

Jan 1, 2017, 7:16:10 PM1/1/17

to Keras-users, gru...@gmail.com

Fixed it

Change

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

to:

model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

m.n.m...@gmail.com

unread,

Jan 27, 2017, 4:45:09 PM1/27/17

to Keras-users, gru...@gmail.com

Hello gru...@gmail.com,

I trying to solve the same problem using Keras but I am stuck on how to handle the different lengths. I saw that you used "None" as indicated below

>>> y0_train[0], y1_train[0], y2_train[0], y3_train[0], y4_train[0], y5_train[0]

(3, 6.0, 8.0, 2.0, None, None)

However, Keras does not seem to understand "None" and requires a integer between [0,10) in the architecture you used. Can you let me know how did you handle the case by using "None"? Did it work and if not what other method did you use to account for variable sequence lengths.

Regards,
Maher

gru...@gmail.com

unread,

Jan 27, 2017, 8:03:17 PM1/27/17

to Keras-users, gru...@gmail.com, m.n.m...@gmail.com

Maher,

I think you are correct in asserting that Keras does not understand None, although for the purposes of this problem, having a None code to signify that no digit is present there would be ideal.

My attempted solution was to simply use another digit to signify the absence of digits.

Recall that the SVHN dataset uses the integers [1..10] to correspond to the digits 1, 2, ..., 9, 0. That being the case, we can, in theory, label "no digit" with the integer "0".

Then, if an image contains the sequence "123", which is of length 3, we would expect our list of labels to look like [3,1,2,3,0,0], meaning that the image has 3 digits, being 1, 2, and 3 in order, and that there is no 4th or 5th digit.

For a discussion of this, see my GitHub at https://github.com/mgruben/digit-recognition, especially the svhn.ipynb, the section entitled "The Training Data", subsection "Creating and Saving".

Now, when I ran my training, I mistakenly encoded "no digit" with the integer "10", so my results are poor, since this rendered a lot of the input as nonsense (e.g. invalidated a lot of the lengths, and conflated the 0-digit label with the no-digit label).

That said, I think there may be inherent problems with this by-digit encoding, since there are only a few images in the entire dataset that have a 5th digit, meaning that it will be ~99% accurate for the 5th-digit identifier to just guess "no digit", meaning that it probably will never guess otherwise. There may be clever ways to handle this within the loss function, but I felt like I should point out that potential pitfall.

--mg

Reply all

Reply to author

Forward