Keras - SGD learning rate Decay is not clear to me

1,144 views
Skip to first unread message

André L

unread,
Feb 9, 2017, 8:39:00 AM2/9/17
to Keras-users

According to the code at : https://github.com/fchollet/keras/blob/master/keras/optimizers.py#L113


lr = self.lr

if self.initial_decay > 0:
           lr *= (1. / (1. + self.decay * self.iterations))
           self.updates .append(K.update_add(self.iterations, 1))





I have a framework that contains both keras and lasagne/theano options for the user for the sake of reproducibility. 
So i wanted to add the option to use the same learning_rate function that keras does.


I Tried the code below but i think its incorrect. Would someone clarify?

if decay > 0:
lr = shared_learning_rate.get_value() * (1. / (1. + decay * (epoch)))
shared_learning_rate.set_value(np.float32(lr))

Klemen Grm

unread,
Feb 10, 2017, 3:59:37 AM2/10/17
to Keras-users
The optimizer's "self.iterations" variable is incremented once per training minibatch, whereas your "decay" variable presumably refers to the index of the training epoch.
Message has been deleted

André L

unread,
Feb 10, 2017, 6:09:58 AM2/10/17
to Keras-users
Thanks for the reply. I made some changes... Would you please check it out ? :)

By the way, Im using decay as the parameter in keras , so im passing 1e-6 to the decay.

Heres a bigger snippet of the code :

print("Starting training...")

training_start_time = time.clock()

total_train_batches = 0

for epoch in range(0, max_epochs):

# In each epoch, do a full pass over the training data:
train_err = 0
train_batches = 0
start_time = time.time()
for batch in self.minibatch_iterator(X, Y, self.batch_size):
inputs, targets = batch
train_err += self.train_fn(inputs, targets)
train_batches += 1
total_train_batches = total_train_batches + train_batches


# And a full pass over the validation data:
val_err = 0
val_acc = 0
val_batches = 0
for batch in self.minibatch_iterator(X_val, Y_val, self.batch_size):
inputs, targets = batch
err, acc = self.val_fn(inputs, targets)
val_err += err
val_acc += acc
val_batches += 1

# Calculate Results
trainingLoss = (train_err / train_batches)
validationLoss = (val_err / val_batches)
validationAccuracy = (val_acc / val_batches * 100)

# Then print the results for this epoch:
print("Epoch {} of {} took {:.3f}s".format(epoch + 1, max_epochs, time.time() - start_time))
print(" Training Loss:\t\t{:.6f}".format(trainingLoss))
print(" Validation Loss:\t\t{:.6f}".format(validationLoss))
print(" Validation Accuracy:\t\t{:.4f} %".format(validationAccuracy))

if decay > 0:
lr = shared_learning_rate.get_value() * (1. / (1. + decay * total_train_batches))
shared_learning_rate.set_value(np.float32(lr))

print "...Current Learning Rate : ", np.float32(shared_learning_rate.get_value())

Klemen Grm

unread,
Feb 10, 2017, 6:14:58 AM2/10/17
to Keras-users
You're calculating the lr decay correctly now, but you only update it once per epoch, whereas in keras the decay is accounted for on every minibatch.

André L

unread,
Feb 10, 2017, 6:19:59 AM2/10/17
to Keras-users
Like this ?

# Finally, launch the training loop.
print("Starting training...")

training_start_time = time.clock()

total_train_batches = 0

for epoch in range(0, max_epochs):

# In each epoch, do a full pass over the training data:
train_err = 0
train_batches = 0
start_time = time.time()
for batch in self.minibatch_iterator(X, Y, self.batch_size):
inputs, targets = batch
train_err += self.train_fn(inputs, targets)
train_batches += 1
        total_train_batches = total_train_batches + 1

Klemen Grm

unread,
Feb 10, 2017, 6:26:33 AM2/10/17
to Keras-users
That looks right.

André L

unread,
Feb 10, 2017, 6:28:56 AM2/10/17
to Keras-users
Thanks for the help :)

Do you know from where they based this learning rate decay from ? 

André L

unread,
Feb 10, 2017, 9:00:30 AM2/10/17
to Keras-users
Are you sure the decay is applied at each minibatch?
Im seeing that the learning rate is decaying too fast.. Even with a decay  of 1e-6 ....


muni...@gmail.com

unread,
Jul 24, 2017, 9:48:22 PM7/24/17
to Keras-users
Hi, @André L

Did you find the answer? I have the same question. In my program, one epoch has 800 mini-batches. So if learning rate decays per mini-batches, the learning rate decays very fast with decay = 1e-6.


在 2017年2月10日星期五 UTC-5上午9:00:30,André L写道:

François Chollet

unread,
Jul 25, 2017, 12:20:53 AM7/25/17
to muni...@gmail.com, Keras-users
You can use a lower decay rate, or if you want a specific per-epoch schedule, use the LearningRateScheduler callback.

--
You received this message because you are subscribed to the Google Groups "Keras-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to keras-users+unsubscribe@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/keras-users/98027e53-292a-44d4-afc8-1c6f01f4c759%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Chong Wang

unread,
Jul 26, 2017, 6:28:05 PM7/26/17
to François Chollet, Keras-users
I am using train_on_batch(), instead of fit(). So I assume I cannot use callback. How can I do step decay and change the learning rate between epochs? Should I do "model.optimizer.lr.assign(0.01)"?
Reply all
Reply to author
Forward
0 new messages