loss exploding at each epoch start

Taebin Ha

unread,

May 9, 2018, 10:43:44 AM5/9/18

to DyNet Users

I'm new to Dynet and implementing dependency parser using dynet, and training parameters at every sentence.

I have about 60000 training sentences, when i train all sentences in one epoch, it seems to work well,

but when i train whole sentences multiple times, first sentence's loss value are very large at every epoch.

My architecture has one bi-directional LSTM layer and MLP input layer with output of bi-LSTM layer and MLP hidden layer and MLP out layer.

The results are like below, and my codes are here.

Should I rebuild LSTM architecture at every epoch? I'm confusing about where did i make mistake.

(I expected loss after "1 epoch and first sentence" should be less than after "0 epoch and 60000th sentence")

Please help and advise to me. thanks.

C:\Python36\python.exe C:/Users/Unix/PycharmProjects/dp-bi-lstm_dynet/main.py
[dynet] random seed: 2290627514
[dynet] allocating memory: 512MB
[dynet] memory allocation done.
corpus load complete
average loss after 1 epoch, 1 sentence is: 17.636943817138672
average loss after 1 epoch, 2000 sentence is: 10.381901719802292
average loss after 1 epoch, 4000 sentence is: 9.691037579214026
average loss after 1 epoch, 6000 sentence is: 9.34041221818515
average loss after 1 epoch, 8000 sentence is: 9.174292319717788
average loss after 1 epoch, 10000 sentence is: 8.982679172636592
average loss after 1 epoch, 12000 sentence is: 8.862404185863815
average loss after 1 epoch, 14000 sentence is: 8.693763274312326
average loss after 1 epoch, 16000 sentence is: 8.59225816030492
average loss after 1 epoch, 18000 sentence is: 8.53699938847947
average loss after 1 epoch, 20000 sentence is: 8.486847541568116
average loss after 1 epoch, 22000 sentence is: 8.424586500192984
average loss after 1 epoch, 24000 sentence is: 8.332845027186082
average loss after 1 epoch, 26000 sentence is: 8.295914618985405
average loss after 1 epoch, 28000 sentence is: 8.242919974412992
average loss after 1 epoch, 30000 sentence is: 8.193468335582656
average loss after 1 epoch, 32000 sentence is: 8.15046729143771
average loss after 1 epoch, 34000 sentence is: 8.092765799846925
average loss after 1 epoch, 36000 sentence is: 8.053647511591553
average loss after 1 epoch, 38000 sentence is: 8.02021125482164
average loss after 1 epoch, 40000 sentence is: 7.982615008230304
average loss after 1 epoch, 42000 sentence is: 7.942969187132986
average loss after 1 epoch, 44000 sentence is: 7.903952183374621
average loss after 1 epoch, 46000 sentence is: 7.87706035587622
average loss after 1 epoch, 48000 sentence is: 7.842469107484902
average loss after 1 epoch, 50000 sentence is: 7.812751507776459
average loss after 1 epoch, 52000 sentence is: 7.791445268832929
average loss after 1 epoch, 54000 sentence is: 7.75892588466848
average loss after 1 epoch, 56000 sentence is: 7.739763741061106
total training time: 8197.93693971634 seconds
average loss after 2 epoch, 1 sentence is: 440044.837590178
average loss after 2 epoch, 2000 sentence is: 226.91213007738756
average loss after 2 epoch, 4000 sentence is: 116.79550837108651
average loss after 2 epoch, 6000 sentence is: 80.09304963260531
average loss after 2 epoch, 8000 sentence is: 61.79701959766746
average loss after 2 epoch, 10000 sentence is: 50.777645399660614
average loss after 2 epoch, 12000 sentence is: 43.442065049668564
average loss after 2 epoch, 14000 sentence is: 38.155572901612906
average loss after 2 epoch, 16000 sentence is: 34.21518854491877
average loss after 2 epoch, 18000 sentence is: 31.184143254367957
average loss after 2 epoch, 20000 sentence is: 28.759303646917814
average loss after 2 epoch, 22000 sentence is: 26.762809004899083
average loss after 2 epoch, 24000 sentence is: 25.063089194166267
average loss after 2 epoch, 26000 sentence is: 23.667318932765042
average loss after 2 epoch, 28000 sentence is: 22.453625818527744
average loss after 2 epoch, 30000 sentence is: 21.40158775735465

Message has been deleted

Miguel Ballesteros

unread,

May 13, 2018, 10:39:40 PM5/13/18

to Taebin Ha, DyNet Users

it would be helpful if you show what your code looks like when you finish an epoch, when you print total training time: 8197.93693971634 seconds

On 13 May 2018 at 22:21, Taebin Ha <taebi...@gmail.com> wrote:

2018년 5월 9일 수요일 오후 11시 43분 44초 UTC+9, Taebin Ha 님의 말:

--
You received this message because you are subscribed to the Google Groups "DyNet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dynet-users+unsubscribe@googlegroups.com.
To post to this group, send email to dynet...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dynet-users/8f2ed9ea-1420-48e1-a045-72ae401d1c72%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--

Miguel Ballesteros

http://miguelballesteros.com

Taebin Ha

unread,

May 14, 2018, 5:29:05 AM5/14/18

to DyNet Users

My code calculates loss at every sentence and train all sentences with 100 epochs.

Part of code that calculating loss is like below.

m = dy.ParameterCollection() mlp_W = m.add_parameters((mlp_hidden, mlp_input_dim)) # 1000 x 100 mlp_W2 = m.add_parameters((mlp_out, mlp_hidden)) mlp_b = m.add_parameters((100)) mlp_b2 = m.add_parameters((1)) final_mlp = m.add_parameters((1, 1000)) total_loss = 0 lookup = m.add_lookup_parameters((max_word_len, 200)) trainer = dy.AdagradTrainer(m) builders = [ dy.LSTMBuilder(1, init_input_dim, lstm_hidden, m), dy.LSTMBuilder(1, init_input_dim, lstm_hidden, m) ] def create_lstm_network(inputs, y_output, index_arr, builders): # per sentence dy.renew_cg(immediate_compute=True, check_validity=True) # dy.renew_cg() seq_len = len(inputs) f_init, b_init = [b.initial_state() for b in builders] # print(np.array(inputs[-1])) for w_index in range(seq_len): lookup.init_row(w_index, inputs[w_index]) w_embs = [dy.nobackprop(lookup[i]) for i in range(seq_len)] fw = [x.output() for x in f_init.add_inputs(w_embs)] bw = [x.output() for x in b_init.add_inputs(reversed(w_embs))] bi = [dy.concatenate([f, b]) for f, b in zip(fw, reversed(bw))] p_mlp_W = dy.parameter(mlp_W) p_mlp_W2 = dy.parameter(mlp_W2) p_mlp_b = dy.parameter(mlp_b) p_mlp_b2 = dy.parameter(mlp_b2) errs = [] # print(y_output) for index_element, y in zip(index_arr, y_output): final_mlp_input = np.zeros(0) for index in index_element: if index == -1: final_mlp_input = np.append(final_mlp_input, mlp_empty_vec) else: final_mlp_input = np.append(final_mlp_input, bi[index].value()) final_mlp = dy.vecInput(len(final_mlp_input)) final_mlp.set(final_mlp_input) y_hat = dy.logistic(p_mlp_W2 * dy.tanh(p_mlp_W * final_mlp + p_mlp_b) + p_mlp_b2) # print('------------------------') # print(y_hat.value()) # err = dy.pickneglogsoftmax(y_hat, dy.scalarInput(y)) err = dy.binary_log_loss(y_hat, dy.scalarInput(y)) # print(err.scalar_value()) # print('------------------------') errs.append(err) return dy.esum(errs) start = time.time() for epoch in range(1, epochs + 1): total_sen_num = 1 for sentence in sentences: X, Y, index_arr = create_training_instance(sentence) loss = create_lstm_network(X, Y, index_arr, builders) total_loss += loss.value() loss.backward() trainer.update() if total_sen_num == 1 or total_sen_num % 2000 == 0: print("average loss after {} epoch, {} sentence is: {}".format(epoch, total_sen_num, total_loss/total_sen_num)) total_sen_num += 1 print("total training time: {} seconds".format(time.time()-start))

Brian Lester

unread,

Jun 7, 2018, 2:33:19 PM6/7/18

to DyNet Users

You never reset your `total_loss` like you do for your `total_sen_num` so when you hit a new epoch you print the total_loss / 1

Reply all

Reply to author

Forward