The printed loss value never goes down. That's the issue mentioned atop my first reply: that the reported number is just a running tally from the start of the `train()` call.
So the real numbers to care about are the differences between each printed number. This would be the more interesting number to print; here I've calculated them via a spreadsheet:
Printed Loss Last Epoch Loss
71964 71964
109095 37131
167496 58401
234446 66950
300772 66327
367244 66472
433101 65857
488015 54914
553470 65455
621009 67540
686966 65957
744763 57797
811454 66691
875197 63743
940066 64869
1004814 64748
1068990 64176
1133891 64900
1196838 62948
1262514 65676
1328977 66463
Those are quite strange, in that rather than improving for a while, they really only improve on the 2nd epoch, before jittering around within a tight range. That's more typical near the end of training, and indicates the model has learned as much as it can.
Are you sure this output is from the metaparameters and training code you showed earlier?
Are you sure your `GetSentences()` code is working properly, providing text that can be learned from?
For example, what does the following print after your `sentences` is defined:
print(sum(1 for _ in sentences)) # total count of training examples
first = iter(sentences).next() # get 1st item
print(len(first)) # 1st item's length in words
print(first[0:3]) # 1st item's 1st 3 words
As a separate note, none of the total or per-epoch loss numbers actually seem 'huge' to me, as it's a tally over all examples in a large dataset. And if you train a model on a larger dataset, this kind of summed loss value will go even higher (and when it reaches its best value, still be higher) than a smaller dataset – even if the model is better at the end, simply because more examples have been tallied together to get the number.
- Gordon