Yes, lstmeval is manual but easy to automate. I use a script like this:
./train.sh $NAME 100
./train.sh $NAME 300
./train.sh $NAME 400
./train.sh $NAME 500
./train.sh $NAME 750
./train.sh $NAME 1000
./train.sh $NAME 1200
...
It does short trainings, save the models into a folder and run lstmeval. At the end I get a report like this:
ext1-g_100: Eval Char error rate=1.4585826, Word error rate=13.347458
ext1-g_300: Eval Char error rate=0.97829078, Word error rate=8.4745763
ext1-g_400: Eval Char error rate=0.75069704, Word error rate=7.6271186
ext1-g_500: Eval Char error rate=0.68842175, Word error rate=7.2033898
ext1-g_750: Eval Char error rate=0.63577665, Word error rate=6.779661
ext1-g_1000: Eval Char error rate=0.50223788, Word error rate=5.0847458
ext1-g_1200: Eval Char error rate=0.47848338, Word error rate=5.5084746
ext1-g_1400: Eval Char error rate=0.50223788, Word error rate=5.9322034
ext1-g_1600: Eval Char error rate=0.47848338, Word error rate=5.0847458
ext1-g_1800: Eval Char error rate=0.42583829, Word error rate=4.6610169
ext1-g_2000: Eval Char error rate=0.4264803, Word error rate=4.2372881
ext1-g_2250: Eval Char error rate=0.44124661, Word error rate=5.0847458
ext1-g_2500: Eval Char error rate=0.42134419, Word error rate=4.2372881
ext1-g_3000: Eval Char error rate=0.42583829, Word error rate=3.9548023
ext1-g_3500: Eval Char error rate=0.3545748, Word error rate=2.9661017
ext1-g_4000: Eval Char error rate=0.42070218, Word error rate=2.9661017
ext1-g_4500: Eval Char error rate=0.38218138, Word error rate=2.9661017
ext1-g_5000: Eval Char error rate=0.42070218, Word error rate=3.3898305
ext1-g_5500: Eval Char error rate=0.37768728, Word error rate=2.1186441
ext1-g_6000: Eval Char error rate=0.38731748, Word error rate=2.5423729
ext1-g_6500: Eval Char error rate=0.34879668, Word error rate=2.1186441
ext1-g_7000: Eval Char error rate=0.40529386, Word error rate=2.6836158
and I can choose which model to use. Here I would pick the 3500 or the 6500: usually I prefer to pick an early one not to risk overfitting. I could also decide to train a little more (8000, 9000, ...) to see if it improves more but it is already oscillating around a certain value.
One
note: evaluation score is just a reference unless you have a lot of real
world data. If you are using synthetic data this will likely differ
from the real world data so it is important not to overfit over it.
You can improve the script with an iteration and stop if the improvement over the best result is below a threshold for a few epochs. I found no real advantage in doing this as the training is quite fast and I have no problem in letting it run while I do something else.
Lorenzo