On Mon, 26 Jul 2021 at 06:59, Inductiveload <
induct...@gmail.com> wrote:
> What is the correct way to generate the required data for running lstmeval manually in this case?
I did actually figure this out in the end, so in case anyone else in
future is as dumb as me and to avoid anyone trying to answer a solved
problem here's my solution (x-posted at stackoverflow[2])
You can generate the .ltsmf files needed for the evaluation like this,
assuming the evaluation ground-truth is in
tesstrain/data/eval-ground-truth:
cd tesstrain
make lists MODEL_NAME=eval
This will generate a file data/eval/all-lstmf, which contains a list
of all the .lstmf files generated. The list.eval contains only a
subset, as the ground truth corpus is partitioned into evaluation and
training sets (according to RATIO_TRAIN).
You can then run lstmeval:
lstmeval \
--model data/your_model.traineddata \
--eval_listfile data/eval/all-lstmf
Producing something like this (the mistake below was added to the
ground truth of one .gt.txt file to provoke an error for example
purposes):
Warning: LSTMTrainer deserialized an LSTMRecognizer!
Truth:TThoſe hypocrites that live amongſt us,
OCR :Those hypocrites that live amongst us,
At iteration 0, stage 0, Eval Char error rate=1.282051, Word error rate=8.333333
If there are no errors (as it was in this case), it looks like:
Warning: LSTMTrainer deserialized an LSTMRecognizer!
At iteration 0, stage 0, Eval Char error rate=0.000000, Word error rate=0.000000
Cheers!
[2]
https://stackoverflow.com/questions/68523440/evaluation-of-a-trained-on-generated-images-tesseract-4-lstm-model-against-real