Lstmeval results

44 views

Skip to first unread message

Augustin Fourcaud

unread,

Jun 22, 2023, 8:38:51 AM6/22/23

to tesseract-ocr

hello, i finetuned the eng.traineddata model because i wanted it to reconize the greek lambda symbol. I got the ocrlambda.traineddata file and i want to evaluate it using lstmeval.

when i eval a checkpoint file with --trainedata parameter set to eng.traineddata i get terrible results with this error on every iteration where the lambda appear.

************************

Encoding of string failed! Failure bytes: ce bb 4d 4e 44
Can't encode transcription: 'LY O kcXλMND' in language ''
Truth:8Az7V I vUOs
OCR :i g8 A z. 7 V I vlU O0 s.l
Line BCER=1.000000, BWER=0.666667

*************************

but when i train with a --trainedata parameter set to ocrlambda.traineddata i get really good results.

but on the doc a saw that the trainedata parameter should be set to the file that was given to the trainer.

is it an error or do i understand or do anything wrong ?

thanks a lot

Augustin Fourcaud

unread,

Jun 26, 2023, 10:07:05 AM6/26/23

to tesseract-ocr

Do you need more information about my installation and my training command ? Or should i just evaluate using the a--trainedata ?