lstmeval does not perform eval

Usamah Jundi

unread,

Apr 15, 2020, 6:47:09 AM4/15/20

to tesseract-ocr

Hi, sorry for the brief title, let me explain my situation a bit.

So what i've done is:

1. Use tesstrain.sh to generate the training files (the.lstmf, .txt and that one other format)

2. Fine-Tune the best model to said training files

3. Perform lstmeval using one of the resulting checkpoints on the same file i trained on

Expected result :

- the program loads every files listed on the .txt generated in (1), and performs validation, and in doing so prints out the Truth and OCR of every sample

What i got:

- the program prints the file names in the .txt generated in (1), but did not perform validation, no Truth and OCR result being written seems to indicate it fails to read the lines?

Things to note:

- other checkpoints trained on the same data does not work, too

- it works when i changed data list (not the one i use for training) or use the .lstm file you extract from off-the-shelf .traineddata files

Is this intended behaviour? or am i doing something wrong?

Shree Devi Kumar

unread,

Apr 15, 2020, 7:53:19 AM4/15/20

to tesseract-ocr

lstmeval has different verbosity levels. Which one did you use?

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/4b75e0be-1bdc-4c1c-a108-8055aefc737b%40googlegroups.com.

--

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

Message has been deleted

Usamah Jundi

unread,

Apr 15, 2020, 10:11:59 PM4/15/20

to tesseract-ocr

Hello shree, i did not specify any verbosity level. The same exact command with the eval list argument pointing to other lists works just fine

On Wednesday, April 15, 2020 at 6:53:19 PM UTC+7, shree wrote:

lstmeval has different verbosity levels. Which one did you use?

On Wed, Apr 15, 2020 at 4:17 PM Usamah Jundi <usama...@prosa.ai> wrote:

Hi, sorry for the brief title, let me explain my situation a bit.

So what i've done is:

1. Use tesstrain.sh to generate the training files (the.lstmf, .txt and that one other format)
2. Fine-Tune the best model to said training files
3. Perform lstmeval using one of the resulting checkpoints on the same file i trained on

Expected result :
- the program loads every files listed on the .txt generated in (1), and performs validation, and in doing so prints out the Truth and OCR of every sample

What i got:
- the program prints the file names in the .txt generated in (1), but did not perform validation, no Truth and OCR result being written seems to indicate it fails to read the lines?

Things to note:
- other checkpoints trained on the same data does not work, too
- it works when i changed data list (not the one i use for training) or use the .lstm file you extract from off-the-shelf .traineddata files

Is this intended behaviour? or am i doing something wrong?

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.

To unsubscribe from this group and stop receiving emails from it, send an email to tesser...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/4b75e0be-1bdc-4c1c-a108-8055aefc737b%40googlegroups.com.

Shree Devi Kumar

unread,

Apr 16, 2020, 4:43:22 AM4/16/20

to tesseract-ocr

--verbosity Amount of diagnostic information to output (0-2). (type:int default:1)

Try with verbosity 2 - it should show even when the ground truth and OCR text are same.

I think your model is correctly recognizing all lines it was trained on but not on the eval lines.

It should also show you the accuracy percentage at end.

I am assuming that the model is correctly recognizing the lines it was tr

To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/3975d6ad-4782-4f0b-aee2-e4aff8ed7749%40googlegroups.com.

Usamah Jundi

unread,

Apr 16, 2020, 5:57:57 AM4/16/20

to tesser...@googlegroups.com

thank you shree for the answers. It was actually a misunderstanding on my part. I thought there is no way it would reach 0 error when on the training log, it doesnt even reach 0. Where does this disparity come from?

To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXe9CxbL16vatkNByb__me%2BE%2BJz4q8ddgUxzY2bp0-bhA%40mail.gmail.com.

Reply all

Reply to author

Forward