Dictionary?

59 views
Skip to first unread message

Des Bw

unread,
Nov 19, 2023, 12:37:56 PM11/19/23
to tesseract-ocr
Does Tesseract actually use the dictionary (wordlist) included into the model (traineddata file)?

- I am not getting any difference/impact by including a dictionary (word list) into the file. 

Has anybody experimented with a dictionary set up?

Zdenko Podobny

unread,
Nov 19, 2023, 1:15:42 PM11/19/23
to tesser...@googlegroups.com
AFAIR there were tests with the legacy engine where the effect of improving results quality by dictionaries where measured as 10-15% for common text.
However: adding a word to a dictionary has never ensured Tesseract's accurate recognition of that word.
For non-word inputs (e.g. serial numbers ...) it was always suggested to turn off dictionaries.
IMO results depend on the input image quality (for good image quality it seems like no effect). If you need more detail/experiences dig into the history of this forum (especially after releasing first version 3).

I never heard that anybody would do such a test for the LSTM engine.

Zdenko


ne 19. 11. 2023 o 18:37 Des Bw <desal...@gmail.com> napísal(a):
Does Tesseract actually use the dictionary (wordlist) included into the model (traineddata file)?

- I am not getting any difference/impact by including a dictionary (word list) into the file. 

Has anybody experimented with a dictionary set up?

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/381c213c-da12-482a-accf-e6847c0fc01bn%40googlegroups.com.

Des Bw

unread,
Nov 19, 2023, 1:39:49 PM11/19/23
to tesseract-ocr
 That is very interesting. I was expecting the dictionary to have some significant impact on the output. I am getting no impact at all. Yes, my images are pretty fine: regular scanned (300dpi) book, and i m on Tesseract 5.  Sure, I will dig into this forum, and also with the experimentation. 

If my results are consistent, I will report back.  We might need to  update our assumptions (and the wiki). 

Thank you for your clarification dear Zdenko.

Reply all
Reply to author
Forward
0 new messages