Trying to use --oem 0 but now cannot load languages

41 views

Skip to first unread message

Kyle Foley

unread,

Jul 15, 2019, 12:00:10 AM7/15/19

to tesseract-ocr

I'm trying to set the tessedit_char_whitelist but it does not work in tesseract 4 so I read here

https://github.com/tesseract-ocr/tesseract/issues/751#issuecomment-423521780

from amitdo that I need to use --oem 0. I put in the following syntax

str4 = pytesseract.image_to_string(Image.open(str3),
    config='--oem 0 -c tessedit_char_whitelist=0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabdefghijklmnopqrstuvwxyzḥś')

and now I get the following error message:

pytesseract.pytesseract.TesseractError: (1, "Failed loading language 'eng' Tesseract couldn't load any languages! Could not initialize tesseract.")

Kyle Foley

unread,

Jul 15, 2019, 12:02:43 AM7/15/19

to tesseract-ocr

I solved this problem, but when I reverted to an old Tesseract the accuracy went down from 99% to a shocking 75%. I can't believe this is happening. Why would anyone remove an entirely useful feature from their software? Do I really have to spend 10 hours learning how to train this thing to understand new characters? I've never done that before and I would prefer the solution that I almost had which only required one line of code? Further, if I do have to train it how many images will I need?

Reply all

Reply to author

Forward

0 new messages