Trying to use --oem 0 but now cannot load languages

41 views
Skip to first unread message

Kyle Foley

unread,
Jul 15, 2019, 12:00:10 AM7/15/19
to tesseract-ocr
I'm trying to set the tessedit_char_whitelist but it does not work in tesseract 4 so I read here


from amitdo that I need to use --oem 0. I put in the following syntax

str4 = pytesseract.image_to_string(Image.open(str3),
config='--oem 0 -c tessedit_char_whitelist=0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabdefghijklmnopqrstuvwxyzḥś')

and now I get the following error message:


pytesseract.pytesseract.TesseractError: (1, "Failed loading language 'eng' Tesseract couldn't load any languages! Could not initialize tesseract.")

Kyle Foley

unread,
Jul 15, 2019, 12:02:43 AM7/15/19
to tesseract-ocr
I solved this problem, but when I reverted to an old Tesseract the accuracy went down from 99% to a shocking 75%.  I can't believe this is happening.  Why would anyone remove an entirely useful feature from their software?  Do I really have to spend 10 hours learning how to train this thing to understand new characters?  I've never done that before and I would prefer the solution that I almost had which only required one line of code?  Further, if I do have to train it how many images will I need?
Reply all
Reply to author
Forward
0 new messages