Whitelist is not accepting special characters

173 views

Skip to first unread message

Shadya S.

unread,

Aug 27, 2023, 12:25:25 AM8/27/23

to tesseract-ocr

I'm using Tesseract (version 5.3.1) in Windows to recognize characters from a text that includes special characters like ñüá. Most of these characters are within the Latin script, so I've declared this in the command line.

In this image, the special characters are ñ,Ñ,á,é.

The command line I'm using is

tesseract text.png stdout --psm 6 -l Latin -c tessedit_char_whitelist=aáeéiocfhklmnñtÑ

However, the output text is missing white spaces between words, and the special characters are being completely ignored, resulting in:

aoloaalcalmoo

okonioniachillalif

Do you know why tesseract is not taking into account the characters I've declared in the whitelist? Maybe I'm not correctly specifying the special characters

Any help is greatly appreciated.

Zdenko Podobny

unread,

Aug 27, 2023, 3:24:32 PM8/27/23

to tesser...@googlegroups.com

IMO there is not need to use psm and whitelist:

tesseract text.png - -l fast/script/Latin
Estimating resolution as 274
Ñato ñelo ñaña álca moño

Ñoko niño niña chillňa élif

For Windows I guess there could be a problem with UTF-8 in the terminal...

Zdenko

ne 27. 8. 2023 o 6:25 Shadya S. <shadyas...@gmail.com> napísal(a):

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/843a1439-45ba-422c-8ba8-40fa557938b3n%40googlegroups.com.

Reply all

Reply to author

Forward

0 new messages