Re: [tesseract-ocr] How to set whitelist for non-English characters?

84 views
Skip to first unread message

童虎

unread,
Jul 31, 2020, 2:27:25 AM7/31/20
to tesser...@googlegroups.com
maybe you can try '-c tessedit_char_whitelist="我愛你"', something like this.

un C <enya.oh...@gmail.com> 于2020年7月29日周三 下午5:27写道:
I am using tesseract v5.0.0-alpha.20200328.

I tried ' -c tessedit_char_whitelist=0123456789,' it does work.
But for Chinese characters, neither '-c tessedit_char_whitelist=我愛你' nor the unicode '-c tessedit_char_whitelist=\u6211\u611b\u4f60' work.

Can anyone give me a hint? Thanks a lot.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/e1fe3ec5-3df5-42c3-8ddd-faac75e22f77o%40googlegroups.com.

un C

unread,
Jul 31, 2020, 2:46:38 AM7/31/20
to tesseract-ocr
Thank you for your reply. I tried your advice but it still do not work. 
I'm grateful to your help.

童虎

unread,
Aug 2, 2020, 11:20:39 PM8/2/20
to tesser...@googlegroups.com
Haha, you can also try -c tessedit_char_whitelist='我愛你', single quote instead of double quote.
Reply all
Reply to author
Forward
0 new messages