Funny results with vowels in Portuguese for Tesseract 4.0alpha

107 views
Skip to first unread message

Paulo Scardine

unread,
Aug 16, 2017, 1:10:49 PM8/16/17
to tesseract-ocr
I have the following image:

For version 3.04 I get the correct result: "Declaração de Nascido Vivo".

For 4.0 I get "Declªrªç㺠de Nªscidº Vivº".

What I have tried so far:
  • everything on the Improving the Quality wiki article
  • messing with `tessedit_char_whitelist` and `tessedit_char_blacklist`
  • custom user word and pattern files
Nothing made difference, I starting to think this may be a bug.

I would appreciate advice on how to improve the diagnostic.

Thanks in advance,
--
Paulo

Marcello Galvão

unread,
Nov 17, 2017, 2:49:04 PM11/17/17
to tesseract-ocr
Hi, i have de same problem..
Did you have any solution?
Thank you!!

Quan Nguyen

unread,
Nov 29, 2017, 10:10:12 AM11/29/17
to tesseract-ocr
Did you try the latest .traineddata versions -- fast or best?

Reply all
Reply to author
Forward
0 new messages