Getting alternative options for OCR results

56 views
Skip to first unread message

Daniel Rembiszewski

unread,
Jan 2, 2019, 1:56:18 AM1/2/19
to tesseract-ocr
Hey,
I have a use-case where I have a way to detect when a specific word is definitely wrong (can't appear in specific contexts). Mostly useful for numbers, where 8 and 0 are often confused.
I would like to display the next best option given by tesseract in this case.

Is there a way to get a descending list of top X options, with their confidences, using the programmatic API?

Zdenko Podobny

unread,
Jan 2, 2019, 9:24:51 AM1/2/19
to tesser...@googlegroups.com

PS: I am not sure how it works with 4.00, but it in 3.0x era it provided alternative option for symbols...

Zdenko


st 2. 1. 2019 o 7:56 Daniel Rembiszewski <gilt...@gmail.com> napísal(a):
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/b07a5eda-c95c-426b-8786-b1206f172775%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Daniel Rembiszewski

unread,
Jan 2, 2019, 11:35:54 AM1/2/19
to tesseract-ocr
I'm using the pytesseract wrapper, which I believe wraps over the CLI.

So I guess a better question is can I get these options via the binary?

Zdenko Podobny

unread,
Jan 2, 2019, 12:59:13 PM1/2/19
to tesser...@googlegroups.com
it is not available from binary.
Maybe you can try to use C-API from python: https://github.com/tesseract-ocr/tesseract/wiki/APIExample#c-api-in-python

Zdenko


st 2. 1. 2019 o 17:35 Daniel Rembiszewski <gilt...@gmail.com> napísal(a):

Lorenzo Bolzani

unread,
Jan 2, 2019, 2:26:12 PM1/2/19
to tesser...@googlegroups.com

I use a python wrapper and I can ask for alternatives chars but with 4.x I always get just one. With 3.x I used to get multiple ones.

As far as I know right now 4.x does not provide this feature.


Lorenzo

Reply all
Reply to author
Forward
0 new messages