Is possible to get the position of every character as output?

417 views
Skip to first unread message

Renan Neri Pereira

unread,
Apr 2, 2020, 2:23:58 PM4/2/20
to tesseract-ocr
I want to know if is possible to have a output with position of every character that was recognized by the OCR. I know with the TSV flag I can get the position of words.

Thanks for the attention

Shree Devi Kumar

unread,
Apr 2, 2020, 9:51:16 PM4/2/20
to tesseract-ocr

On Thu, Apr 2, 2020 at 11:53 PM Renan Neri Pereira <rena...@poli.ufrj.br> wrote:
I want to know if is possible to have a output with position of every character that was recognized by the OCR. I know with the TSV flag I can get the position of words.

Thanks for the attention

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/bf1059e8-a99e-44ad-b630-d341d3be02ad%40googlegroups.com.


--

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

James Peterson

unread,
Apr 3, 2020, 2:47:39 AM4/3/20
to tesseract-ocr
The "makebox" options lets you see the position of each character.  So, for example, my use of tesseract is:
 
> tesseract page.tiff box/page makebox

which produces a file called box/page.box with a separate line for each character showing the character and its position:

T 987 3805 1016 3847 0
o 1000 3805 1016 3847 0
r 1015 3805 1016 3847 0

T -- the recognized character is in a box with upper left corner at (987,3805) and lower right corner at (1016,3847) on page 0.
Reply all
Reply to author
Forward
0 new messages