OCR of free hand photo of book

77 views
Skip to first unread message

Borneq

unread,
Jan 30, 2024, 8:00:17 PM1/30/24
to tesseract-ocr
First I test tesseract on file generated as flat image.
I generate Lorem Ipsum text:

5 paragraphs, 452 words 2978 bytes, 24 lines + 4 blank lines, maximal line len in my editor was 135 chars.

Result: 100% accurate but two full stop marks, fantastic.

Next, I rotate image. Only 0.7 degree caused a lot of confusion and minor rotation 0.1-0.6 degree - treat some m as n.

In my book photo images are often rotate up to 3.5 degree.

Worse, text is transformed into curve lines of text like F-distribution

("What function looks like the edge of a paper book sideways? on math.stackexchange.com)

how to work with real photos of books, it is possible as option or thing that is missing in tesseract ?

Message has been deleted

Zdenko Podobny

unread,
Jan 31, 2024, 3:20:19 PM1/31/24
to tesser...@googlegroups.com
Tesseract is OCR engine and the user is responsible for preprocessing  - see the documentation.
IMO there is already app (using tesseract) for what you try to do: Text Fairy [1]


Zdenko


st 31. 1. 2024 o 2:00 Borneq <borucki...@gmail.com> napísal(a):
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/9ac3343e-df3c-432e-8066-af21a20eda1cn%40googlegroups.com.

Borneq

unread,
Feb 1, 2024, 1:29:35 PM2/1/24
to tesseract-ocr
I have Linux and prefer batch. I found https://gist.github.com/endolith/334196bac1cac45a4893 (from https://tesseract-ocr.github.io/tessdoc/ImproveQuality.html#examples). It correct recognizes 1.00 degree. How it combine with tesseract?

Borneq

unread,
Feb 1, 2024, 3:20:23 PM2/1/24
to tesseract-ocr
I found mzucker/page_dewarp on github - tool for dewarp books and convert color to black and white
Reply all
Reply to author
Forward
0 new messages