Facing trouble with Tesseract OCR (from v4 to v5) for python version upgrade (from Python 3.6 to Python 3.10)

56 views
Skip to first unread message

Prashant Sharma

unread,
Feb 27, 2023, 2:03:13 AM2/27/23
to tesseract-ocr
Hi All,

I am trying to upgrade the software versions of an inhouse text extraction application developed with Python, tesserocr python module and tesseract OCR software as below:


  • Existing software versions (Outdated softwares) : Python (v3.6.5) + tesserocr (v2.4.0) + tesseract OCR (v4)
  • Target software versions   (Latest softwares)   : Python (v3.10.7) + tesserocr (v2.5.2) + tesseract OCR (v5)

However I get different results from same set of softwares with different versions (as above) in terms of bounding box cordinates, text extraction results (minor changes), and other numerical metadata while calling the GetHOCRText method.

I need to get exact same extraction result in terms of metadata (ex.-bounding boxes) as I have some dependencies post the text extraction hence result needs to be same for metadata with the upgraded softwares.

Could you please advise ?

Regards,
Prashant Sharma

Zdenko Podobny

unread,
Mar 11, 2023, 1:03:26 PM3/11/23
to tesser...@googlegroups.com
First of all: it is a good manner to provide a test case (working code + input &output)
Next: there were improvements (e.g. https://github.com/tesseract-ocr/tesseract/commit/3a5e5089343798932d9952628acfdf56f3108c43)  in providing better -bounding boxes, so you will need to make a custom build with reverting of respective commits. 

Zdenko


po 27. 2. 2023 o 8:03 Prashant Sharma <prashants...@gmail.com> napísal(a):
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/59de7622-bb9d-4aa2-8b86-686b3d63f639n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages