Exception on using ctypes with tesserac-ocr TessPageIteratorBoundingBox

82 views
Skip to first unread message

Philip Lee

unread,
Feb 12, 2020, 7:39:02 AM2/12/20
to tesseract-ocr
import ctypes
import os
os.putenv("PATH", r'C:\Program Files\Tesseract-OCR')
os.environ["TESSDATA_PREFIX"] = r'C:\Program Files\Tesseract-OCR\tessdata'

liblept = ctypes.cdll.LoadLibrary('liblept-5.dll')
pix = liblept.pixRead('test.png'.encode())
print(pix)

tesseractLib = ctypes.cdll.LoadLibrary('libtesseract-5.dll')

tesseractHandle = tesseractLib.TessBaseAPICreate()

tesseractLib.TessBaseAPIInit3(tesseractHandle, '.', 'eng')

tesseractLib.TessBaseAPISetImage2(tesseractHandle, pix)

# text_out = tesseractLib.TessBaseAPIGetUTF8Text(tesseractHandle)
# print(ctypes.string_at(text_out))

tessPageIterator = tesseractLib.TessResultIteratorGetPageIterator(tesseractHandle)
iteratorLevel = 3  # RIL_BLOCK,  RIL_PARA,  RIL_TEXTLINE,  RIL_WORD,  RIL_SYMBOL
tesseractLib.TessPageIteratorBoundingBox(tessPageIterator, iteratorLevel, ctypes.c_int(0), ctypes.c_int(0), ctypes.c_int(0), ctypes.c_int(0))

I got exceptions :

Traceback (most recent call last):
  File "D:\BaiduYunDownload\programming\Python\CtypesOCR.py", line 25, in <module>
    tesseractLib.TessPageIteratorBoundingBox(tessPageIterator, iteratorLevel, ctypes.c_int(0), ctypes.c_int(0), ctypes.c_int(0), ctypes.c_int(0))
OSError: exception: access violation reading 0x00000018

So what's wrong ? The aim of this program is to get bounding rectangle of each word. I know projects like tesserocr and PyOCR

P.S. Specifying the required argument types (function prototypes) for the DLL functions doesn't matter here. One could uncoment the commented lines and comment the last three lines to test it. 

Reply all
Reply to author
Forward
0 new messages