Failure loading language 'osd' in python3

395 views
Skip to first unread message

Reuben Cummings

unread,
May 25, 2016, 9:35:15 AM5/25/16
to tesseract-ocr
Using the capi and python3, the 'osd' language file fails to load. It works fine with python2. I'm using mac osx 10.9.5.

code:

#!/usr/bin/python
# -*- coding: utf-8 -*-

from __future__ import print_function, division, absolute_import

from os import environ
from ctypes import CDLL, POINTER, Structure, c_char_p, c_bool
from ctypes.util import find_library

LIBTESS
= find_library('libtesseract')
LIBLEPT
= find_library('liblept')
TESSDATA_PREFIX
= environ.get('TESSDATA_PREFIX', '/opt/local/share')


class TessBaseAPI(Structure):
   
pass


class Pix(Structure):
   
pass


class TessPageIterator(Structure):
   
pass


def create_tess_api(prefix=TESSDATA_PREFIX):
    tesseract
= CDLL(LIBTESS)
    leptonica
= CDLL(LIBLEPT)
    base_api
= POINTER(TessBaseAPI)
    p_iter
= POINTER(TessPageIterator)
    argtypes
= [base_api, c_char_p, c_char_p]

    tesseract
.TessBaseAPICreate.restype = base_api
    tesseract
.TessBaseAPIInit3.argtypes = argtypes
    tesseract
.TessBaseAPIInit3.restype = c_bool
    tesseract
.TessBaseAPISetImage2.restype = None
    tesseract
.TessBaseAPISetImage2.argtypes = [base_api, POINTER(Pix)]
    tesseract
.TessBaseAPIAnalyseLayout.argtypes = [base_api]
    tesseract
.TessBaseAPIAnalyseLayout.restype = p_iter

    api
= tesseract.TessBaseAPICreate()
    tesseract
.TessBaseAPIInit3(api, prefix.encode('utf-8'), b'eng')

    leptonica
.pixRead.argtypes = [c_char_p]
    leptonica
.pixRead.restype = POINTER(Pix)
   
return tesseract, leptonica, api

tesseract
, leptonica, api = create_tess_api()
path
= b'eurotext.tif'
tesseract
.TessBaseAPISetPageSegMode(api, 1)
pix
= leptonica.pixRead(path)
tesseract
.TessBaseAPISetImage2(api, pix)
print(tesseract.TessBaseAPIAnalyseLayout(api))

tesseract 3.04.00
 leptonica
-1.71
  libgif
4.2.3 : libjpeg 9a : libpng 1.6.21 : libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.5.0 : libopenjp2 2.1.0


Reuben Cummings

unread,
May 25, 2016, 9:46:01 AM5/25/16
to tesseract-ocr
output when file is run with python2

<__main__.LP_TessPageIterator object at 0x102369950>

output when file is run with python3

Failed loading language 'osd'
Tesseract couldn't load any languages!
Warning: Auto orientation and script detection requested, but osd language failed to load
<__main__.LP_TessPageIterator object at 0x10194b950>

Also, I know it is finding the tessdata folder because when I change the prefix value, it gives the following error in both python2 and 3:

Error opening data file ./tessdata/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
Segmentation fault: 11
Reply all
Reply to author
Forward
0 new messages