Standalone Self-contained Tesseract-OCR for Mac

567 views
Skip to first unread message

Peter Reid

unread,
Mar 24, 2016, 6:49:03 AM3/24/16
to tesseract-ocr
I have a standalone version of tesseract-ocr for Windows that can be run from a folder located anywhere in the Windows filing system without having to do an installation.  For the Mac the user has to install HomeBrew/MacPort first and then tesseract-ocr afterwards.  This fixes tesseract-ocr to particular parts of the OS X filing system, preventing it from being relocated and used elsewhere on the Mac. 

I'm looking for a standalone/self-contained version of tesseract-ocr for the Mac that can be located anywhere and can be run without requiring installations.  Please can someone point to such a version of tesseract-ocr or give instructions on how I can build one of these!

Thanks

Peter Reid

unread,
Apr 18, 2017, 4:22:34 AM4/18/17
to tesseract-ocr
I've done some further searching and found several versions of shell scripts that are supposed to generate a standalone version of Tesseract.  However, they all fail at the last part of the process, namely building Tesseract itself!  The script builds the libraries for zlib (v1.2.8), libpng (v1.6.13), libjpeg (9b) and leptonica (v1.73), but fails with the following error:

  checking for leptonica... yes
  checking for pixCreate in -llept... no
  configure: error: leptonica library missing

I can't find a way to correct this!  Here's the config details that lead to this error:

export CXXFLAGS="-I$BUILD_DIR/include -I$BUILD_DIR/include/libpng16 -I$BUILD_DIR/include/leptonica -lpng -ljpeg -lz"
export CPPFLAGS="-I$BUILD_DIR/include -I$BUILD_DIR/include/libpng16 -I$BUILD_DIR/include/leptonica -lpng -ljpeg -lz"
export LDFLAGS="-L$BUILD_DIR/lib"
export LIBLEPT_HEADERSDIR="$BUILD_DIR/include/leptonica"
       
./configure --prefix=$TESSERACT_DIR --with-extra-libraries=$BUILD_DIR/lib

[Note: I added the CXXFLAGS as well as the CPPFLAGS as I wasn't sure which was needed]

I have attached the latest version of the shell script I'm using so you can see the context.

Can anyone fix my script or tell me another way of generating a standalone version of Tesseract for the Mac?

Thanks
buildSA.sh

ShreeDevi Kumar

unread,
Apr 18, 2017, 4:40:14 AM4/18/17
to tesser...@googlegroups.com
Use latest version of leptonica - 1.74.1


ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/e6dbc1e0-1314-47e9-b76c-627db8b6afc4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Peter Reid

unread,
Apr 18, 2017, 4:55:11 AM4/18/17
to tesseract-ocr
Hi ShreeDevi

I have tried the latest version of Leptonica but I get numerous warnings (38 of them, mainly about implicit function definitions) and a fatal error 'endian.h' not found.  The build finishes saying that Leptonica has been built OK and its library appears in the lib folder.  However, when I try to build Tesseract, I get the following error:

checking for leptonica... yes
checking for pixCreate in -llept... no
configure: error: leptonica library missing
Configuration done, now Building
make: Nothing to be done for `install'.
Tesseract build failed. Exiting.

So I'm not better off with the latest version.  At least with version 1.73 I don't get the warnings and error messages when building Leptonica even though the Tesseract build fails.

Thanks

Peter



On Thursday, March 24, 2016 at 10:49:03 AM UTC, Peter Reid wrote:

ShreeDevi Kumar

unread,
Apr 18, 2017, 5:15:50 AM4/18/17
to tesser...@googlegroups.com


If you are building tesseract 4.0, you need Lept 1.74

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

Peter Reid

unread,
Apr 18, 2017, 6:06:42 AM4/18/17
to tesseract-ocr
I'm trying to build Tesseract 3, namely version 3.05.00 or thereabouts.

In fact I started trying to build using the latest versions of all the libs but had several failures, so I've backtracked to earlier versions in order to get successful builds.  The latest release versions are the following I think:

ZLIB_VERSION=1.2.11
LIBJPEG_VERSION=9b
LIBPNG_VERSION=1.6.29
LEPTONICA_VERSION=1.74.01
TESSERACT_VERSION=3.05.00

My latest attempt succeeded to build zlib and libjpeg but failed with libpng:

Undefined symbols for architecture x86_64:
  "_inflateValidate", referenced from:
      _png_inflate_claim in pngrutil.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make[1]: *** [libpng16.la] Error 1
make: *** [check] Error 2

I've looked at the link about compiling Tesseract but it describes MacPort and HomeBrew only for Mac deployment, which do not generate standalone tesseract binaries.  My apps need to include a runnable version of tesseract that doesn't require any installation.  I can do this for Windows as the compiling web page gives the details, but I'm having to try to build the standalone version for the Mac myself.  This is why I'm going through this process!

Thanks again

Peter


On Tuesday, April 18, 2017 at 10:15:50 AM UTC+1, shree wrote:


If you are building tesseract 4.0, you need Lept 1.74

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Tue, Apr 18, 2017 at 2:25 PM, Peter Reid <peter....@gmail.com> wrote:
Hi ShreeDevi

I have tried the latest version of Leptonica but I get numerous warnings (38 of them, mainly about implicit function definitions) and a fatal error 'endian.h' not found.  The build finishes saying that Leptonica has been built OK and its library appears in the lib folder.  However, when I try to build Tesseract, I get the following error:

checking for leptonica... yes
checking for pixCreate in -llept... no
configure: error: leptonica library missing
Configuration done, now Building
make: Nothing to be done for `install'.
Tesseract build failed. Exiting.

So I'm not better off with the latest version.  At least with version 1.73 I don't get the warnings and error messages when building Leptonica even though the Tesseract build fails.

Thanks

Peter


On Thursday, March 24, 2016 at 10:49:03 AM UTC, Peter Reid wrote:
I have a standalone version of tesseract-ocr for Windows that can be run from a folder located anywhere in the Windows filing system without having to do an installation.  For the Mac the user has to install HomeBrew/MacPort first and then tesseract-ocr afterwards.  This fixes tesseract-ocr to particular parts of the OS X filing system, preventing it from being relocated and used elsewhere on the Mac. 

I'm looking for a standalone/self-contained version of tesseract-ocr for the Mac that can be located anywhere and can be run without requiring installations.  Please can someone point to such a version of tesseract-ocr or give instructions on how I can build one of these!

Thanks

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

ShreeDevi Kumar

unread,
Apr 18, 2017, 8:03:22 AM4/18/17
to tesser...@googlegroups.com
I haven't built 3.05 so cannot help. I would suggest that you try with older commits of tesseract 3.05 branch to see which one works.

Hope that those who have built 3.05 on mac will help.

Peter Reid

unread,
Apr 20, 2017, 4:58:15 AM4/20/17
to tesseract-ocr
Hi ShreeDevi

Thanks for your advice anyway, it got me thinking about the problem in other ways. I'm pleased to say that I managed to get a build script that works using the following versions of the libs:

LIBJPEG_VERSION=9b
ZLIB_VERSION=1.2.11
LIBPNG_VERSION=1.6.29
LEPTONICA_VERSION=1.71
TESSERACT_VERSION=3.04.01

I couldn't get it to work with any later versions of Leptonica (1.74.1 is the latest) or Tesseract (3.05.00 is the latest v3).  Despite this, I now have a working standalone Mac version of Tesseract that I can drive from my own code!

Apart from the actual files themselves, it's necessary to set 2 environment variables:

TESSDATA_PREFIX - set to the parent folder for the tessdata folder
DYLD_LIBRARY_PATH - set to the path up to and including the lib folder

I've included my version of the build script in case anyone else needs to do something similar.

Thanks again

Peter
mac_standalone_tesseract_build_script.sh

shree

unread,
Apr 20, 2017, 5:24:24 AM4/20/17
to tesseract-ocr
Glad you got it to work.

I have added issue with link to this discussion at https://github.com/tesseract-ocr/tesseract/issues/830 
 

shree

unread,
Apr 23, 2017, 2:31:23 PM4/23/17
to tesseract-ocr
Hi Peter,
Stefan Weil has made changes to the 3.05 branch to address this issue. Please give a try using the latest commit and preferably provide your feedback in the issue tracker where I have added this.

Peter Reid

unread,
Apr 30, 2017, 11:23:57 AM4/30/17
to tesseract-ocr
Hi Shree

Sorry for the delay in replying but I'm struggling to get a successful build now.  I'm attaching my shell script for you to look at but the failure seems to be related to aclocal being called inside autogen.sh.

To be honest I'm not confident that things are building OK elsewhere as I see a variety of error and warning messages appear even though the relevant script finishes with "success"! I'm not a C/C+ etc coder, I do all my programming using LiveCode.  I'm just trying to get a reliable build of a portable version of Tesseract that I can drive through a command-line interface.  The OCR capability I'm trying to provide using Tesseract is just a part of a much larger app.  LiveCode allows me to build an app on my Mac to be deployed on Mac, Windows & Linux.  In each case the only code I need for each target deployment is a command line or two that runs Tesseract with a given set of source files that my app has extracted/created elsewhere. In order to make installation easy I include a portable version of Tesseract amongst the resources for my app.

As I'm not a C etc. coder (I last wrote serious C several decades ago!) I'm not able to judge which error/warning messages are significant or figure out how to fix them.  I was hoping to follow a recipe that would reliably build a portable Tesseract for the Mac and Windows.  I'm just trying different combinations of sub-builds until I find one that works, which is why I ended up with a combination of older versions of the dependencies. So I'm not a good person to ask to build this and report errors etc!

Best regards

Peter
mac_standalone_tesseract_build_script.sh

shree

unread,
May 1, 2017, 3:58:55 AM5/1/17
to tesseract-ocr

Stefan Weil

unread,
May 1, 2017, 10:00:26 AM5/1/17
to tesseract-ocr

I had a look on that shell script. The published version does not work because it tries to build Tesseract without building the dependencies (libjpeg, zlib, libpng, leptonica) first. This can be easily fixed in the first lines of the script. Now it builds those dependencies. It also starts building Tesseract 3.05, but fails when linking the tesseract executable because of a missing symbol _fmemopen in liblept.


This problem is caused by an unusual build of Leptonica: instead of the normal configure / make, the script calls ./make-for-local which uses a hand-built makefile (maybe made for Linux). I'm also surprised that TIFF support is disabled for the Leptonica build and doubt that this will work with Tesseract.

Stefan Weil

unread,
May 1, 2017, 10:00:29 AM5/1/17
to tesseract-ocr
On Thursday, 24 March 2016 11:49:03 UTC+1, Peter Reid wrote:
I have a standalone version of tesseract-ocr for Windows that can be run from a folder located anywhere in the Windows filing system without having to do an installation.  For the Mac the user has to install HomeBrew/MacPort first and then tesseract-ocr afterwards.


Building Tesseract with HomeBrew or MacPorts is much easier than with your script, and it simply works. End users who want to run Tesseract don't need HomeBrew or MacPorts. They only need some libraries which can be copied and distributed with the tesseract executable.

ShreeDevi Kumar

unread,
May 1, 2017, 11:48:58 AM5/1/17
to tesser...@googlegroups.com

Stefan,
Please make the mac binaries available for both 3.05 and 4.00 similar to windows.
I noticed that you have posted the test version for standalone Tess.
Thanks!

PS: Are the Travis created binaries available for download by users?

> --
> You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
> To post to this group, send email to tesser...@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.

> To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/831fc251-3afe-4cc3-b039-8ee34b000e07%40googlegroups.com.

Shakir Zareen

unread,
Mar 26, 2019, 3:58:08 PM3/26/19
to tesseract-ocr

Hi


For me none of above worked for Tesseract v 4.0. So I took an unorthodox approach as follows:

- Got a MAC Virtual machine up and running
- Install Homebrew
- Install Tesseract using Homebrew (4.0)
- Copied the whole Cellar folder (as it contains all dependencies for tesseract)

Then comes the fun part.

- All libs in the various folders refer to each other via a path convention as "/usr/loca/Cellar/leptonica/lib/lebt5.dylib"
- The output libtesseract.4.dylib refers to leptonica and leptonica refers to jpg, tiff etc libs
- So we have to update all libs so that the paths being referred from "usr/local/Cellar/leptonica..." should change to "../../../leptonica" for all libs
- We can use otool -L <dylib path> to get that (otoll is part of XCode command line tools)
- Then we can use install_name_tool -change to change references to dylibs
- It was a hard process but I did that one by one and here are the command (provided you are in the pwd of CopyOfCellar/tesseract/bin )

install_name_tool -change /usr/local/opt/leptonica/lib/liblept.5.dylib ../../../leptonica/1.78.0/lib/liblept.5.dylib  tesseract 

install_name_tool -change /usr/local/opt/leptonica/lib/liblept.5.dylib ../../../leptonica/1.78.0/lib/liblept.5.dylib  /LocalPathofCopyOfCellar/tesseract/4.0.0_1/lib/libtesseract.4.dylib

install_name_tool -change /usr/local/opt/libpng/lib/libpng16.16.dylib ../../../libpng/1.6.36/lib/libpng16.16.dylib /LocalPathofCopyOfCellar/leptonica/1.78.0/lib/liblept.5.dylib

install_name_tool -change /usr/local/opt/jpeg/lib/libjpeg.9.dylib ../../../jpeg/9c/lib/libjpeg.9.dylib /LocalPathofCopyOfCellar/leptonica/1.78.0/lib/liblept.5.dylib

install_name_tool -change /usr/local/opt/giflib/lib/libgif.7.dylib ../../../giflib/5.1.4_1/lib/libgif.7.dylib /LocalPathofCopyOfCellar/leptonica/1.78.0/lib/liblept.5.dylib

install_name_tool -change /usr/local/opt/libtiff/lib/libtiff.5.dylib ../../../libtiff/4.0.10_1/lib/libtiff.5.dylib /LocalPathofCopyOfCellar/leptonica/1.78.0/lib/liblept.5.dylib

install_name_tool -change /usr/local/opt/webp/lib/libwebp.7.dylib ../../../webp/1.0.2/lib/libwebp.7.dylib /LocalPathofCopyOfCellar/leptonica/1.78.0/lib/liblept.5.dylib

install_name_tool -change /usr/local/opt/openjpeg/lib/libopenjp2.7.dylib ../../../openjpeg/2.3.0/lib/libopenjp2.2.3.0.dylib /LocalPathofCopyOfCellar/leptonica/1.78.0/lib/liblept.5.dylib

install_name_tool -change /usr/local/opt/jpeg/lib/libjpeg.9.dylib ../../../jpeg/9c/lib/libjpeg.9.dylib /LocalPathofCopyOfCellar/libtiff/4.0.10_1/lib/libtiff.5.dylib

Once all above is done the tesseract becomes standalone provide you keep all the libs and includes in the folder structure as in original Cellar.

export TESSDATA_PREFIX=../share/tessdata

Now if some body can make all that into a bash script that will be great.

Reply all
Reply to author
Forward
0 new messages