building tesseract for online hosting

187 views
Skip to first unread message

Mobeen Ali

unread,
Sep 7, 2020, 4:48:01 AM9/7/20
to tesseract-ocr
Hi!
I was trying to build and upload tesseract online. I've somehow found the tesseract but I'm getting this error:

"pytesseract.pytesseract.TesseractError: (127, '/home/dpsocr/mysite/tesseract-ara/bin/tesseract: error while loading shared libraries: libtesseract.so.4: cannot open shared object file: No such file or directory')"

It seems like tesseract is unable to find libtesseract library.
But the libraries are in the 'lib' directory as follows:
|_mysite
      |_tesseract
                |_bin
                |_lib
                |_tesseract

So the Question is that, I have built the tesseract using docker now I wanted to know that if there is any way to build tesseract and it's libraries, and then compress them together to upload them to the online hosting so that tesseract will know its required libraries are along with it.

Thanks for any advice.

Zdenko Podobny

unread,
Sep 7, 2020, 7:01:29 AM9/7/20
to tesser...@googlegroups.com
try to build static version of tesseract

Zdenko


po 7. 9. 2020 o 10:48 Mobeen Ali <moby...@gmail.com> napísal(a):
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/551aacb8-59ff-4c5f-881d-c8f94aeba0a4n%40googlegroups.com.

Mobeen Ali

unread,
Sep 7, 2020, 9:41:47 AM9/7/20
to tesseract-ocr
Thanks @zdenop for your response. Is there any 'How To?' for building the static version of tesseract on macOS or Ubuntu?

mit

unread,
Sep 7, 2020, 10:32:56 AM9/7/20
to tesseract-ocr
I am also looking for it. The process to create static tesseract

Mobeen Ali

unread,
Sep 8, 2020, 2:42:08 AM9/8/20
to tesseract-ocr
@mit did you found any solution?

Mobeen Ali

unread,
Sep 8, 2020, 2:57:01 AM9/8/20
to tesseract-ocr
I tried using a static version of tesseract but i got this error now,

pytesseract.pytesseract.TesseractError: (127, '/home/dpsocr/mysite/tesseract-static/bin/tesseract: symbol lookup error: /home/dpsocr/mysite/tesseract-static/bin/tesseract: undefined symbol: pixaDisplayTiledInColumns')

Please help!!

mit

unread,
Sep 8, 2020, 3:04:00 AM9/8/20
to tesseract-ocr

No..how have you built a static version of tesseract? Any link?

Thanks

Mobeen Ali

unread,
Sep 8, 2020, 3:08:16 AM9/8/20
to tesseract-ocr
@mit heres the link to the github page:


This guy has a single script written for the whole task. but when build the static version of tesseract and run it, i get this error:

pytesseract.pytesseract.TesseractError: (127, '/home/dpsocr/mysite/tesseract-static/bin/tesseract: symbol lookup error: /home/dpsocr/mysite/tesseract-static/bin/tesseract: undefined symbol: pixaDisplayTiledInColumns')

Give it a try and share what you get.
Best of Luck!

Zdenko Podobny

unread,
Sep 8, 2020, 3:11:02 AM9/8/20
to tesser...@googlegroups.com
Static building is a general topic and not tesseract specific.
You need to consult documentation for your build chain. 
E.g. good startpoint is ./configure --help if you use autotools


Linux (so maybe also MAc) prefers to build only shared versions of libraries, so you need to re-build also all leptonica dependencies as static.

Zdenko


ut 8. 9. 2020 o 9:04 mit <kollol...@gmail.com> napísal(a):

mit

unread,
Sep 8, 2020, 6:37:29 AM9/8/20
to tesseract-ocr
The tesseract binary it creates,still needs leptonica files to run , which in essence breaks the core idea of static linking

Zdenko Podobny

unread,
Sep 8, 2020, 7:56:44 AM9/8/20
to tesser...@googlegroups.com
As I mentioned in a previous email: you need to build a static leptonica library with all its dependencies (image libraries) as static libraries.
Maybe it would require more tweaking  ;-) :

> ldd /usr/bin/tesseract
        linux-vdso.so.1 (0x00007ffdd44e2000)
        libtesseract.so.5 => /usr/lib64/libtesseract.so.5 (0x00007faa42f5d000)
        liblept.so.5 => /usr/lib64/liblept.so.5 (0x00007faa42ac9000)
        libpng16.so.16 => /usr/lib64/libpng16.so.16 (0x00007faa42886000)
        libjpeg.so.8 => /usr/lib64/libjpeg.so.8 (0x00007faa4261d000)
        libgif.so.7 => /usr/lib64/libgif.so.7 (0x00007faa42414000)
        libtiff.so.5 => /usr/lib64/libtiff.so.5 (0x00007faa4219b000)
        libwebpmux.so.2 => /usr/lib64/libwebpmux.so.2 (0x00007faa41f91000)
        libwebp.so.6 => /usr/lib64/libwebp.so.6 (0x00007faa41d33000)
        libopenjp2.so.7 => /usr/lib64/libopenjp2.so.7 (0x00007faa41ae1000)
        libz.so.1 => /lib64/libz.so.1 (0x00007faa418ca000)
        libarchive.so.13 => /usr/lib64/libarchive.so.13 (0x00007faa41615000)
        libcurl.so.4 => /usr/lib64/libcurl.so.4 (0x00007faa4138c000)
        librt.so.1 => /lib64/librt.so.1 (0x00007faa41184000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007faa40f65000)
        libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007faa40b84000)
        libm.so.6 => /lib64/libm.so.6 (0x00007faa4084c000)
        libgomp.so.1 => /usr/lib64/libgomp.so.1 (0x00007faa40614000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007faa403fc000)
        libc.so.6 => /lib64/libc.so.6 (0x00007faa40041000)
        liblzma.so.5 => /usr/lib64/liblzma.so.5 (0x00007faa3fe07000)
        libjbig.so.2 => /usr/lib64/libjbig.so.2 (0x00007faa3fbfb000)
        libcrypto.so.1.1 => /usr/lib64/libcrypto.so.1.1 (0x00007faa3f769000)
        libacl.so.1 => /lib64/libacl.so.1 (0x00007faa3f560000)
        libbz2.so.1 => /usr/lib64/libbz2.so.1 (0x00007faa3f343000)
        libxml2.so.2 => /usr/lib64/libxml2.so.2 (0x00007faa3efdb000)
        libnghttp2.so.14 => /usr/lib64/libnghttp2.so.14 (0x00007faa3edb3000)
        libidn2.so.0 => /usr/lib64/libidn2.so.0 (0x00007faa3eb96000)
        libssh.so.4 => /usr/lib64/libssh.so.4 (0x00007faa3e915000)
        libpsl.so.5 => /usr/lib64/libpsl.so.5 (0x00007faa3e705000)
        libssl.so.1.1 => /usr/lib64/libssl.so.1.1 (0x00007faa3e499000)
        libgssapi_krb5.so.2 => /usr/lib64/libgssapi_krb5.so.2 (0x00007faa3e24d000)
        libldap_r-2.4.so.2 => /usr/lib64/libldap_r-2.4.so.2 (0x00007faa3dff9000)
        liblber-2.4.so.2 => /usr/lib64/liblber-2.4.so.2 (0x00007faa3ddea000)
        /lib64/ld-linux-x86-64.so.2 (0x00007faa434b4000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007faa3dbe6000)
        libattr.so.1 => /lib64/libattr.so.1 (0x00007faa3d9e1000)
        libunistring.so.2 => /usr/lib64/libunistring.so.2 (0x00007faa3d65f000)
        libkrb5.so.3 => /usr/lib64/libkrb5.so.3 (0x00007faa3d383000)
        libk5crypto.so.3 => /usr/lib64/libk5crypto.so.3 (0x00007faa3d151000)
        libcom_err.so.2 => /lib64/libcom_err.so.2 (0x00007faa3cf4d000)
        libkrb5support.so.0 => /usr/lib64/libkrb5support.so.0 (0x00007faa3cd40000)
        libresolv.so.2 => /lib64/libresolv.so.2 (0x00007faa3cb29000)
        libsasl2.so.3 => /usr/lib64/libsasl2.so.3 (0x00007faa3c90c000)
        libkeyutils.so.1 => /usr/lib64/libkeyutils.so.1 (0x00007faa3c708000)
        libselinux.so.1 => /lib64/libselinux.so.1 (0x00007faa3c4df000)
        libpcre.so.1 => /usr/lib64/libpcre.so.1 (0x00007faa3c252000)


I would suggest building a minimal leptonica and tesseract library, to avoid extra work with linking. E.g. if you will open image with other library (python PIL [1] , opencv [2] ...) you do not any image support in leptonica - just pass image data to leptonica/tesseract)


Zdenko


ut 8. 9. 2020 o 12:37 mit <kollol...@gmail.com> napísal(a):

Zdenko Podobny

unread,
Sep 8, 2020, 8:14:01 AM9/8/20
to tesser...@googlegroups.com
I did not try it on linux, but you can try to use Microsoft vcpkg[1]to build static leptonica[2]... (on windows it works).


Zdenko


ut 8. 9. 2020 o 13:56 Zdenko Podobny <zde...@gmail.com> napísal(a):

Zdenko Podobny

unread,
Sep 8, 2020, 8:42:59 AM9/8/20
to tesser...@googlegroups.com
vcpkg static build is not supported on linux:

Error: invalid triplet: x64-linux-static
Available architecture triplets
VCPKG built-in triplets:
  x64-osx
  x64-windows
  arm-uwp
  arm64-windows
  x64-linux
  x86-windows
  x64-uwp
  x64-windows-static

VCPKG community triplets:
  s390x-linux
  x64-windows-static-md
  arm64-mingw-dynamic
  x86-ios
  x86-windows-static
  arm-mingw-static
  wasm32-emscripten
  arm64-uwp
  arm64-ios
  arm-ios
  x86-windows-static-md
  x86-uwp
  x86-mingw-static
  arm64-linux
  arm64-mingw-static
  arm-windows
  x64-ios
  x86-windows-v120
  x64-osx-dynamic
  arm-linux
  x64-mingw-static
  arm64-osx
  x86-mingw-dynamic
  x64-mingw-dynamic
  arm-mingw-dynamic
  arm64-windows-static


Zdenko


ut 8. 9. 2020 o 14:13 Zdenko Podobny <zde...@gmail.com> napísal(a):

mit

unread,
Sep 8, 2020, 9:09:56 AM9/8/20
to tesseract-ocr
Thanks for the suggestions..Will look through it. I mainly need static tesseract to directly use it in AWS Lambda,and hence I will have to find a way to build it in linux.

Tom Morris

unread,
Sep 8, 2020, 11:57:50 AM9/8/20
to tesseract-ocr
On Tuesday, September 8, 2020 at 9:09:56 AM UTC-4 mit wrote:
Thanks for the suggestions..Will look through it. I mainly need static tesseract to directly use it in AWS Lambda,and hence I will have to find a way to build it in linux.

Rather than going to all the trouble of creating static versions of all Tess' dependent libraries (and their dependencies), if this is just for AWS Lambda, why not use a deployment package to bundle up all the dependencies? There doesn't appear to be a specific page for C++, but here's the C# equivalent: https://docs.aws.amazon.com/lambda/latest/dg/csharp-package.html

Tom

Mobeen Ali

unread,
Sep 9, 2020, 1:50:10 AM9/9/20
to tesseract-ocr
Well maybe I had left out why i was trying to build static libraries, I'm building these libraries for deploying a flask webapp. The online hosting provider does not allow root access for installation due to which it's not possible building the libraries on the hosting console. And I'm building these for linux OS hence i cannot use vcpkg so... any other solution of creating this library statically and linking both?

Mobeen Ali

unread,
Sep 9, 2020, 1:57:19 AM9/9/20
to tesseract-ocr
@mit

I was working on AWS Lambda and i had built the libraries for OCR functions, just follow the docker part of this post/tutorial by Amine Tamasna:

You will be good to go...

Александр Поздняков

unread,
Sep 9, 2020, 2:01:03 PM9/9/20
to tesser...@googlegroups.com
Hi.
Alternatively, use AppImage (Ubuntu >= 16.04)
1. Download
wget https://github.com/AlexanderP/tesseract-appimage/releases/download/v5.0.0-alpha-773-gd33ed/tesseract-5.0.0-alpha-773-gd33ed-x86_64.AppImage
chmod +x tesseract-5.0.0-alpha-773-gd33ed-x86_64.AppImage
2.Run
./tesseract-5.0.0-alpha-773-gd33ed-x86_64.AppImage --l eng page.jpg -
or
./tesseract-5.0.0-alpha-773-gd33ed-x86_64.AppImage --appimage-extract
 ./squashfs-root/AppRun --l eng page.jpg -


Instructions for creating AppImage: https://github.com/AlexanderP/tesseract-appimage


ср, 9 сент. 2020 г. в 08:57, Mobeen Ali <moby...@gmail.com>:

Shree Devi Kumar

unread,
Sep 9, 2020, 11:09:01 PM9/9/20
to tesseract-ocr
Thanks, Alex.

I suggest that you also add this to tesseract documentation, tessdoc repo.

mit

unread,
Sep 14, 2020, 4:39:18 AM9/14/20
to tesseract-ocr
Thanks Alex..So does this work only on Ubuntu or it can be used in Amazon Linux AMI's too?

Александр Поздняков

unread,
Sep 14, 2020, 4:27:47 PM9/14/20
to tesser...@googlegroups.com
Hi.
I have no way to check on "Amazon Linux"
I checked for:
Debian: ≥ 9
Fedora: ≥ 29
Ubuntu: ≥ 16.04
CentOS ≥ 8
openSUSE ≥ 42.3
openSUSE Tumbleweed

пн, 14 сент. 2020 г. в 11:39, mit <kollol...@gmail.com>:
Reply all
Reply to author
Forward
0 new messages