Compiling tesseract 4 in Debian

371 views
Skip to first unread message

lundissimo

unread,
Jan 29, 2020, 11:38:20 PM1/29/20
to tesseract-ocr
I've downloaded the tesseract 4.0 Git repository on a system running Debian linux. I've run autogen.sh and configure, then make and 'make install'. After successfully installing tesseract on a Mac using Homebrew, the 'tesseract -v' command produces:
tesseract 4.1.1
 leptonica-1.79.0
  libgif 5.2.1 : libjpeg 9c : libpng 1.6.37 : libtiff 4.1.0 : zlib 1.2.11 : libwebp 1.1.0 : libopenjp2 2.3.1
 Found AVX2
 Found AVX
 Found FMA
 Found SSE
However, although no errors are displayed, the resulting tesseract on Debian after 'make install' outputs just:
tesseract 4.0.1
 leptonica-1.80.0
  zlib 1.2.8
I found the various libxxx files mentioned in the Mac output above in the directory /usr/lib/x86_64-linux-gnu, so I provided it as an option to configure:
./configure "LDFLAGS=-L/usr/lib/x86_64-linux-gnu"
to no effect.

Can anyone tell me how to get tesseract to compile with the required libraries?

Zdenko Podobny

unread,
Jan 30, 2020, 2:16:26 PM1/30/20
to tesser...@googlegroups.com
I looks like there is installed another version of tesseract. Uninstall old version and reinstall compiled.


Zdenko


št 30. 1. 2020 o 5:38 lundissimo <lu...@lundissimo.info> napísal(a):
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/f113f6b4-db37-4f2d-ab9c-2bd57026f337%40googlegroups.com.

lundissimo

unread,
Jan 30, 2020, 4:45:30 PM1/30/20
to tesseract-ocr
Please explain what leads you to believe there's another version of tesseract installed. I had installed it using apt-get, but the repository provided tesseract 3, so I did 'apt-get remove tesseract' before starting this build process.


On Thursday, January 30, 2020 at 2:16:26 PM UTC-5, zdenop wrote:
I looks like there is installed another version of tesseract. Uninstall old version and reinstall compiled.


Zdenko


št 30. 1. 2020 o 5:38 lundissimo <lu...@lundissimo.info> napísal(a):
I've downloaded the tesseract 4.0 Git repository on a system running Debian linux. I've run autogen.sh and configure, then make and 'make install'. After successfully installing tesseract on a Mac using Homebrew, the 'tesseract -v' command produces:
tesseract 4.1.1
 leptonica-1.79.0
  libgif 5.2.1 : libjpeg 9c : libpng 1.6.37 : libtiff 4.1.0 : zlib 1.2.11 : libwebp 1.1.0 : libopenjp2 2.3.1
 Found AVX2
 Found AVX
 Found FMA
 Found SSE
However, although no errors are displayed, the resulting tesseract on Debian after 'make install' outputs just:
tesseract 4.0.1
 leptonica-1.80.0
  zlib 1.2.8
I found the various libxxx files mentioned in the Mac output above in the directory /usr/lib/x86_64-linux-gnu, so I provided it as an option to configure:
./configure "LDFLAGS=-L/usr/lib/x86_64-linux-gnu"
to no effect.

Can anyone tell me how to get tesseract to compile with the required libraries?

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesser...@googlegroups.com.

Zdenko Podobny

unread,
Jan 31, 2020, 4:07:39 AM1/31/20
to tesser...@googlegroups.com
If you installed tesseract from git, it can not report version 4.0.1 - there was never such version.

Zdenko


št 30. 1. 2020 o 22:45 lundissimo <lu...@lundissimo.info> napísal(a):
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/0970143c-e676-4326-a4bc-340665165c42%40googlegroups.com.

lundissimo

unread,
Jan 31, 2020, 3:49:56 PM1/31/20
to tesseract-ocr
Do you suppose that closes the discussion? I did indeed install tesseract using the tarball I downloaded from https://github.com/tesseract-ocr/tesseract/tree/4.0. Typing 'tesseract -v' gives the output I provided above. Deal with it. Talk to the developers about maintaining consistency between their Git repo labels and the version reported by the code. But telling me it "can not report" what I report it reported is less than helpful, borders on trolling.

lundissimo

unread,
Jan 31, 2020, 3:53:10 PM1/31/20
to tesseract-ocr
On the Debian server, after specifying the LFLAGS to 'configure', I realized I had previously built the leptonica library used, so I went back and rebuilt leptonica as well with
./configure "LDFLAGS=-L/usr/lib/x86_64-linux-gnu"
and then rebuilt tesseract. Now the 'tesseract -v' command outputs the same, but also include libpng in the list.

Does anyone know how to get the required libs included when building tesseract?

Zdenko Podobny

unread,
Jan 31, 2020, 4:48:09 PM1/31/20
to tesser...@googlegroups.com
Tesseract need only leptonica.
For getting tesseract please use  https://github.com/tesseract-ocr/tesseract/releases  
Reported image libraries are output of leptonica function, so I suggest to have a look at leptonica build.

If you need help please provide exact steps how you get leptonica, which command you used for building, output of configure etc..


Zdenko


pi 31. 1. 2020 o 21:53 lundissimo <lu...@lundissimo.info> napísal(a):
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/fb4c0d52-98a5-4fbc-9d42-cbfc036a1113%40googlegroups.com.

lundissimo

unread,
Jan 31, 2020, 7:18:08 PM1/31/20
to tesseract-ocr
Thank you for that link. I hadn't retrieved the file from the releases section. I have now re-obtained and re-installed both leptonica and tesseract. Because I found files lib{gif,jpeg,png,tiff,webp,openjp2}* in system folder /usr/lib/x86_64-linux-gnu, I included that folder to the linker. The steps followed are as follows:
wget https://github.com/DanBloomberg/leptonica/archive/1.79.0.zip
unzip 1.79.0.zip
cd leptonica-1.79.0/
./autogen.sh
./configure "LDFLAGS=-L/usr/lib/x86_64-linux-gnu"
make
make install
cd ..
wget https://github.com/tesseract-ocr/tesseract/archive/4.1.1.tar.gz
zcat 4.1.1.tar.gz |tar xvf -
cd tesseract-4.1.1
./autogen.sh
./configure "LDFLAGS=-L/usr/lib/x86_64-linux-gnu"
make
make install
ldconfig
make training
make training-install


I've attached the config.log files produced by both configure commands, in case they contain any clues.

After this new install, the 'tesseract -v' command outputs:
tesseract 4.1.1
 leptonica-1.79.0
  libpng 1.2.50 : zlib 1.2.8

And of course, it does not work. Testing on an image of a page of text:
 tesseract 6305_6.jpg 6305_6
Error in pixReadMemTiff: function not present
Error in pixReadMem: tiff: no pix returned
Error in pixaGenerateFontFromString: pix not made
Error in bmfCreate: font pixa not made
Tesseract Open Source OCR Engine v4.1.1 with Leptonica
Error in pixReadStreamJpeg: function not present
Error in pixReadStream: jpeg: no pix returned
Error in pixRead: pix not read
Error during processing.

There must be something missing in the system, or in the build options.

leptonica-config.log
tesseract-config.log

Shree Devi Kumar

unread,
Feb 1, 2020, 12:26:20 AM2/1/20
to tesseract-ocr
The version of leptonica that you have

leptonica-1.79.0
  libpng 1.2.50 : zlib 1.2.8

Only has support for png. All others will fail.

You need to change leptonica build to include libtiff etc.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

Shree Devi Kumar

unread,
Feb 1, 2020, 2:07:47 AM2/1/20
to tesseract-ocr
For Debian you can also get the latest packages from https://notesalexp.org/tesseract-ocr/
--

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

Zdenko Podobny

unread,
Feb 1, 2020, 5:29:54 AM2/1/20
to tesser...@googlegroups.com
If you are building any software from source you should be familiar with you system and have knowledge about building process.
What you wrote indicate your system have installed only runtime libraries and you are missing devel packages.

Zdenko


so 1. 2. 2020 o 1:18 lundissimo <lu...@lundissimo.info> napísal(a):
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

lundissimo

unread,
Feb 1, 2020, 4:12:14 PM2/1/20
to tesseract-ocr
Thank you. I'm realizing that this is the most likely problem. I am NOT very familiar with the system; it is a web server managed by others, and it is no surprise that the system would have been configured with only the components necessary to serve the websites, not the devel packages. I'll consult with the hosting service to see if they can help.
Reply all
Reply to author
Forward
0 new messages