Installation of Tesseract and some of its dependencies from source on CentOS

40 views
Skip to first unread message

Fatih Ertinaz

unread,
Oct 16, 2018, 11:46:27 PM10/16/18
to tesseract-ocr
Hello all

I could not find a proper documentation explaining each step explicitly regarding the compilation from source including leptonica and some other image libs. 

Therefore I created my own scripts and wanted to share them here. Hopefully others might benefit as well.

Starting with libpng, libjpeg and libtiff (looks like libz I had was already the most recent). 
Otherwise having the exact same issue as:
  • For libjepg:
#!/bin/sh

# Download package from:
# wget http://www.ijg.org/files/jpegsrc.v9c.tar.gz
# Then extract to $base/jpeg/ folder.
# This script will take care of the rest.

# Package details
base=$HOME/image-libs
pkg
=jpeg
vrs
=9c

# Go to source folder
cd $base
/$pkg/$pkg-$vrs

# Run configure
./configure

# Compile and install
make
make test
sudo make install
  • For libpng:
#!/bin/sh

# Download package from:
# wget https://download.sourceforge.net/libpng/libpng-1.6.35.tar.gz
# Then extract to $base/png/ folder.
# This script will take care of the rest.

# Package details
base=$HOME/image-libs
pkg
=libpng
vrs
=1.6.35

/home/fertinaz/image-libs/png/libpng-1.6.35
# Go to source folder
cd $base
/png/$pkg-$vrs

# Run configure
./configure

# Compile and install
make check
sudo make install

  • For libtiff:
#!/bin/sh

# Download package from:
# wget http://download.osgeo.org/libtiff/tiff-4.0.9.tar.gz
# Then extract to $base/tiff/ folder.
# This script will take care of the rest.

# Package details
base=$HOME/image-libs
pkg
=tiff
vrs
=4.0.9

# Go to source folder
cd $base
/$pkg/$pkg-$vrs

# Run configure
./configure

# Compile and install
make
sudo make install
  • Assuming that everything went fine, you can move to next step which is compiling leptonica:
#!/bin/sh

# Get the leptonica source.
# I located it in $HOME/leptonica

# Package details
base=$HOME
pkg
=leptonica

# Go to tesseract repo
cd $base
/$pkg

# Start from scratch each time this script is called
rm
-rf build && mkdir build && cd build

# Compile and install
cmake
.. -DBUILD_PROG=1
make
sudo make install
  • Now you should be able to start compiling tesseract-ocr:
#!/bin/sh

# Created clone of the git repo under $base
# Then installed to $prf

# Package details
base=$HOME/Tesseract
pkg
=tesseract

prf
=$base/$pkg/install

# Go to tesseract repo
cd $base
/$pkg

# Generates configuration script
./autogen.sh

# Run configure
./configure --prefix=$prf --enable-debug

# Compile and install -- optional check also not sure how necessary is ldconfig when prf is used
make
make check
make install
sudo ldconfig

  • Since this scripts installs tesseract to a specific location, you need to declare the following variable:
export TESSDATA_PREFIX=${tesserac_root}/share/tessdata
  • You can put it to your run scripts or ".bashrc". That's up to you.

  • One thing that is forgotten is I guess following files needs to be copied to "$install/share/tessdata":
$:~/Tesseract$ ls lang-packs/
eng
.traineddata  eng.user-patterns  eng.user-words

Hope this helps to others who want to compile from scratch.

// Fatih

Zdenko Podobny

unread,
Oct 17, 2018, 2:37:20 AM10/17/18
to tesser...@googlegroups.com
  1. Why you are building debug tesseract?
  2. Why you are mixing build tools (cmake for leptonica and autotool for rest)? There was reported issue regarding this mix in case of leptonica->tesseract...
  3. jped. png, tiff are common lib heavily used by desktop system. Replacing system-wide libs with custom build is very bad idea, because you can break think you have no clue. Using system provided libs is much better solution.

Zdenko


st 17. 10. 2018 o 5:46 Fatih Ertinaz <fert...@gmail.com> napísal(a):
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/7569f18e-f1b0-4cf3-bf31-c39570d2d524%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Fatih Ertinaz

unread,
Oct 17, 2018, 8:32:25 AM10/17/18
to tesseract-ocr
Hello Zdenko

Thank you for the feedback.

Why you are building debug tesseract?

No particular reason, just out of curiosity. I'd like to debug at some point.

Why you are mixing build tools (cmake for leptonica and autotool for rest)? There was reported issue regarding this mix in case of leptonica->tesseract...

I've cloned git repo for leptonica which didn't contain a binary configure file. It should be available in the tar.gz I presume. I didn't have an issue in the initial tests and that would be odd but good to know.

jped. png, tiff are common lib heavily used by desktop system. Replacing system-wide libs with custom build is very bad idea, because you can break think you have no clue. Using system provided libs is much better solution.

Was too lazy to compile them with prefixes and update environment for each one. With the default versions, I had the problem mentioned in the first post so that's why I compiled them too. Nothing broken but thank you again.

One correction to title: This is a Ubuntu-16.04 system not CentOS. 

In general I am not suggesting compiling from scratch. I am just practicing and playing because of curiosity. Also, on Ubuntu-16.04 package manager installed older versions so that's why I started compiling from source. Additionally, I'd like testing various versions and this way I keep my env organised. 

Thank you

// Fatih
Reply all
Reply to author
Forward
0 new messages