Issue 641 in tesseract-ocr: tesseract segfaults on Centos 6.2

0 views
Skip to first unread message

tesser...@googlecode.com

unread,
Mar 1, 2012, 4:05:58 PM3/1/12
to tesserac...@googlegroups.com
Status: New
Owner: ----

New issue 641 by easternc...@gmail.com: tesseract segfaults on Centos 6.2
http://code.google.com/p/tesseract-ocr/issues/detail?id=641

What steps will reproduce the problem?

1.Just run with following command:
2./usr/local/bin/tesseract /dir/logo.gif.tif /dir/logo.gif

What is the expected output? What do you see instead?

The file should be converted into text format.


What version of the product are you using? On what operating system?

OS Info:
# uname -a
Linux 2.6.32-220.el6.i686 #1 SMP Tue Dec 6 16:15:40 GMT 2011 i686 i686 i386
GNU/Linux

#cat /etc/issue
CentOS release 6.2 (Final)
Kernel \r on an \m

# tesseract -v
tesseract 3.01

leptonica version is: leptonica-1.68


The input file is of following type:

# identify /dir/logo.gif.tif
/dir/logo.gif.tif TIFF 754x144 754x144+0+0 8-bit Grayscale DirectClass
23.3kb

Please provide any additional information below.
Following is the output from debug:

Program received signal SIGSEGV, Segmentation fault.
tesseract::Classify::ComputeIntCharNormArray (this=0x809c798,
NormFeature=0x80ddc10, Templates=0x0, CharNormArray=0xbfffb738 "") at
float2int.cpp:84
84 for (i = 0; i < Templates->NumClasses; i++) {
Missing separate debuginfos, use: debuginfo-install
glibc-2.12-1.47.el6_2.5.i686 libgcc-4.4.6-3.el6.i686 libjpeg-6b-46.el6.i686
libstdc++-4.4.6-3.el6.i686 libtiff-3.9.4-1.el6_0.3.i686
zlib-1.2.3-27.el6.i686
(gdb) where
#0 tesseract::Classify::ComputeIntCharNormArray (this=0x809c798,
NormFeature=0x80ddc10, Templates=0x0, CharNormArray=0xbfffb738 "") at
float2int.cpp:84
#1 0x002b32b2 in tesseract::Classify::GetIntCharNormFeatures
(this=0x809c798, Blob=0x80ab4a0, Templates=0x0, IntFeatures=0xbfffd738,
CharNormArray=0xbfffb738 "",
BlobLength=0xb7fb3008, FeatureOutlineArray=0x0) at adaptmatch.cpp:2066
#2 0x002b33a7 in tesseract::Classify::GetCharNormFeatures (this=0x809c798,
Blob=0x80ab4a0, Templates=0x0, IntFeatures=0xbfffd738,
CharNormArray=0xbfffb738 "",
BlobLength=0xb7fb3008, FeatureOutlineIndex=0x0) at adaptmatch.cpp:1916
#3 0x002b4940 in tesseract::Classify::CharNormClassifier (this=0x809c798,
Blob=0x80ab4a0, Templates=0x0, Results=0xb7fb3008) at adaptmatch.cpp:1389
#4 0x002b54ad in tesseract::Classify::DoAdaptiveMatch (this=0x809c798,
Blob=0x80ab4a0, Results=0xb7fb3008) at adaptmatch.cpp:1626
#5 0x002b7aa2 in tesseract::Classify::AdaptiveClassifier (this=0x809c798,
Blob=0x80ab4a0, Choices=0x80ddc00, CPResults=0x0) at adaptmatch.cpp:183
#6 0x002ad310 in tesseract::Wordrec::call_matcher (this=0x809c798,
tessblob=0x80ab4a0) at tface.cpp:179
#7 0x002ad9ff in tesseract::Wordrec::classify_blob (this=0x809c798,
blob=0x80ab4a0, string=0x34f802 "chop_word:", color=Green) at
wordclass.cpp:71
#8 0x00298d34 in tesseract::Wordrec::chop_word_main (this=0x809c798,
word=0x80ab240) at chopper.cpp:510
#9 0x002ad444 in tesseract::Wordrec::cc_recog (this=0x809c798,
word=0x80ab240) at tface.cpp:121
#10 0x001d3add in tesseract::Tesseract::recog_word_recursive
(this=0x809c798, word=0x80ab240, blob_choices=0x80aa568) at tfacepp.cpp:114
#11 0x001d4b9d in tesseract::Tesseract::recog_word (this=0x809c798,
word=0x80ab240, blob_choices=0x80aa568) at tfacepp.cpp:55
#12 0x001c92e2 in tesseract::Tesseract::tess_segment_pass1 (this=0x809c798,
word=0x80ab240, blob_choices=0x80aa568) at tessbox.cpp:56
#13 0x001a6bce in tesseract::Tesseract::classify_word_pass1
(this=0x809c798, word=0x80ab240, row=0x80d4fd0, block=0x80ab6d0) at
control.cpp:490
#14 0x001a8aa6 in tesseract::Tesseract::recog_all_words (this=0x809c798,
page_res=0x80aa810, monitor=0x0, target_word_box=0x0, word_config=0x0,
dopasses=0) at control.cpp:264
#15 0x00193db1 in tesseract::TessBaseAPI::Recognize (this=0xbffff614,
monitor=0x0) at baseapi.cpp:559
#16 0x00196a35 in tesseract::TessBaseAPI::ProcessPage (this=0xbffff614,
pix=0x80a85f0, page_index=0,

filename=0xbffff880 "/var/www/vhosts/ocrconvert.com/httpdocs/processed/4f4fe0e4cb68e/logo.gif.tif",
retry_config=0x0, timeout_millisec=0, text_out=0xbffff664)
at baseapi.cpp:732
#17 0x00196d02 in tesseract::TessBaseAPI::ProcessPages (this=0xbffff614,
filename=0xbffff880 "/var/www/vhosts/ocrconvert.com/httpdocs/processed/4f4fe0e4cb68e/logo.gif.tif",
retry_config=0x0, timeout_millisec=0, text_out=0xbffff664) at
baseapi.cpp:648
#18 0x08048fc2 in main (argc=3, argv=0xbffff734)
at ../api/tesseractmain.cpp:138
(gdb)


tesser...@googlecode.com

unread,
Mar 2, 2012, 2:25:56 AM3/2/12
to tesserac...@googlegroups.com

Comment #1 on issue 641 by zde...@gmail.com: tesseract segfaults on Centos
6.2
http://code.google.com/p/tesseract-ocr/issues/detail?id=641

Can you please provide testing image? I can try it in tesseract from svn.

tesser...@googlecode.com

unread,
Mar 2, 2012, 4:07:11 AM3/2/12
to tesserac...@googlegroups.com

Comment #2 on issue 641 by easternc...@gmail.com: tesseract segfaults on
Centos 6.2
http://code.google.com/p/tesseract-ocr/issues/detail?id=641

I've attached the image, which is causing segfault.

Attachments:
logo.gif.tif 23.3 KB

tesser...@googlecode.com

unread,
Mar 2, 2012, 8:46:27 AM3/2/12
to tesserac...@googlegroups.com

Comment #3 on issue 641 by zde...@gmail.com: tesseract segfaults on Centos
6.2
http://code.google.com/p/tesseract-ocr/issues/detail?id=641

I can not reproduce problem. I tried it on Windows XP SP3 and it was
without problem. I tried it on Centos 5.7 (I do not have CentOS 6.2) with
tesseract 3.01 and 3.00 and it worked...
So it does not seem to be tesseract problem.

Did you build tesseract by yourself or did you installed it from some
package?
Can you send 'ldd /usr/local/bin/tesseract'?

tesser...@googlecode.com

unread,
Mar 2, 2012, 3:39:29 PM3/2/12
to tesserac...@googlegroups.com

Comment #4 on issue 641 by easternc...@gmail.com: tesseract segfaults on
Centos 6.2
http://code.google.com/p/tesseract-ocr/issues/detail?id=641

Yes, I compiled it.

Following is the output from ldd:

$ ldd /usr/local/bin/tesseract
linux-gate.so.1 => (0x00c3d000)
libtesseract.so.3 => /usr/local/lib/libtesseract.so.3 (0x00292000)
liblept.so.2 => /usr/local/lib/liblept.so.2 (0x00110000)
libpthread.so.0 => /lib/libpthread.so.0 (0x00ad0000)
libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x007bb000)
libm.so.6 => /lib/libm.so.6 (0x00560000)
libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x0058a000)
libc.so.6 => /lib/libc.so.6 (0x005a8000)
libtiff.so.3 => /usr/lib/libtiff.so.3 (0x00738000)
/lib/ld-linux.so.2 (0x00f5c000)
libjpeg.so.62 => /usr/lib/libjpeg.so.62 (0x00ba6000)
libz.so.1 => /lib/libz.so.1 (0x0079a000)


tesser...@googlecode.com

unread,
Mar 18, 2012, 12:49:40 AM3/18/12
to tesserac...@googlegroups.com

Comment #5 on issue 641 by courtney...@gmail.com: tesseract segfaults on
Centos 6.2
http://code.google.com/p/tesseract-ocr/issues/detail?id=641

Just to confirm: have you installed the required language data files (which
were updated for v3.01)? I can reproduce this exact error (i.e., a
SEGFAULT in ComputeIntCharNormArray) if the "XXX.traineddata" (and
associated files) aren't present (or aren't readable in the expected
location). The language-specific training files are available from the
downloads page.

tesser...@googlecode.com

unread,
Mar 18, 2012, 6:48:41 AM3/18/12
to tesserac...@googlegroups.com

Comment #6 on issue 641 by easternc...@gmail.com: tesseract segfaults on
Centos 6.2
http://code.google.com/p/tesseract-ocr/issues/detail?id=641

Hi,

Thanks, works like a charm after installing the language pack.

Can you please tell me how can I install all the languages pack?

tesser...@googlecode.com

unread,
Mar 18, 2012, 9:36:04 AM3/18/12
to tesserac...@googlegroups.com
Updates:
Status: No-longer-an-issue

Comment #7 on issue 641 by zde...@gmail.com: tesseract segfaults on Centos
6.2
http://code.google.com/p/tesseract-ocr/issues/detail?id=641

Is you downloaded/used tesseract-3.01.tar.gz, than you have to download all
relevant files manually and install them manually.

Or you can use svn version: install svn and than run:
$ svn checkout http://tesseract-ocr.googlecode.com/svn/trunk/ tesseract-ocr
$ cd tesseract-ocr

and follow instruction in INSTALL.svn[1] But be aware: language data files
are more than 600MB!

[1] http://code.google.com/p/tesseract-ocr/source/browse/trunk/INSTALL.SVN

tesser...@googlecode.com

unread,
Mar 18, 2012, 4:03:34 PM3/18/12
to tesserac...@googlegroups.com

Comment #8 on issue 641 by zde...@gmail.com: tesseract segfaults on Centos
6.2
http://code.google.com/p/tesseract-ocr/issues/detail?id=641

If you downloaded/used tesseract-3.01.tar.gz, than you have to download all

Reply all
Reply to author
Forward
0 new messages