New issue 641 by easternc...@gmail.com: tesseract segfaults on Centos 6.2
http://code.google.com/p/tesseract-ocr/issues/detail?id=641
What steps will reproduce the problem?
1.Just run with following command:
2./usr/local/bin/tesseract /dir/logo.gif.tif /dir/logo.gif
What is the expected output? What do you see instead?
The file should be converted into text format.
What version of the product are you using? On what operating system?
OS Info:
# uname -a
Linux 2.6.32-220.el6.i686 #1 SMP Tue Dec 6 16:15:40 GMT 2011 i686 i686 i386
GNU/Linux
#cat /etc/issue
CentOS release 6.2 (Final)
Kernel \r on an \m
# tesseract -v
tesseract 3.01
leptonica version is: leptonica-1.68
The input file is of following type:
# identify /dir/logo.gif.tif
/dir/logo.gif.tif TIFF 754x144 754x144+0+0 8-bit Grayscale DirectClass
23.3kb
Please provide any additional information below.
Following is the output from debug:
Program received signal SIGSEGV, Segmentation fault.
tesseract::Classify::ComputeIntCharNormArray (this=0x809c798,
NormFeature=0x80ddc10, Templates=0x0, CharNormArray=0xbfffb738 "") at
float2int.cpp:84
84 for (i = 0; i < Templates->NumClasses; i++) {
Missing separate debuginfos, use: debuginfo-install
glibc-2.12-1.47.el6_2.5.i686 libgcc-4.4.6-3.el6.i686 libjpeg-6b-46.el6.i686
libstdc++-4.4.6-3.el6.i686 libtiff-3.9.4-1.el6_0.3.i686
zlib-1.2.3-27.el6.i686
(gdb) where
#0 tesseract::Classify::ComputeIntCharNormArray (this=0x809c798,
NormFeature=0x80ddc10, Templates=0x0, CharNormArray=0xbfffb738 "") at
float2int.cpp:84
#1 0x002b32b2 in tesseract::Classify::GetIntCharNormFeatures
(this=0x809c798, Blob=0x80ab4a0, Templates=0x0, IntFeatures=0xbfffd738,
CharNormArray=0xbfffb738 "",
BlobLength=0xb7fb3008, FeatureOutlineArray=0x0) at adaptmatch.cpp:2066
#2 0x002b33a7 in tesseract::Classify::GetCharNormFeatures (this=0x809c798,
Blob=0x80ab4a0, Templates=0x0, IntFeatures=0xbfffd738,
CharNormArray=0xbfffb738 "",
BlobLength=0xb7fb3008, FeatureOutlineIndex=0x0) at adaptmatch.cpp:1916
#3 0x002b4940 in tesseract::Classify::CharNormClassifier (this=0x809c798,
Blob=0x80ab4a0, Templates=0x0, Results=0xb7fb3008) at adaptmatch.cpp:1389
#4 0x002b54ad in tesseract::Classify::DoAdaptiveMatch (this=0x809c798,
Blob=0x80ab4a0, Results=0xb7fb3008) at adaptmatch.cpp:1626
#5 0x002b7aa2 in tesseract::Classify::AdaptiveClassifier (this=0x809c798,
Blob=0x80ab4a0, Choices=0x80ddc00, CPResults=0x0) at adaptmatch.cpp:183
#6 0x002ad310 in tesseract::Wordrec::call_matcher (this=0x809c798,
tessblob=0x80ab4a0) at tface.cpp:179
#7 0x002ad9ff in tesseract::Wordrec::classify_blob (this=0x809c798,
blob=0x80ab4a0, string=0x34f802 "chop_word:", color=Green) at
wordclass.cpp:71
#8 0x00298d34 in tesseract::Wordrec::chop_word_main (this=0x809c798,
word=0x80ab240) at chopper.cpp:510
#9 0x002ad444 in tesseract::Wordrec::cc_recog (this=0x809c798,
word=0x80ab240) at tface.cpp:121
#10 0x001d3add in tesseract::Tesseract::recog_word_recursive
(this=0x809c798, word=0x80ab240, blob_choices=0x80aa568) at tfacepp.cpp:114
#11 0x001d4b9d in tesseract::Tesseract::recog_word (this=0x809c798,
word=0x80ab240, blob_choices=0x80aa568) at tfacepp.cpp:55
#12 0x001c92e2 in tesseract::Tesseract::tess_segment_pass1 (this=0x809c798,
word=0x80ab240, blob_choices=0x80aa568) at tessbox.cpp:56
#13 0x001a6bce in tesseract::Tesseract::classify_word_pass1
(this=0x809c798, word=0x80ab240, row=0x80d4fd0, block=0x80ab6d0) at
control.cpp:490
#14 0x001a8aa6 in tesseract::Tesseract::recog_all_words (this=0x809c798,
page_res=0x80aa810, monitor=0x0, target_word_box=0x0, word_config=0x0,
dopasses=0) at control.cpp:264
#15 0x00193db1 in tesseract::TessBaseAPI::Recognize (this=0xbffff614,
monitor=0x0) at baseapi.cpp:559
#16 0x00196a35 in tesseract::TessBaseAPI::ProcessPage (this=0xbffff614,
pix=0x80a85f0, page_index=0,
filename=0xbffff880 "/var/www/vhosts/ocrconvert.com/httpdocs/processed/4f4fe0e4cb68e/logo.gif.tif",
retry_config=0x0, timeout_millisec=0, text_out=0xbffff664)
at baseapi.cpp:732
#17 0x00196d02 in tesseract::TessBaseAPI::ProcessPages (this=0xbffff614,
filename=0xbffff880 "/var/www/vhosts/ocrconvert.com/httpdocs/processed/4f4fe0e4cb68e/logo.gif.tif",
retry_config=0x0, timeout_millisec=0, text_out=0xbffff664) at
baseapi.cpp:648
#18 0x08048fc2 in main (argc=3, argv=0xbffff734)
at ../api/tesseractmain.cpp:138
(gdb)
Can you please provide testing image? I can try it in tesseract from svn.
I've attached the image, which is causing segfault.
Attachments:
logo.gif.tif 23.3 KB
I can not reproduce problem. I tried it on Windows XP SP3 and it was
without problem. I tried it on Centos 5.7 (I do not have CentOS 6.2) with
tesseract 3.01 and 3.00 and it worked...
So it does not seem to be tesseract problem.
Did you build tesseract by yourself or did you installed it from some
package?
Can you send 'ldd /usr/local/bin/tesseract'?
Yes, I compiled it.
Following is the output from ldd:
$ ldd /usr/local/bin/tesseract
linux-gate.so.1 => (0x00c3d000)
libtesseract.so.3 => /usr/local/lib/libtesseract.so.3 (0x00292000)
liblept.so.2 => /usr/local/lib/liblept.so.2 (0x00110000)
libpthread.so.0 => /lib/libpthread.so.0 (0x00ad0000)
libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x007bb000)
libm.so.6 => /lib/libm.so.6 (0x00560000)
libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x0058a000)
libc.so.6 => /lib/libc.so.6 (0x005a8000)
libtiff.so.3 => /usr/lib/libtiff.so.3 (0x00738000)
/lib/ld-linux.so.2 (0x00f5c000)
libjpeg.so.62 => /usr/lib/libjpeg.so.62 (0x00ba6000)
libz.so.1 => /lib/libz.so.1 (0x0079a000)
Just to confirm: have you installed the required language data files (which
were updated for v3.01)? I can reproduce this exact error (i.e., a
SEGFAULT in ComputeIntCharNormArray) if the "XXX.traineddata" (and
associated files) aren't present (or aren't readable in the expected
location). The language-specific training files are available from the
downloads page.
Hi,
Thanks, works like a charm after installing the language pack.
Can you please tell me how can I install all the languages pack?
Comment #7 on issue 641 by zde...@gmail.com: tesseract segfaults on Centos
6.2
http://code.google.com/p/tesseract-ocr/issues/detail?id=641
Is you downloaded/used tesseract-3.01.tar.gz, than you have to download all
relevant files manually and install them manually.
Or you can use svn version: install svn and than run:
$ svn checkout http://tesseract-ocr.googlecode.com/svn/trunk/ tesseract-ocr
$ cd tesseract-ocr
and follow instruction in INSTALL.svn[1] But be aware: language data files
are more than 600MB!
[1] http://code.google.com/p/tesseract-ocr/source/browse/trunk/INSTALL.SVN
If you downloaded/used tesseract-3.01.tar.gz, than you have to download all