New issue 595 by chef...@gmail.com: Crashbug in tesseract 3.0.1
http://code.google.com/p/tesseract-ocr/issues/detail?id=595
Using tesseract with custom OCRB training to decode ID cards and passports
machine readable zone (MRZ). Usually works well except for a few particular
cases (decoding ~ 50000 images triggered this bug a few times)
What steps will reproduce the problem?
tesseract /tmp/char.bmp tmp -l ocrb -psm 10 && cat tmp.txt
Tesseract Open Source OCR Engine v3.01 with Leptonica
Erreur de segmentation
Running with gdb:
Program received signal SIGSEGV, Segmentation fault.
0xb7e72332 in restore_underlined_blobs (block=0x81650c8) at underlin.cpp:65
65 &chop_cells);
(gdb) bt
#0 0xb7e72332 in restore_underlined_blobs (block=0x81650c8) at
underlin.cpp:65
#1 0xb7e22bc0 in tesseract::Textord::cleanup_rows_fitting (this=0x80a78e8,
page_tr=..., block=0x81650c8, gradient=0, rotation=..., block_edge=0,
testing_on=1 '\001') at makerow.cpp:623
#2 0xb7e22e15 in tesseract::Textord::fit_rows (this=0x80a78e8, gradient=0,
page_tr=..., blocks=0xbfffd984) at makerow.cpp:225
#3 0xb7e55ccd in tesseract::Textord::TextordPage (this=0x80a78e8,
pageseg_mode=tesseract::PSM_SINGLE_CHAR, width=24, height=33,
pix=0x8164760, blocks=0x80aa570,
to_blocks=0xbfffd984) at textord.cpp:306
#4 0xb7da0db0 in tesseract::Tesseract::SegmentPage (this=0x809c798,
input_file=0x80aa508, blocks=0x80aa570, osd_tess=0x0, osr=0xbfffd9fc) at
pagesegmain.cpp:177
#5 0xb7d794cc in tesseract::TessBaseAPI::FindLines (this=0xbffff334) at
baseapi.cpp:1413
#6 0xb7d798e0 in tesseract::TessBaseAPI::Recognize (this=0xbffff334,
monitor=0x0) at baseapi.cpp:523
#7 0xb7d7c185 in tesseract::TessBaseAPI::ProcessPage (this=0xbffff334,
pix=0x80aa4c0, page_index=0, filename=0xbffff607 "/tmp/char.bmp",
retry_config=0x0,
timeout_millisec=0, text_out=0xbffff384) at baseapi.cpp:732
#8 0xb7d7c4e2 in tesseract::TessBaseAPI::ProcessPages (this=0xbffff334,
filename=0xbffff607 "/tmp/char.bmp", retry_config=0x0, timeout_millisec=0,
text_out=0xbffff384) at baseapi.cpp:659
#9 0x08048fc2 in main (argc=7, argv=0xbffff454)
at ../api/tesseractmain.cpp:138
What is the expected output? What do you see instead?
What version of the product are you using? On what operating system?
Tesseract 3.0.1 runing on Linux
Attachments:
char.bmp 1.8 KB
ocrb.traineddata 307 KB
confirmed, but is still present
Crashes with English as well on this image.