Segmentation fault 3.04.01

1,139 views
Skip to first unread message

MiguelArmas

unread,
Mar 14, 2016, 3:25:16 AM3/14/16
to tesseract-ocr
Hello,

I'm getting the following error when running tesseract --list-langs

fseek(data_file_, static_cast<size_t>(offset_table_[tessdata_type]), SEEK_SET) == 0:Error:Assert failed:in file ../ccutil/tessdatamanager.h, line 173
Segmentation fault

I have compiled tesseract 3.04.01 and leptonica 1.73 on Amazon EC2 linux. eng.trainneddata downloaded from https://github.com/tesseract-ocr/tessdata is placed inside tessdata/

Any idea what's wrong?

Thanks,

Miguel

Tom Morris

unread,
Mar 14, 2016, 12:53:41 PM3/14/16
to tesseract-ocr
I'd double check where your TESSDATA_PREFIX points to. It should be the *parent* of the tessdata directory. I get a similar symptom if I run tesseract --list-langs with the variable set wrong (which is a bug, but one that's on a pretty rare path).

Tom 

MiguelArmas

unread,
Mar 14, 2016, 9:35:36 PM3/14/16
to tesseract-ocr
Thanks Tom,

TESSDATA_PREFIX is set to point to tessdata parent:

[user@server ~]$ echo $TESSDATA_PREFIX
/usr/local/share/
[user@server ~]$ tesseract --list-langs
fseek(data_file_, static_cast<size_t>(offset_table_[tessdata_type]), SEEK_SET) == 0:Error:Assert failed:in file ../ccutil/tessdatamanager.h, line 173
Segmentation fault

I haven't been able to find what the problem is and nothing comes when googling the error on line 173.
That's why I decided to post here and hope someone can help.

Thanks again,

Miguel

Tom Morris

unread,
Mar 15, 2016, 2:48:48 AM3/15/16
to tesser...@googlegroups.com
After I sent my note, I realized that the real problem wasn't that it was pointing to the wrong place, but rather that it was pointing to a place with an invalid/corrupt (ie from a different version) tessdata file. Any chance you've got a mismatched language file or that it got corrupted/truncated during the download?

Tom

ShreeDevi Kumar

unread,
Mar 15, 2016, 3:45:07 AM3/15/16
to tesser...@googlegroups.com
tesseract --list-langs --tessdata-dir  /usr/local/share/

​Try specifying the directory in the command line.

I have tessdata in two different places and I can list them as follows:


User@HP MINGW32 ~/tesseract-ocr
$ tesseract --list-langs --tessdata-dir ./
List of available languages (17):
ara
deu
deu_frak
eng
equ
guj
hin
iast
kan
mar
osd
rus
san
san1
san2
san3
tam

User@HP MINGW32 ~/tesseract-ocr
$ tesseract --list-langs --tessdata-dir /mingw32/share/
List of available languages (8):
ara
deu
eng
heb
hin
iast
osd
san


ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/978f9c2b-b687-4861-a134-5798d3111572%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

MiguelArmas

unread,
Mar 15, 2016, 6:05:24 PM3/15/16
to tesseract-ocr
Tom, you were absolutely right, my lang file was corrupted (only 33K from a total of about 21M).
I downloaded the file again (from https://github.com/tesseract-ocr/tessdata) and checked it was the right size... voila!!!
Now the lang is being detected by tesseract as expected.

Thank you very much for your tips and help!

Shree: Thanks for your answer, as you can see above the problem was with the lang file and not with the location.

Miguel
Reply all
Reply to author
Forward
0 new messages