Italian - Missing special-words

1,178 views
Skip to first unread message

bácsi Kazi

unread,
Jan 10, 2016, 10:46:13 AM1/10/16
to tesseract-ocr
Hi,

Finally I could build my portable 3.05dev install with CygWin (without training, because I got errors while building - ideas welcome). I'm now using the Italian language files from GitHub, but I keep on getting the error "failed to load .../special-words".
The output seems fine, but it's annoying. Among the training files I can see this on GitHub, and if I place it to the tessdata folder, the error message disappears. Isn't it a building error of langdata?
Is this "Detected n diacritics" a normal warning (there was no such in 3.02)?
Greetings:

Kazi

John Muccigrosso

unread,
May 28, 2016, 8:33:52 AM5/28/16
to tesseract-ocr
Newcomer to tesseract and I'm having this same problem. The process ran and created a good file, but I got multiple instances of this error. I do not see any such file in the github repository.

Installed via homebrew on Mavericks and I linked in the italian-language files from github.

Thanks for any help.

Marco Atzeri

unread,
May 28, 2016, 8:49:01 AM5/28/16
to tesser...@googlegroups.com
For hints on building looks on cygwin tesseract 3.04.01 source package.
You can download it with cygwin setup.

The content list, reported on:
https://cygwin.com/packages/x86_64/tesseract-ocr/tesseract-ocr-3.04.01-1-src

is
tesseract-ocr-3.04.01-1.src/tesseract-3.04.01.tar.gz
tesseract-ocr-3.04.01-1.src/tesseract-ocr.cygport
tesseract-ocr-3.04.01-1.src/tesseract-training.patch
tesseract-ocr-3.04.01-1.src/tesseract-undefined.patch

There is a patch for training and one for building shared libs.
The cygport contains the build setup.


Regards
Marco

cygwin package maintainer

John Muccigrosso

unread,
May 29, 2016, 2:41:04 PM5/29/16
to tesseract-ocr
Just to be clear: the error I got when running on OS X was the "missing special-words" one. 

John Muccigrosso

unread,
Jun 5, 2017, 9:59:38 AM6/5/17
to tesseract-ocr
Checking in on this. It's still occurring for me with italian on OS X, Tesseract Open Source OCR Engine v3.05.00 with Leptonic.

Error: failed to load /usr/local/Cellar/tesseract/3.05.00_1/share/tessdata/ita.special-words

ShreeDevi Kumar

unread,
Jun 5, 2017, 10:07:59 AM6/5/17
to tesser...@googlegroups.com
File is there in langdata


and is referred to in the language config file




ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Mon, Jun 5, 2017 at 7:29 PM, John Muccigrosso <jmuc...@gmail.com> wrote:
Checking in on this. It's still occurring for me with italian on OS X, Tesseract Open Source OCR Engine v3.05.00 with Leptonic.

Error: failed to load /usr/local/Cellar/tesseract/3.05.00_1/share/tessdata/ita.special-words

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/b0b70fd2-5c5e-4fcf-8869-3f852194e141%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

John Muccigrosso

unread,
Jun 5, 2017, 10:32:53 AM6/5/17
to tesseract-ocr


On Monday, June 5, 2017 at 10:07:59 AM UTC-4, shree wrote:

Thanks.

I'm doing this by installing tesseract via homebrew, then keeping a local copy of tessdata via github. tessdata doesn't have the special-words file (which in this case is only two lines anyway). Perhaps it should?
Reply all
Reply to author
Forward
Message has been deleted
0 new messages