Unable to load unicharset file /root/Download/pytesser/tessdata/eng.unicharset

eugenwpg

unread,

Jun 11, 2009, 2:40:59 AM6/11/09

to tesseract-ocr

The only thing in the README file about language packs says:
Tesseract ... can recognize 6 languages "out of the box."

Aha, perhaps you meant http://code.google.com/p/tesseract-ocr/wiki/ReadMe
which at least says something, although this is no longer all that
merry a chase. Finally in the ReleaseNotes for v2.01 it says "No new
data files for the original 6 languages. Use the files from v2.00".
So in order to get v2.03 to work, after installing it according to
directions, one then has to install an old version in order to obtain
these language files? This is so hard to believe, I wonder whether
I'm reading what I'm reading.

svaram

unread,

Jun 11, 2009, 5:11:29 AM6/11/09

to tesseract-ocr

we do not have to install old version .
the language data files are available as individual files :)

* tesseract-2.00.eng.tar.gz --- English
* tesseract-2.00.nld.tar.gz --- Dutch
* tesseract-2.00.spa.tar.gz --- Spanish
* tesseract-2.00.deu.tar.gz --- German
* tesseract-2.00.ita.tar.gz --- Italian
* tesseract-2.00.fra.tar.gz --- French

the English data has worked for me in recognizing
text from the scanning of a very old book [1932]

Am I missing something from your mail ...

Ray Smith

unread,

Jun 11, 2009, 3:02:03 PM6/11/09

to tesser...@googlegroups.com

I have made this clearer and bigger on the home page (which everybody merrily ignores anyway) and in the ReadMe wiki.

Also updated the FAQ to point to the wiki page. A lot of users have had trouble understanding this, Hopefully it will be clearer now. It will be very important for 3.00, as there will be a lot more languages.

Ray

Reply all

Reply to author

Forward