Issue 970 in tesseract-ocr: --list-lang doesn't work without eng.traineddata

61 views
Skip to first unread message

tesser...@googlecode.com

unread,
Aug 20, 2013, 5:37:04 AM8/20/13
to tesserac...@googlegroups.com
Status: New
Owner: ----

New issue 970 by dsila...@gmail.com: --list-lang doesn't work without
eng.traineddata
http://code.google.com/p/tesseract-ocr/issues/detail?id=970

What steps will reproduce the problem?
1. Compile and build tesseract from current SVN
2. Install tesseract, libtesseract and some tesseract data except English
(e.g., rus.traineddata)
3. Try to launch 'tesseract --list-lang'

What is the expected output? What do you see instead?

I expect to see that Russian language support is installed, instead I get
an error:

Error opening data file /usr/share/tessdata/eng.traineddata
...
Failed loading language 'eng'
...

(/usr/share/tessdata/ is a correct location of tessdata files, but it only
contains rus.traineddata in my case).

What version of the product are you using? On what operating system?
SVN build (r866), ROSA Linux



--
You received this message because this project is configured to send all
issue notifications to this address.
You may adjust your notification preferences at:
https://code.google.com/hosting/settings

tesser...@googlecode.com

unread,
Aug 22, 2013, 1:48:20 AM8/22/13
to tesserac...@googlegroups.com

Comment #1 on issue 970 by gmv...@gmail.com: --list-lang doesn't work
without eng.traineddata
http://code.google.com/p/tesseract-ocr/issues/detail?id=970

you can use
tesseract -l rus --list-lang

tesser...@googlecode.com

unread,
Aug 22, 2013, 2:45:59 AM8/22/13
to tesserac...@googlegroups.com

Comment #2 on issue 970 by dsila...@gmail.com: --list-lang doesn't work
without eng.traineddata
http://code.google.com/p/tesseract-ocr/issues/detail?id=970

Hm, indeed, 'tesseract --list-lang -l rus' works fine in my case. But I
would expect --list-lang to work without specifying '-l'.

Also note that '--print-parameters' option suffers from the same problem.

tesser...@googlecode.com

unread,
Sep 2, 2013, 2:21:52 PM9/2/13
to tesserac...@googlegroups.com
Updates:
Status: WontFix

Comment #3 on issue 970 by zde...@gmail.com: --list-lang doesn't work
without eng.traineddata
http://code.google.com/p/tesseract-ocr/issues/detail?id=970

The reason for this is simple:
tesseract api has to be initiated to produce correct output. If you don't
specified language, than tesseract will use default language - English
(e.g. 'tesseract --list-lang' = 'tesseract --list-lang -l eng'.

The same situation is for '--print-parameters'. In this case language file
during init can modify parameters. Try to compare output of these command
for better understanding:
tesseract --print-parameters -l eng >a
tesseract --print-parameters -l hin >b
Reply all
Reply to author
Forward
0 new messages