Failed loading language

218 views
Skip to first unread message

Nuno Feliciano

unread,
Sep 9, 2019, 12:06:08 PM9/9/19
to tesseract-ocr




Hi,

I am trying to make a model from scratch.
I created a language using 
combine_lang_model --input_unicharset D:\software\Tesseract-OCR-4.0\tessdata\Latin.unicharset --script_dir D:\software\Tesseract-OCR-4.0\tessdata --output_dir D:\software\Tesseract-OCR-4.0\training\output --lang ccy
Than I put the generated ccy.traineddata file in tessdata folder and I execute
tesseract --tessdata-dir D:\software\Tesseract-OCR-4.0\tessdata -l ccy <file> stdout, which gives me
Failed loading language 'ccy'
Tesseract couldn't load any languages!
Could not initialize tesseract.

tesseract --list-langs gives me
ccy
eng
osd
...


Can anyone help?

Thanks,
Nuno Feliciano
Latin.unicharset

Shree Devi Kumar

unread,
Sep 9, 2019, 12:09:39 PM9/9/19
to tesseract-ocr
Combine-lang-model only creates the starter traineddata. It is used as part of lstm training process. It cannot be used for recognition. 

Training from scratch requires running the lstmtraing command.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/f0157ef9-7b83-4fa3-8cf5-3697514d6de0%40googlegroups.com.

Nuno Feliciano

unread,
Sep 10, 2019, 10:10:20 AM9/10/19
to tesseract-ocr

Thanks for the quick reply. The first time I got the error was after the learning process, so I did a step backwards to replicate the error.

When I train the model
lstmtraining 
--traineddata D:/software/Tesseract-OCR-4.0/tessdata/ccy.traineddata 
-U D:/software/Tesseract-OCR/tessdate/Latin.unicharset 
--train_listfile D:/software/Tesseract-OCR/training/list.train 
--net_spec
 "[1,40,0,1 Ct5,5,64 Mp3,3 Lfys128 Lbx256 Lbx256 O1c1]" 
 --model_output D:/software/Tesseract-OCR/training/model/output

 I get a file named output_checkpoint with 200MB. I renamed it to ccy.traineddata and put it in the tessdata folder. Is this how it's supposed to do?
Then know When I execute the OCR I get
Error opening data file D:\software\Tesseract-OCR-4.0\tessdata/ccy.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language 'ccy'
Tesseract couldn't load any languages!
Could not initialize tesseract.

The file exists, and I can open in a text editor.

Is there a way to check if a traineddata file is valid?

Thanks,
Nuno

segunda-feira, 9 de Setembro de 2019 às 17:09:39 UTC+1, shree escreveu:
Combine-lang-model only creates the starter traineddata. It is used as part of lstm training process. It cannot be used for recognition. 

Training from scratch requires running the lstmtraing command.

On Mon, Sep 9, 2019, 21:36 Nuno Feliciano <nfeli...@gmail.com> wrote:




Hi,

I am trying to make a model from scratch.
I created a language using 
combine_lang_model --input_unicharset D:\software\Tesseract-OCR-4.0\tessdata\Latin.unicharset --script_dir D:\software\Tesseract-OCR-4.0\tessdata --output_dir D:\software\Tesseract-OCR-4.0\training\output --lang ccy
Than I put the generated ccy.traineddata file in tessdata folder and I execute
tesseract --tessdata-dir D:\software\Tesseract-OCR-4.0\tessdata -l ccy <file> stdout, which gives me
Failed loading language 'ccy'
Tesseract couldn't load any languages!
Could not initialize tesseract.

tesseract --list-langs gives me
ccy
eng
osd
...


Can anyone help?

Thanks,
Nuno Feliciano

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesser...@googlegroups.com.

Shree Devi Kumar

unread,
Sep 10, 2019, 10:25:16 AM9/10/19
to tesseract-ocr
>I get a file named output_checkpoint with 200MB. I renamed it to ccy.traineddata and put it in the tessdata folder. Is this how it's supposed to do?


>Is there a way to check if a traineddata file is valid?


-d .traineddata FILE…: Lists directory of components from the .traineddata file.  

combine_tessdata -d tessdata/eng.traineddata 

To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/9a4f9c1d-009a-4420-a662-26b2678e253a%40googlegroups.com.


--

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

Nuno Feliciano

unread,
Sep 10, 2019, 11:28:25 AM9/10/19
to tesseract-ocr
Thanks a lot, shree!
Reply all
Reply to author
Forward
0 new messages