I do not include 'chi_tra' in my tessdata folder . What is it ? I have seen language-specific.sh

402 views
Skip to first unread message

이경준

unread,
Mar 10, 2018, 2:18:48 AM3/10/18
to tesseract-ocr
Hi i'm sorry to question oftenly. and lots of questions.

But, I must use tesseract 4.0 for my business .

plz understand my situations. I have lots of family to raise.


ealier you gave me a bash sciprt . In there tesstrain.sh (course) . it give me an error like


Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language 'chi_tra'
Error opening data file /usr/share/tesseract-ocr/4.00/tessdata/chi_tra.traineddata


Before, you gave me a conference . it froms the lang directory / kor.config.


in there #Fixes https://github.com/tesseract-ocr/tesseract/issues/1009
preserve_interword_spaces 1

tessedit_load_sublangs chi_tra

# New Segmentation search params



So I guess "tessedit_load_sublangs chi_tra" cause to error for executing "tesstrain.sh"

So I conclude(for solution) 1) Delete that sentence -> Is it right ? or what is the side-effect
                           

   I want to have 1 traineddata which is fine tuned and for 2 langugages (korean & English)

  so is it possible to add the sentece like  1-1)"tessedit_load_sublangs eng"-> Is it right? or possible???


In conclusion

1)
I do not want to see like error " Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language 'chi_tra'
Error opening data file /usr/share/tesseract-ocr/4.00/tessdata/chi_tra.traineddata "


2) If I want to use tessereract(4.0) for 2 languages(e.g. Korean, English) by 1(one) traineddata(which is fine tuned)

Is it possible and How to make 1 finedtuned traineddata for 2 languages(e.g Korean, English)

3) tesseract is possible to use like

$ tesseract (picture.png) -l kor+eng

is it possible ?????

4) What is kor.vert traineddata ? (tessdata-best)

What is different from kor.traineddata ???

5) Is it possible to fine tune by existing images??? How is it possible to use script you gave me

ShreeDevi Kumar

unread,
Mar 10, 2018, 2:24:01 AM3/10/18
to tesser...@googlegroups.com
I hope someone who knows Korean can answer your questions.



--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/5427cba9-411f-42fa-91a0-989d983a3694%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

이경준

unread,
Mar 10, 2018, 7:56:49 AM3/10/18
to tesseract-ocr
Sorry ... I just want to know tesseract4.0 sorry

Gonil Rho

unread,
Mar 10, 2018, 8:27:15 AM3/10/18
to tesseract-ocr
2), 3):

I'm wondering about using tesseract 4.0 for multiple language, too.

After searching & testing a while, I found that it seems not working the old method for tesseract 3. (e.g. running with '-l lang1+lang2' option)
Is there any other method that I have to try?

Or I have to train tesseract with two languages at the same time? 


2018년 3월 10일 토요일 오후 4시 18분 48초 UTC+9, 이경준 님의 말:

ShreeDevi Kumar

unread,
Mar 10, 2018, 8:36:28 AM3/10/18
to tesser...@googlegroups.com
Lang1+lang2 should work. If it does not, please open an issue with an example image.

If lang2 is English, you may want to try the script level traineddata, which includes English with the other languages . 

Please take a look at the readme file in tessdata_fast which explains about script level files in more details.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

이경준

unread,
Mar 11, 2018, 5:38:53 AM3/11/18
to tesseract-ocr
Thank you for replying my questions. Thank you
Reply all
Reply to author
Forward
0 new messages