LSTM files

101 views
Skip to first unread message

Zohreh Khosrobeygi

unread,
Aug 13, 2018, 6:16:09 AM8/13/18
to tesseract-ocr
Hi, 
I have been training persian language. My text is too large so I had to generated 18 boxfiles and 18 tifs for one text. Then I make on unicharset for all 18 files. Now when I want to make lstm file, it just create one lstm,"fas.B_Mitra.exp0.lstmf" and can't create for 1 ot 18. 
I test somthing else. I created "fas.B_Nazanin.exp0.lstmf" too and use lstmtraining but it just used fas.B_Mitra.exp0.lstmf and did not use another.
How can I make a lstm for all my boxes?
Thx.

zwwts...@gmail.com

unread,
Aug 14, 2018, 5:12:26 AM8/14/18
to tesseract-ocr
you should use tessearct command for each of your box/tif pair
tesseract ${dir}/lang.font.exp0.tif ${dir}/lang.font.exp0 lstm.train
and then put all the lstm files together in training_files.txt

在 2018年8月13日星期一 UTC+8下午6:16:09,Zohreh Khosrobeygi写道:

Khosrobeigy.zohreh

unread,
Aug 14, 2018, 6:04:48 AM8/14/18
to tesser...@googlegroups.com
Sorry, I couldn't understand. 
Could you please explain more this "and then put all the lstm files together in training_files.txt"

--
You received this message because you are subscribed to a topic in the Google Groups "tesseract-ocr" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tesseract-ocr/928-Wfn5rGs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/97a24af8-fb96-402a-a15b-1e6a7df405ca%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Zohreh Khosrobeygi
University of Tehran, 2016

zwwts...@gmail.com

unread,
Aug 14, 2018, 8:26:57 AM8/14/18
to tesseract-ocr
I mean put all the file path in this file, then running the lstmtraining
# cat eng.training_files.txt
/home/tess-ocr/model_output/test//eng.Arial.exp0.lstmf
/home/tess-ocr/model_output/test//eng.Microsoft_YaHei.exp0.lstmf
/home/tess-ocr/model_output/test//eng.Times_New_Roman.exp0.lstmf


在 2018年8月14日星期二 UTC+8下午6:04:48,Zohreh Khosrobeygi写道:
Sorry, I couldn't understand. 
Could you please explain more this "and then put all the lstm files together in training_files.txt"
On Tue, Aug 14, 2018 at 1:19 PM, <zwwts...@gmail.com> wrote:
you should use tessearct command for each of your box/tif pair
tesseract ${dir}/lang.font.exp0.tif ${dir}/lang.font.exp0 lstm.train
and then put all the lstm files together in training_files.txt

在 2018年8月13日星期一 UTC+8下午6:16:09,Zohreh Khosrobeygi写道:
Hi, 
I have been training persian language. My text is too large so I had to generated 18 boxfiles and 18 tifs for one text. Then I make on unicharset for all 18 files. Now when I want to make lstm file, it just create one lstm,"fas.B_Mitra.exp0.lstmf" and can't create for 1 ot 18. 
I test somthing else. I created "fas.B_Nazanin.exp0.lstmf" too and use lstmtraining but it just used fas.B_Mitra.exp0.lstmf and did not use another.
How can I make a lstm for all my boxes?
Thx.

--
You received this message because you are subscribed to a topic in the Google Groups "tesseract-ocr" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tesseract-ocr/928-Wfn5rGs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to tesseract-oc...@googlegroups.com.

Khosrobeigy.zohreh

unread,
Aug 14, 2018, 9:11:37 AM8/14/18
to tesser...@googlegroups.com
ok, but I have some tif and box files for each font for example:
fas.B_Mitra.exp0.box
fas.B_Mitra.exp0.tif
fas.B_Mitra.exp1.box
fas.B_Mitra.exp1.tif
fas.B_Mitra.exp2.box
fas.B_Mitra.exp2.tif
.
.
.
How can I make lstm for each of them?



To unsubscribe from this group and all its topics, send an email to tesseract-ocr+unsubscribe@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

For more options, visit https://groups.google.com/d/optout.

zwwts...@gmail.com

unread,
Aug 14, 2018, 9:43:01 AM8/14/18
to tesseract-ocr
tesseract fas.B_Mitra.exp0.tif fas.B_Mitra.exp0 lstm.train
tesseract fas.B_Mitra.exp1.tif fas.B_Mitra.exp1 lstm.train
.
.
.
you can try these.
I'm not quite sure, since I didn't doing like this before.


在 2018年8月14日星期二 UTC+8下午9:11:37,Zohreh Khosrobeygi写道:
Reply all
Reply to author
Forward
0 new messages