What is difference between "unicharset file" and "lstm-unicharset file"

743 views
Skip to first unread message

이경준

unread,
Mar 1, 2018, 7:52:33 PM3/1/18
to tesseract-ocr

Hi . Thank you for seeing my questions 

1. What is difference between 'unicharset' and 'lstm-unicharset' ?  

I know to make 'unicharset' by command line : "$ tesseract (lang).(filename).exp(num).tif  (lang).(filename).exp(num).box

But I don't know to make 'lstm-unicharset'  ???

cf) .tr -> .lstmf

I apply this command line = "$tesseract (lang).(filename).exp(num).tif (lang).(filename).exp(num) nobatch box.train" to tesseract (lang).(filename).exp(num).tif (lang).(filename).exp(num) nobatchlstm.train"

2. This usage is right? 

Is it possible to apply 'unicharset' to 'lstm-unicharset'



3. In the github wiki passage

Overview of Training Process

The overall training process is similar to training 3.04.

Conceptually the same:

  1. Prepare training text.
  2. Render text to image + box file. (Or create hand-made box files for existing image data.)
  3. Make unicharset file. (Can be partially specified, ie created manually).
  4. Make a starter traineddata from the unicharset and optional dictionary data.
  5. Run tesseract to process image + box file to make training data set.
  6. Run training on training data set.
  7. Combine data files.

The key differences are:

  • The boxes only need to be at the textline level. It is thus far easier to make training data from existing image data.
  • The .tr files are replaced by .lstmf data files.
  • Fonts can and should be mixed freely instead of being separate.
  • The clustering steps (mftraining, cntraining, shapeclustering) are replaced with a single slow lstmtraining step.

I think that In The key differecen section "unicharset" are replace by "lstm-unicharset"  - sentence is added 

Am I false???? 



I wait everybody's answers

Thank U. Have a nice day!

ShreeDevi Kumar

unread,
Mar 1, 2018, 11:18:16 PM3/1/18
to tesser...@googlegroups.com

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/5730b272-043b-4abe-8d85-b8f4d96aad33%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages