Facing some problem in understanding fine tuning

39 views
Skip to first unread message

Jennil Thiyam

unread,
May 22, 2019, 4:21:37 AM5/22/19
to tesseract-ocr
I am planning to perform fine tuning training in ben.traindata.
According to he procedure written it is said to we that "The training requires a new unicharset/recoder, optional language models, and the old traineddata file containing the old unicharset/recoder." Here I get the old traindata, but i dont know about this new unicharset/recorder. I want to add only one character in the already existing ben.traindata model. In the example of adding  +-a and performing fine tuning in the eng.traindata, they just add this character in eng_training.text and run the command of fine tuning. Can anyone please tell the procedure of adding one character and do the fine tuning, do i need to create new unicharset/recorder?

Shree Devi Kumar

unread,
May 22, 2019, 7:54:08 AM5/22/19
to tesser...@googlegroups.com
> I want to add only one character in the already existing ben.traindata model.

What character do you want to add?

You should be able to do the same process as the plus-minus training for one character as shown in example for English.

On Wed, May 22, 2019 at 1:51 PM Jennil Thiyam <thiyam...@gmail.com> wrote:
I am planning to perform fine tuning training in ben.traindata.
According to he procedure written it is said to we that "The training requires a new unicharset/recoder, optional language models, and the old traineddata file containing the old unicharset/recoder." Here I get the old traindata, but i dont know about this new unicharset/recorder. I want to add only one character in the already existing ben.traindata model. In the example of adding  +-a and performing fine tuning in the eng.traindata, they just add this character in eng_training.text and run the command of fine tuning. Can anyone please tell the procedure of adding one character and do the fine tuning, do i need to create new unicharset/recorder?

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/ef95fd9b-0639-4d30-8ca5-84e711948dd2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

Jennil Thiyam

unread,
May 22, 2019, 8:02:24 AM5/22/19
to tesser...@googlegroups.com
we used bengali script, but with one extra character, that is what i want to add, so will it work if i put that character in the ben_training.txt like they did in plus-minus training

Jennil Thiyam

unread,
May 22, 2019, 8:46:11 AM5/22/19
to tesser...@googlegroups.com
The layout of writing is in some manner in the ben_training.txt, (i have attached the sshot). could u please explain how do i put my character in this file
training.png

Shree Devi Kumar

unread,
May 22, 2019, 8:54:54 AM5/22/19
to tesser...@googlegroups.com
You have to add the character in the training text and then generate box tiff paid using the text and a bengali font which supports your additional character.

Jennil Thiyam

unread,
May 22, 2019, 9:39:51 AM5/22/19
to tesser...@googlegroups.com
I think this tiff file is generated after running on tesstrain.sh. am i right??
Reply all
Reply to author
Forward
0 new messages