Adding new font to existing traineddata

170 views
Skip to first unread message

Nalin Linux

unread,
Oct 24, 2016, 10:31:13 PM10/24/16
to tesseract-ocr
I have tested the traineddata concatination using "+" operator. But my question is whether there exist a way to train one more font to existing trained data ? or does it need entire box-tif pairs at each time ? if required, from where can I get this box-tif training data set of current malayalam and english traineddata ?

ShreeDevi Kumar

unread,
Oct 25, 2016, 1:43:44 AM10/25/16
to tesser...@googlegroups.com

There is no way to add one font to traineddata, it has to be done on all fonts at one time

The box tiff pairs are not provided by Google developers who provided the traineddata (some fonts are proprietary).

You can attempt to recreate the training using the source files from langdata repository. The lists of fonts used for each language is in langugage-specific.sh linked from tesstrain.sh in tesseract/training


On 25 Oct 2016 4:31 a.m., "Nalin Linux" <nalin....@gmail.com> wrote:
I have tested the traineddata concatination using "+" operator. But my question is whether there exist a way to train one more font to existing trained data ? or does it need entire box-tif pairs at each time ? if required, from where can I get this box-tif training data set of current malayalam and english traineddata ?

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/b853579f-ed5f-4acb-9835-4102517a8a85%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages