Training tesseract 4.00 for a new tricky font (PalaceScript) fails

47 views
Skip to first unread message

Yuliana Zigangirova

unread,
Oct 31, 2019, 8:27:44 AM10/31/19
to tesseract-ocr
Hi everyone,

I am trying to train Tesseract for some funny looking fonts, like Palace for example.
I have tried a simple way  -  produced traindata with http://trainyourtesseract.com/
and then have made a call like

api->Init(".\\tessdata", "eng+Palace",OEM_TESSERACT_ONLY).
api->SetPageSegMode(PSM_SINGLE_LINE);
api->SetImage(image);
 // Get OCR result
 outText = api->GetUTF8Text();

The result for a line like

M P S T a o e h i l n p r s t u w y

is below, no glyph is correctly recognized:

.MDXXXo,XkX.n.mX.XnoX

Does trainyourtesseract make bad traineddata or do I make wrong calls,
and how does one handle such cases?

Actualle, I have tried the same with less funny fonts,
but also the recognition almost does not improve.

I am attaching the tiff file  and my trained data for Palace.

Thank you everyone in advance for help,
Yuliana
PalaceScript.tiff
Palace.traineddata

Purushotham Rao Eravalli

unread,
Oct 31, 2019, 12:37:33 PM10/31/19
to tesser...@googlegroups.com
Hi, 
Can we retrain tesseract by removing all the unwanted symbols and characters for English language. 
If so can someone share how to do so please.


Thanks,
Purushotham


--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/1b574457-5418-46f7-93fb-f2849b232f10%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages