Training tesseract 4.00 for a new tricky font (PalaceScript) fails

47 views

Skip to first unread message

Yuliana Zigangirova

unread,

Oct 31, 2019, 8:27:44 AM10/31/19

to tesseract-ocr

Hi everyone,

I am trying to train Tesseract for some funny looking fonts, like Palace for example.

I have tried a simple way - produced traindata with http://trainyourtesseract.com/

and then have made a call like

api->Init(".\\tessdata", "eng+Palace",OEM_TESSERACT_ONLY).

api->SetPageSegMode(PSM_SINGLE_LINE);
api->SetImage(image);
// Get OCR result
outText = api->GetUTF8Text();

The result for a line like

M P S T a o e h i l n p r s t u w y

is below, no glyph is correctly recognized:

.MDXXXo,XkX.n.mX.XnoX

Does trainyourtesseract make bad traineddata or do I make wrong calls,

and how does one handle such cases?

Actualle, I have tried the same with less funny fonts,

but also the recognition almost does not improve.

I am attaching the tiff file and my trained data for Palace.

Thank you everyone in advance for help,

Yuliana

PalaceScript.tiff

Palace.traineddata

Purushotham Rao Eravalli

unread,

Oct 31, 2019, 12:37:33 PM10/31/19

to tesser...@googlegroups.com

Hi,

Can we retrain tesseract by removing all the unwanted symbols and characters for English language.

If so can someone share how to do so please.

Thanks,

Purushotham

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/1b574457-5418-46f7-93fb-f2849b232f10%40googlegroups.com.

Reply all

Reply to author

Forward

0 new messages