Bad results on custom traditional chinese

45 views

Skip to first unread message

laurent....@gmail.com

unread,

May 17, 2016, 11:21:12 AM5/17/16

to tesseract-ocr

Hi,

Here's the point: I have to train tesseract on a new font in traditional chinese. For now, all the results were not good enough.
I've just tried to train it with only a small set of characters and 1 input image.
Then I took a sample of that image to test it.

The image is:

And the detected text is: 客戶服務置龍擇語言設交置社交
I'm using tesseract 3.02 on Windows.

The questions are:
- What kind of machine learning concept tesseract use ?
- How can I have better results with tesseract ?
- Do I have to train it with a lot of different images ?
- Do I have some parameters to play with on the training part ?

Thanks.

Auto Generated Inline Image 1

laurent....@gmail.com

unread,

May 18, 2016, 9:09:32 AM5/18/16

to tesseract-ocr

I have more questions:
- How does tesseract use the unicharambigs files ?
- I do have different results whether I'm trying to recognize the text with PSM_SINGLE_WORD, PSM_SINGLE_BLOCK or PSM_SINGLE_LINE. And not the same one that give me the best results for every images. Why ?
- How can I make tesseract read the third character of the image as 1 character and not as 2 (月艮) ?

Reply all

Reply to author

Forward

0 new messages