Text2image Issues for Tamil Unicode Font Sundaram - 0807

40 views
Skip to first unread message

Mugunthan

unread,
Aug 13, 2018, 2:33:42 PM8/13/18
to tesseract-ocr
I've been making tif/box files for Tamil character recognition using text2image on windows. I came across some issues for this Unicode Font Sundaram-0807 with text2image. 

Issue1: *Some characters in the tif file doesn't match with the text file. 

Issue2: Some characters in the tif file match with text file only at some instances.

I tried changing the size of generated tif file but it doesn't help. Please see the attached screenshots and the files for the font Sundaram-0807 - Size 12

Ps: I've used other tamil unicode fonts such as Latha, Akshar and TheeneeUni, they all worked perfectly.


Issue1.PNG
Issue2.PNG
tamil.txt
SUNDARAM-0807.ttf

shree

unread,
Aug 16, 2018, 10:37:43 AM8/16/18
to tesseract-ocr
>Ps: I've used other tamil unicode fonts such as Latha, Akshar and TheeneeUni, they all worked perfectly.

As you recognize in statement above, problem is with the font not tesseract.

Mugunthan

unread,
Aug 16, 2018, 11:11:20 AM8/16/18
to tesseract-ocr
Hi Shree,

Thanks for your reply. I've just tried this procedure using the same font in the Linux environment and it worked fine. I believe there should be some issue with Windows version text2image. I'm using ub mannheim version as suggested in github.

Reply all
Reply to author
Forward
0 new messages