Help with unicharambigs

92 views
Skip to first unread message

rkvsraman

unread,
Sep 27, 2016, 9:34:15 PM9/27/16
to tesseract-ocr

Hello,


I am testing the tamil data set and i am getting following issue

பெயர் is getting recognised as டுபயர்

and யான  as யஈன

I created the following unicharambigs file but it isnt helping.

v2
டுப    பெ    1
ஈ    ா    1

I tried both 1 and 0 as type indicator but no use

I have attached source text, image and OCred text

I tested it with original tam.traineddata too with same result.



Any suggestions?


tam_ocred.txt
tam_source.text
tam.png

ShreeDevi Kumar

unread,
Sep 27, 2016, 11:51:06 PM9/27/16
to tesser...@googlegroups.com
It has been a while since I looked at Tamil.


You may want to try v1 style of unicharambigs file to see if that works. 
Not sure if v2 is supported ...

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/0f16d94f-8cc7-4c50-9988-3b1eaacb333e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Quan Nguyen

unread,
Sep 27, 2016, 11:52:25 PM9/27/16
to tesseract-ocr

ShreeDevi Kumar

unread,
Sep 27, 2016, 11:59:29 PM9/27/16
to tesser...@googlegroups.com
Quan,

The page also says ...

Version 3.03 and on supports a new, simpler format for the unicharambigs file:

v2
'' " 1
m rn 0
iii m 0
But it may not be supported as per post by zdenko on the issue.

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

rkvs...@gmail.com

unread,
Sep 28, 2016, 1:19:12 AM9/28/16
to tesseract-ocr

I tried v1 too. Wasn't much useful 😞

 

Sent from my Windows 10 phone

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.

To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.


To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

nkantan r

unread,
Aug 26, 2017, 9:41:40 AM8/26/17
to tesseract-ocr
i am also interested in using tesseract ocr for e-pub conversion of some of scanned books that i have;  i used python for auto correction and also a dictionary is required for updating auto correction of mistakes like யஈன

regards
rnk
Reply all
Reply to author
Forward
0 new messages