Having issue with Italic characters

416 views
Skip to first unread message

Muhammad Shamim

unread,
Mar 24, 2017, 3:55:27 AM3/24/17
to tesseract-ocr
Hi,

I am using  tesseract-ocr-setup-3.05.00dev.exe to do OCR and its working fine for me with default training data files .
Only facing issue with Italic character .
e.g
     Italic "l"   => "/"
     Italic "i"   => "/"
Anybody has idea to deal with this issue ?
Any extra step need to do ?

Thankyou
t2.txt
tzone_2.png

ShreeDevi Kumar

unread,
Mar 24, 2017, 4:09:05 AM3/24/17
to tesser...@googlegroups.com
Use Tesseract 4.0.0alpha and --oem 1 for LSTM. It works ok with that. 
--oem 0 with legacy engine gives / instead of i.

you could test to see if a  better dpi image(300 dpi)  works with the legacy engine.

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/0a801b3c-9dfd-48b0-ab81-af2d71e2ed91%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Muhammad Shamim

unread,
Mar 24, 2017, 2:18:09 PM3/24/17
to tesseract-ocr
Thanks Shree,

That's a nice suggestion to get it, sure I will try it and let you know .
One more thing do you have any idea about the JAVA wrapper library on top of Tesseract 4.0.0 ?

thanks again

On Friday, 24 March 2017 13:39:05 UTC+5:30, shree wrote:
Use Tesseract 4.0.0alpha and --oem 1 for LSTM. It works ok with that. 
--oem 0 with legacy engine gives / instead of i.

you could test to see if a  better dpi image(300 dpi)  works with the legacy engine.

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Fri, Mar 24, 2017 at 8:01 AM, Muhammad Shamim <mdsha...@gmail.com> wrote:
Hi,

I am using  tesseract-ocr-setup-3.05.00dev.exe to do OCR and its working fine for me with default training data files .
Only facing issue with Italic character .
e.g
     Italic "l"   => "/"
     Italic "i"   => "/"
Anybody has idea to deal with this issue ?
Any extra step need to do ?

Thankyou

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

Muhammad Shamim

unread,
Mar 30, 2017, 11:49:59 AM3/30/17
to tesseract-ocr

I have tried with tesseract-ocr-setup-4.00.00dev, downloaded from
But unfortunetly, I got the following error, I seems image has some problem, but the same image processed successfully with older version tesseract-ocr-setup-3.05.00dev.exe

G:\work_env\tzone_5>tesseract.exe tzone_10.png tt44
Tesseract Open Source OCR Engine v4.00.00alpha with Leptonica
Warning. Invalid resolution 0 dpi. Using 70 instead.

Any help ?

Thanks

Muhammad Shamim

unread,
Mar 30, 2017, 12:07:34 PM3/30/17
to tesseract-ocr

Hi,

Sorry for the previous message , Actually It generating the text file with correct conversion ,but following message is showing in the command window.


G:\work_env\tzone_5>tesseract.exe tzone_2.png tz2444

Tesseract Open Source OCR Engine v4.00.00alpha with Leptonica
Warning. Invalid resolution 0 dpi. Using 70 instead.

tz2444 contains :

Clue:
Like Jillette

Thanks to shree for the advice .


Regards
Shamim
Reply all
Reply to author
Forward
0 new messages