Reading dot matrix characters

up...@6paq.com

unread,

Oct 23, 2014, 2:37:03 AM10/23/14

to tesser...@googlegroups.com

Hello.

I have images that contain characters that are made from individual dots, like from a dot matrix printer. I tried to use various operations on the images (binarization, edge detection, dilatation, ...) and was able to make the dots bigger so they are connected 90% of the time. However, detection is still very bad.

This image contains characters from A to L

my modified version is

after recognition, Tesseract (3.02, using the .NET wrapper) gives me for the standard english language the characters "FJBEDEFEHIJKL". Only the last 5 characters are right, the rest is wrong. Do you know of a way to make recognition better besides training a new font for this special case? Tesseract works quite good for other projects I have, I would love a solution that does not rely on a special font if possible.

ShreeDevi Kumar

unread,

Oct 23, 2014, 2:55:36 AM10/23/14

to tesser...@googlegroups.com

Try .net wrapper with newer version of tesseract.

invert the image, smoothen/blur, make greyscale ... I tried with vietocr

output is 'QBCDEFGHIJKL'

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/e6b8d4bb-ecc3-463c-9cc7-96f46a63be27%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

ShreeDevi Kumar

unread,

Oct 23, 2014, 2:59:40 AM10/23/14

to tesser...@googlegroups.com

http://sourceforge.net/projects/vietocr/files/vietocr/4.0%20Beta/

Version 4.0 Beta (29 July 2014) - Upgrade to Tesseract 3.03 RC (r1127) - Upgrade Tess4J library - Update JNA to v4.1.0 - Update Ghost4J to v0.5.1 - Add support for searchable PDF output in bulk/batch mode

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

up...@6paq.com

unread,

Nov 5, 2014, 4:24:32 AM11/5/14

to tesser...@googlegroups.com

I tried it with version 3.03 and found no improvements. As you suggested, I used invert, tried blurring but could not improve recognition. VietOCR is not an option as I have to integrate the recognition into an application and have to do this without a GUI.

Could you tell me the steps (and if available, parameters) you used to convert the image to get better results?

ShreeDevi Kumar

unread,

Nov 5, 2014, 6:27:57 AM11/5/14

to tesser...@googlegroups.com

I had asked to try vietocr because it is using a newer svn version for the java 4.0beta and I find it easy to test under windows with the gui, as I can change the image filter settings in it.

You will have to choose the tools based on your platform and other requirements. You could use imagemagick for preprocessing. You may still have problem because of the shape of 'A'.

I am attaching the results that I got using latest version of tesseract from git (I run it under msys2/mingw-w64 on windows8). I tried with the png and then with a modified tif - I used irfanview - negative (invert image) - blur - resize/resample to tif with lzw compression,

Both image files and results are attached.

BTW, I am using the english traineddata and other related files from https://code.google.com/p/tesseract-ocr/source/browse/?repo=tessdata

The file is 20.9 MB.

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/a7a262b3-f785-44e8-82c1-56fc3e60eeec%40googlegroups.com.

AL2.png-eng-psm3.txt

AL2.png

AL2.tif

AL2.tif-eng-psm3.txt

ShreeDevi Kumar

unread,

Nov 5, 2014, 9:16:28 AM11/5/14

to tesser...@googlegroups.com

I have added it as an issue at https://code.google.com/p/tesseract-ocr/issues/detail?id=1374

Please attach an image there with the whole alphabet - upper and lower case as well as numbers to identify whether there are any other issues.

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

Balaji Gurunathan

unread,

Jun 12, 2019, 1:32:11 AM6/12/19

to tesseract-ocr

Hi,

I've a similar requirement and I'm new to Tesseract. Could you please share steps required to implement this?

Thanks.

ameera3

unread,

Jul 9, 2019, 8:32:07 PM7/9/19

to tesseract-ocr

Hello,

I recently completed a tesseract-ocr project with dot matrix fonts. I discovered that I could get reasonable accuracy quite quickly with an ensemble of finetuned models. Since this technique may help you, here is a link to a GitHub repository with full code and experiments:

https://github.com/ameera3/OCR_Expiration_Date

Best wishes,

ameera3

Reply all

Reply to author

Forward