tesseractdotnetwrapper - Need to Increase quality of image

348 views
Skip to first unread message

Sarel van der Merwe

unread,
Jul 11, 2011, 5:21:16 AM7/11/11
to tesseract-ocr
What will be the best way to increase the quality of an image using
the technology inside Visual Studio C# 2008

Need to prepare the image before I past it to the tesseract engine

Please provide example code if possible.

Thanks

Sarel

Sarel van der Merwe

unread,
Jul 11, 2011, 6:43:51 AM7/11/11
to tesseract-ocr
Example image attached.
example.jpg

dean...@gmail.com

unread,
Jul 11, 2011, 4:23:10 PM7/11/11
to tesseract-ocr
I had awesome results by using the automated tools in Adobe
Photoshop. Simply use the Photocopy filter from the Filter Gallery.
That, combined with contrast levels helped a ton. I was able to
extract appx 80 percent of the numbers out of 105 files. Also, run
tesseract in digits mode only...

On Jul 11, 5:43 am, Sarel van der Merwe <sfvdme...@gmail.com> wrote:
> Example image attached.
>
>  example.jpg
> 5KViewDownload

Sarel van der Merwe

unread,
Jul 12, 2011, 2:05:31 AM7/12/11
to tesser...@googlegroups.com
Hi,

Thanks for your feedback.

How do you set tesseract to digits mode only, I'm using C#?
One of the problems I'm having is that the numbers can be skew... any
suggestions.

Will it be possible to use the filter copy gallery inside C#, was
thinking of using Aforge.net filters
http://www.aforgenet.com/framework/docs/html/cdf93487-0659-e371-fed9-3b216efb6954.htm

Any suggestions or example code will be appreciated ..

Thanks

> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to tesser...@googlegroups.com
> To unsubscribe from this group, send email to
> tesseract-oc...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

dean...@gmail.com

unread,
Jul 12, 2011, 11:12:24 AM7/12/11
to tesseract-ocr
It's a command line switch.
tesseract <image> <outputbasename> [-l lang] digits

On Jul 12, 1:05 am, Sarel van der Merwe <sfvdme...@gmail.com> wrote:
> Hi,
>
> Thanks for your feedback.
>
> How do you set tesseract to digits mode only, I'm using C#?
> One of the problems I'm having is that the numbers can be skew... any
> suggestions.
>
> Will it be possible to use the filter copy gallery inside C#, was
> thinking of using Aforge.net filtershttp://www.aforgenet.com/framework/docs/html/cdf93487-0659-e371-fed9-...
>
> Any suggestions or example code will be appreciated ..
>
> Thanks
>
> On Mon, Jul 11, 2011 at 10:23 PM, deangr...@gmail.com

Dmitri Silaev

unread,
Jul 12, 2011, 11:57:59 AM7/12/11
to tesser...@googlegroups.com
Don't confuse, it's not a switch, it's a command-line param indicating
the name of a config file, which one sometimes should copy from
somewhere or create by hand. See
http://code.google.com/p/tesseract-ocr/wiki/FAQ#How_do_I_recognize_only_digits?

Warm regards,
Dmitri Silaev
www.CustomOCR.com

Sarel van der Merwe

unread,
Jul 12, 2011, 5:55:13 PM7/12/11
to tesser...@googlegroups.com
Thank You..

Sarel van der Merwe

unread,
Jul 13, 2011, 2:19:25 AM7/13/11
to tesser...@googlegroups.com
How would you counter balance re-align text so that it would be
readable by tesseract

Would it be possible to draw a virtual line on the bottom of the “word
/ sentence” and use that to re-align the text ?
Will it be possible to change the angle for all the objects inside the
"eng.traineddata" say to +30 degrees
to compensate for the alignment?

Dmitri Silaev

unread,
Jul 13, 2011, 5:19:12 AM7/13/11
to tesser...@googlegroups.com
If I get you right, yes, this is possible. This can be done by using a
rotation function of an image manipulation library before sending your
image to Tesseract. E.g. the FreeImage library, the
FreeImage_RotateClassic function. Or, this can be done with
batch/shell script files by running ImageMagick to rotate images and
then sending them to Tesseract.

If that's not what you've meant then please show us your images.

Warm regards,
Dmitri Silaev
www.CustomOCR.com

On Wed, Jul 13, 2011 at 10:19 AM, Sarel van der Merwe

Sarel van der Merwe

unread,
Jul 13, 2011, 6:41:31 PM7/13/11
to tesser...@googlegroups.com
Hi Dmitri,

This is the worse case scenario.
Need to extract numbers printed on white paper that is attached to a
multi pattern background.
The background can be any color or pattern.
The range of numbers is from 1- 10000

I'm using visual Studio C# 2008 and the new tesseractdotnetwrapper

I was thinking of using Aforge.net Bicubic Interpolation to increase
the size and rotation.
Will it be possible to set the app to do the automatic adjustment for
the alignment.

What do you think would be the best approach to solve this problem?

Thanks

Sarel

numbers.jpg

Dmitri Silaev

unread,
Jul 15, 2011, 12:03:56 AM7/15/11
to tesser...@googlegroups.com
Hi Sarel,

Does the "numbers.jpg" file you've attached show a collection of
images, or every one of them can be recognized separately? Are these
runner paper tags?

On Thu, Jul 14, 2011 at 2:41 AM, Sarel van der Merwe

Reply all
Reply to author
Forward
0 new messages