Re: Improve results from attached image

352 views
Skip to first unread message

Sven Pedersen

unread,
Sep 12, 2012, 12:47:10 PM9/12/12
to tesser...@googlegroups.com
The image is inverted. Use a free image program or library to make it
black on white. ImageMagick is a popular choice. Tesseract OCR does
not handle white on black well.
--Sven

On Wed, Sep 12, 2012 at 10:55 AM, Mike <ml...@nds.com> wrote:
> Hi me again,
>
> I attached another image which is quite similar, but in this case tesseract
> fails to detect the correct output in any way, so not even using the
> tesseract exe, with psm set to 8 or leaving it as auto
> Has anybody got an idea why? Does this mean one has to teach tesseract to
> read this character correctly?
>
> Thanks,
> Mike
>
>
> On Monday, September 3, 2012 11:06:41 AM UTC+2, Mike wrote:
>>
>> Hi,
>>
>> maybe someone can point me into the right direction.
>> I use Windows 7 32 bit.
>> When taking the attached image and loading it with tesseract.exe (3.01)
>> via following command: tesseract.exe OCR_MONO_DEBUG.jpg test -l eng -psm 8
>> The result is correct.
>> However I use the following functions (where image is the attached file
>> read internally by my program converted to 1 byte mono):
>>
>> pTessBase->SetPageSegMode(tesseract::PSM_SINGLE_WORD);
>> pTessBase->SetImage(pImage, width, height, 1, width);
>> char* ocr_result = pTessBase->GetUTF8Text();
>>
>> Then oddly enough I do not get any results, all I get is an empty string.
>> Setting whitelist to only numbers does not help either. When I have 2
>> numbers to recognize such as 81 then all works fine.
>>
>> Thanks in advance.
>> Mike
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to tesser...@googlegroups.com
> To unsubscribe from this group, send email to
> tesseract-oc...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en



--
``All that is gold does not glitter,
not all those who wander are lost;
the old that is strong does not wither,
deep roots are not reached by the frost.
From the ashes a fire shall be woken,
a light from the shadows shall spring;
renewed shall be blade that was broken,
the crownless again shall be king.”

Sven Pedersen

unread,
Jan 4, 2013, 12:07:22 PM1/4/13
to tesser...@googlegroups.com
Tesseract does not work well for fewer than 4 chars, I think, and your image is very pixelated.
Sven

On Friday, January 4, 2013, Mike wrote:
Hi, I am still facing an issue where the number 8 is not detected,

Here is a way to reproduce the problem using binaries downloaded from the tesseract site.
I downloaded the tesseract portable (http://code.google.com/p/tesseract-ocr/downloads/detail?name=tesseract-ocr-3.02-win32-portable.zip&can=2&q=) and ran following command line with the attached image to this post.
tesseract.exe -l eng -psm 8 OCR_MONO_DEBUG.jpg test
in test.txt i get following string  "/"
I would expect "8", I would really appreciate it a lot if anyone can verify this behaviour on their side.

Thanks in advance,
Mike

On Thursday, September 13, 2012 12:19:13 PM UTC+2, Mike wrote:
Hi,

Thanks for the info. I am using revision 700, now I tried what "sventech" explained and it improved my results. I will integrate the latest revision and see if it then even gets better.

On Wednesday, September 12, 2012 11:09:29 PM UTC+2, Stane wrote:
Does the example images work with your code?

If us the tesseract 3.02 api to detect your image(white 8 on black ground), it get recognized without problems
Iam using the default PageSegMode and OEM_TESSERACT_ONLY.
Hope that helps somehow.
Reply all
Reply to author
Forward
0 new messages