Help in read Blue and White image.

37 views
Skip to first unread message

Lucas Alexandre

unread,
Aug 19, 2016, 3:33:28 PM8/19/16
to tesser...@googlegroups.com

Hello,

I am a new member of this mailing list. I am creating a small project to read electronic screens through OCR. In other words, we set up some equipment that capture
the VGA output of computers and other devices, and converts the signal to RCA composite, so I can take pictures and videos of the machine. My idea is to capture
BIOS images (Setup) and convert them to text that can be read by visually impaired users, like me. The fact is that Tesseract does not seem to understand my images,
but other commercial OCRs can read almost 99% of the text, with amazing accuracy. Before purchasing any license these OCRs, I wonder if there is anything I can
do
to make the tesseract is able to read my screens with some precision. I've tried to make the tesseract tessinput.tif return the file, and the result is a 1KB file
with bad picture quality, with completely blurred and distorted letters. I believe this happens because the tesseract tries to improve the image internally, but
ends up destroying it. If there was any option for tesseract not modify the image, certainly I could best results. Even in commercial OCRs, I can choose whether
you want the image to be converted to black and white.

    Does anyone have any idea how I can do this by tesseract? I thought even in trying to recompile the tesseract in a way that it does not alter the original image.
Or is there some binary for Windows or Linux you already have this capability?

    Very grateful.

    Sincerely,
    Lucas Alexandre

Allistair C

unread,
Aug 19, 2016, 3:46:02 PM8/19/16
to tesser...@googlegroups.com
Do you have a sample image?

Sent from my iPhone
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/57b75ef6.9a35ed0a.27a7a.1bb5%40mx.google.com.
For more options, visit https://groups.google.com/d/optout.

Allistair

unread,
Aug 19, 2016, 4:12:42 PM8/19/16
to tesser...@googlegroups.com
Is that an actual input file to tesseract? Is this what you claim 99% accuracy with commercial OCR engines? I am surprised if that's the case, the quality is very bad. 

On 19 August 2016 at 21:06, Lucas Alexandre <lucasale...@gmail.com> wrote:

    Hello,

    See the attached file.

    Regards,
    Lucas Alexandre

-----Mensagem original-----
De: Allistair C <alli...@gmail.com>
Para: tesser...@googlegroups.com
Data: Sexta, 19 de Agosto de 2016 20:45
Assunto: Re: [tesseract-ocr] Help in read Blue and White image.

Do you have a sample image?

Sent from my iPhone

> On 19 Aug 2016, at 20:33, Lucas Alexandre <lucasale...@gmail.com> wrote:
>
>
>    Hello,
>
> I am a new member of this mailing list. I am creating a small project to read electronic screens through OCR. In other words, we set up some equipment that capture
> the VGA output of computers and other devices, and converts the signal to RCA composite, so I can take pictures and videos of the machine. My idea is to capture
> BIOS images (Setup) and convert them to text that can be read by visually impaired users, like me. The fact is that Tesseract does not seem to understand my images,
> but other commercial OCRs can read almost 99% of the text, with amazing accuracy. Before purchasing any license these OCRs, I wonder if there is anything I can
> do
> to make the tesseract is able to read my screens with some precision. I've tried to make the tesseract tessinput.tif return the file, and the result is a 1KB file
> with bad picture quality, with completely blurred and distorted letters. I believe this happens because the tesseract tries to improve the image internally, but
> ends up destroying it. If there was any option for tesseract not modify the image, certainly I could best results. Even in commercial OCRs, I can choose whether
> you want the image to be converted to black and white.
>
>     Does anyone have any idea how I can do this by tesseract? I thought even in trying to recompile the tesseract in a way that it does not alter the original
image.
> Or is there some binary for Windows or Linux you already have this capability?
>
>     Very grateful.
>
>     Sincerely,
>     Lucas Alexandre
>
> --
> You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.

> To post to this group, send email to tesser...@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/57b75ef6.9a35ed0a.27a7a.1bb5%40mx.google.com.
> For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

Allistair

unread,
Aug 19, 2016, 4:23:39 PM8/19/16
to tesser...@googlegroups.com
Which free-to-try OCR service gets you 99% on this image, I have to see that. This image is super low quality and I would be surprised to see any OCR engine do well. I don't think the issue is anything to do with what Tesseract does to improve the image internally, it's more to do with the fuzziness and gapiness of the text. The image you sent is out of focus with fuzzy edges to text that make words clash. You'd need to get more high res photos to stand a chance in my opinion.

On 19 August 2016 at 21:16, Lucas Alexandre <lucasale...@gmail.com> wrote:

    Hello,

For commercial services, use this attached.


    Regards,
    Lucas Alexandre

-----Mensagem original-----
De: Allistair <alli...@gmail.com>
Para: "tesseract-ocr@googlegroups.com" <tesseract-ocr@googlegroups.com>
Data: Sexta, 19 de Agosto de 2016 21:12
Assunto: Re: [tesseract-ocr] Re: Help in read Blue and White image.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
Reply all
Reply to author
Forward
0 new messages