image_to_sting() alsways delivers empty string (Python)

3,707 views
Skip to first unread message

anita josic

unread,
May 5, 2017, 3:10:49 AM5/5/17
to tesseract-ocr

Hello

I am trying to extract text from a picture, but I always geht an empty text.
The used picture in the code for image_to_string('temp2.jpg') is added below.
I tried to treshold with opencv, but there was just a slice difference to the picture added below.

Is there a step missing? is the picture format jpg wrong? is it impossible because of white and balck fields appearing as text on the picture ..?

I am urgently searching for help and hoping for an answer in short time.

#!/usr/bin/env python
import os
import subprocess
from picamera.array import PiRGBArray
from time import *
from picamera import PiCamera
from datetime import datetime, timedelta
import cv2
try:
   
import Image
except ImportError:
   
from PIL import Image, ImageEnhance, ImageFilter
from pytesseract import *

#EXTRACT TEXT
print 'pytesser:'
#img = Image.open('/home/pi/camera/IMAGE-2017-05-04_141433.png')
img
= Image.open('artikelbild-02.jpg')
im
= img.convert('RGBA')
enhancer
= ImageEnhance.Contrast(im)
im
= enhancer.enhance(3)
im
= im.convert('1')
im
.save('temp2.jpg')

#use tesseract library to extract text from
text
= pytesseract.image_to_string(Image.open('temp2.jpg'))

print "Text:"+text

#what the text contains
if "DHL" in text:
   
print 'DHL Lieferant'
elif "Post" in text:
   
print 'Postbote'
elif "GLS" in text:

....






Zdenko Podobný

unread,
May 5, 2017, 3:23:58 AM5/5/17
to tesser...@googlegroups.com

Zdenko

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/e97baa76-1ee5-49af-b824-766ab2ec0b03%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

anita josic

unread,
May 5, 2017, 4:25:15 AM5/5/17
to tesseract-ocr

Using
tesseract --tessdata-dir /usr/share/tesseract-ocr temp2.jpg -l eng -psm 20 text

in the terminal, I get the output
‘33:;
in text.txt. Well, that is at least something, but far away from what I intended to get.

Looking forward to answers.

Am Freitag, 5. Mai 2017 09:10:49 UTC+2 schrieb anita josic:



Zdenko Podobný

unread,
May 5, 2017, 4:30:09 AM5/5/17
to tesser...@googlegroups.com

Zdenko

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
Message has been deleted

anita josic

unread,
May 5, 2017, 4:33:09 AM5/5/17
to tesseract-ocr
Hi Zdenko

I read it now, but still don't know what I need to use. I already read a lot but I still don't know what part is missing. I am hoping for real feedback and help. I am not really coming forward trying stuff on my own as you can see.

anita josic

unread,
May 5, 2017, 5:31:41 AM5/5/17
to tesseract-ocr

Hello again

i tried out to follow these instructions for the usage of bazaar
https://github.com/tesseract-ocr/tesseract/blob/master/doc/tesseract.1.asc#config-files-and-augmenting-with-user-data

having now
/usr/share/tesseract-ocr/tessdata/eng.user-words: (contains DHL and other words the image contains)
/usr/share/tesseract-ocr/tessdata/eng.user-patterns: (contains only \n\*)
/usr/share/tesseract-ocr/tessdata/configs/bazaar: (containing the same lines as described in the source)

I executed the command
tesseract --tessdata-dir /usr/share/tesseract-ocr temp2.jpg -l eng text bazaar

the command was executed
but the text.txt is still empty

I really don't know what else I can help me to get an accurate otput..


Am Freitag, 5. Mai 2017 10:30:09 UTC+2 schrieb zdenop:

Zdenko

On Fri, May 5, 2017 at 10:25 AM, anita josic <nini....@gmail.com> wrote:

Using
tesseract --tessdata-dir /usr/share/tesseract-ocr temp2.jpg -l eng -psm 20 text

in the terminal, I get the output
‘33:;
in text.txt. Well, that is at least something, but far away from what I intended to get.

Looking forward to answers.

Am Freitag, 5. Mai 2017 09:10:49 UTC+2 schrieb anita josic:



--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

Zdenko Podobný

unread,
May 5, 2017, 12:37:56 PM5/5/17
to tesser...@googlegroups.com
Really? And you thing your image fits to that examples?
E.g. texts are in the line, there is not noise - just the text, DPI is OK etc???

You will never get good output from bad input.

Zdenko

On Fri, May 5, 2017 at 10:31 AM, anita josic <nini....@gmail.com> wrote:
Hi


I read it now, but still don't know what I need to use. I already read a lot but I still don't know what part is missing. I am hoping for real feedback and help. I am not really coming forward trying stuff on my own as you can see.

Am Freitag, 5. Mai 2017 09:23:58 UTC+2 schrieb zdenop:

Zdenko

To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

anita josic

unread,
May 5, 2017, 1:58:16 PM5/5/17
to tesseract-ocr
Thank you for the so nice / positive-looking and detailed help.
I really feel like I can handle it by myself, really. Thank you so much.

May the force be with you

Zdenko

anita josic

unread,
May 5, 2017, 2:26:48 PM5/5/17
to tesseract-ocr
Hey guys

I am now so far that I have the picture in really rich gray tones, so that not everything is so "noisy" (image.convert ('L') instead of image.convert ('1').
But still no output.
I think I really need to cut the text and then remove the background.
Maybe an expert can show me the best way here.

I think with treshold I could remove the background after I first cut the text.
But I guess the crop always has to be done manually?

Cheers


Am Freitag, 5. Mai 2017 09:10:49 UTC+2 schrieb anita josic:
Reply all
Reply to author
Forward
0 new messages