Detecting simple phone number

148 views
Skip to first unread message

Flipz

unread,
Jul 4, 2009, 5:52:47 AM7/4/09
to tesseract-ocr
Hi,

I installed tesseract to do some simple OCR on a very basic image - a
phone number in Arial 11pt in a png format (http://www.autohop.bg/OCR/
phone.png) which I convert to a uncompressed tiff using:

convert -monochrome -normalize ./phone.png ./phone.tif

When I run it through tesseract I get an empty file from the following
command:

/usr/bin/tesseract ./phone.tif phone

I tried training the program but I cant get the whole logic around the
box files. I only need to detect the numbers in the png. Do I need to
additionally train tesseract for that or it is built in the current
eng support? I looked very simple from the tesseract documentation but
apparently I am missing out on something because I get an empty
phone.txt file.

Any help will be appreciated.

Thanks,

CJ

Yury Tarasievich

unread,
Jul 4, 2009, 6:10:23 AM7/4/09
to tesser...@googlegroups.com
For tesseract to produce something, you'll need
an image with a resolution on the order of 100-s
of dpi. I find that 300 dpi works
satisfactorily, but 200 dpi seems also okay. And
you have something like 72 dpi image there.

Flipz wrote:
> Hi,
>
> I installed tesseract to do some simple OCR on a very basic image - a
> phone number in Arial 11pt in a png format (http://www.autohop.bg/OCR/
> phone.png) which I convert to a uncompressed tiff using:

...

Flipz

unread,
Jul 4, 2009, 6:12:16 AM7/4/09
to tesseract-ocr
Do you know a way to upscale this png to 300dpi through a Linux
command (maybe in convert?). As far as I know web images are mostly
72dpi and this means I won't be able to use tesseract for any of them,
correct?

If this image is at 300dpi, do you think it is managable for tesseract
to extract the numbers?

On Jul 4, 1:10 pm, Yury Tarasievich <yury.tarasiev...@gmail.com>
wrote:

Yury Tarasievich

unread,
Jul 4, 2009, 6:47:14 AM7/4/09
to tesser...@googlegroups.com
Flipz wrote:
> Do you know a way to upscale this png to 300dpi through a Linux
> command (maybe in convert?). As far as I know web images are mostly
> 72dpi and this means I won't be able to use tesseract for any of them,
> correct?
>
> If this image is at 300dpi, do you think it is managable for tesseract
> to extract the numbers?

I don't really know, I haven't yet try to do the
full OCR cycle with an 4x scaled image. I speak
only from my limited experience with scans and
digital photos processing in tesseract. E.g., I
had some 150 dpi photos of book pages, and I
scaled them 2x in Gimp with standard settings,
and those tesseract processed fairly well.

WHEAT

unread,
Jul 4, 2009, 11:40:04 PM7/4/09
to tesseract-ocr
hi,
We're Sky Studio, profressional OCR develop team, we have check the
http://www.autohop.bg/OCR/phone.png , no problem at all that we can
make a OCR for this simple phone number pictures. with >=95% success
rate, within 1 days ! We will only charge 100usd for this OCR development, and with 1 year free tech support. you can choose DLL or
 commandline program. we will provide detail samples on how to use the dll or commandline program , in c#, vb, delphi .
 
Looking forward to your reply .
 
Best Regards
Richard
Sky Studio Inc.
7/5/2009

200万种商品,最低价格,疯狂诱惑你

Yury Tarasievich

unread,
Jul 5, 2009, 3:27:46 AM7/5/09
to tesser...@googlegroups.com
Flipz wrote:
> Do you know a way to upscale this png to 300dpi through a Linux
> command (maybe in convert?). As far as I know web images are mostly
> 72dpi and this means I won't be able to use tesseract for any of them,
> correct?
>
> If this image is at 300dpi, do you think it is managable for tesseract
> to extract the numbers?

Can't see why not. And upscaling may indeed be
done with convert or, possibly, Gimp. I can't
point out the best algorithm for the upscaling,
however (you know, biliniear, bicubic etc.)

--

Flipz

unread,
Jul 5, 2009, 3:37:24 AM7/5/09
to tesseract-ocr
I did it with convert, it has a very nice set of upscaling algorithms
and I managed to pull it off to 300dpi which gives 98%+ accuracy
rate..

Thanks for all your help, especially Yury's for pointing me to the
direction of upscaling the image, I was trying to create a custom
tesseract box for something already built in the eng distribution of
the software :)

zhi

unread,
Jul 8, 2009, 11:44:46 AM7/8/09
to tesseract-ocr
Hi Flipz,

I run into the same problem as yours. Can you pleas tell me how to
increase the dpi of an image in linux with convert and how to check
the dpi of an image with command?

Thanks

Flipz

unread,
Jul 8, 2009, 11:59:08 AM7/8/09
to tesseract-ocr
Here are the commands I use:

$phone_image = "test.png";
$phone_image_1 = "test_upscale.png";
$phone_image_2 = "test_upscale.tif";
$phone_extract = "test_phone";

exec("/usr/bin/convert -filter Lanczos -resample 300x300 -enhance ".
$phone_image." ".$phone_image_1);
exec("/usr/bin/convert -monochrome -normalize ".$phone_image_1." ".
$phone_image_2);
exec("/usr/bin/tesseract ".$phone_image_2." ".$phone_extract);
> > > --- Скриване на цитирания текст -
>
> - Показване на цитирания текст -

zhi

unread,
Jul 8, 2009, 12:09:16 PM7/8/09
to tesseract-ocr
Flipz, thanks so much. and Is there anyway to check the dpi of an
image with command?
Reply all
Reply to author
Forward
0 new messages