I installed tesseract to do some simple OCR on a very basic image - a
phone number in Arial 11pt in a png format (http://www.autohop.bg/OCR/ phone.png) which I convert to a uncompressed tiff using:
When I run it through tesseract I get an empty file from the following
command:
/usr/bin/tesseract ./phone.tif phone
I tried training the program but I cant get the whole logic around the
box files. I only need to detect the numbers in the png. Do I need to
additionally train tesseract for that or it is built in the current
eng support? I looked very simple from the tesseract documentation but
apparently I am missing out on something because I get an empty
phone.txt file.
For tesseract to produce something, you'll need an image with a resolution on the order of 100-s of dpi. I find that 300 dpi works satisfactorily, but 200 dpi seems also okay. And you have something like 72 dpi image there.
Flipz wrote: > Hi,
> I installed tesseract to do some simple OCR on a very basic image - a > phone number in Arial 11pt in a png format (http://www.autohop.bg/OCR/ > phone.png) which I convert to a uncompressed tiff using:
Do you know a way to upscale this png to 300dpi through a Linux
command (maybe in convert?). As far as I know web images are mostly
72dpi and this means I won't be able to use tesseract for any of them,
correct?
If this image is at 300dpi, do you think it is managable for tesseract
to extract the numbers?
On Jul 4, 1:10 pm, Yury Tarasievich <yury.tarasiev...@gmail.com>
wrote:
> For tesseract to produce something, you'll need
> an image with a resolution on the order of 100-s
> of dpi. I find that 300 dpi works
> satisfactorily, but 200 dpi seems also okay. And
> you have something like 72 dpi image there.
> Flipz wrote:
> > Hi,
> > I installed tesseract to do some simple OCR on a very basic image - a
> > phone number in Arial 11pt in a png format (http://www.autohop.bg/OCR/ > > phone.png) which I convert to a uncompressed tiff using:
Flipz wrote: > Do you know a way to upscale this png to 300dpi through a Linux > command (maybe in convert?). As far as I know web images are mostly > 72dpi and this means I won't be able to use tesseract for any of them, > correct?
> If this image is at 300dpi, do you think it is managable for tesseract > to extract the numbers?
I don't really know, I haven't yet try to do the full OCR cycle with an 4x scaled image. I speak only from my limited experience with scans and digital photos processing in tesseract. E.g., I had some 150 dpi photos of book pages, and I scaled them 2x in Gimp with standard settings, and those tesseract processed fairly well.
> On Jul 4, 1:10 pm, Yury Tarasievich <yury.tarasiev...@gmail.com> > wrote: >> For tesseract to produce something, you'll need >> an image with a resolution on the order of 100-s >> of dpi. I find that 300 dpi works >> satisfactorily, but 200 dpi seems also okay. And >> you have something like 72 dpi image there.
hi, We're Sky Studio, profressional OCR develop team, we have check the http://www.autohop.bg/OCR/phone.png , no problem at all that we can make a OCR for this simple phone number pictures. with >=95% success rate, within 1 days ! We will only charge 100usd for this OCR development, and with 1 year free tech support. you can choose DLL or
commandline program. we will provide detail samples on how to use the dll or commandline program , in c#, vb, delphi .
>I installed tesseract to do some simple OCR on a very basic image - a
>phone number in Arial 11pt in a png format (http://www.autohop.bg/OCR/ >phone.png) which I convert to a uncompressed tiff using:
>When I run it through tesseract I get an empty file from the following
>command:
>/usr/bin/tesseract ./phone.tif phone
>I tried training the program but I cant get the whole logic around the
>box files. I only need to detect the numbers in the png. Do I need to
>additionally train tesseract for that or it is built in the current
>eng support? I looked very simple from the tesseract documentation but
>apparently I am missing out on something because I get an empty
>phone.txt file.
Flipz wrote: > Do you know a way to upscale this png to 300dpi through a Linux > command (maybe in convert?). As far as I know web images are mostly > 72dpi and this means I won't be able to use tesseract for any of them, > correct?
> If this image is at 300dpi, do you think it is managable for tesseract > to extract the numbers?
Can't see why not. And upscaling may indeed be done with convert or, possibly, Gimp. I can't point out the best algorithm for the upscaling, however (you know, biliniear, bicubic etc.)
I did it with convert, it has a very nice set of upscaling algorithms
and I managed to pull it off to 300dpi which gives 98%+ accuracy
rate..
Thanks for all your help, especially Yury's for pointing me to the
direction of upscaling the image, I was trying to create a custom
tesseract box for something already built in the eng distribution of
the software :)
On 5 Юли, 10:27, Yury Tarasievich <yury.tarasiev...@gmail.com> wrote:
> Flipz wrote:
> > Do you know a way to upscale this png to 300dpi through a Linux
> > command (maybe in convert?). As far as I know web images are mostly
> > 72dpi and this means I won't be able to use tesseract for any of them,
> > correct?
> > If this image is at 300dpi, do you think it is managable for tesseract
> > to extract the numbers?
> Can't see why not. And upscaling may indeed be
> done with convert or, possibly, Gimp. I can't
> point out the best algorithm for the upscaling,
> however (you know, biliniear, bicubic etc.)
I run into the same problem as yours. Can you pleas tell me how to
increase the dpi of an image in linux with convert and how to check
the dpi of an image with command?
Thanks
On Jul 5, 3:37 am, Flipz <svet...@icepique.com> wrote:
> I did it with convert, it has a very nice set of upscaling algorithms
> and I managed to pull it off to 300dpi which gives 98%+ accuracy
> rate..
> Thanks for all your help, especially Yury's for pointing me to the
> direction of upscaling the image, I was trying to create a custom
> tesseract box for something already built in the eng distribution of
> the software :)
> On 5 Юли, 10:27, Yury Tarasievich <yury.tarasiev...@gmail.com> wrote:
> > Flipz wrote:
> > > Do you know a way to upscale this png to 300dpi through a Linux
> > > command (maybe in convert?). As far as I know web images are mostly
> > > 72dpi and this means I won't be able to use tesseract for any of them,
> > > correct?
> > > If this image is at 300dpi, do you think it is managable for tesseract
> > > to extract the numbers?
> > Can't see why not. And upscaling may indeed be
> > done with convert or, possibly, Gimp. I can't
> > point out the best algorithm for the upscaling,
> > however (you know, biliniear, bicubic etc.)
> I run into the same problem as yours. Can you pleas tell me how to
> increase the dpi of an image in linux with convert and how to check
> the dpi of an image with command?
> Thanks
> On Jul 5, 3:37 am, Flipz <svet...@icepique.com> wrote:
> > I did it with convert, it has a very nice set of upscaling algorithms
> > and I managed to pull it off to 300dpi which gives 98%+ accuracy
> > rate..
> > Thanks for all your help, especially Yury's for pointing me to the
> > direction of upscaling the image, I was trying to create a custom
> > tesseract box for something already built in the eng distribution of
> > the software :)
> > On 5 Юли, 10:27, Yury Tarasievich <yury.tarasiev...@gmail.com> wrote:
> > > Flipz wrote:
> > > > Do you know a way to upscale this png to 300dpi through a Linux
> > > > command (maybe in convert?). As far as I know web images are mostly
> > > > 72dpi and this means I won't be able to use tesseract for any of them,
> > > > correct?
> > > > If this image is at 300dpi, do you think it is managable for tesseract
> > > > to extract the numbers?
> > > Can't see why not. And upscaling may indeed be
> > > done with convert or, possibly, Gimp. I can't
> > > point out the best algorithm for the upscaling,
> > > however (you know, biliniear, bicubic etc.)
> On 8 Юли, 18:44, zhi <simonz...@gmail.com> wrote:
> > Hi Flipz,
> > I run into the same problem as yours. Can you pleas tell me how to
> > increase the dpi of an image in linux with convert and how to check
> > the dpi of an image with command?
> > Thanks
> > On Jul 5, 3:37 am, Flipz <svet...@icepique.com> wrote:
> > > I did it with convert, it has a very nice set of upscaling algorithms
> > > and I managed to pull it off to 300dpi which gives 98%+ accuracy
> > > rate..
> > > Thanks for all your help, especially Yury's for pointing me to the
> > > direction of upscaling the image, I was trying to create a custom
> > > tesseract box for something already built in the eng distribution of
> > > the software :)
> > > > Flipz wrote:
> > > > > Do you know a way to upscale this png to 300dpi through a Linux
> > > > > command (maybe in convert?). As far as I know web images are mostly
> > > > > 72dpi and this means I won't be able to use tesseract for any of them,
> > > > > correct?
> > > > > If this image is at 300dpi, do you think it is managable for tesseract
> > > > > to extract the numbers?
> > > > Can't see why not. And upscaling may indeed be
> > > > done with convert or, possibly, Gimp. I can't
> > > > point out the best algorithm for the upscaling,
> > > > however (you know, biliniear, bicubic etc.)