California License Plate font issues with OCR

426 views
Skip to first unread message

ZIA

unread,
Jul 28, 2010, 5:56:24 PM7/28/10
to tesseract-ocr
I am writing a license plate recognition application in C#. I am
almost done, i have started work on my own OCR,but then I decided to
use tessearact-ocr, which now partially works. I provide the
california license plate to ocr, but some of the font, it doesn't
recognizes, for example, like "Z" becomes number 2, letter "O" becomes
"U", and number 4 becomes something else. Any suggestion? any language
file or font file that will solve this issue. Beside that in complex
images, i am having hard time to locate License plate. but my concern
is now on ocr, since i thought i would save time by using tesseract
then writing my own neural network. I would really appreciate any
ideas or suggestions.

Andres

unread,
Jul 28, 2010, 10:23:53 PM7/28/10
to zrah...@gmail.com, tesser...@googlegroups.com
Hello,

I'm working on the same as you, for the licence plates from Argentina, as I live in Argentina.

Same as you described, the problem was to locate the licence plate.

Now I'm working with the OCR and then I will work on horizontalizing the images, because if they are not completely horizontal, the OCR fails, for example today I was getting a 5 instead a of a 6. When I horizontalized the image with photoshop, everything turned to ok.

I dont know how is the layout of the positions of letters and numbers in California plates, are they assorted ? ...if you know if the character should be a number or a letter according to its position, you have two options (as far as I know):

- when recognizing char by char, tell Tesseract that you expect a number or a letter. I saw that in somewere inside the source code, don't remember where.
- make your own conversion, e.g., if you are expecting a number and you get a G, map it to a 6, if you expect a 2 map it to a Z.

I think that I'll use the last one, I'm not on that part yet. I'm getting good results on images where the characters are big because of the distance of the camera, but in small letters (13 pixels height) things are not good.

So I have a pair of ideas to test, perhaps somebody from the group could give me opinions regarding them:
- following the contour, with polygon approximation of the chars, making an image with that contours and running Tesseract on that image (trained for that)
- make an image with my font (one of each from the alphabet), and repeating the alphabet with different levels of threshold. I think that internally Tesseract thresholds the images. Hard to explain this, but I think that it may improve the quality.

If you want to continue speaking about specifics of licence plate recognition, we can continue privately because it's off topic. I'm interested in continuing. There are many things to speak about, for example, the prices of the cameras, light filters, times of execution, etc.

You can write me to andrej100 at gmail

Regards,

Andres



2010/7/28 ZIA <zrah...@gmail.com>

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To post to this group, send email to tesser...@googlegroups.com.
To unsubscribe from this group, send email to tesseract-oc...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.


Andres

unread,
Jul 29, 2010, 12:24:42 AM7/29/10
to zrah...@gmail.com, tesser...@googlegroups.com
Sorry, when I wrote this:


- make an image with my font (one of each from the alphabet), and repeating the alphabet with different levels of threshold. I think that internally Tesseract thresholds the images. Hard to explain this, but I think that it may improve the quality.

I missed to clarify that my intention is to train Tesseract with that image.




2010/7/28 Andres <andr...@gmail.com>

Jimmy O'Regan

unread,
Jul 29, 2010, 8:38:24 AM7/29/10
to tesser...@googlegroups.com
On 29 July 2010 03:23, Andres <andr...@gmail.com> wrote:
> Hello,
>
> I'm working on the same as you, for the licence plates from Argentina, as I
> live in Argentina.
>
> Same as you described, the problem was to locate the licence plate.
>
> Now I'm working with the OCR and then I will work on horizontalizing the
> images, because if they are not completely horizontal, the OCR fails, for
> example today I was getting a 5 instead a of a 6. When I horizontalized the
> image with photoshop, everything turned to ok.
>
> I dont know how is the layout of the positions of letters and numbers in
> California plates, are they assorted ? ...if you know if the character
> should be a number or a letter according to its position, you have two
> options (as far as I know):
>
> - when recognizing char by char, tell Tesseract that you expect a number or
> a letter. I saw that in somewere inside the source code, don't remember
> where.

You were probably looking at the code that guesses among 1, l and i

Most of the code in the dict/ directory does some variation on this,
by 'permuting' the character possibilities.

> - make your own conversion, e.g., if you are expecting a number and you get
> a G, map it to a 6, if you expect a 2 map it to a Z.
>

Patrick may have more details on this approach.

According to Wikipedia
(http://en.wikipedia.org/wiki/Vehicle_registration_plates_of_Argentina),
the normal Argentinian license plates follow the template AAA 000, so
you could just generate the possible combinations, and use them in a
dawg.

perl -e 'for $a (65..90){for $b (65..90) {for $c (65..90) {printf
"%c%c%c\n", $a, $b, $c;}}}'
perl -e 'for $a (0..9){for $b (0..9) {for $c (0..9) {printf
"%d%d%d\n", $a, $b, $c;}}}'

Will get you the two lists you want.

(For the original question, according to
http://en.wikipedia.org/wiki/Vehicle_registration_plates_of_California
this is the California scheme:
perl -e 'for $a (0..9){for $b (65..90){for $c (65..90) {for $d
(65..90) {for $e (0..9){for $f (0..9) {for $g (0..9) {printf
"%d%c%c%c%d%d%d\n", $a, $b, $c, $d, $e, $f, $g;}}}}}}}'

> I think that I'll use the last one, I'm not on that part yet. I'm getting
> good results on images where the characters are big because of the distance
> of the camera, but in small letters (13 pixels height) things are not good.
>
> So I have a pair of ideas to test, perhaps somebody from the group could
> give me opinions regarding them:
> - following the contour, with polygon approximation of the chars, making an
> image with that contours and running Tesseract on that image (trained for
> that)

Seems reasonable. Something like autotrace or potrace might be useful.

> - make an image with my font (one of each from the alphabet), and repeating
> the alphabet with different levels of threshold. I think that internally
> Tesseract thresholds the images. Hard to explain this, but I think that it
> may improve the quality.

Yes, Tesseract internally thresholds the image. I think Google did
something like this in the Tesseract 3 language packs, so it might be
worth doing.

>
> If you want to continue speaking about specifics of licence plate
> recognition, we can continue privately because it's off topic. I'm

Well, you've earned my applause for recognising that, but if your
conversation turns up information that will save someone some time
later on, I'm all for it.

--
<Leftmost> jimregan, that's because deep inside you, you are evil.
<Leftmost> Also not-so-deep inside you.

ZIA

unread,
Jul 29, 2010, 6:26:36 PM7/29/10
to tesseract-ocr
Hello,

Permuting may work, but haven't tried it. I am also looking for font
sample of CA license plate, which will help me in a way that i can
train my own
OCR. I really don't know where can I get the sample file A to Z and 0
to 9 of ca license plate font.

for LP extraction, i am trying to implement some kind of rectangle
window (concept from SCW- in one paper). What i did, i applied the
edge filter, which shows me the license plate clearly, i just need to
extract them. one of simple approach of histogram works, if there is
not a lot of noise, even reflection in images cause problem.

On Jul 29, 5:38 am, "Jimmy O'Regan" <jore...@gmail.com> wrote:
> (For the original question, according tohttp://en.wikipedia.org/wiki/Vehicle_registration_plates_of_California
> > 2010/7/28 ZIA <zrahma...@gmail.com>

Giuseppe Menga

unread,
Jul 30, 2010, 12:03:23 PM7/30/10
to tesser...@googlegroups.com
I can give a contribution on finding the character style;
look at http://new.myfonts.com/WhatTheFont/
Giuseppe

Andres

unread,
Jul 30, 2010, 2:26:58 PM7/30/10
to tesser...@googlegroups.com, jor...@gmail.com
Hello Jimmy,

Thank you for your message.

I'm writing between your lines:

2010/7/29 Jimmy O'Regan <jor...@gmail.com>

On 29 July 2010 03:23, Andres <andr...@gmail.com> wrote:
> Hello,
>
> I'm working on the same as you, for the licence plates from Argentina, as I
> live in Argentina.
>
> Same as you described, the problem was to locate the licence plate.
>
> Now I'm working with the OCR and then I will work on horizontalizing the
> images, because if they are not completely horizontal, the OCR fails, for
> example today I was getting a 5 instead a of a 6. When I horizontalized the
> image with photoshop, everything turned to ok.
>
> I dont know how is the layout of the positions of letters and numbers in
> California plates, are they assorted ? ...if you know if the character
> should be a number or a letter according to its position, you have two
> options (as far as I know):
>
> - when recognizing char by char, tell Tesseract that you expect a number or
> a letter. I saw that in somewere inside the source code, don't remember
> where.

You were probably looking at the code that guesses among 1, l and i

I think that I saw somewhere that it was possible to configure that you expect numbers or letters, but I'm not sure anymore.
 

Most of the code in the dict/ directory does some variation on this,
by 'permuting' the character possibilities.

> - make your own conversion, e.g., if you are expecting a number and you get
> a G, map it to a 6, if you expect a 2 map it to a Z.
>

Patrick may have more details on this approach.

According to Wikipedia
(http://en.wikipedia.org/wiki/Vehicle_registration_plates_of_Argentina),
the normal Argentinian license plates follow the template AAA 000, so
you could just generate the possible combinations, and use them in a
dawg.

 perl -e 'for $a (65..90){for $b (65..90) {for $c (65..90) {printf
"%c%c%c\n", $a, $b, $c;}}}'
 perl -e 'for $a (0..9){for $b (0..9) {for $c (0..9) {printf
"%d%d%d\n", $a, $b, $c;}}}'

Will get you the two lists you want.

Thank you very much for this idea.
The resulting set of words (in the case of the six characters) would have a size of 17,576,000 lines.
How is the access that makes tesseract to this ? Isn't it too big for that ?
 
(For the original question, according to
http://en.wikipedia.org/wiki/Vehicle_registration_plates_of_California
this is the California scheme:
perl -e 'for $a (0..9){for $b (65..90){for $c (65..90) {for $d
(65..90) {for $e (0..9){for $f (0..9) {for $g (0..9) {printf
"%d%c%c%c%d%d%d\n", $a, $b, $c, $d, $e, $f, $g;}}}}}}}'

> I think that I'll use the last one, I'm not on that part yet. I'm getting
> good results on images where the characters are big because of the distance
> of the camera, but in small letters (13 pixels height) things are not good.
>
> So I have a pair of ideas to test, perhaps somebody from the group could
> give me opinions regarding them:
> - following the contour, with polygon approximation of the chars, making an
> image with that contours and running Tesseract on that image (trained for
> that)

Seems reasonable. Something like autotrace or potrace might be useful.

Glad to read that. Since I use OpenCV I usually use cvFindContours() function and then cvApproxPoly()
 
> - make an image with my font (one of each from the alphabet), and repeating
> the alphabet with different levels of threshold. I think that internally
> Tesseract thresholds the images. Hard to explain this, but I think that it
> may improve the quality.

Yes, Tesseract internally thresholds the image. I think Google did
something like this in the Tesseract 3 language packs, so it might be
worth doing.

Do you know if it uses automatic threshold levels or if there is some place to configure it ?
 
>
> If you want to continue speaking about specifics of licence plate
> recognition, we can continue privately because it's off topic. I'm

Well, you've earned my applause for recognising that, but if your
conversation turns up information that will save someone some time
later on, I'm all for it.

great, I will be glad to share if something good appears.
 

Andres

unread,
Jul 30, 2010, 3:17:28 PM7/30/10
to tesser...@googlegroups.com
Hello,

What's the height of the characters that you are having problems with ?
But if you have not identified the font, I assume that you never trained tesseract for it, so your problem is there. I think that you won't have good results without training.
As Giuseppe suggested, whatthefont is the right place to go, and almost the only one. There is another one, but it's like a guided tree, you have to answer questions about your font shape and you never upload it. Something similar to the guides used by botanists to identify plants based on their leafs and stuff. Don't remember the name of the site.
This site: http://www.fontyukle.com/en/index.php doesn't charge you for the fonts. I've found there fonts that other sites wanted to charge.

Regarding LP, for curiosity: have you measured your detection time of the plate ? ...with what image resolution ?

Regards,

Andres



2010/7/29 ZIA <zrah...@gmail.com>

Jimmy O'Regan

unread,
Jul 30, 2010, 3:34:13 PM7/30/10
to Andres, tesser...@googlegroups.com

Yeah, there's that too.

>>
>> Most of the code in the dict/ directory does some variation on this,
>> by 'permuting' the character possibilities.
>>
>> > - make your own conversion, e.g., if you are expecting a number and you
>> > get
>> > a G, map it to a 6, if you expect a 2 map it to a Z.
>> >
>>
>> Patrick may have more details on this approach.
>>
>> According to Wikipedia
>> (http://en.wikipedia.org/wiki/Vehicle_registration_plates_of_Argentina),
>> the normal Argentinian license plates follow the template AAA 000, so
>> you could just generate the possible combinations, and use them in a
>> dawg.
>>
>>  perl -e 'for $a (65..90){for $b (65..90) {for $c (65..90) {printf
>> "%c%c%c\n", $a, $b, $c;}}}'
>>  perl -e 'for $a (0..9){for $b (0..9) {for $c (0..9) {printf
>> "%d%d%d\n", $a, $b, $c;}}}'
>>
>> Will get you the two lists you want.
>>
> Thank you very much for this idea.
> The resulting set of words (in the case of the six characters) would have a
> size of 17,576,000 lines.
> How is the access that makes tesseract to this ? Isn't it too big for that ?
>

It'll probably hit the dawg size limit, but you can change it.

The preset is in a variable. I'll dig around for it when I get a chance.

>>
>> >
>> > If you want to continue speaking about specifics of licence plate
>> > recognition, we can continue privately because it's off topic. I'm
>>
>> Well, you've earned my applause for recognising that, but if your
>> conversation turns up information that will save someone some time
>> later on, I'm all for it.
>>
> great, I will be glad to share if something good appears.
>

--

Andres

unread,
Jul 30, 2010, 3:45:16 PM7/30/10
to tesser...@googlegroups.com
By the way, the fonts used in the licence plates in Argentina are not commercial. So I had to build my training image with pictures that I took with my own camera on the street. If that's your case, prepare yourself for a lot of photoshop work, to make the size of the characters uniform (tips: (paste) -> Ctrl+T (transform) -> drag the edges holding shift to keep proportions ---->when you finish with all fonts, merge visible layers (Shift+Ctrl+E) to avoid having a multilayer TIFF file------use the rulers to guide you vertically-----finally you might dicide if you want to threshold)

Question to the list:
The images that I use have black background and the letters are white. I trained Tesseract for that. Does that make any difference, should I get better results by inverting the image (in the training image and captured image) ?

Regards,

Andres


2010/7/30 Andres <andr...@gmail.com>

Andres

unread,
Jul 30, 2010, 4:10:19 PM7/30/10
to tesser...@googlegroups.com

2010/7/30 Jimmy O'Regan <jor...@gmail.com>
 
Do you know anything about the access time ? I can't figure out if Tess should access this using a constant time algorithm or not.
 
That's great. Thank you.

Jimmy O'Regan

unread,
Jul 30, 2010, 4:18:08 PM7/30/10
to tesser...@googlegroups.com
On 30 July 2010 20:45, Andres <andr...@gmail.com> wrote:
> By the way, the fonts used in the licence plates in Argentina are not
> commercial. So I had to build my training image with pictures that I took
> with my own camera on the street. If that's your case, prepare yourself for
> a lot of photoshop work, to make the size of the characters uniform (tips:
> (paste) -> Ctrl+T (transform) -> drag the edges holding shift to keep
> proportions ---->when you finish with all fonts, merge visible layers
> (Shift+Ctrl+E) to avoid having a multilayer TIFF file------use the rulers to
> guide you vertically-----finally you might dicide if you want to threshold)
>
> Question to the list:
> The images that I use have black background and the letters are white. I
> trained Tesseract for that. Does that make any difference, should I get
> better results by inverting the image (in the training image and captured
> image) ?

Tesseract is supposed to handle that gracefully, though for training
it would be better to use black on white.

Andres

unread,
Jul 30, 2010, 4:36:20 PM7/30/10
to tesser...@googlegroups.com
2010/7/30 Jimmy O'Regan <jor...@gmail.com>
On 30 July 2010 20:45, Andres <andr...@gmail.com> wrote:
You mean that I can train on black on white and then read white on black with no difference ?
 
--
<Leftmost> jimregan, that's because deep inside you, you are evil.
<Leftmost> Also not-so-deep inside you.

ZIA

unread,
Jul 31, 2010, 8:12:32 PM7/31/10
to tesseract-ocr
Thanks for Andre, Jimmy

CA license plate Font is available. I tired to find the sample file to
train my ocr, but haven't find anything yet. You are right, I may need
to use alot of photoshop, but again, not sure how many LP will give me
the whole set of numbers and characters. I didn't train the tesseract,
becausei thought OCR will be able to figure out, since the provided
images have no noise. I will email you the final images that I am
providing to OCR. Most of the CA license plate are black on white, but
there are color and other different type of LP there, but I am
ignoring those and assuming that most of the LP characters are black
on light background.

Just for curiosity, when you take the image, do you only focus on LP
area or the whole car? In some of my images, there was a reflection in
the image and I need to get rid of reflection some how, but haven't
figured out.

I used the suggested site that was supposed to give the name of the
font or other information, but when i provided the image, it was not
able to correctly identify the character and it didn't work. I think
Jimmy had the link.
I think, I need to capture enough images and then use photoshop, and
then i need to read on, how to train my data. Quite of work ahead.
Anyway, any of you have any idea, about scanning image and getting the
LP (image was filtered using edge filter, i can see the rectangle box
of LP, just need to figure out, how to scan and how to extract. The
ratio of CA LP is 1 to 2, or 6 to 12 inches (height=6, width=12)

thanks

On Jul 30, 1:36 pm, Andres <andrej...@gmail.com> wrote:
> 2010/7/30 Jimmy O'Regan <jore...@gmail.com>
> > tesseract-oc...@googlegroups.com<tesseract-ocr%2Bunsu...@googlegroups.com>
> > .

ZIA

unread,
Jul 31, 2010, 8:24:26 PM7/31/10
to tesseract-ocr

Austin Henderson

unread,
Jul 31, 2010, 9:20:02 PM7/31/10
to tesser...@googlegroups.com

Just out of curiosity do you have any sample images? I am curious what your source look like.

Thanka

> > > By the way, the fonts use...

> > tesseract-oc...@googlegroups.com<tesseract-ocr%2Bunsu...@googlegroups.com>

> > .
> > For more options, visit this group at

> >http://groups.google.com/group/tesseract-ocr?hl=e...

ZIA

unread,
Jul 31, 2010, 8:25:50 PM7/31/10
to tesseract-ocr
Sorry there is a typo in first line, I meant to say in first sentence
that CA license plate font is not available, so far i Know.

ZIA

unread,
Jul 31, 2010, 8:41:34 PM7/31/10
to tesseract-ocr
Sorry there is a typo in first line, I meant to say in first sentence
that CA license plate font is not available, so far i Know.

On Jul 31, 5:12 pm, ZIA <zrahma...@gmail.com> wrote:

Andres

unread,
Jul 31, 2010, 10:11:50 PM7/31/10
to tesser...@googlegroups.com
Good news regarding your fonts, see between lines.

2010/7/31 ZIA <zrah...@gmail.com>

Thanks for Andre, Jimmy

CA license plate Font is available. I tired to find the sample file to
train my ocr, but haven't find anything yet. You are right, I may need
to use alot of photoshop, but again, not sure how many LP will give me
the whole set of numbers and characters. I didn't train the tesseract,
becausei thought OCR will be able to figure out, since the provided
images have no noise. I will email you the final images that I am
providing to OCR. Most of the CA license plate are black on white, but
there are color and other different type of LP there, but I am
ignoring those and assuming that most of the LP characters are black
on light background.

There is something wrong with your jpg images. Photoshop doesn't work with them.
I converted them to bmp using mspaint, and then uploaded the file to www.whatthefont.com

The font is (or very close to):
Penitentiary Gothic Fill

it costs $21

the font:
http://new.myfonts.com/fonts/ephemera/penitentiary-gothic/fill/

Hint: write your text in CorelDraw in order to be able to adjust sizes and pitches. Use layers, put your real plate image in one layer and write above it in a second layer, adjust everything until they are the same. Then you can copy and paste in order to keep the parameters.


Just for curiosity, when you take the image, do you only focus on LP
area or the whole car? In some of my images, there was a reflection in
the image and I need to get rid of reflection some how, but haven't
figured out.
 

To the whole car (in fact, to the street, I mean, camera in free run mode, with or without a car, without triggering).
To get rid of reflections, as an initial approach, I recommend you the use of polarizing filters in the cameras.
Optical image filtering is a huge topic, we can continue privately.

 
To unsubscribe from this group, send email to tesseract-oc...@googlegroups.com.

Jimmy O'Regan

unread,
Jul 31, 2010, 10:39:45 PM7/31/10
to tesser...@googlegroups.com
On 30 July 2010 21:36, Andres <andr...@gmail.com> wrote:
>
>> Tesseract is supposed to handle that gracefully, though for training
>> it would be better to use black on white.
>>
> You mean that I can train on black on white and then read white on black
> with no difference ?

There *should* be almost no difference, except that the text will be
marked inverted.

Andres

unread,
Aug 1, 2010, 12:10:40 AM8/1/10
to tesser...@googlegroups.com
Anyway, any of you have any idea, about scanning image and getting the
LP (image was filtered using edge filter, i can see the rectangle box
of LP, just need to figure out, how to scan and how to extract. The
ratio of CA LP is 1 to 2, or 6 to 12 inches (height=6, width=12)

I'm not sure about being understanding completelly.  Could you extend a little ?


ZIA

unread,
Aug 1, 2010, 8:13:47 PM8/1/10
to tesseract-ocr
Thanks Andre for finding the font. I will see how can i use that. As
you suggested using coreldraw, i don't have this software, i will try
to see if i can use some other software like MS word.

I was asking how to extract license plate from image. What I am doing,
i get the image, re-sized, convert to binary image and then run the
sobel edge filter. So now i have an image that shows me the rectangle
part of LP clearly, and I know the ration of height to width is 1 to
2. I just need to scan the image and look for rectangle and calculate
their ration, to get the correct LP from image, or is there any other
more efficient way. I hope this clear things, if not, I will send you
the image as input image and after edge filter image.

Zia

Andres

unread,
Aug 1, 2010, 9:28:48 PM8/1/10
to tesser...@googlegroups.com

2010/8/1 ZIA <zrah...@gmail.com>

Thanks Andre for finding the font. I will see how can i use that. As
you suggested using coreldraw, i don't have this software, i will try
to see if i can use some other software like MS word.
 
You are welcome.
Things will be very hard with MS Word. Try better to get Corel or something like it.
 
I was asking how to extract license plate from image. What I am doing,
i get the image, re-sized, convert to binary image and then run the
sobel edge filter. So now i have an image that shows me the rectangle
part of LP clearly, and I know the ration of height to width is 1 to
2. I just need to scan the image and look for rectangle and calculate
their ration, to get the correct LP from image, or is there any other
more efficient way. I hope this clear things, if not, I will send you
the image as input image and after edge filter image.

Sobel filter can be just for preprocessing. It doesn't give you any identification on the graphic objects. I thought that you already resolved that part.

A good algorithm for finding the plate is a very hard thing to do, I'm working on mine from several years ago.

If you don't need it for commercial purposes (in terms of speed, accuracy, light conditions, etc.) here you have the source code of what you need. It's in C# (if I remember correctly, you work in C#), and it uses Tesseract. The whole project is a single file of 173 lines and it works.

http://www.emgu.com/wiki/index.php/License_Plate_Recognition_in_CSharp

What's your situation ?


Zia

On Jul 31, 9:10 pm, Andres <andrej...@gmail.com> wrote:
> > Anyway, any of you have any idea, about scanning image and getting the
> >> LP (image was filtered using edge filter, i can see the rectangle box
> >> of LP, just need to figure out, how to scan and how to extract. The
> >> ratio of CA LP is 1 to 2, or 6 to 12 inches (height=6, width=12)
>
> >> I'm not sure about being understanding completelly.  Could you extend a
>
> little ?

Jimmy O'Regan

unread,
Aug 2, 2010, 8:30:24 AM8/2/10
to tesser...@googlegroups.com
On 2 August 2010 02:28, Andres <andr...@gmail.com> wrote:
>
> 2010/8/1 ZIA <zrah...@gmail.com>
>>
>> Thanks Andre for finding the font. I will see how can i use that. As
>> you suggested using coreldraw, i don't have this software, i will try
>> to see if i can use some other software like MS word.
>
>
> You are welcome.
> Things will be very hard with MS Word. Try better to get Corel or something
> like it.
>

Well, Tesseract is Open Source, so let's promote other Open Source:
http://gimp-win.sourceforge.net/

Giuseppe Menga

unread,
Aug 2, 2010, 6:47:43 AM8/2/10
to tesser...@googlegroups.com
Dear Zia,
I may give you a different idea for recovering the plate rectangle that
worked for me.
In my case I had to recover the box contour of a medicine in order to detect
the expiration date in it.
I'm using Leptonica for this:
static const char *seed_sequence = "o3.3 + r11 + o10.1 + c15.10 + x4";
Pix* pix; // the input gray picture
Pix *pixm,*pixM; //the minima and maxima loci
//int w,h;
//pixGetDimensions(pix,&w,&h,NULL);
pixLocalExtrema(pix,0,0,&pixm,&pixM); // to get minima and maxima
Pix *pixMd2 = pixMorphSequence(pixM, seed_sequence, 0); // some cleaning
with opening and closing
float skewAngle,conf;
Pix* pixMdd = pixFindSkewAndDeskew(pixMd2,1,&skewAngle,&conf); // deskew
// find baselines
PTA* pta;
NUMA* numa = pixFindBaselines(pixMdd,&pta,0); // find the horizontal lines
/*numa is an array of ordinates, pta is an array of points (x,y) - the
extreme points of the horizontal lines found for each ordinate. */
Extraxting with a simple logic the key four points ( or just three) of the
main rectangle I build the clipping box.
Among different techniques this was the one more reliable, as the search for
maxima is fairly insensitive to flares and reflections.
Let me know.
Giuseppe

----- Original Message -----
From: "ZIA" <zrah...@gmail.com>
To: "tesseract-ocr" <tesser...@googlegroups.com>
Sent: Monday, August 02, 2010 2:13 AM
Subject: Re: California License Plate font issues with OCR

Zia

--

ZIA

unread,
Aug 5, 2010, 2:16:48 PM8/5/10
to tesseract-ocr
Hi Andre
Sorry for late reply to forum, I have tried the link that you have
posted, but some how it didn't recognize the CA license plate from
input image, so I gave up on this one. It says that this application
is for European LP. I have changed the ratio to resemble to CA LP
ratio, but it was not finding anything. First, I was very excited when
I found this open source, but soon I realized this need a lot of work
to work with CA LP. My application is not for commercial, I am using
this for my university project, so my first goal is to get an
application that works on CA LP, once it works, I will definitely
spend more time to improve the performance.

Zia

On Aug 1, 6:28 pm, Andres <andrej...@gmail.com> wrote:
> 2010/8/1 ZIA <zrahma...@gmail.com>
> > tesseract-oc...@googlegroups.com<tesseract-ocr%2Bunsu...@googlegroups.com>
> > .

ZIA

unread,
Aug 5, 2010, 2:26:49 PM8/5/10
to tesseract-ocr
Hello Giuseppe

I never heard about Leptonica, I am going to read about it and see how
it will help in my case. I went through your sample code, be honest,
it didn't make a lot of sense, I think should read about leptonica
first and then it will make sense. Just few question, as you said that
you are using this to locate the box contour. So, can you explain
what is *seed_sequence value are and what it represent (dimension of
the box of medicine or what). I will read more about it and then I
will ask you more, but once again thanks for giving me something that
may work.

regards,
Zia

Giuseppe Menga

unread,
Aug 9, 2010, 2:52:38 PM8/9/10
to tesser...@googlegroups.com
Dear Zia:
Leptonica can be easily integrated with Tesseract. As a matter of fact the
new version of Tesseract will use Leptonica for layout analysis.
Actually I'm on holiday (on a sailing boat around Sardinia island), I'm
unable to answer you, but I will give you all technical details when I will
return to my office on September 2�,
As a starting point the subdirectory prog of the Leptonicadownload contains
a lot of examples, I learn there,
Giuseppe

--------------------------------------------------
From: "ZIA" <zrah...@gmail.com>
Sent: Thursday, August 05, 2010 8:26 PM
To: "tesseract-ocr" <tesser...@googlegroups.com>

Reply all
Reply to author
Forward
0 new messages