--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To post to this group, send email to tesser...@googlegroups.com.
To unsubscribe from this group, send email to tesseract-oc...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.
You were probably looking at the code that guesses among 1, l and i
Most of the code in the dict/ directory does some variation on this,
by 'permuting' the character possibilities.
> - make your own conversion, e.g., if you are expecting a number and you get
> a G, map it to a 6, if you expect a 2 map it to a Z.
>
Patrick may have more details on this approach.
According to Wikipedia
(http://en.wikipedia.org/wiki/Vehicle_registration_plates_of_Argentina),
the normal Argentinian license plates follow the template AAA 000, so
you could just generate the possible combinations, and use them in a
dawg.
perl -e 'for $a (65..90){for $b (65..90) {for $c (65..90) {printf
"%c%c%c\n", $a, $b, $c;}}}'
perl -e 'for $a (0..9){for $b (0..9) {for $c (0..9) {printf
"%d%d%d\n", $a, $b, $c;}}}'
Will get you the two lists you want.
(For the original question, according to
http://en.wikipedia.org/wiki/Vehicle_registration_plates_of_California
this is the California scheme:
perl -e 'for $a (0..9){for $b (65..90){for $c (65..90) {for $d
(65..90) {for $e (0..9){for $f (0..9) {for $g (0..9) {printf
"%d%c%c%c%d%d%d\n", $a, $b, $c, $d, $e, $f, $g;}}}}}}}'
> I think that I'll use the last one, I'm not on that part yet. I'm getting
> good results on images where the characters are big because of the distance
> of the camera, but in small letters (13 pixels height) things are not good.
>
> So I have a pair of ideas to test, perhaps somebody from the group could
> give me opinions regarding them:
> - following the contour, with polygon approximation of the chars, making an
> image with that contours and running Tesseract on that image (trained for
> that)
Seems reasonable. Something like autotrace or potrace might be useful.
> - make an image with my font (one of each from the alphabet), and repeating
> the alphabet with different levels of threshold. I think that internally
> Tesseract thresholds the images. Hard to explain this, but I think that it
> may improve the quality.
Yes, Tesseract internally thresholds the image. I think Google did
something like this in the Tesseract 3 language packs, so it might be
worth doing.
>
> If you want to continue speaking about specifics of licence plate
> recognition, we can continue privately because it's off topic. I'm
Well, you've earned my applause for recognising that, but if your
conversation turns up information that will save someone some time
later on, I'm all for it.
--
<Leftmost> jimregan, that's because deep inside you, you are evil.
<Leftmost> Also not-so-deep inside you.
On 29 July 2010 03:23, Andres <andr...@gmail.com> wrote:You were probably looking at the code that guesses among 1, l and i
> Hello,
>
> I'm working on the same as you, for the licence plates from Argentina, as I
> live in Argentina.
>
> Same as you described, the problem was to locate the licence plate.
>
> Now I'm working with the OCR and then I will work on horizontalizing the
> images, because if they are not completely horizontal, the OCR fails, for
> example today I was getting a 5 instead a of a 6. When I horizontalized the
> image with photoshop, everything turned to ok.
>
> I dont know how is the layout of the positions of letters and numbers in
> California plates, are they assorted ? ...if you know if the character
> should be a number or a letter according to its position, you have two
> options (as far as I know):
>
> - when recognizing char by char, tell Tesseract that you expect a number or
> a letter. I saw that in somewere inside the source code, don't remember
> where.
Most of the code in the dict/ directory does some variation on this,
by 'permuting' the character possibilities.
Patrick may have more details on this approach.
> - make your own conversion, e.g., if you are expecting a number and you get
> a G, map it to a 6, if you expect a 2 map it to a Z.
>
According to Wikipedia
(http://en.wikipedia.org/wiki/Vehicle_registration_plates_of_Argentina),
the normal Argentinian license plates follow the template AAA 000, so
you could just generate the possible combinations, and use them in a
dawg.
perl -e 'for $a (65..90){for $b (65..90) {for $c (65..90) {printf
"%c%c%c\n", $a, $b, $c;}}}'
perl -e 'for $a (0..9){for $b (0..9) {for $c (0..9) {printf
"%d%d%d\n", $a, $b, $c;}}}'
Will get you the two lists you want.
(For the original question, according to
http://en.wikipedia.org/wiki/Vehicle_registration_plates_of_California
this is the California scheme:
perl -e 'for $a (0..9){for $b (65..90){for $c (65..90) {for $d
(65..90) {for $e (0..9){for $f (0..9) {for $g (0..9) {printf
"%d%c%c%c%d%d%d\n", $a, $b, $c, $d, $e, $f, $g;}}}}}}}'
Seems reasonable. Something like autotrace or potrace might be useful.
> I think that I'll use the last one, I'm not on that part yet. I'm getting
> good results on images where the characters are big because of the distance
> of the camera, but in small letters (13 pixels height) things are not good.
>
> So I have a pair of ideas to test, perhaps somebody from the group could
> give me opinions regarding them:
> - following the contour, with polygon approximation of the chars, making an
> image with that contours and running Tesseract on that image (trained for
> that)
> - make an image with my font (one of each from the alphabet), and repeatingYes, Tesseract internally thresholds the image. I think Google did
> the alphabet with different levels of threshold. I think that internally
> Tesseract thresholds the images. Hard to explain this, but I think that it
> may improve the quality.
something like this in the Tesseract 3 language packs, so it might be
worth doing.
>Well, you've earned my applause for recognising that, but if your
> If you want to continue speaking about specifics of licence plate
> recognition, we can continue privately because it's off topic. I'm
conversation turns up information that will save someone some time
later on, I'm all for it.
Yeah, there's that too.
>>
>> Most of the code in the dict/ directory does some variation on this,
>> by 'permuting' the character possibilities.
>>
>> > - make your own conversion, e.g., if you are expecting a number and you
>> > get
>> > a G, map it to a 6, if you expect a 2 map it to a Z.
>> >
>>
>> Patrick may have more details on this approach.
>>
>> According to Wikipedia
>> (http://en.wikipedia.org/wiki/Vehicle_registration_plates_of_Argentina),
>> the normal Argentinian license plates follow the template AAA 000, so
>> you could just generate the possible combinations, and use them in a
>> dawg.
>>
>> perl -e 'for $a (65..90){for $b (65..90) {for $c (65..90) {printf
>> "%c%c%c\n", $a, $b, $c;}}}'
>> perl -e 'for $a (0..9){for $b (0..9) {for $c (0..9) {printf
>> "%d%d%d\n", $a, $b, $c;}}}'
>>
>> Will get you the two lists you want.
>>
> Thank you very much for this idea.
> The resulting set of words (in the case of the six characters) would have a
> size of 17,576,000 lines.
> How is the access that makes tesseract to this ? Isn't it too big for that ?
>
It'll probably hit the dawg size limit, but you can change it.
The preset is in a variable. I'll dig around for it when I get a chance.
>>
>> >
>> > If you want to continue speaking about specifics of licence plate
>> > recognition, we can continue privately because it's off topic. I'm
>>
>> Well, you've earned my applause for recognising that, but if your
>> conversation turns up information that will save someone some time
>> later on, I'm all for it.
>>
> great, I will be glad to share if something good appears.
>
--
2010/7/30 Jimmy O'Regan <jor...@gmail.com>
Tesseract is supposed to handle that gracefully, though for training
it would be better to use black on white.
On 30 July 2010 20:45, Andres <andr...@gmail.com> wrote:
--
<Leftmost> jimregan, that's because deep inside you, you are evil.
<Leftmost> Also not-so-deep inside you.
Just out of curiosity do you have any sample images? I am curious what your source look like.
Thanka
> > > By the way, the fonts use...
> > tesseract-oc...@googlegroups.com<tesseract-ocr%2Bunsu...@googlegroups.com>
> > .
> > For more options, visit this group at
Thanks for Andre, Jimmy
CA license plate Font is available. I tired to find the sample file to
train my ocr, but haven't find anything yet. You are right, I may need
to use alot of photoshop, but again, not sure how many LP will give me
the whole set of numbers and characters. I didn't train the tesseract,
becausei thought OCR will be able to figure out, since the provided
images have no noise. I will email you the final images that I am
providing to OCR. Most of the CA license plate are black on white, but
there are color and other different type of LP there, but I am
ignoring those and assuming that most of the LP characters are black
on light background.
Just for curiosity, when you take the image, do you only focus on LP
area or the whole car? In some of my images, there was a reflection in
the image and I need to get rid of reflection some how, but haven't
figured out.
To unsubscribe from this group, send email to tesseract-oc...@googlegroups.com.
There *should* be almost no difference, except that the text will be
marked inverted.
Anyway, any of you have any idea, about scanning image and getting the
LP (image was filtered using edge filter, i can see the rectangle box
of LP, just need to figure out, how to scan and how to extract. The
ratio of CA LP is 1 to 2, or 6 to 12 inches (height=6, width=12)
Thanks Andre for finding the font. I will see how can i use that. As
you suggested using coreldraw, i don't have this software, i will try
to see if i can use some other software like MS word.
I was asking how to extract license plate from image. What I am doing,
i get the image, re-sized, convert to binary image and then run the
sobel edge filter. So now i have an image that shows me the rectangle
part of LP clearly, and I know the ration of height to width is 1 to
2. I just need to scan the image and look for rectangle and calculate
their ration, to get the correct LP from image, or is there any other
more efficient way. I hope this clear things, if not, I will send you
the image as input image and after edge filter image.
Zia
On Jul 31, 9:10 pm, Andres <andrej...@gmail.com> wrote:
> > Anyway, any of you have any idea, about scanning image and getting the
> >> LP (image was filtered using edge filter, i can see the rectangle box
> >> of LP, just need to figure out, how to scan and how to extract. The
> >> ratio of CA LP is 1 to 2, or 6 to 12 inches (height=6, width=12)
>
> >> I'm not sure about being understanding completelly. Could you extend a
>
> little ?
Well, Tesseract is Open Source, so let's promote other Open Source:
http://gimp-win.sourceforge.net/
----- Original Message -----
From: "ZIA" <zrah...@gmail.com>
To: "tesseract-ocr" <tesser...@googlegroups.com>
Sent: Monday, August 02, 2010 2:13 AM
Subject: Re: California License Plate font issues with OCR
Zia
--
--------------------------------------------------
From: "ZIA" <zrah...@gmail.com>
Sent: Thursday, August 05, 2010 8:26 PM
To: "tesseract-ocr" <tesser...@googlegroups.com>