Training for a new country

Antonio Carlos Censi

unread,

Jan 16, 2014, 12:20:03 PM1/16/14

to open...@googlegroups.com

How to use the miscelanwous utilities to train for a new country?

Thanks

ACC

Matt Hill

unread,

Jan 16, 2014, 12:31:39 PM1/16/14

to open...@googlegroups.com

Which country are you interested in training?

Are the plate dimensions different than North American or European plates?

Is the plate region not reliably detected? Are the OCR characters accurate?

There's a few places you can tweak for better accuracy. To fully train
a country can be an involved process. Just trying to figure out what
you need so I can give you the best advice.

-Matt

> --
> You received this message because you are subscribed to the Google
> Groups "OpenALPR" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to openalpr+u...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.

Antonio Carlos Censi

unread,

Jan 16, 2014, 12:52:20 PM1/16/14

to open...@googlegroups.com

Brazil

There 2 types, for cars and trucks and another for motorbykes. Images from both types attached.

Plates have 3 letters and 4 numbers. Font is Mandatory or UKLicensePlate (it is a subset from Mandatory) . If you want you can download a copy for free from http://www.dafont.com/uk-number-plate.font

For cars is AAA-NNNN in one row, for bykes in AAA in one row and NNNN in another.

They are standard for all the country and vary in color depending upon license type: gray with black letters normally, and white letters in case of red, blue or other dark background.

You can have a look searching for "placas de carros Brasil" in Google Images.

Thanks for your attention.

ACC

placa1.jpg

placa16.jpg

m-ano-2012.jpg.JPG

Antonio Carlos Censi

unread,

Jan 16, 2014, 1:03:28 PM1/16/14

to open...@googlegroups.com

Just to complement your other questions:

- Plates are being detected with a success of around 60%,

- Chars are being recognized well with the easy cases

- Number 1 normally is not detected.

- Last digit is missing sometimes, even tweaking postprocessing to base format @@@#### or @@@?####, both for us or ec regions.

Plate dimensions (cars) are 400 mm X 130 mm. Character height is around 65 mm. Width is variable depending upon each char.

Byke plates are 200 mm X 170 mm and char heigh is 53. I have not tried theses ones yet.

Thanks

ACC

Matt Hill

unread,

Jan 17, 2014, 1:13:14 AM1/17/14

to open...@googlegroups.com

Antonio,

Brazil license plates are different enough from EU and US plates (sizes and fonts) that you may see a big benefit by training OpenALPR for your country.

I have just added two new repositories (with instructions) that should help you train the OCR and the plate detection algorithm. The EU plates are included as an example. These repositories are located here:
https://github.com/openalpr/train-ocr
https://github.com/openalpr/train-detector

Training the detector should increase your plate detection rate above 60%. The character recognition (especially for the number 1) should improve as well when you train the OCR. Please reply if you have any issues with the instructions -- if something is unclear, I will need to update it.

Missing the last digit in the plate is more of an issue in the code itself (charactersegmenter.cpp). I imagine it's probably detecting the character as the edge of the plate and disregarding it. Turning on some of the debug options in openalpr.conf should help troubleshoot those cases.

Currently OpenALPR supports single-line license plates -- it would take code enhancements to support two-line license plates.

-Matt

Antonio Carlos Censi

unread,

Jan 17, 2014, 8:16:16 AM1/17/14

to open...@googlegroups.com

Thanks, Matt

I will be looking at the new material.

I have already activated debug to process a short sample of images gathered from the internet. I will be reporting later the results.

Something that seems to happen is that, althoug the detected plate area is OK will all the chars area, later processing cut it to just 6 of the seven chars, missing the first or the last. I have adjusted the dimensions, but it did not seems to affect the cropped area.

Regarding the Tesseract training, in the Google code repository there is a program (training/text2image) to take a TTF file and generate the TIFF image with the letter boxes. I will try it.

In 2012 there a was a change for applying a reflective material over the background. I think that will need to some adjusts in the preprocessing of the images too.

Cordially

ACC

Matt

unread,

Jan 17, 2014, 11:33:43 AM1/17/14

to open...@googlegroups.com

OpenALPR processes the plate image first just to verify that it is, in fact, a license plate and to figure out the edges/rotation. It also figures out whether the characters are inverted (i.e., dark on light or light on dark characers). After that, it straightens and deskews the image so that it looks like it's a head-on capture.

Later, in charactersegmenter.cpp, it creates a histogram from this straightened image. This is what determines the likely character positions. Anywhere there is appreciable character content followed by whitespace, it determines is a character. There are filters here for character height (e.g., ignore dashes) and character width (e.g., ignore long, skinny lines at the left and right edge since these could be plate boundaries).

I just downloaded one of your images and ran it on debug using these settings in openalpr.conf:

[debug]

general = 1

timing = 0

state_id = 0

plate_lines = 0

plate_corners = 0

char_regions = 0

char_segment = 1

char_analysis = 0

color_filter = 0

ocr = 0

postprocess = 0

show_images = 1

The image is attached.

Here are the results from OpenALPR (notice the missing "1" at the end).

plate0: 10 results -- Processing Time = 270.193ms.

- BUS902 confidence: 84.979 template_match: 0

- BUS9D2 confidence: 81.8859 template_match: 0

- BUS9020 confidence: 80.6593 template_match: 0

- BUS902D confidence: 80.1146 template_match: 0

- BUSS02 confidence: 78.9764 template_match: 0

- BUS9Q2 confidence: 78.507 template_match: 0

- BUS902Q confidence: 78.2764 template_match: 0

- BUS902H confidence: 78.1059 template_match: 0

- BUS902G confidence: 78.0941 template_match: 0

- BUS902U confidence: 77.8298 template_match: 0

It's using EU as the country. If you notice, the plate area is way too wide, this is because the European plate detector is looking for wider plates than Brazilian. If you retrain your detector for Brazil, this should get you a much tighter bounding box.

Secondly, it looks like the character segmenter has got it correct. It's disqualified all of the characters that aren't part of your plate content (for a variety of reasons), leaving you with BUS9021. Each of these characters will go through to OCR. However, The "1" is not making it through to the final recognition since it's an unrecognized character. The European plates have no character that is a straight line like this -- Neither the "1" fonts, nor the "I" fonts look like that. So, OCR cannot recognize it. If you retrain the OCR for your Brazilian font, that should help as well.

I wouldn't mess with the Tesseract text2image utility. The OpenALPR misc utility does all of that for you. It helps you tag characters in the plate, then it puts together a proper Tesseract tiff and box file automatically. Using other programs or doing this manually (yuck!) would be very difficult.

The changes in plate style (reflective and non-reflective) should not be a problem. Just train the detector for both by including lots of images of both styles. As long as the dimensions are the same, that should not be a problem.

Selection_081.png

Antonio Carlos Censi

unread,

Jan 17, 2014, 12:26:30 PM1/17/14

to open...@googlegroups.com

OK

Thanks again

ACC

Tu Phuong

unread,

Mar 7, 2014, 9:08:40 AM3/7/14

to open...@googlegroups.com

Hi Matt,

I'm from Vietnam. Our plates have two-line plate (please see attachment). I'm waiting for your update to support two-line plate. Please notify me when you finish, or could you guide me how to code it, and I'll update for you.

Thanks you so much.

Tu Phuong

unread,

Mar 7, 2014, 9:09:58 AM3/7/14

to open...@googlegroups.com

This is the attachment

index2.jpg

Matt

unread,

Mar 8, 2014, 8:38:15 AM3/8/14

to open...@googlegroups.com

Two line plates should be possible to support, but will require code changes.

First, the detector needs to be trained to recognize two-line plates. This means lots of samples need to be collected and cropped, and then the LBP cascade detector must be trained with those samples.

After that, the detector would be run twice for countries that have two-line plates. The first pass would find 1-line plates, the second pass would find 2-line plates.

Next the characteranalysis.cpp file would need to be tweaked to recognize two-lines of text. Currently it looks for large "blobs" located centrally in the plate. This would just need to be changed to support two rows of text (located towards the top and bottom of the plate). I think the charactersegmenter.cpp should also need to be changed to support differentiating these rows.

Raul Batalha

unread,

Jun 22, 2015, 3:05:03 PM6/22/15

to open...@googlegroups.com

Olá Antonio Carlos,

Li a conversa sua com o Matt, estou com a mesma dificuldade que a sua, mais acredito que vc já tenha solucionado o caso! Gostaria de uma forcinha sua para treinar tendo em vista que entendeu bem o tutorial.

Em quinta-feira, 16 de janeiro de 2014 13:20:03 UTC-4, Antonio Carlos Censi escreveu:

Como usar os utilitários miscelanwous para treinar para um novo país?

Obrigado

ACC

Alverne Paiva

unread,

Nov 28, 2015, 4:16:39 PM11/28/15

to OpenALPR

Raul você conseguiu realizar seu treinamento para placas do Brasil? Estou pensando em fazer mas caso alguém já tenha feito e queira disponibilizar os arquivos vai poupar o trabalho.

Reply all

Reply to author

Forward