Using Tesseract to scan floor plans of a ship

166 views
Skip to first unread message

Rutger Rozendal

unread,
Aug 21, 2015, 3:55:29 AM8/21/15
to tesseract-ocr
Dear People,

We are using Tesseract to recognise room numbers on a floorpan of a ship deck.
Attached to this email two examples.

We are trying different methods and have a mixture of results, let's say recognising between 20% till 70% of the room numbers.

Because the image come with color we are now wondering is results are better when we take out colours upfront?
So making them black and white or grayscale.
Can we use Tesseract to do this color conversion with a certain profiling or do we need to use an external program for that?

Also we could work maybe with a search for specific patters, as these rooms most of the time consist out of 4 digits.

Any tips on direction for a best configuration is helpfull.

Thanks in advance,

Rutger
rci_hm_DECK09.jpg
cel_rf_DECK08.jpg

Allistair C

unread,
Aug 21, 2015, 4:25:53 AM8/21/15
to tesser...@googlegroups.com
The way I would do this is use a rectangle-by-color extraction phase that produces all the cropped out colour rectangles with numbers and then perform ocr on each one which should be good success for the quality of text 

Sent from my iPhone
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/bf5d8e06-7e42-464d-ab50-61951a89447e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
<rci_hm_DECK09.jpg>
<cel_rf_DECK08.jpg>

Rutger Rozendal

unread,
Aug 21, 2015, 7:38:42 AM8/21/15
to tesser...@googlegroups.com
Ok, thanks

We will try this method
So first getting the rectangles out as cropped pictures and then do the character recognition on this separate picture.

For rectangle-by-color extraction we think to use OpenCV as it seems that Tesseract is not really into that, isn't it?

We used OpenCV before to find the rooms (rectangular boxes) but that was based on the walls, their radius and this analysis was done on an inverted black-end-white picture of the floorplan.
We will try now with the color as an input source but - as some rooms have the same color and they are beside each other we are wondering what will happen to those.

And then for the last step, the Tesseract recognition on the cropped picture of one room, is it advisable to use there a grayscale image?
And can we feed Tesseract with a kind of target list? For us it is important to find the location on the room (x and y on the picture), that is the overall goal of the assignment.

Thanks again for any tips in this challange.

Rutger







For more options, visit https://groups.google.com/d/optout.



--
--
Drs. Ing. R.D. Rozendal

Noterik B.V.

Tel. +31-(0)20-5929966
Fax. +31-(0)20-5929969

Check out the demo's of our tools:
http://www.noterik.nl/video

Allistair

unread,
Aug 21, 2015, 9:18:32 AM8/21/15
to tesser...@googlegroups.com
Hi,

Yes OpenCV would be the method - finding contours and filtering by colour/shape etc.

I am not sure how you would separate rooms sharing the same colour but you have the benefit of thick black borders, so your method will need to somehow use those borders. 

There is no real benefit to grays-caling the images beforehand, Tesseract already does this internally. You may however like to remove the coloured background first - since you know the colour for a rectangle for each output you can remove it quite easily, replacing it either with white (for black text) or black (for white text).

Be very interested to hear how you get on ..

Cheers

Robert Komar

unread,
Aug 21, 2015, 2:07:51 PM8/21/15
to tesser...@googlegroups.com
I believe that tesseract operates on black and white
images. All grayscale and colour images are converted
internally to black and white if necessary. In your
case, you could probably do the conversion yourself,
turning every pixel that is not black to white, since
all of the text is black.

Many people have converted numeric text, and there
are many posts in the archive about that. I think
some used a whitelist of numeric characters, and
others created dictionaries containing valid combinations
of numbers to search against. Tesseract does not
just try to recognize each character, it also tries
to recognize each "word" against dictionaries, so
it helps to let tesseract know that "8008" is a
better answer than "BOOB".

Cheers,
Rob Komar

P.S. Does anyone know if the whitelist applies to
the dictionary search, as well? If not, I think it
would be a useful addition to make to the code.

Rutger Rozendal

unread,
Aug 22, 2015, 1:51:58 AM8/22/15
to tesser...@googlegroups.com

>
> I believe that tesseract operates on black and white
> images. All grayscale and colour images are converted
> internally to black and white if necessary. In your
> case, you could probably do the conversion yourself,
> turning every pixel that is not black to white, since
> all of the text is black.
>
> Many people have converted numeric text, and there
> are many posts in the archive about that. I think
> some used a whitelist of numeric characters, and
> others created dictionaries containing valid combinations
> of numbers to search against. Tesseract does not
> just try to recognize each character, it also tries
> to recognize each "word" against dictionaries, so
> it helps to let tesseract know that "8008" is a
> better answer than "BOOB".
>
> Cheers,
> Rob Komar
>

ok, cool, very good to know. So what will try then is to make a target list of rooms that we want to find and feed this list as a 'numeric dictionary' into to Tesseract.

We keep you updated on the results, somewhere next week.

Thanks again,

Rutger

Art Rhyno.

unread,
Aug 23, 2015, 9:59:09 AM8/23/15
to tesser...@googlegroups.com

If you have a list of background colours for the image, you could extract those individually. I tried a simple gaussian blur to deal them ([1] and [2]). Not perfect, but you get the idea. Once the background colours are out of the equation, there is a horizontal line removal example included with leptonica that might help you [3]. I tried it on the two images ([4] and [5]), and then was able to get fairly good results.

 

art

---

1. http://imgur.com/NvNW9do

2. http://imgur.com/uThhJPQ

3. http://www.leptonica.com/line-removal.html

4. http://imgur.com/Bnat2U0

5. http://imgur.com/8ydMinC

Reply all
Reply to author
Forward
0 new messages