Older fonts

33 views

Skip to first unread message

Geo Past

unread,

Aug 22, 2016, 7:06:35 AM8/22/16

to tesseract-ocr

Hi everyone,

I'm running a Tesseract install for my project and i'm hoping to improve OCR results. At the moment i convert each image to grayscale to get better results.

Here's a few examples of the content i'm dealing with:

https://www.flickr.com/photos/sarniahistoricalsociety/24629031156/in/album-72157663444855040/

https://www.flickr.com/photos/sarniahistoricalsociety/27218048966/in/album-72157663444855040/

https://www.flickr.com/photos/sarniahistoricalsociety/27251861635/in/album-72157663444855040/

https://www.flickr.com/photos/sarniahistoricalsociety/27251771435/in/album-72157663444855040/

As you can see each example text is quite different. I'm wondering what the best approach would be - trying to find matching fonts for tesseract? All comments welcome :)

Thanks,

Neil

Tom Morris

unread,

Aug 23, 2016, 1:35:28 AM8/23/16

to tesseract-ocr

On Monday, August 22, 2016 at 7:06:35 AM UTC-4, Geo Past wrote:

I'm running a Tesseract install for my project and i'm hoping to improve OCR results. At the moment i convert each image to grayscale to get better results.

Here's a few examples of the content i'm dealing with:

https://www.flickr.com/photos/sarniahistoricalsociety/24629031156/in/album-72157663444855040/

This one is hand printed. Probably not a good candidate for Tesseract...

https://www.flickr.com/photos/sarniahistoricalsociety/27218048966/in/album-72157663444855040/
https://www.flickr.com/photos/sarniahistoricalsociety/27251861635/in/album-72157663444855040/
https://www.flickr.com/photos/sarniahistoricalsociety/27251771435/in/album-72157663444855040/

As you can see each example text is quite different. I'm wondering what the best approach would be - trying to find matching fonts for tesseract? All comments welcome :)

The remaining ones look like real fonts, although the first is almost completely obscured by the background and the last is bifurcated by the image edge. I suspect if you can successfully deal with those issues, the font issues will fade into the background (and, if not, you can enhance the training to include the fonts you are interested in).