Python OCR challenge

29 views
Skip to first unread message

Ian Ozsvald

unread,
Jul 11, 2010, 8:09:17 AM7/11/10
to ai...@googlegroups.com
I'm close to starting a challenge using optical character recognition
to read English Heritage Plaques for the http://openplaques.org/
project. The challenge is to write some routines that help an OCR tool
to accurately extract text from images of plaques.

I'll be running the challenge over a number of months, sharing results
and working to make an automatic plaque transcriber for OpenPlaques
(otherwise humans have to do the transcription by hand...). I have a
prototype in Python and I'll be using Python for my entry, there are
details with src here:
http://aicookbook.com/wiki/Automatic_plaque_transcription

If you'd like to stretch your knowledge of OCR, vision, image
preparation and data mining then perhaps you'd like to join the
challenge? Keep an eye on the Google Group:
http://groups.google.com/group/aicookbook
for details of the start. I'll be awarding a prize each month and open
sourcing everything.

Hoping this interests some of you,
Ian.

--
Ian Ozsvald (A.I. researcher, screencaster)
i...@IanOzsvald.com

http://IanOzsvald.com
http://MorConsulting.com/
http://blog.AICookbook.com/
http://TheScreencastingHandbook.com
http://FivePoundApp.com/
http://twitter.com/IanOzsvald

Ian Ozsvald

unread,
Aug 29, 2010, 7:41:40 PM8/29/10
to ai...@googlegroups.com
This is an update - one of my collaborators has won this month's
challenge and the cash prize, he's taken the average error in the test
set down from hundreds to just 33.4. There's still a way to go before
I present this to the OpenPlaques hack day next month (funded by the
Royal Society of the Arts) up in London.

If you'd like to get involved there's another cash prize and the
chance for your name to get announced at the hack day. Here's a
write-up of the current state along with links to Jonathan's Python
source:
http://blog.aicookbook.com/2010/08/automatic-plaque-transcription-pytesseract-average-error-down-to-33-4/

There are vision-cleanup, OCR, dictionary and other challenges in the
way of good recognition so there's plenty you could tackle if you
fancy an interesting open-source brain teaser.

Ian.

Reply all
Reply to author
Forward
0 new messages