OCR ground-truthing tools

69 views
Skip to first unread message

Bill Janssen

unread,
Jun 4, 2011, 2:34:21 PM6/4/11
to ocropus
So, what are people using for OCR ground-truthing tools? I've found
TrueViz, which is a decade old, but perhaps that's the state of the
art? Anything that takes hOCR as input?

Tom

unread,
Jun 29, 2011, 5:55:50 PM6/29/11
to ocr...@googlegroups.com
OCRopus doesn't need line-accurate transcriptions.  It's fine if you just take each page and type in its text in reading order.  The training tools will figure out how to align that with the input.

In some cases you may need additional or different labeling tools:

  • You can use ocropus-cedit and related tools inside OCRopus
  • We're helping another group write a proposal for web-based correction tools (as part of a larger project).
  • The TextGrid project (that we're part of) is also developing tools based on Eclipse.
  • We're also developing some tools for using Mechanical Turk for transcriptions and verification.

Tom

Raj Julha

unread,
Jul 4, 2011, 6:11:38 AM7/4/11
to ocropus
Hi

I'm very much interested in the web based correction tool and the
mechanical turk bit. Is there any link to these projects?

Cheers

Raj

On Jun 30, 1:55 am, Tom <tmb...@gmail.com> wrote:
> OCRopus doesn't need line-accurate transcriptions.  It's fine if you just
> take each page and type in its text in reading order.  The training tools
> will figure out how to align that with the input.
>
> In some cases you may need additional or different labeling tools:
>
>    - You can use ocropus-cedit and related tools inside OCRopus
>    - We're helping another group write a proposal for web-based correction
>    tools (as part of a larger project).
>    - The TextGrid project (that we're part of) is also developing tools
>    based on Eclipse.
>    - We're also developing some tools for using Mechanical Turk for
>    transcriptions and verification.
>
> Tom

Tom

unread,
Jul 4, 2011, 6:09:05 PM7/4/11
to ocr...@googlegroups.com
No, not yet.  The web-based correction tools will hopefully be out next year (depending on funding).  The Mechanical Turk tools are a student project/thesis and should be done in a few months.

Tom

Tom

unread,
Jul 4, 2011, 6:09:38 PM7/4/11
to ocr...@googlegroups.com
There are some new ground-truthing tools now.  See the YouTube video channel for how they work.

Tom

Bill Janssen

unread,
Aug 15, 2011, 5:24:47 PM8/15/11
to ocropus
"the YouTube video channel"? Really?

Bill

Bill Janssen

unread,
Aug 15, 2011, 5:45:59 PM8/15/11
to ocropus
Looks pretty good.

Bill
Reply all
Reply to author
Forward
0 new messages