Get Tesseract ocr to ignore or replace images with whitespace

1,346 views
Skip to first unread message

Richard Arnold

unread,
Jul 16, 2014, 5:23:10 PM7/16/14
to tesser...@googlegroups.com
I want to send scan output directly to a text file minus any images, is it possible to get Tesseract ocr to either ignore images in the scan, or to replace images with whitespace?

If so, how would this be possible?

Are there any coding examples available, i.e. function to recognize an image, draw rectangle around it and exclude image from scan results?

Thanks

Traun Leyden

unread,
Jul 20, 2014, 3:33:22 AM7/20/14
to tesser...@googlegroups.com

You might check out Stroke Width Transform.

I recently added this feature to OpenOCR -- the idea is that it can remove the non-text elements from an image, leaving only the text.

Check out:

Richard Arnold

unread,
Jul 20, 2014, 4:51:32 PM7/20/14
to tesser...@googlegroups.com, Rarno...@neo.rr.com
Hello Traun and thanks for replying to my post.

Stroke Width Transform looks very interesting. However, I have some questions regarding its use in what I'm doing.
I'm writing a Desktop application and OpenOCR appears to use a web service call??

  1. Can OpenOCR (and Stroke Width Transform) be used with a Desktop application?
  2. Is OpenOCR the same as CuneiForm OpenOCR from Cognitive Technologies?
There are multiple references to OpenOCR, i.e. one from Source Forge, one from Cognitive Technologies, etc.

Thank you again for clarification on this tool.

Nick White

unread,
Aug 6, 2014, 11:15:18 AM8/6/14
to tesser...@googlegroups.com
Hi Richard,

On Sun, Jul 20, 2014 at 01:51:32PM -0700, Richard Arnold wrote:
> Stroke Width Transform looks very interesting. However, I have some questions
> regarding its use in what I'm doing.
> I'm writing a Desktop application and OpenOCR appears to use a web service
> call??

Stroke Width Transform does look interesting indeed. But yes, a web
service may well not be appropriate.

It looks (from a very cursory search and read) like it's an easy
algorithm to implement, and there are several reasonable sounding
examples using OpenCV on stackoverflow:

http://stackoverflow.com/questions/22425545/stroke-width-transform-opencv-using-python
http://stackoverflow.com/questions/11116199/stroke-width-transform-swt-implementation-python

Nick
Reply all
Reply to author
Forward
0 new messages