Deskew waves in a document

1,056 views
Skip to first unread message

Patrick Collins

unread,
May 6, 2011, 2:49:42 PM5/6/11
to tesser...@googlegroups.com
Hi,
I am trying to scan a series of documents which have been badly skewed by the book's edge. Has anyone seen any commercial or open sources implementations of deskewing software which can handle advanced deskew's like this?

Patrick.

Robert Komar

unread,
May 6, 2011, 9:27:40 PM5/6/11
to tesser...@googlegroups.com
On Fri, 6 May 2011, Patrick Collins wrote:

> Hi,I am trying to scan a series of documents which have been badly skewed by

It's hard to know whether you mean the whole page is scanned at an
angle, or that the text is badly distorted where the page curves
up into the spine when the book is squashed flat.

If the whole page is at an angle, then imagemagick 'convert' does
a reasonable job of deskewing.

I don't know about software to correct curved text, but I do my scanning
with a Plustek Opticbook scanner. It gets pretty close to the spine
of the book (about 1/4" for my 3600 model) without needing to try
to squash the book flat. So there is pretty well no skewing over
the scanning surface. If you're doing the scanning yourself and
money isn't too tight, you could look into getting one of these
scanners yourself.

Cheers,
Rob Komar

zdenko podobny

unread,
May 7, 2011, 3:19:52 AM5/7/11
to tesser...@googlegroups.com
Hi,

I am not sure if I understood your problem (e.g. if you are looking for  "dewarp" ("straighten text line") feature. In leptonica there are example programs for dewarping: dewarp_reg.c and dewarptest.c. I try it to on one of my project, but it did not worked on my images (e.g. I plan to play with it later ;-) )

On  http://diybookscanner.org I found references for (commercial) program Book Restorer [1]. I had change to test it and I can proof it worked perfect - if you need to straight text lines in output.

I have experience only with these. But if you google for "dewarp" or "Image straightening algorithm" you can find a lot of interesting suggestions for algorithm or programs ([2], [3])


--
Zdenko

--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesser...@googlegroups.com
To unsubscribe from this group, send email to
tesseract-oc...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

zdenko podobny

unread,
May 7, 2011, 3:23:15 AM5/7/11
to tesser...@googlegroups.com
here is link for leptonica dewarp documentation: http://tpgit.github.com/UnOfficialLeptDocs/leptonica/dewarping.html

Zdenko

Dmitri Silaev

unread,
May 7, 2011, 6:34:02 AM5/7/11
to tesser...@googlegroups.com, pcol...@gmail.com
If you want to do massive scanning, indeed, the best option is
something like DIY Book Scanner.

In case when you'd like to make it in a software way, then you should
decide if you want to get a "rectified" image (like book pages came
out of a flatbed scanner) or just need the text to be recognized.
There are some image processing algorithms and out-of-the-box products
that allow you to achieve the former, though only to some degree of
perfection. However if for your goals the latter is enough, then you
may try to pass "rectified" images to an OCR system and probably get
decent results.

Methods of rectifying and preparing such curved images for OCR are
very diverse, though. E.g. in my engine for processing sales receipt
photos I use a great mixture of traditional and self-devised methods,
and there's still a room for improvement. You'll need to play with
methods to determine what suits you best.

Warm regards,
Dmitri Silaev
www.CustomOCR.com

Reply all
Reply to author
Forward
0 new messages