Getting started - simple ocr of A4 pages

300 views
Skip to first unread message

daniel....@gmail.com

unread,
Jan 22, 2013, 1:49:40 AM1/22/13
to ocr...@googlegroups.com
Hi,

I'm attempting to ocr an image, which is of a simple and relatively clear A4 page.
Cuneiform does an ok job, but I'm curious to see if ocropus is better.

I followed the instructions to install on the home page. However, when I do the following from a shell I get some errors.
I type:
ocropus-recognize-book in.png -o out.html

Here is the output I get:
book directory ./_book-017587
in.png -> ./_book-017587/0001.png

=== preprocess

# ocropus-nlbin ./_book-017587/????.png
=== ./_book-017587/0001.png 1
estimating skew angle
estimating thresholds
rescaling
./_book-017587/0001.png lo-hi (0.42 1.45) angle  0.2  no-normalization
writing
Traceback (most recent call last):
  File "/usr/local/bin/ocropus-nlbin", line 181, in <module>
    process1((f,i+1))
  File "/usr/local/bin/ocropus-nlbin", line 161, in process1
    gray()
  File "/usr/lib/pymodules/python2.7/matplotlib/pyplot.py", line 2951, in gray
    im = gci()
  File "/usr/lib/pymodules/python2.7/matplotlib/pyplot.py", line 217, in gci
    return gcf()._gci()
  File "/usr/lib/pymodules/python2.7/matplotlib/pyplot.py", line 369, in gcf
    return figure()
  File "/usr/lib/pymodules/python2.7/matplotlib/pyplot.py", line 343, in figure
    **kwargs)
  File "/usr/lib/pymodules/python2.7/matplotlib/backends/backend_tkagg.py", line 80, in new_figure_manager
    window = Tk.Tk()
  File "/usr/lib/python2.7/lib-tk/Tkinter.py", line 1688, in __init__
    self.tk = _tkinter.create(screenName, baseName, className, interactive, wantobjects, useTk, sync, use)
_tkinter.TclError: no display name and no $DISPLAY environment variable
Traceback (most recent call last):
  File "/usr/local/bin/ocropus-recognize-book", line 73, in <module>
    run(args.preproc,book+"/????.png")
  File "/usr/local/bin/ocropus-recognize-book", line 57, in run
    assert subprocess.call(args)==0
AssertionError

If anyone can help me overcome these errors that would be greatly appreciated. I can not seem to find any output text at the moment from this operation.

Thanks in advance,
Dan

Sriranga(78yrsold)

unread,
Jan 22, 2013, 2:02:59 AM1/22/13
to ocr...@googlegroups.com
Have you checked with the "Issues"  of http://code.google.com/p/ocropus/issues/list ?
Presumed  you are testing with English lang or else?  Better to post under "issues" with attached sample image.



--
You received this message because you are subscribed to the Google Groups "ocropus" group.
To post to this group, send email to ocr...@googlegroups.com.
To unsubscribe from this group, send email to ocropus+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msg/ocropus/-/GxgC1kqoP1UJ.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Daniel Dalton

unread,
Jan 23, 2013, 5:01:17 AM1/23/13
to ocr...@googlegroups.com
On Tue, Jan 22, 2013 at 12:32:59PM +0530, Sriranga(78yrsold) wrote:
> Have you checked with the "Issues"� of
> [1]http://code.google.com/p/ocropus/issues/list ?
> Presumed� you are testing with English lang or else?� Better to post under
> "issues" with attached sample image.

Thanks. Ok, I solved that first problem. Now I either get a blank
output.html file for some images or assertion errors in other
cases. I'll examine that page again though.
Here are the assertion errors:
daniel@vostro-3350:~/Documents$ ocropus-recognize-book letter_monash.png
-o output1.html
book directory ./_book-008701
letter_monash.png -> ./_book-008701/0001.png

=== preprocess

# ocropus-nlbin ./_book-008701/????.png
=== ./_book-008701/0001.png 1
estimating skew angle
estimating thresholds
rescaling
./_book-008701/0001.png lo-hi (0.00 1.00) angle 0.0 no-normalization
writing

=== page segmentation

# ocropus-gpageseg ./_book-008701/????.bin.png
./_book-008701/0001.bin.png
./_book-008701/0001.bin.png: scale (5.91608) less than --minscale;
skipping

=== line recognition

# ocropus-lattices ./_book-008701/????/??????.bin.png
Traceback (most recent call last):
File "/usr/local/bin/ocropus-lattices", line 56, in <module>
args.files = ocrolib.glob_all(args.files)
File "/usr/local/lib/python2.7/dist-packages/ocrolib/toplevel.py",
line 204, in argument_checks
result = f(*args,**kw)
File "/usr/local/lib/python2.7/dist-packages/ocrolib/common.py", line
509, in glob_all
raise Exception("%s: expansion did not yield any files"%arg)
Exception: ./_book-008701/????/??????.bin.png: expansion did not yield
any files
Traceback (most recent call last):
File "/usr/local/bin/ocropus-recognize-book", line 79, in <module>
run(args.linerec,book+"/????/??????.bin.png",m=args.model)
File "/usr/local/bin/ocropus-recognize-book", line 57, in run
assert subprocess.call(args)==0
AssertionError


Thanks for your help.

Dan

Angelica

unread,
Mar 4, 2013, 9:13:32 AM3/4/13
to ocr...@googlegroups.com
Did you manage to solve the assertion problem? I'm getting the same output..

Nathan K

unread,
Apr 6, 2013, 1:28:50 PM4/6/13
to ocr...@googlegroups.com
Looks like the image is to small, causing no output. Thus subsequent stages will fail. 

# ocropus-gpageseg ./_book-008701/????.bin.png
./_book-008701/0001.bin.png
./_book-008701/0001.bin.png: scale (5.91608) less than --minscale;
skipping


--
You received this message because you are subscribed to the Google Groups "ocropus" group.
To post to this group, send email to ocr...@googlegroups.com.
To unsubscribe from this group, send email to ocropus+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.




--




Nathan Keilar
Hunted Hive Web Studio ~ Innovative Solutions For Real World Problems
Technical Director and Business Manager

EMAIL:                    i...@madteckhead.com
PHONE:                  +61 (0) 7 3040 3065
SKYPE/TWITTER:  https://twitter.com/#!/madteckhead
FACEBOOK:           http://www.facebook.com/nathan.keilar
WEB:                       http://madteckhead.com

This email (including any attachments) is confidential and may be privileged. If you have received it in error, please notify the sender by return email and delete this message from your system. Any unauthorised use or dissemination of this message in whole or in part is strictly prohibited. Please note that emails are susceptible to change and we will not be liable for the improper or incomplete transmission of the information contained in this communication nor for any delay in its receipt or damage to your system. We do not guarantee that the integrity of this communication has been maintained nor that this communication is free of viruses, interceptions or interference.

Tom

unread,
Apr 10, 2013, 1:25:29 AM4/10/13
to ocr...@googlegroups.com
You need to run OCRopus under an X11 window system (this is the default under Ubuntu), otherwise the various display functions used by the numerical libraries will not be able to initialize.

Tom
Reply all
Reply to author
Forward
0 new messages