Create searchable pdfs

370 views
Skip to first unread message

Harald Heigl

unread,
Apr 11, 2013, 6:34:28 AM4/11/13
to ocr...@googlegroups.com
I want to create a searchable pdf on my Fedora machine, I've installed ocropus 0.7 from source. It works so far I can see (run_test sucessful).
I've found some very old threats about converting to searchable pdfs, like this one: https://groups.google.com/forum/?fromgroups=#!searchin/ocropus/searchable/ocropus/sBobkt8z_4w/zueGRpQoj68J

Can anyone explain (with commands line per line), how to get a scanned jpg to a searchable pdf?

Thanks in advance!
Harald

Harald Heigl

unread,
Apr 15, 2013, 8:17:58 AM4/15/13
to ocr...@googlegroups.com
Okay I think I'm coming closer, since ocropus-recognize-book doesn't exist any more, I'm using something a modified "run-test"-script:

ocropus
-nlbin "$1" -o temp
ocropus
-sauvola "$1" -o temp
ocropus
-gpageseg 'temp/????.bin.png'
ocropus
-rpred 'temp/????/??????.bin.png'
ocropus
-hocr 'temp/????.bin.png' -o temp.html
ocropus
-visualize-results temp
ocropus
-gtedit html temp/????/??????.bin.png -o temp-correction.html




But there it exits at ocropus-rpred with:
Traceback (most recent call last):
 
File "/usr/bin/ocropus-rpred", line 84, in <module>
    network
= ocrolib.load_object(args.model,verbose=1)
 
File "/usr/lib/python2.7/site-packages/ocrolib/common.py", line 428, in load_object
    fname
= ocropus_find_file(fname)
 
File "/usr/lib/python2.7/site-packages/ocrolib/common.py", line 673, in ocropus_find_file
   
raise OcropusFileNotFound(fname)
ocrolib
.common.OcropusFileNotFound: <OcropusFileNotFound uw3unlv.pyrnn.gz>


What am I missing?


Tom

unread,
Apr 19, 2013, 12:37:54 AM4/19/13
to ocr...@googlegroups.com
It looks like you haven't downloaded and installed the model files. Did you follow the instructions in the README?

Tom

Harald Heigl

unread,
Apr 22, 2013, 6:30:36 AM4/22/13
to ocr...@googlegroups.com
Oh, I did download the model files!
But that was a good point, I started searching arround ...

The install-script seems to install the models into /usr/share/ocropus, but in /usr/lib/python2.7/site-packages/ocrolib/common.py (line 647) you can read:
data_paths = [
    ".",
    "./models",
    "./data",
    "./gui",
    "/usr/local/share/ocropus/models",
    "/usr/local/share/ocropus/data",
    "/usr/local/share/ocropus/gui",
    "/usr/local/share/ocropus",
]

If I change /usr/local/share to /usr/share it works so far and models are found.

Bug or feature?


But so far it seems to work, at least I can start playing around ...

Tom

unread,
Apr 22, 2013, 1:22:49 PM4/22/13
to ocr...@googlegroups.com
The install script just uses the Python distutils target directory. Looks like Fedora makes a different choice there than Ubuntu.

Your best short term solution is to make a symbolic link.

I'll add the path to the OCRopus search path.

Tom

cf...@web.de

unread,
Dec 10, 2013, 4:21:52 AM12/10/13
to ocr...@googlegroups.com
Hello Tom,

how can i create a pdf after calling this:


ocropus-gtedit html temp/????/??????.bin.png -o temp-correction.html

Regards Claus
Reply all
Reply to author
Forward
0 new messages