OCRopus 0.5

911 views
Skip to first unread message

Tom

unread,
Jun 2, 2012, 5:24:36 PM6/2/12
to ocr...@googlegroups.com
OCRopus 0.5 was released a few weeks ago on Google Code.  There are a lot of changes relative to older versions:


- OCRopus has been completely refactored and now consists of a set of Python modules, with some native code modules.

- Unicode and ligature support should be fully working now.

- Language modeling still uses finite state transducers, but all finite state transducer code has been refactored into ocrofst.

- There is a completely new recognizer that performs much better than the old recognizer and scales to millions of training samples.

- Databases for training/testing have been changed from SQLite format to HDF5 (using PyTables).

- You can pull over everything you need for an install using a single command ("hg clone https://code.google.com/p/ocropus")


There are some videos on Google showing installation and training:


There is also some additional documentation here:


Image preprocessing and layout analysis are still basically the old versions from OCRopus.  They are still fairly sensitive to noise and will be replaced in future releases.

Tom

sriranga(79yrsold)

unread,
Jun 3, 2012, 8:18:19 AM6/3/12
to ocropus
Which version python should be used> python25 or python27 or python32
are available for download.for windows platform WinXP.


On Jun 3, 2:24 am, Tom <tmb...@gmail.com> wrote:
> OCRopus 0.5 was released a few weeks ago on Google Code.  There are a lot
> of changes relative to older versions:
>
> - OCRopus has been completely refactored and now consists of a set of
> Python modules, with some native code modules.
>
> - Unicode and ligature support should be fully working now.
>
> - Language modeling still uses finite state transducers, but all finite
> state transducer code has been refactored into ocrofst.
>
> - There is a completely new recognizer that performs much better than the
> old recognizer and scales to millions of training samples.
>
> - Databases for training/testing have been changed from SQLite format to
> HDF5 (using PyTables).
>
> - You can pull over everything you need for an install using a single
> command ("hg clonehttps://code.google.com/p/ocropus")
>
> There are some videos on Google showing installation and training:
>
>    http://www.youtube.com/playlist?list=PL8B1A3C55DD915896&feature=mh_lolz
>
> There is also some additional documentation here:
>
> https://docs.google.com/a/iupr.com/document/d/1RxXeuuYJRhrOkK8zcVpYtT...

Sriranga(78yrs)

unread,
Jun 3, 2012, 7:44:00 AM6/3/12
to ocr...@googlegroups.com
Which version python should be used>  python25/ python27/python32 are available for download.for windows platform.

--
You received this message because you are subscribed to the Google Groups "ocropus" group.
To view this discussion on the web visit https://groups.google.com/d/msg/ocropus/-/VL6raX9pO5wJ.
To post to this group, send email to ocr...@googlegroups.com.
To unsubscribe from this group, send email to ocropus+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/ocropus?hl=en.

Tom

unread,
Jun 3, 2012, 11:01:24 AM6/3/12
to ocr...@googlegroups.com
Python 2.7 should work.  It's not compatible with Python 3.

Porting to Windows requires a bit of work, though, since there is still some native code.

Tom

Sriranga(78yrsold)

unread,
Jun 3, 2012, 12:26:31 PM6/3/12
to ocr...@googlegroups.com, M.N.S.Rao
Tom,
I appreciate for your prompt reply. I hope that developer or programmer may help/assist to port to Windows in the interest of community.
I am interested to use occropus for Kannada project also under your valuable guidance.
With warmest regards,
-sriranga(79yrs)


Tom

--
You received this message because you are subscribed to the Google Groups "ocropus" group.
To view this discussion on the web visit https://groups.google.com/d/msg/ocropus/-/T_GjgRA4ZB4J.

Tom Morris

unread,
Jun 5, 2012, 10:28:23 AM6/5/12
to ocr...@googlegroups.com
Congratulations on the release! It's great to see progress being made.

For anyone who wants to install on earlier versions of Ubuntu, you can
find the necessary edits for the package names in my repository
http://code.google.com/r/tfmorris-ocropus-ubuntu-11-10-install-fixes/source/checkout

Where does this version of the code stand relative to production
quality code? Is it getting close or still a long way or ...? I know
that much of the recent effort has been put into
refactoring/reimplementation, but I can't see from the web site what
the overall plan is and how much work is left.

One reason that I'm asking is that a crude comparison to tesseract
seems to indicate that ocropus, in its current state, requires an
order of magnitude more resources without any improvement in
recognition accuracy. The FST stage of the processing, in particular,
seems incredibly resource heavy without really doing much improvement
of the raw text generated by earlier stages.

Tom

Tom

unread,
Jun 5, 2012, 4:02:02 PM6/5/12
to ocr...@googlegroups.com
Where does this version of the code stand relative to production
quality code?  Is it getting close or still a long way or ...?  

Right now, preprocessing and layout analysis are the least reliable parts: they work well on the kinds of documents they were designed for (books and journal articles scanned at 300-600dpi on a flatbed scanner), but noise, distortions, and other resolutions make them fail fairly easily.  

Commercial OCR systems achieve their performance through careful engineering and testing, but we don't have engineers that can do that.  Instead, we're looking to machine learning to solve these problems, and that makes the problem harder, but hopefully in the long run leads to better solutions. 

In any case, there are already a bunch of much improved modules in the pipeline that we are planning this year that should help make OCRopus more robust and suitable to more applications.
 
The FST stage of the processing, in particular,
seems incredibly resource heavy without really doing much improvement
of the raw text generated by earlier stages.

That's one of the reasons the language modeling has been refactored.  OCRopus 0.5 outputs its recognition lattices in a simple text file format now, making it easy to replace FST-based modeling entirely with other language modeling approaches.

Tom

Sriranga(78yrsold)

unread,
Jun 6, 2012, 4:25:06 AM6/6/12
to ocr...@googlegroups.com
Tried to install in ubuntu 12.04 -vide scripts attached. However it is noticed that alice.png did not display during test. I am unable to understand where I made mistake.  In fact this is 2nd attempt re- install using commands in terminal viz. (1)sudo sh ./ocroinst pacakages (2) sudo sh ./ocroinst install (3)sudo sh ./ocroinst dl (4) sudo ocropus  ocropy/tests/alice.png. Ufortunately alice.png did not displayed  similar to one in video http://www.youtube.com/playlist?list=PL8B1A3C55DD915896&feature=mh_lolz . Further  how to do  re: "IMPORTANT
You must add /usr/local/lib/ to your LD_LIBRARY_PATH variable and add /usr/local/bin to your PATH variable. "  displayed during the installation.

Where I made mistake.? Since  I am newbie to ocropus project - I seek valuable guidance.

With regards,
-sriranga(79yrs)

--
You received this message because you are subscribed to the Google Groups "ocropus" group.
To view this discussion on the web visit https://groups.google.com/d/msg/ocropus/-/VL6raX9pO5wJ.
ocropus -terminal display
typescript

79yrsold

unread,
Jun 9, 2012, 6:09:50 AM6/9/12
to ocropus, 79yrs old
waiting for valuable guidance - for which I shall be thankful to you.
-sriranga(79yrs)

On Jun 6, 1:25 pm, "Sriranga(78yrsold)" <withblessi...@gmail.com>
wrote:
> > command ("hg clonehttps://code.google.com/p/ocropus")
>
> > There are some videos on Google showing installation and training:
>
> >http://www.youtube.com/playlist?list=PL8B1A3C55DD915896&feature=mh_lolz
>
> > There is also some additional documentation here:
>
> >https://docs.google.com/a/iupr.com/document/d/1RxXeuuYJRhrOkK8zcVpYtT...
>
> > Image preprocessing and layout analysis are still basically the old
> > versions from OCRopus.  They are still fairly sensitive to noise and will
> > be replaced in future releases.
>
> > Tom
>
> > --
> > You received this message because you are subscribed to the Google Groups
> > "ocropus" group.
> > To view this discussion on the web visit
> >https://groups.google.com/d/msg/ocropus/-/VL6raX9pO5wJ.
> > To post to this group, send email to ocr...@googlegroups.com.
> > To unsubscribe from this group, send email to
> > ocropus+u...@googlegroups.com.
> > For more options, visit this group at
> >http://groups.google.com/group/ocropus?hl=en.
>
>
>
>  ocropus -terminal display
> < 1KViewDownload
>
>  typescript
> 176KViewDownload

Tom

unread,
Jun 10, 2012, 11:08:52 PM6/10/12
to ocr...@googlegroups.com
Well, the reason it's not working is because you didn't follow these steps:

add /usr/local/lib/ to your LD_LIBRARY_PATH variable 
add /usr/local/bin to your PATH variable

How you do that depends on how you log in.  Probably something like:

export LD_LIBRARY_PATH=/usr/local/lib:/usr/lib:/lib
export PATH="/usr/local/bin:$PATH"

will work.

Tom

Sriranga(78yrsold)

unread,
Jun 11, 2012, 5:44:17 AM6/11/12
to ocr...@googlegroups.com
Tom,
Thanks for the valuable guidance. followed your instructions. but still alice.png  did not generate. Where I made mistake.? typescript.txt is attached for your examination and further guidance.
With Warmest regards,
-sriranga(79yrs)

--
You received this message because you are subscribed to the Google Groups "ocropus" group.
To view this discussion on the web visit https://groups.google.com/d/msg/ocropus/-/siv8P_9COEgJ.
typesript.txt

Sriranga(78yrs)

unread,
Jun 13, 2012, 7:47:12 AM6/13/12
to ocr...@googlegroups.com
Tom.
tested again same problem. extract of terminal is reproduced below:
dell1@ubuntu:~/ocropus$ ocropus ocropy/tests/alice.png
book directory _book-001997
# ocropus-preproc -o _book-001997 ocropy/tests/alice.png
=== ocropy/tests/alice.png 1 (600, 1200)
(601, 1200) (601, 1200)
# writing _book-001997/0001 (601, 1200) (601, 1200)
# ocropus-prast _book-001997
=== _book-001997/0001.png
# loading _book-001997/0001.bin.png
# segmenting
# writing 12 lines
# ocropus-lattices _book-001997
adding 11 files from _book-001997
added 0 files directly
Traceback (most recent call last):
  File "/usr/local/bin/ocropus-lattices", line 82, in <module>
    cmodel = ocrolib.load_component(ocrolib.ocropus_find_file(options.model))
  File "/usr/local/lib/python2.7/dist-packages/ocrolib/common.py", line 716, in load_component
    return pickle.load(stream)
EOFError
exit 1
dell1@ubuntu:~/ocropus$ 

Awaiting further guidance.
-sriranga(79yrs)

sriranga(79yrsold)

unread,
Jun 7, 2012, 11:27:17 PM6/7/12
to ocropus
no help/guidance is forthcoming?

On Jun 6, 1:25 pm, "Sriranga(78yrsold)" <withblessi...@gmail.com>
wrote:
> > command ("hg clonehttps://code.google.com/p/ocropus")
>
> > There are some videos on Google showing installation and training:
>
> >http://www.youtube.com/playlist?list=PL8B1A3C55DD915896&feature=mh_lolz
>
> > There is also some additional documentation here:
>
> >https://docs.google.com/a/iupr.com/document/d/1RxXeuuYJRhrOkK8zcVpYtT...
>
> > Image preprocessing and layout analysis are still basically the old
> > versions from OCRopus.  They are still fairly sensitive to noise and will
> > be replaced in future releases.
>
> > Tom
>
> > --
> > You received this message because you are subscribed to the Google Groups
> > "ocropus" group.
> > To view this discussion on the web visit
> >https://groups.google.com/d/msg/ocropus/-/VL6raX9pO5wJ.
> > To post to this group, send email to ocr...@googlegroups.com.
> > To unsubscribe from this group, send email to
> > ocropus+u...@googlegroups.com.
> > For more options, visit this group at
> >http://groups.google.com/group/ocropus?hl=en.
>
>
>

Sriranga(78yrs)

unread,
Jun 8, 2012, 3:04:16 AM6/8/12
to ocr...@googlegroups.com
Early solution/guidance for my problem is requested.
-sriranga(79yrs)

Tom

unread,
Jul 18, 2012, 2:32:13 AM7/18/12
to ocr...@googlegroups.com
An EOF Error means that some file got truncated.  You probably interrupted the download of the model files somewhere (they are pretty big) or ran out of disk space.

Remove /usr/local/share/ocropus/*.cmodel and run the model downloading step again from ocroinst.


Tom

Sriranga(78yrsold)

unread,
Jul 18, 2012, 3:47:28 AM7/18/12
to ocr...@googlegroups.com
Tom,
since i have installed latest version 0.5.4, now works fine - no problem at present. For my knowledge - it is presumed that to remove " cmodel ", the command line should be as follows:
" remove /usr/local/share/ocropus/*.cmodel " which may kindly be confirmed.
With regards,
-sriranga(79yrs)

To unsubscribe from this group, send email to ocropus+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msg/ocropus/-/zKC7N--cNPcJ.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Reply all
Reply to author
Forward
0 new messages