next release

Tom Breuel

unread,

Mar 3, 2010, 8:17:46 PM3/3/10

to ocropus

We're preparing for the next release. The release consists of the
following components:

* iulib -- basic image processing
* ocropus -- OCR-specific functionality (libraries and some
command line programs)
* ocroswig -- bindings of iulib and ocropus to Python
* ocropy -- Python library and command line tools
* pyopenfst -- Python bindings of the OpenFST library

Please see the InstallTranscript to see how this is installed.

There is plenty of new functionality:

* all recognition can now be carried out from Python
* there are top-level commands for recognition and training
written in Python
* classifiers now can cope with large character sets
* there are tools for clustering and correcting character shapes
* there is support for ligatures
* there are numerous bug fixes
* training is possible on very large datasets (many millions of
samples)

We will be calling this release 0.4.4, since there is still some
functionality missing for what we want to call 0.5:

* the Python tools do not yet do a good job at upper/lower case
modeling (but we have good prototype code that just needs to be
integrated)
* the language models need to be tested and improved
* we need to integrate the book-adaptive recognition tools into
the Python code
* Unicode support needs to be integrated into the Python loops
* the main loop of the RAST layout analysis will be rewritten in
Python
* there will be some new layout analysis that works for distorted
pages
* we need to integrate our orientation detection and text/image
segmentation code
* we want to get rid of the makefiles

Install instructions are here:

http://code.google.com/p/ocropus/wiki/InstallTranscript

Tom
We'll probably provide a single tarball

74yrs old

unread,

Mar 4, 2010, 12:15:56 AM3/4/10

to ocr...@googlegroups.com

Trust it will work in ubuntu 9.04? Hope that tarball will contains sample datasets of English for hands on experience by the newbies.

--
You received this message because you are subscribed to the Google Groups "ocropus" group.
To post to this group, send email to ocr...@googlegroups.com.
To unsubscribe from this group, send email to ocropus+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/ocropus?hl=en.

74yrs old

unread,

Mar 6, 2010, 9:28:21 PM3/6/10

to ocr...@googlegroups.com

Prof.Tom Breuel,
Will you kindly intimate the approximate date on which ocropus likely to be released and also intimate me the address of website under which tarbal for download is available.
Hope Tarbal will contain sample datasets of English for benefit of users and enable me to emulate for Kannada project on the lines of sample datasets provided by you.
Wishing you All the Best Wishes and Luck,
-sriranga(77yrsold)

Tom

unread,

Mar 7, 2010, 8:58:37 AM3/7/10

to ocr...@googlegroups.com

Hi,

essentially, the next release is already out; just follow the instructions on the web site:

http://code.google.com/p/ocropus/wiki/InstallTranscript

There will be some more Python scripts (e.g., for other ways of training the recognizer) and more comments in the Python code.

Documentation and examples will take a little longer, but I hope that the use of Python already helps.

Tom

Bob Gustafson

unread,

Mar 8, 2010, 2:13:20 AM3/8/10

to ocr...@googlegroups.com

On Sun, Mar 7, 2010 at 7:58 AM, Tom <tmb...@gmail.com> wrote:

Hi,

essentially, the next release is already out; just follow the instructions on the web site:

http://code.google.com/p/ocropus/wiki/InstallTranscript

Since the files are kept in a Mercurial repository now instead of subversion, the file:

generate_version_cc.sh

Needs to be changed to take its version from the Mercurial repository information:

export VERSION_STAMP="`hg identify -n` `date +%Y-%m-%d` `uname -o -i`"

Bob G

Reply all

Reply to author

Forward