OCRopus Release 0.1.0 ("alpha release")

32 views
Skip to first unread message

Thomas Breuel

unread,
Oct 23, 2007, 11:25:12 AM10/23/07
to ocr...@googlegroups.com
Well, we have the first numbered release of OCRopus up on ocropus.org, the promised Alpha release.  As scheduled, it includes a lot of new functionality:
  • text/image segmentation
  • MLP-based character recognition
  • OpenFST-based statistical language modeling
  • more detailed layout information in the hOCR output
  • better testing and evaluation tools
  • some image cleanup, deskewing
  • Lua-based configuration and scripting
  • fast binary morphology
  • better code organization through namespaces, include file simplifications
  • code for alignment and training data generation from transcribed ground truth
This branch will be maintained as the 0.1 branch and main development is moving to 0.2, eventually resulting in the 0.5 release (beta release), planned for the end of Q1 2008.  New functionality will go largely only into 0.2.  We will be back-porting smaller, useful pieces of functionality to 0.1.

Note that while the MLP-based recognizer and the OpenFST language modeling work, they do not perform very well yet; we have just focussed on getting the functionality in there for now.

For the beta release, we will be focusing less on new functionality and more on getting higher quality output, better command line tools for training and testing, and bug fixing.

I'd like to thank everybody for their feedback, suggestions, and contributions, and in particular Daniel, Hagen, Faisal, Ilya, and Christian for the large amount of pre-release work.  We all hope that OCRopus will become increasingly useful to over the next year.

Cheers,
Thomas

for the OCRopus developers

Message has been deleted

Thomas Breuel

unread,
Oct 23, 2007, 12:15:13 PM10/23/07
to ocr...@googlegroups.com
Thanks. 

I was hoping to keep things low key until the beta release, since I think the beta release will be more widely useful and a lot easier to install.  And, perhaps most importantly, by the beta release, there will be a lot more documentation, making it easier for people to contribute.  So, Slashdot may not necessarily be the best place right now.

Cheers,
Thomas.

On 10/23/07, cma...@googlemail.com < cma...@googlemail.com> wrote:

Hi Thomas,
I've send a mail to Heise and Golem, do you (or someone else here)
have a Slashdot account?

Cheers,
Christian



Étienne Bersac

unread,
Oct 23, 2007, 12:18:33 PM10/23/07
to ocr...@googlegroups.com
Hi,

I would like to write a news for Linuxfr.org, but i don't know what MLP
mean. (can't find thru google either).

Please, Can you explain me MLP ?

Congrats for 0.1.0 !

Cheers,
Étienne.
--
E Ultreïa !

Étienne Bersac

unread,
Oct 23, 2007, 12:25:11 PM10/23/07
to ocr...@googlegroups.com
Ok, i won't post a news on linuxfr. Just a post ;)

Regards,
ÉTienne.
--

E Ultreïa !

Thomas Breuel

unread,
Oct 23, 2007, 12:39:49 PM10/23/07
to ocr...@googlegroups.com
"MLP" is "multi-layer perceptron" or "neural network".  OCRopus 0.1.0 contains a simple implementation as the first non-Tesseract character recognizer.

Cheers,
Thomas.

cma...@googlemail.com

unread,
Oct 24, 2007, 5:55:56 AM10/24/07
to ocropus
Hi,
here is the news from Golem (German):
http://www.golem.de/0710/55596.html

Cheers,
Christian

cma...@googlemail.com

unread,
Oct 24, 2007, 10:56:08 AM10/24/07
to ocropus
Reply all
Reply to author
Forward
0 new messages