OCRopus Release 0.1.0 ("alpha release")

Thomas Breuel

unread,

Oct 23, 2007, 3:25:12 PM10/23/07

to ocr...@googlegroups.com

Well, we have the first numbered release of OCRopus up on ocropus.org, the promised Alpha release. As scheduled, it includes a lot of new functionality:

text/image segmentation
MLP-based character recognition
OpenFST-based statistical language modeling
more detailed layout information in the hOCR output
better testing and evaluation tools
some image cleanup, deskewing
Lua-based configuration and scripting
fast binary morphology
better code organization through namespaces, include file simplifications
code for alignment and training data generation from transcribed ground truth

This branch will be maintained as the 0.1 branch and main development is moving to 0.2, eventually resulting in the 0.5 release (beta release), planned for the end of Q1 2008. New functionality will go largely only into 0.2. We will be back-porting smaller, useful pieces of functionality to 0.1.

Note that while the MLP-based recognizer and the OpenFST language modeling work, they do not perform very well yet; we have just focussed on getting the functionality in there for now.

For the beta release, we will be focusing less on new functionality and more on getting higher quality output, better command line tools for training and testing, and bug fixing.

I'd like to thank everybody for their feedback, suggestions, and contributions, and in particular Daniel, Hagen, Faisal, Ilya, and Christian for the large amount of pre-release work. We all hope that OCRopus will become increasingly useful to over the next year.

Cheers,
Thomas

for the OCRopus developers

Message has been deleted

Thomas Breuel

unread,

Oct 23, 2007, 4:15:13 PM10/23/07

to ocr...@googlegroups.com

Thanks.

I was hoping to keep things low key until the beta release, since I think the beta release will be more widely useful and a lot easier to install. And, perhaps most importantly, by the beta release, there will be a lot more documentation, making it easier for people to contribute. So, Slashdot may not necessarily be the best place right now.

Cheers,
Thomas.

On 10/23/07, cma...@googlemail.com < cma...@googlemail.com> wrote:

Hi Thomas,
I've send a mail to Heise and Golem, do you (or someone else here)
have a Slashdot account?

Cheers,
Christian

Étienne Bersac

unread,

Oct 23, 2007, 4:18:33 PM10/23/07

to ocr...@googlegroups.com

Hi,

I would like to write a news for Linuxfr.org, but i don't know what MLP
mean. (can't find thru google either).

Please, Can you explain me MLP ?

Congrats for 0.1.0 !

Cheers,
Étienne.
--
E Ultreïa !

Étienne Bersac

unread,

Oct 23, 2007, 4:25:11 PM10/23/07

to ocr...@googlegroups.com

Ok, i won't post a news on linuxfr. Just a post ;)

Regards,
ÉTienne.
--

E Ultreïa !

Thomas Breuel

unread,

Oct 23, 2007, 4:39:49 PM10/23/07

to ocr...@googlegroups.com

"MLP" is "multi-layer perceptron" or "neural network". OCRopus 0.1.0 contains a simple implementation as the first non-Tesseract character recognizer.

Cheers,
Thomas.

cma...@googlemail.com

unread,

Oct 24, 2007, 9:55:56 AM10/24/07

to ocropus

Hi,
here is the news from Golem (German):
http://www.golem.de/0710/55596.html

Cheers,
Christian

cma...@googlemail.com

unread,

Oct 24, 2007, 2:56:08 PM10/24/07

to ocropus

Hi,
here is another news post from ars technica:
http://arstechnica.com/news.ars/post/20071024-hands-on-with-googles-ocropus-open-source-scanning-software.html

Cheers,
Christian

Reply all

Reply to author

Forward