Packaging OCRopus

11 views
Skip to first unread message

Étienne Bersac

unread,
Jun 29, 2007, 10:57:03 AM6/29/07
to ocr...@googlegroups.com
Hi all,

You know i'm developing gnome-scan. I want to provide an OCR sink plugin
for end august based on OCRopus. As usual, when i use a development
software dependancy, i build a deb package.

I got OCRopus up and running. This required a trivial patch to tesseract
SVN (see :
http://code.google.com/p/tesseract-ocr/issues/detail?id=36&can=2&q= ).
Would be nice to get it fixed soon. (Why not releasing it ?).

Also, OCRopus use Autotools + Jam. However, i don't see anyway to get a
tarball. Of course, make distcheck is useless here, but jam does not
provide an equivalent :x. Also, OCRopus does not provide any tarball.

So, before packaging, is suggest to distribute OCRopus itself. :) I'm
really waiting to integrate it well in gnome-scan !

Regards,
Étienne.
--
Verso l'Alto !

Thomas Breuel

unread,
Jun 30, 2007, 1:13:02 PM6/30/07
to ocr...@googlegroups.com
Hi,

thanks for your feedback. Keep in mind that OCRopus is still "pre-
alpha" precisely so that we can get feedback on the build system and
architecture.

On Jun 29, 2007, at 7:57 AM, Étienne Bersac wrote:

> I got OCRopus up and running. This required a trivial patch to
> tesseract
> SVN (see :
> http://code.google.com/p/tesseract-ocr/issues/detail?id=36&can=2&q= ).
> Would be nice to get it fixed soon. (Why not releasing it ?).

Ray Smith is the primary contact for Tesseract check-ins; could you
ping him again, please?

> Also, OCRopus use Autotools + Jam. However, i don't see anyway to
> get a
> tarball. Of course, make distcheck is useless here, but jam does not
> provide an equivalent :x. Also, OCRopus does not provide any tarball.

Well, there are two choices.

First, we could add the necessary targets to the Jamfile. What
would be needed in the Jamfile for easy Debian packaging?

Second, while we don't like using make for development work, creating
a separate automake-based build for the packaging should be pretty easy.

Which one would be better for you? Which one could you help with?

Another area we haven't decided on yet is how to turn OCRopus into a
shared library. There's the obvious, simple way of doing it on
Linux, but providing a separate plain-C interface and exposing that
as the shared library interface might be better (since it permits
direct calls from FFIs and avoid Windows DLL issues related to C++).
Any suggestions/input?

Cheers,
Thomas.


Étienne Bersac

unread,
Jun 30, 2007, 1:18:36 PM6/30/07
to ocr...@googlegroups.com
Hi,

I discuss this with my mentor. Two solution came in mind : either fix
the building with Jam to generate tarball or migration to automake.

I search for jam documentation and the official web site was very hard
to find for no gain compared to well documented automake. Also, i agree
autoconf leads to messy configure.ac, however, automake is quite good
and is complete (dist, distcheck, and friends).

I started writing full autotools build system for ocropus on top of SVN.
I will send the patch asap. Don't take it as an offense, but i find that
"make replacement" often forget automake and lead to such situation of
manual coding. I don't mean jam is the wrong solution at all, it's just
not suitable for autotools replacement yet.

Expects some patch in the near future. :)

Bill Janssen

unread,
Jun 30, 2007, 11:55:40 PM6/30/07
to ocr...@googlegroups.com
Here's another vote for automake instead of Jam.

Bill

Thomas Breuel

unread,
Jul 1, 2007, 7:52:47 PM7/1/07
to ocr...@googlegroups.com
I wrote the current automake configuration for Tesseract, so I'm familiar with automake. Believe me, I don't like using unusual tools for building, but automake just has too big a risk of producing incorrect output during day-to-day development for a project like OCRopus.

Jam can be a pita and it's not very well documented.  OTOH, it's simple, fast, mature, pretty widely used, and it usually does the right thing.  For example, you can change into a subdirectory, type "jam", and it will update the targets in that directory and anything they depend on, and it will do so quickly and correctly.

Right now, we aren't planning on "migrating" to automake; what we can do is try keep an automake configuration in OCRopus for the benefit of packagers, in addition to the regular jam-based builds, and see how that works.  This is, incidentally, also how automake is used in Tesseract: its primary developers use different build systems during development, and automake is just used by packagers.

So, any automake configuration for OCRopus should be kept extremely simple.  It should only produce the top-level targets that are of relevance to packagers and nothing else.  Please keep that in mind when trying to create an automake configuration--don't try to reproduce all the Jamfile functionality.

Cheers,
Thomas.

On 6/30/07, Étienne Bersac <bers...@laposte.net> wrote:

Étienne Bersac

unread,
Jul 2, 2007, 3:33:35 AM7/2/07
to ocr...@googlegroups.com
Hi Thomas,

I started the patch for adding building of libraries and ocropus. I have
two issues :

First, ocropus use e.g. #include "imgio.h" instead of #include
"../imgio/imgio.h" . I don't understand why and when it works or not.

Second, i have problem with linking ocropus with tesseract. I find some
odd "PartialLinking" in Jamfile i don't understand.

Also, you may notice the bug report i filed for tesseract + autoheader
bug ?
http://code.google.com/p/tesseract-ocr/issues/detail?id=39&can=2&q=

Please help.

automake.diff

Ilya Mezhirov

unread,
Jul 3, 2007, 1:41:07 PM7/3/07
to ocropus
Hi Étienne,

Thank you for the work!

> First, ocropus use e.g. #include "imgio.h" instead of #include
> "../imgio/imgio.h" . I don't understand why and when it works or not.

Yes. This works because ImportDir directives in Jamfiles provide
header paths both for Jam and gcc. ImportDirs have to be there anyway
to provide dependencies between directories, so they're used also for
the headers. There's a plan to use ImportDirs for the libraries, too.

> Second, i have problem with linking ocropus with tesseract. I find some
> odd "PartialLinking" in Jamfile i don't understand.

It's an old hack made to cope with abundance of tesseract libraries.
It can be rid of: just move all the -ltesseract_stuff into top-level
Jamrules and delete all the stuff about tesseract_all.o, replacing
LibraryFromObjects libtesseract.a : tesseract_all.o ;
with
Library libtesseract : tesseract.cc ;
I'd do that but maybe it's better to simply merge 11 Tesseract
libraries into one.

I'll have a closer look at your patch and bug report tomorrow.

Again thank you and good luck with your project.

Best wishes,
Ilya

Thomas Breuel

unread,
Jul 6, 2007, 1:41:55 AM7/6/07
to ocr...@googlegroups.com

I'd do that but maybe it's better to simply merge 11 Tesseract
libraries into one.

Well, having OCRopus build systems contain 11 Tesseract libraries just doesn't make much sense.  Keep in mind that those 11 libraries not only need to be listed by every software package using Tesseract, they also need to be installed in /usr/lib.

Tesseract should be a single library, and the best thing to do is to change the Tesseract build system to create a single library.

Until then, let's not change the OCRopus Jamfiles.  The automake stuff can list the Tesseract libraries individually if it likes.


Cheers,
Tom

Reply all
Reply to author
Forward
0 new messages