whether -Alpha Ocropus or latest one has been released?

54 views
Skip to first unread message

74yrsold

unread,
Oct 9, 2007, 3:18:52 AM10/9/07
to ocropus
Whether Ocropus has been released and if so where? whether source
code available to build (VC++6) in MSwindows just like tesseractocr?
Is there any provision to train any one of Indian languages(indic)?

Christian Kofler

unread,
Oct 9, 2007, 9:43:07 AM10/9/07
to ocropus
Hi,

we are aiming for the Alpha Release on 19th of October.
It is not planned to provide a Visual Studio project file as in
tesseract.
A clean Windows port is planned after Alpha.

Cheers,

Christian

Craig

unread,
Oct 9, 2007, 7:10:52 PM10/9/07
to ocropus
Hi,

I managed to get an earlier checkout of ocorpus to build using visual
studio 2005. However I did not do it in a nice way, more of a hack
just to
see how ocropus went.

The biggest problem was getting aspell to work on Windows. I
eventually
found a project on source forge that had done most of the work on
Aspell:
http://sourceforge.net/projects/descdatadiary - however it did not
build on
latest visual studio, but with a few changes (modifying some STL
headers...)
i got it to work.

With ocropus code the problems I encounted:

1. the multiple directory, library structure is really hard work to
port.
It is much easier to just copy all the source files into one dir and
forget
trying to build multiple libs. There are not that many files - it
really
is not a problem to have them all in one dir and is so much easier to
setup
and maintain.

2. in a dozen or so places ocroupus uses code like :
int size = somecalc();
char buf[size];

which will not compile on Visual studio. I had to change this to
use
'new' and 'delete'.

3. problems with min(), max(), overloads.

4. problems with isalpha()

5. includes of gnu specific headers (e.g. unistd.h)


And a few others!!


Regards,
Craig Broadbear

On Oct 9, 11:43 pm, Christian Kofler <christian.kof...@googlemail.com>
wrote:

> > Is there any provision to train any one of Indian languages(indic)?- Hide quoted text -
>
> - Show quoted text -

Thomas Breuel

unread,
Oct 10, 2007, 3:21:23 AM10/10/07
to ocr...@googlegroups.com
Thanks for your report of your experience with porting OCRopus to Windows.   Generally, if you give us specific files/line numbers and/or patches, that will make it much easier for us to accomodate Windows and incorporate this into the source tree.

The biggest problem was getting aspell to work on Windows.   I

Aspell will be removed from OCRopus in the near future since it's not needed anymore.

With ocropus code the problems I encounted:

1. the multiple directory, library structure is really hard work to
port.

Well, this is the usual way for organizing big projects and people are relying on the directory structure in many ways during development.

VisualStudio has no problems dealing with multi-directory projects, someone just needs to create a build file; we'll be happy to include that in the distribution if you submit something.

2. in a dozen or so places ocroupus uses code like :
    int size = somecalc();
    char buf[size];

   which will not compile on Visual studio.   I had to change this to
use 'new' and 'delete'.

Can you give us line numbers?  There is no easy other way of finding those since GNU C++ stopped warning about VLAs in C++ code some versions ago.

3. problems with min(), max(), overloads.
4. problems with isalpha()

Again, for that, we need to know specifics: line numbers etc.  It would also be useful if you can figure out whether the problem is a bug in VisualStudio or GNU C++ overloading resolution and what the problem is so that we can find a reasonable workaround that works for everything (and submit the appropriate bug reports).

5. includes of gnu specific headers  (e.g. unistd.h)

unistd.h is not a GNU specific header, it's a  POSIX/XOpen header and supported on most operating systems.  On Windows, it should be supported through its POSIX APIs or Cygwin.  In most places, it doesn't look like unistd.h is needed; it would help if you could find out where unistd.h is included unnecessarily and submit the file names where it's included unnecessarily.

In some places, unistd.h is needed; in those cases, the solution will likely be to move all the POSIX dependent code into a single source file and have two versions, a Windows version and a POSIX version.  Again, if you can submit a patch that does that, that would speed up things.

Cheers,
Thomas.

lakshmesha

unread,
Oct 18, 2007, 7:34:44 PM10/18/07
to ocropus
Hi Craig,
I am working to port ocropus to windows using Microsoft visual studio
2005.
Currently I am having problems with porting the ocr-utils to windows.
I am not sure how to port the process related code like (eg: pid_t,
signal, wait, fork) to windows.
I am finding it difficult to port the ocropus dependencies too
(ocropus-external).

As you have mentioned below that you were able to compile it on
windows.
Can you help me on the above two problems that I am facing?

Regards,
Lakshmesha

Ilya Mezhirov

unread,
Oct 19, 2007, 4:53:05 AM10/19/07
to ocropus
Hi Lakshmesha,

The wait/fork stuff is actually not used, langmod-ispell can be
removed.
The signal stuff that catches crashes of Tesseract can also be removed
in the Windows port.

Ilya

Thomas Breuel

unread,
Oct 19, 2007, 10:16:41 AM10/19/07
to ocr...@googlegroups.com
Hi Ilya,

could you please remove that stuff or put it in as an issue?

Thanks,
Thomas.
Reply all
Reply to author
Forward
0 new messages