OCRopus 0.6: possibility for command-line use only & for operating with Tesseract

Martin Reynaert

unread,

Sep 19, 2012, 9:11:06 AM9/19/12

to ocr...@googlegroups.com

Hi,

In the framework of a project proposal I need to find out quickly:

whether OCRopus v. 0.6. can in fact still operate exclusively in command-line
mode (as on a non-GUI research server running Debian).

whether the option of choosing/employing Tesseract as the OCR engine is still a
possibility.

Any reply will be greatly appreciated!

Thank you!

Martin Reynaert
Researcher
Tilburg University
The Netherlands

Tom

unread,

Sep 21, 2012, 12:42:58 PM9/21/12

to ocr...@googlegroups.com, reyn...@tilburguniversity.edu

On Wednesday, September 19, 2012 6:35:04 AM UTC-7, Martin Reynaert wrote:

Hi,

In the framework of a project proposal I need to find out quickly:

whether OCRopus v. 0.6. can in fact still operate exclusively in command-line
mode (as on a non-GUI research server running Debian).

It can operate completely at the command line in principle. Some of the scripts currently attempt to connect to an X server, but that's easy to fix, or you can simply give them a dummy framebuffer (Xvfb :55; export DISPLAY=:55)

whether the option of choosing/employing Tesseract as the OCR engine is still a
possibility.

Yes, in the same way as always: it is used for line recognition. Right now, that's just done via a shell script (because of the Tesseract API changes), but eventually we'll integrate it via Python again.

Tom

Sriranga(78yrs)

unread,

Sep 20, 2012, 11:24:05 PM9/20/12

to ocr...@googlegroups.com

Reynaert,

OCRopus 0.6 features much simpler installation, fewer dependencies, and improved character recognition rates. This is the first all-Python release. - according to project home website. http://code.google.com/p/ocropus/

Yes.still operate exclusively in command-line only at present.

RE:option of choosing/employing Tesseract as the OCR engine is still a possibility. In view of the fact that ocropus
0.6 is all-python built - whereas tesseract is NOT all-python built - as such I doubt about possibility.
However ocropus 0.6 can trained by using box files trained in tesseract.

Only Prof.Tom (who is busy now) will have to clarify about using tesseract OCR engine.

With regards,
-sriranga(79yrs)

--
You received this message because you are subscribed to the Google Groups "ocropus" group.
To post to this group, send email to ocr...@googlegroups.com.
To unsubscribe from this group, send email to ocropus+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

zdenko podobny

unread,

Sep 25, 2012, 5:26:23 AM9/25/12

to ocr...@googlegroups.com, reyn...@tilburguniversity.edu

On Fri, Sep 21, 2012 at 6:42 PM, Tom <tmb...@gmail.com> wrote:

whether the option of choosing/employing Tesseract as the OCR engine is still a
possibility.

Yes, in the same way as always: it is used for line recognition. Right now, that's just done via a shell script (because of the Tesseract API changes), but eventually we'll integrate it via Python again.

In tesseract-ocr 3.02 (hopefully it will be released soon) there will be C-API, so it could be possible to use tesseract via ctypes (example is in svn[1])

There is also (maintained ;-) ) python wrapper for tesseract 3.0x[2].

[1] http://tesseract-ocr.googlecode.com/svn/trunk/contrib/tesseract-c_api-demo.py

[2] http://code.google.com/p/python-tesseract/

--
Zdenko