Tesseract RPMs

256 views
Skip to first unread message

admin

unread,
Jun 14, 2008, 8:16:35 PM6/14/08
to tesser...@googlegroups.com
I've just built RPM packages of Tesseract 2.03 and Tessdata 2.00
(english only at the moment) for CentOS 5.1.
Both have installed OK and work, but the process hasn't been 100%
squeaky clean. So a few questions which someone may be able to answer:

- Is it absolutely necessary to install the dummy tessdata language
files in order for tesseract to run? Can the tessdata directory be left
empty in an initial install. Attempting to install real tessdata files
via an RPM package produces an error along the lines of "files already
installed and I'm not overwriting them". Using the -force option works
but isn't optimal.

- Are all the various *.a, *.h and *.ccp files installed necessary for
normal operation of tesseract, or could some be considered 'development
' files that could be separated into a tesseract-devel RPM? You can see
I'm no C programmer.

- The tesseract-2.00-eng.tar.gz packages (and presumably the other
language data packs) untar to a "tessdata" directory, which breaks the
rpmbuild process because of the change in directory name (tesseract ->
tessdata). There may be a way of fiddling this in the spec file, but do
the developers have any issues with me renaming tessdata packages along
these lines:

tesseract-2.00-eng(.tar.gz) -> tessdata-eng-2.00(.rpm)

- I'd like to include a useful man page on usage - any tips on where I
could find some up to date info to flesh this out?

- Finally, any other tips/comments/suggestions regarding tesseract and
RPM packages?

When I have quality packages available I am of course happy to share!

Thanks
Mick

admin

unread,
Jun 14, 2008, 11:47:14 PM6/14/08
to tesser...@googlegroups.com
admin wrote:

With a bit more research I have answered some of these myself:


> - Is it absolutely necessary to install the dummy tessdata language
> files in order for tesseract to run? Can the tessdata directory be left
> empty in an initial install. Attempting to install real tessdata files
> via an RPM package produces an error along the lines of "files already
> installed and I'm not overwriting them". Using the -force option works
> but isn't optimal.
>

I'm now using %pre and %postun (pre install and post uninstall scripts)
in the RPM spec file to move the original files out of the way on
installation of real tessdata RPM, and back again if it is uninstalled.


> - The tesseract-2.00-eng.tar.gz packages (and presumably the other
> language data packs) untar to a "tessdata" directory, which breaks the
> rpmbuild process because of the change in directory name (tesseract ->
> tessdata). There may be a way of fiddling this in the spec file, but do
> the developers have any issues with me renaming tessdata packages along
> these lines:
>
> tesseract-2.00-eng(.tar.gz) -> tessdata-eng-2.00(.rpm)
>

The broken rpmbuild process can be fixed with an "-n" option to %prep in
the spec file.
I'm still going with the modified package name.

The other questions are still standing at this stage! :-)

Mick

admin

unread,
Jun 14, 2008, 11:59:53 PM6/14/08
to tesser...@googlegroups.com
OK! I just discovered Andrew Ziem's tesseract.spec file in the tarball.
That answers all questions apart from the source for man page info.
I'll hack Andrew's spec file into shape for CentOS 5.1, shouldn't need
much work at all.

Mick

Alberto Lusiani

unread,
Jun 25, 2008, 5:22:33 PM6/25/08
to tesseract-ocr
I did not notice the .spec file in the sources but found on the web
Fedora .src.rpm
files for tesseract, which I easily adapted for CentOs 5. If it can be
of any help for your work,
please have a look at http://sites.google.com/site/alusiani/software/tesseract
.
I edited the Fedora .src.rpm to remove the eng language from the
tesseract rpm,
and I moved the eng language data to the common langpack .src.rpm
file,
which produces a .rpm file for each language.

On Jun 15, 5:59 am, admin <m...@mjhall.org> wrote:
> OK! I just discovered Andrew Ziem's tesseract.spec file in the tarball.
[...]

Greetings!
--
Alberto

admin

unread,
Jun 26, 2008, 9:33:43 AM6/26/08
to tesser...@googlegroups.com
In the tesseract-2.03.tar.gz package I found a spec file named
"tesseract.spec". This worked for me with a little fiddling.

So I'm good now, but I'll try your packages when I next get a chance!

Thanks

Reply all
Reply to author
Forward
0 new messages