> I am forwarding this communication to developer forums.
> Contribution from somebody with experience in this area is to be
> warmly welcomed.
I'm not an expert in this area but I understand what it entails and the
benefits it brings. If there's one project that needs it badly, it's
this one so kudos for bringing the issue to the table. :)
One guy who is an expert is Diego "Flameeyes" Pettenò. He's extremely
busy with other projects though so I'd prefer that you didn't bother
him but his blog is an invaluable resource on many topics including
this one. Just search for "visibility".
Regards,
James
Just to get an idea of the size of the visibility issue, I tried the
suggestion from the GCC Visibility article [1], and got:
$ nm -C -D libtesseract.so | wc -l
5846
$ nm -C -D liblept.so | wc -l
2218
where the libtesseract was built a week or so ago. Do you really
need/want 5800+ dynamic symbols exported? For liblept, that number is
a direct reflection of what's listed in leptprotos.h, so presumably
all those symbols really are relevant.
I've started reading "How to Write Shared Libraries" [2], it's pretty
slow-going, and since Ulrich isn't, apparently, a native-english
speaker some sentences need to be read a few times to understand what
he means.
Other useful references are:
+ The GCC documentation on the -fvisibility option is at "3.18 Options for
Code Generation Conventions" [3]
"Set the default ELF image symbol visibility to the specified
option—all symbols will be marked with this unless overridden within
the code. Using this feature can very substantially improve linking
and load times of shared object libraries, produce more optimized
code, provide near-perfect API export and prevent symbol clashes. It
is strongly recommended that you use this in any shared objects you
distribute."
+ The GCC documentation on the "visibility" attribute is at "6.30
Declaring Attributes of Functions" [4].
-- Tom
[1] http://gcc.gnu.org/wiki/Visibility
[2] http://people.redhat.com/drepper/dsohowto.pdf
[3] http://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html
[4] http://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html
By contrast, for Windows:
>dumpbin /exports libtesseract302.dll
Microsoft (R) COFF/PE Dumper Version 9.00.30729.01
Copyright (C) Microsoft Corporation. All rights reserved.
Dump of file libtesseract302.dll
File Type: DLL
Section contains the following exports for libtesseract302.dll
00000000 characteristics
4F45AA1B time date stamp Wed Feb 22 18:53:15 2012
0.00 version
1 ordinal base
128 number of functions
128 number of names
It's not surprising that only tesseract.exe can successfully link with
libtesseract302.dll, and that all the training apps can currently only
be linked with the static libraries.
-- Tom
Yes. That's why the changes needed are minor (except for the fact that
someone has to figure out which classes to mark). Remember that making
classes explicitly visible will of course change the list of "public"
header files that need to be copied to the "public" include directory
--- for those people (normally Windows developers) who don't have the
tesseract source or don't want to bother adding all the tesseract
sub-dirs to their list of include dirs.
The first example in [1] seems dangerous to me because it is marking
up declarations even for static libraries (which I recall reading
somewhere is a bad idea, probably the informative blog [2] James Le
Cuirot mentioned).
The second example in [1] seems clumsy to me since you have to #define
both FOX_DLL and FOX_DLL_EXPORTS. I like the current technique that
both tesseract and leptonica follow which is to say x_EXPORTS,
x_IMPORTS or neither to indicate static library creation/use.
So the following lines in api/baseapi.h:
#ifdef TESSDLL_EXPORTS
#define TESSDLL_API __declspec(dllexport)
#elif defined(TESSDLL_IMPORTS)
#define TESSDLL_API __declspec(dllimport)
#else
#define TESSDLL_API
#endif
need to be changed to something like (I'll test this tomorrow to be sure):
#if defined(_WIN32) || defined(__CYGWIN__)
#if defined(TESS_EXPORTS)
#define TESS_API __declspec(dllexport)
#elif defined(TESS_IMPORTS)
#define TESS_API __declspec(dllimport)
#else
#define TESS_API
#endif
#define TESS_LOCAL
#else
#if __GNUC__ >= 4
#if defined(TESS_EXPORTS) || defined(TESS_IMPORTS)
#define TESS_API __attribute__ ((visibility ("default")))
#define TESS_LOCAL __attribute__ ((visibility ("hidden")))
#else
#define TESS_API
#define TESS_LOCAL
#endif
#else
#define TESS_API
#define TESS_LOCAL
#endif
#endif
(Where I got rid of the DLL part of the macros because unix doesn't
use the term)
(I'm also not sure how this effects MinGW builds?)
We could commit the baseapi.h changes (including those below)
immediately to the repository because it doesn't really change
anything until you do the next steps.
In make files you either have to define TESS_EXPORTS when building a
DLL or Shared library, TESS_IMPORTS when linking with a DLL or shared
library, or neither when building or linking with a static library.
Additionally, on unix you then have to start using -fvisibility=hidden
and -fvisibility-inlines-hidden on shared library builds (still not
sure what effect, if any, those flags have on static libraries). Once
you do this then *only* objects marked with TESS_API will be visible
in shared libraries (on Windows this already happens automatically
which is what is causing all the portability problems).
Now for only *declarations* in the header files (you don't have to
change definitions at all) you need to add TESS_API to all things you
want to make visible in share libraries/DLLs.
So for example, in api/baseapi.h you just have to change:
class TESSDLL_API TessBaseAPI {
to
class TESS_API TessBaseAPI {
to make the entire TessBaseAPI Class visible in shared libraries. Use
the TESS_LOCAL macro with TessBaseAPI members you then *don't* want
exported.
and ccutil/strngs.h needs to go from:
class CCUTIL_API STRING
to
class TESS_API STRING
If you decide, for example, you want to make the entire PageIterator
Class visible, change ccmain/pageiterator.h from:
class PageIterator {
to:
class TESS_API PageIterator {
-- Tom
[1] http://gcc.gnu.org/wiki/Visibility
[2] http://blog.flameeyes.eu/tag/visibility and http://blog.flameeyes.eu/tag/elf
Actually api/baseapi.h is probably not the place to put these changes.
Maybe move it to api/apitypes.h instead? And then you also have to
include apitypes.h in any header that needs to use TESS_API (which is
why using baseapi wasn't a good idea).
I should reiterate that I'm no "expert" in any of this. Whoever does
this better understand all the implications themselves. In particular,
I have no idea what impact the "Problems with C++ exceptions (please
read!)" section of [1] will have.
[1] http://gcc.gnu.org/wiki/Visibility
-- Tom
Since I'm not familiar with all the details of
autoconf/automake/libtools, I had a hard time figuring out how to
change the various Makefile.am files. I attached a diff of all my
changes but someone else should really figure out the correct changes
to do. I essentially added the following to the AM_CPPFLAGS of all the
"convenience library" Makefile.am's
-DTESS_EXPORTS -fvisibility=hidden -fvisibility-inlines-hidden
But doing this seems to not only affect building the shared libraries
but the static ones also:
/bin/bash ../libtool --tag=CXX --mode=compile g++ -DHAVE_CONFIG_H
-I. -I.. -DUSE_STD_NAMESPACE -I../ccutil -I../ccstruct -I../image
-I../viewer -I../classify -I../dict -I../wordrec -I../cutil
-I../neural_networks/runtime -I../cube -I../textord
-DTESS_EXPORTS -fvisibility=hidden -fvisibility-inlines-hidden
-I/usr/local/include/leptonica
-g -O2 -MT tfacepp.lo -MD -MP -MF .deps/tfacepp.Tpo
-c -o tfacepp.lo tfacepp.cpp
libtool: compile: g++ -DHAVE_CONFIG_H
-I. -I.. -DUSE_STD_NAMESPACE -I../ccutil -I../ccstruct -I../image
-I../viewer -I../classify -I../dict -I../wordrec -I../cutil
-I../neural_networks/runtime -I../cube -I../textord
-DTESS_EXPORTS -fvisibility=hidden -fvisibility-inlines-hidden
-I/usr/local/include/leptonica
-g -O2 -MT tfacepp.lo -MD -MP -MF .deps/tfacepp.Tpo
-c tfacepp.cpp -fPIC -DPIC -o .libs/tfacepp.o
libtool: compile: g++ -DHAVE_CONFIG_H
-I. -I.. -DUSE_STD_NAMESPACE -I../ccutil -I../ccstruct -I../image
-I../viewer -I../classify -I../dict -I../wordrec -I../cutil
-I../neural_networks/runtime -I../cube -I../textord
-DTESS_EXPORTS -fvisibility=hidden -fvisibility-inlines-hidden
-I/usr/local/include/leptonica
-g -O2 -MT tfacepp.lo -MD -MP -MF .deps/tfacepp.Tpo
-c tfacepp.cpp -o tfacepp.o >/dev/null 2>&1
mv -f .deps/tfacepp.Tpo .deps/tfacepp.Plo
What's the correct way to change the Makefiles to give separate
CPPFLAGS for shared library builds versus static library builds?
I get two tesseract's, one in api and another in api/.libs. Both
versions of tesseract correctly OCR eurotext.tif.
tesseract-3.02apha/api$ ls -agGF
total 840
drwxrwxr-x 4 4096 2012-02-28 22:44 ./
drwxrwxr-x 26 4096 2012-02-28 23:01 ../
-rw-rw-r-- 1 1392 2012-02-28 21:28 apitypes.h
-rw-rw-r-- 1 72417 2012-02-28 11:48 baseapi.cpp
-rw-rw-r-- 1 30466 2012-02-28 12:04 baseapi.h
drwxrwxr-x 2 4096 2012-02-28 22:41 .deps/
drwxrwxr-x 2 4096 2012-02-28 22:45 .libs/
-rw-rw-r-- 1 909 2012-02-28 22:38 libtesseract_api.la
-rw-rw-r-- 1 351 2012-02-28 22:38 libtesseract_api_la-baseapi.lo
-rw-rw-r-- 1 591420 2012-02-28 22:38 libtesseract_api_la-baseapi.o
-rw-rw-r-- 1 1167 2012-02-28 22:38 libtesseract.la
-rw-rw-r-- 1 28705 2012-02-28 22:44 Makefile
-rwxrw-rw- 1 2775 2012-02-28 22:40 Makefile.am*
-rw-rw-r-- 1 30797 2012-02-28 22:44 Makefile.in
-rwxrwxr-x 1 7482 2012-02-28 22:41 tesseract*
-rw-rw-r-- 1 9928 2012-02-26 11:25 tesseractmain.cpp
-rw-rw-r-- 1 1754 2012-02-22 18:15 tesseractmain.h
-rw-rw-r-- 1 32140 2012-02-28 22:41 tesseract-tesseractmain.o
tesseract-3.02apha/api$ ldd tesseract
not a dynamic executable
tesseract-3.02apha$ api/tesseract eurotext.tif eurotext-static
Tesseract Open Source OCR Engine v3.02 with Leptonica
Page 0
tesseract-3.02apha$ cat eurotext-static.txt
The (quick) [brown] {fox} jumps!
Over the $43,456.78 <lazy> #90 dog
& duck/goose, as 12.5% of E-mail
from aspa...@website.com is spam.
Der ,,schnelle” braune Fuchs springt
fiber den faulen Hund. Le renard brun
«rapide» saute par-dessus le chien
paresseux. La volpe marrone rapida
salta sopra il cane pigro. El zorro
marrén répido salta sobre el perro
perezoso. A raposa marrom rzipida
salta sobre o cfio preguicoso.
tesseract-3.02apha/api/.libs$ ls -agGF
total 68784
drwxrwxr-x 2 4096 2012-02-28 22:45 ./
drwxrwxr-x 4 4096 2012-02-28 22:44 ../
-rw-rw-r-- 1 46665172 2012-02-28 22:38 libtesseract.a
-rw-rw-r-- 1 611716 2012-02-28 22:38 libtesseract_api.a
lrwxrwxrwx 1 22 2012-02-28 22:38 libtesseract_api.la ->
../libtesseract_api.la
-rw-rw-r-- 1 603900 2012-02-28 22:38 libtesseract_api_la-baseapi.o
lrwxrwxrwx 1 18 2012-02-28 22:38 libtesseract.la -> ../libtesseract.la
-rw-rw-r-- 1 1168 2012-02-28 22:38 libtesseract.lai
lrwxrwxrwx 1 21 2012-02-28 22:38 libtesseract.so ->
libtesseract.so.3.0.2*
lrwxrwxrwx 1 21 2012-02-28 22:38 libtesseract.so.3 ->
libtesseract.so.3.0.2*
-rwxrwxr-x 1 22470351 2012-02-28 22:38 libtesseract.so.3.0.2*
-rwxrwxr-x 1 31539 2012-02-28 22:45 lt-tesseract*
-rwxrwxr-x 1 31539 2012-02-28 22:41 tesseract*
tesseract-3.02apha/api/.libs$ ldd tesseract
linux-gate.so.1 => (0x00e8f000)
libtesseract.so.3 => /usr/local/lib/libtesseract.so.3 (0x00110000)
liblept.so.2 => /usr/local/lib/liblept.so.2 (0x00757000)
libstdc++.so.6 => /usr/lib/i386-linux-gnu/libstdc++.so.6 (0x004ef000)
libgcc_s.so.1 => /lib/i386-linux-gnu/libgcc_s.so.1 (0x0044d000)
libc.so.6 => /lib/i386-linux-gnu/libc.so.6 (0x005da000)
libpthread.so.0 => /lib/i386-linux-gnu/libpthread.so.0 (0x0046b000)
libm.so.6 => /lib/i386-linux-gnu/libm.so.6 (0x00c09000)
libz.so.1 => /lib/i386-linux-gnu/libz.so.1 (0x00486000)
libpng12.so.0 => /lib/i386-linux-gnu/libpng12.so.0 (0x0049b000)
libjpeg.so.62 => /usr/lib/i386-linux-gnu/libjpeg.so.62 (0x0090c000)
libgif.so.4 => /usr/lib/libgif.so.4 (0x004c5000)
libtiff.so.4 => /usr/lib/i386-linux-gnu/libtiff.so.4 (0x00eae000)
/lib/ld-linux.so.2 (0x004cf000)
tesseract-3.02apha$ api/.libs/tesseract eurotext.tif eurotext-shared
Tesseract Open Source OCR Engine v3.02 with Leptonica
Page 0
tesseract-3.02apha$ cat eurotext-shared.txt
The (quick) [brown] {fox} jumps!
Over the $43,456.78 <lazy> #90 dog
& duck/goose, as 12.5% of E-mail
from aspa...@website.com is spam.
Der ,,schnelle” braune Fuchs springt
fiber den faulen Hund. Le renard brun
«rapide» saute par-dessus le chien
paresseux. La volpe marrone rapida
salta sopra il cane pigro. El zorro
marrén répido salta sobre el perro
perezoso. A raposa marrom rzipida
salta sobre o cfio preguicoso.
I now get:
tesseract-3.02apha/api/.libs$ nm -C -D libtesseract.so.3.0.2 | wc -l
448
vs the 5846 dynamic symbols before. So the basic visibility technique
does work as expected.
It seems a bit surprising that the staticly linked tesseract is only
7482 bytes, while the version that links with the shared library is
31539 bytes?
For api/Makefile.am I tried setting AM_CPPFLAGS but discovered that
seems to be ignored when I also have to do:
libtesseract_api_la_CPPFLAGS = -DTESS_EXPORTS
tesseract_CPPFLAGS = -DTESS_IMPORTS
So I changed those lines to:
libtesseract_api_la_CPPFLAGS = $(AM_CPPFLAGS) -DTESS_EXPORTS
tesseract_CPPFLAGS = $(AM_CPPFLAGS) -DTESS_IMPORTS
BTW,
include_HEADERS = \
apitypes.h baseapi.h tesseractmain.h
but tesseractmain.h isn't a public header so shouldn't it instead be
added to tesseract_SOURCES? Automake's "9.2 Header files" section [1]
says:
"Usually, only header files that accompany installed libraries need
to be installed. Headers used by programs or convenience libraries
are not installed. The noinst_HEADERS variable can be used for such
headers. However when the header actually belongs to a single
convenience library or program, we recommend listing it in the
program's or library's _SOURCES variable (see Program Sources)
instead of in noinst_HEADERS."
I had to remove the training directory from the main Makefile.am,
since as predicted, those applications fail to build when the
-fvisibility=hidden flag is used. As a temporary measure, I suppose
you could somehow force apps in that particular directory to only link
with the static library.
-- Tom
[1] http://www.gnu.org/software/automake/manual/html_node/Headers.html#index-g_t_005fHEADERS-646
It seems a bit surprising that the staticly linked tesseract is only
7482 bytes, while the version that links with the shared library is
31539 bytes?
For api/Makefile.am I tried setting AM_CPPFLAGS but discovered that
seems to be ignored when I also have to do:
libtesseract_api_la_CPPFLAGS = -DTESS_EXPORTS
tesseract_CPPFLAGS = -DTESS_IMPORTS
So I changed those lines to:
libtesseract_api_la_CPPFLAGS = $(AM_CPPFLAGS) -DTESS_EXPORTS
tesseract_CPPFLAGS = $(AM_CPPFLAGS) -DTESS_IMPORTS
BTW,
include_HEADERS = \
apitypes.h baseapi.h tesseractmain.h
but tesseractmain.h isn't a public header so shouldn't it instead be
added to tesseract_SOURCES? Automake's "9.2 Header files" section [1]
says:
"Usually, only header files that accompany installed libraries need
to be installed. Headers used by programs or convenience libraries
are not installed. The noinst_HEADERS variable can be used for such
headers. However when the header actually belongs to a single
convenience library or program, we recommend listing it in the
program's or library's _SOURCES variable (see Program Sources)
instead of in noinst_HEADERS."
Okay, I think the non-makefile changes are harmless to commit (since
they don't do anything on unix until the appropriate
macros are turned on via the makefiles). And checking those changes in
would make the VS2008 Solution maintenance easier since there I *do*
need to know what macros to define (and where it's really easy to set
them separately for each build configuration).
host.h has two lines in it defining DLLEXPORT & DLLIMPORT that should
be removed since they are redundant.
What should be done with DLLSYM? While obsolete maybe it's still a
hint to which classes/structs need to be visible? If not I can
globally remove it easily enough.
Ah! <blush> yes indeed the api/tesseract is a script that lets you
test tesseract before it is installed to /usr/local/bin. So the next
things I'd like to know are: is the statically linked executable ever
automatically made, what is it called, and where does it go?
Right now all the convenience libraries are putting *all* their header
files in include_HEADERS. At some point we need to figure out which of
these really need to be installed. Eventually the list of
include_HEADERS and my list of files that need to be copied to a
"public" include folder on Windows (currently 13 files) should become
the same thing.
It seems a bit surprising that the staticly linked tesseract is only 7482 bytes, while the version that links with the shared library is 31539 bytes?That's because libtool hides the exectuable away for some reason I don't understand, and the "staticly linked tesseract" is in fact a 7k shell script to execute the binary. I would love to learn why this is the case, and why the shell script is so huge. Jimmy?
applied in 692 - installed are only those header files that were identified by Tom python script [2]. If more files need to be installed you can adapt Makefile.am in particular directory, or just let me know what should be included in installation.For api/Makefile.am I tried setting AM_CPPFLAGS but discovered that seems to be ignored when I also have to do: libtesseract_api_la_CPPFLAGS = -DTESS_EXPORTS tesseract_CPPFLAGS = -DTESS_IMPORTS So I changed those lines to: libtesseract_api_la_CPPFLAGS = $(AM_CPPFLAGS) -DTESS_EXPORTS tesseract_CPPFLAGS = $(AM_CPPFLAGS) -DTESS_I.MPORTS BTW, include_HEADERS = \ apitypes.h baseapi.h tesseractmain.h but tesseractmain.h isn't a public header so shouldn't it instead be added to tesseract_SOURCES? Automake's "9.2 Header files" section [1] says: "Usually, only header files that accompany installed libraries need to be installed. Headers used by programs or convenience libraries are not installed. The noinst_HEADERS variable can be used for such headers. However when the header actually belongs to a single convenience library or program, we recommend listing it in the program's or library's _SOURCES variable (see Program Sources) instead of in noinst_HEADERS."Aha! Great information!
...libtool does several things for you: it links with the shared archive rather than the static archive. [1]
...This will choose either the static or shared archive from the `libshell.la' Libtool library depending on the target host and any Libtool mode switches mentioned in the`Makefile.am', or passed to configure. [2]
Looking at zdenko's latest r693, I was surprised that tesseractmain.h
still does:
#include "params.h"
#include "blobs.h"
#include "notdll.h"
because I know that in my APITest VS2008 Solution, I explicitly did
*not* include those headers since they are not required to build
tesseract and not in the "public" 13:
api\apitypes.h
api\baseapi.h
ccmain\thresholder.h
ccstruct\publictypes.h
ccutil\errcode.h
ccutil\fileerr.h
ccutil\host.h
ccutil\memry.h
ccutil\platform.h
ccutil\serialis.h
ccutil\strngs.h
ccutil\tesscallback.h
ccutil\unichar.h
I also wondered how he was able to correctly build, when he now uses
tprintf() in tesseract. The answer is blobs.h eventually includes
tprintf.h and api/Makefile.am is, IMO, incorrectly letting the gcc
compiler poke around in a bunch of tesseract-ocr subdirs looking for
headers.
If we are really going to be "eating what we cook", then we should be
building tesseract (*and* the training apps) in the same kind of
environment as any other project using libtesseract. We have to assume
that the only headers we can see are the "public" headers. This is
exactly analogous to only being able to see visible symbols in the
libtesseracts shared library (instead of everything).
I'm admittedly not sure of the best way to do this. Do we make a new
include subdir, add it to the list of directories to search when
building libtesseract, and specify *only* that directory when building
apps that link with libtesseract?
In r693, zdenko added TESS_API visibility to tprintf() in
ccutil/tprintf.h. This is a good example of the impact of such a
change.
1) He should first of all, include platform.h (which is where
TESS_API is defined) inside tprintf.h.
2) He *also* has to make sure tprintf.h is a public
header. Unfortunately, tprintf.h includes params.h, params.h includes
genericvector.h (and so on). This is where things get a bit hairy.
Hopefully he really doesn't need to include params.h and can somehow
get around this by refactoring -- I haven't looked at tprintf.cpp
very closely.
3) He should explicitly include tprintf.h in either tesseractmain.cpp
(my preference) or tesseractmain.h.
4) He has to update my tesshelper.py program to add tprintf.h to the list of
public headers.
And, of course, this is still avoiding the issue that the TessBaseAPI
class currently refers to objects that the caller can do nothing
useful with (just to give two examples from api/baseapi.h:
class PageIterator;
PageIterator* AnalyseLayout();
class ResultIterator;
ResultIterator* GetIterator();
I haven't finished my APIExamples Solution yet, so I don't know if
there are other ways to get the same information from other methods.
Either we make PageIterator and ResultIterator visible, or we should
remove them from TessBaseAPI. This problem has already come up in the
tesseract-ocr newsgroup.
No one said adding visibility support was going to be painless :)
-- Tom
Solving Issue 287 and its concern with the number of exported symbols
was one of the motivating factors for addressing the visibility
problem (along with fixing undefined external errors when building the
Windows DLL).
However, just addressing libtesseract's external symbols probably
won't be enough for the original poster. The fact remains that liblept
also currently exports 2200+ symbols.
One option that comes to mind, is to support building a shared library
libtesseract that *statically* links with liblept. This removes the
public dependence on liblept.so and doesn't increase libtesseract's
visible symbol count.
Of course, then users of this library will not be able to use any
liblept functions directly unless they also statically link with
liblept. I'm pretty sure this is safe, since there isn't any
initialization involved with using liblept functions. And from my own
experience, I know that programs that statically link with liblept and
its dependent image libraries can be surprisingly tiny.
The result would be, I believe, the smallest "working set" for the OP?
Given that, should we really contemplate not providing a static
library version of libtesseract? Maybe some other program would like
to link statically with it, in the same way that linking statically
with liblept is sometimes helpful?
I'm not sure what happens if a project links with a shared library
that statically links with libtesseract, at the same time it also
statically links with libtesseract directly?
-- Tom
[1] http://code.google.com/p/tesseract-ocr/issues/detail?id=287
As far as final size goes, perhaps the first answer to "C/C++ gcc & ld
- remove unused symbols" [1] will help?
-- Tom
[1] http://stackoverflow.com/questions/6687630/c-c-gcc-ld-remove-unused-symbols
+ fixes issue [1] where boolean was being compared to float
+ removes extra includes from tesseractmain.h
+ removes extra DLLEXPORT & DLLIMPORT from hosts.h
+ remove CCUTIL_IMPORTS & CCUTIL_EXPORTS from vs2008 *.vcproj.
+ tesseract prints full version info when -v arg used:
vs2008\DLL_Release>tesseract-dll.exe -v
tesseract 3.02
leptonica-1.68 (Feb 21 2012, 05:25:30) [MSC v.1500 DLL Release 32 bit]
libgif 4.1.6 : libjpeg 8c : libpng 1.4.3 : libtiff 3.9.4 : zlib 1.2.5
or:
vs2008\LIB_Release\tesseract.exe -v
tesseract 3.02
leptonica-1.68 (Feb 21 2012, 05:29:12) [MSC v.1500 LIB Release 32 bit]
libgif 4.1.6 : libjpeg 8c : libpng 1.4.3 : libtiff 3.9.4 : zlib 1.2.5
This is very helpful when answering support questions. IMO, all the
training apps should also do this.
More info on static linking: here's the size of the Windows
LIB_Release executables which are statically linked with libtesseract
& liblept:
1,092,096 ambiguous_words.exe
1,311,232 classifier_tester.exe
616,448 cntraining.exe
580,608 combine_tessdata.exe
593,408 dawg2wordlist.exe
952,832 mftraining.exe
878,080 shapeclustering.exe
2,349,568 tesseract.exe
585,216 unicharset_extractor.exe
677,376 wordlist2dawg.exe
And the size of the tesseract DLL version and the Release libraries:
12,288 tesseract-dll.exe
1,554,432 libtesseract302.dll
1,672,192 liblept168.dll
14,674,982 libtesseract302-static.lib
2,519,302 liblept168-static-mtdll.lib
70,450 giflib416-static-mtdll.lib
363,212 libjpeg8c-static-mtdll.lib
331,028 libpng143-static-mtdll.lib
1,777,404 libtiff394-static-mtdll.lib
199,940 zlib125-static-mtdll.lib
[1] http://code.google.com/p/tesseract-ocr/issues/detail?id=573
> However, just addressing libtesseract's external symbols probably
> won't be enough for the original poster. The fact remains that liblept
> also currently exports 2200+ symbols.
It would be worth bringing Dan Bloomberg into the conversation.
Although he wanted to keep his static makefiles, he was very grateful
for my autotools improvements and we have continued to discuss various
issues since.
> I'm not sure what happens if a project links with a shared library
> that statically links with libtesseract, at the same time it also
> statically links with libtesseract directly?
I'm not sure whether you get an error or whether the direct link takes
precedence but even if it's the latter, this is very bad. If the
library versions are different or are built with different options then
things will almost certainly break. I once tried to force a proprietary
program to use my system Qt libraries. It eventually crashed and burned
because my Qt was built against libpng 1.5 while its Qt was built
against 1.4, which it was also using directly.
Regards,
James
Helpful tip for determining header file dependencies, see [1] for details:
IDIRS="-I../api -I../ccmain -I../ccstruct -I../ccutil -I../classify
-I../cube -I../cutil -I../dict -I../image -I../neural_networks/runtime
-I../textord -I../viewer -I../wordrec"
tesseract-3.02apha/ccmain$ gcc $IDIRS -MM pageiterator.h
pageiterator.o: pageiterator.h ../ccstruct/publictypes.h
tesseract-3.02apha/ccmain$ gcc $IDIRS -MM resultiterator.h
resultiterator.o: resultiterator.h ltrresultiterator.h pageiterator.h \
../ccstruct/publictypes.h ../ccutil/unicharset.h ../ccutil/strngs.h \
../ccutil/platform.h ../ccutil/memry.h ../ccutil/host.h \
../ccutil/serialis.h ../ccutil/errcode.h ../ccutil/fileerr.h \
../ccutil/unichar.h ../ccutil/unicharmap.h ../ccutil/params.h \
../ccutil/genericvector.h ../ccutil/tesscallback.h ../ccutil/helpers.h \
../ccutil/ndminx.h ../ccutil/genericvector.h
tesseract-3.02apha/api$ gcc $IDIRS -MM baseapi.h
baseapi.o: baseapi.h apitypes.h ../ccstruct/publictypes.h \
../ccmain/thresholder.h ../ccutil/unichar.h ../ccutil/tesscallback.h \
../ccutil/host.h ../ccutil/platform.h
tesseract-3.02apha/ccutil$ gcc $IDIRS -MM strngs.h
strngs.o: strngs.h platform.h memry.h host.h serialis.h errcode.h \
fileerr.h
So, to add initial visibility for the PageIterator & ResultIterator
classes, to the original 13 public headers:
api\apitypes.h
api\baseapi.h
ccmain\thresholder.h
ccstruct\publictypes.h
ccutil\errcode.h
ccutil\fileerr.h
ccutil\host.h
ccutil\memry.h
ccutil\platform.h
ccutil\serialis.h
ccutil\strngs.h
ccutil\tesscallback.h
ccutil\unichar.h
we need to only add the following 6 headers:
ccutil/genericvector.h
ccutil/helpers.h
ccutil/ndminx.h
ccutil/params.h
ccutil/unicharmap.h
ccutil/unicharset.h
Of course, the PageIterator and ResultIterator header files forward
declare other classes that may also need to be made visible.
[1] http://gcc.gnu.org/onlinedocs/cpp/Invocation.html#Invocation
-- Tom
Ooops, should be:
we need to only add the following 8 headers:
ccmain/pageiterator.h
ccmain/resultiterator.h
ccutil/genericvector.h
ccutil/helpers.h
ccutil/ndminx.h
ccutil/params.h
ccutil/unicharmap.h
ccutil/unicharset.h
-- Tom
Arrrgh! One last time, it should be
we need to only add the following 9 headers:
ccmain/ltrresultiterator.h
ccmain/pageiterator.h
ccmain/resultiterator.h
ccutil/genericvector.h
ccutil/helpers.h
ccutil/ndminx.h
ccutil/params.h
ccutil/unicharmap.h
ccutil/unicharset.h
-- Tom
+ Remove visibility from protected members of tesseract::TessBaseAPI
class by applying TESS_LOCAL macro.
+ Make PageIterator & ResultIterator classes visible by applying TESS_API macro.
+ Fix api/Makefile.am & training/Makefile.am since build dir is not
same as source dir when building from "external" dir.
Tested on Ubuntu 11.10 via:
cd ~/Builds/tesseract-3.02apha/
./autogen.sh
cd ../Output/tesseract-3.02/
../../tesseract-3.02apha/configure --enable-visibility
make
api/tesseract ../../tesseract-3.02apha/eurotext.tif eurotext
(training apps fail since still have undefined references)
After protected members removed from TessBaseAPI class:
~/Builds/Output/tesseract-3.02$ nm -C -D --defined-only
api/.libs/libtesseract.so.3.0.2 | wc -l
173
After PageIterator & ResultIterator classes made visible:
~/Builds/Output/tesseract-3.02$ nm -C -D --defined-only
api/.libs/libtesseract.so.3.0.2 | wc -l
230
-- Tom
+ fix VS2008 warning about "non dll-interface class
tesseract::LTRResultIterator used as base for dll-interface class
tesseract::ResultIterator" by making LTRResultIterator also visible.
+ Changed Project preprocessor definition of WINDLLNAME, because
stringizing operator doesn't seem to work when initializing
tessedit_module_name in ccutil/ccutil.cpp (which was omitted in
previous fixes).
+ Update vs2008/tesshelper.py for new public header files.
-- Tom
Thanks. Actually, I sort of liked having Zdenko look over my patches
before they went live :) I'll start small initially to make sure I
don't mess the repository up.
OTOH, I have a number of changes to add for the vs2008/doc directory,
and being able to update directly will make that process easier.
-- Tom