Hello,since the installer for Windows will contain the libtesseract DLL in the next version (v 3.02),
I haven't tried to build tesseract in months, but isn't it still
impossible to build any of the training apps using the DLL version of
libtesseract [1]? As such, I see little point in releasing a DLL
version of tesseract.exe.
> On other hand (windows) programmers will need more than dll - for them there
> should be packages like tesseract-ocr-3.02-vs2008.zip (solution files),
> tesseract-3.02-win32-lib-include-dirs.zip (library with include files) ,
> maybe package with example how to use API.
I agree. And this brings up just one of the issues I mentioned months
ago [2]. There is a distinction between public & private headers.
Public headers need to be released in there own separate include
directory. I suggested that they go in BuildFolder\include\tesseract
and provided a python program called tesshelper.py to automatically
copy the relevant headers.
However, the current source tree spreads these headers out over
numerous directories and no distinction is made between headers which
need to be public. This leads to errors such as discussed in "Issue
362 - unresolved external symbol" [3].
Shouldn't we change things so that all public headers instead go into
a single directory?
I am admittedly ignorant on how all this impacts
linux developers who plan to use libtesseract (shared or otherwise).
Now that we are closer to actually releasing v3.02 and people seem to
be actually addressing the outstanding issues (I was pleasantly
surprised at recent the flurry of source checkins), I would suggest
that the many questions I raised in [2] still need to be answered
(irregardless of the separate work on the C-API which I think was a
bit premature).
Sheepishly, I gave up on my overly ambitious plans mentioned in that
thread when it resulted in zero responses (and I got a little
overwhelmed at the complexity and frankly strange behavior of some of
the more esoteric baseapi functions). However, given the recent uptick
in activity, I guess I'll fire up TortoiseSVN, VS2008, and Ubuntu
again (have people been testing the --enable-visibility flag?) this
weekend and see if I can make more progress.
On Fri, Sep 28, 2012 at 2:09 AM, zdenko podobny <zde...@gmail.com> wrote:I haven't tried to build tesseract in months, but isn't it still
> On Mon, Sep 24, 2012 at 9:57 AM, troplin <tro...@gmail.com> wrote:
>>
>> Hello,
>>
>> since the installer for Windows will contain the libtesseract DLL in the
>> next version (v 3.02),
>
>
> I realised that there was not discusssion about this yet and I thoght that
> installer will include only statically linked tesseract... My understanding
> is that end users (interested in installer) do not need dll (maybe I am
> wrong).
impossible to build any of the training apps using the DLL version of
libtesseract [1]? As such, I see little point in releasing a DLL
version of tesseract.exe.
> On other hand (windows) programmers will need more than dll - for them thereI agree. And this brings up just one of the issues I mentioned months
> should be packages like tesseract-ocr-3.02-vs2008.zip (solution files),
> tesseract-3.02-win32-lib-include-dirs.zip (library with include files) ,
> maybe package with example how to use API.
ago [2]. There is a distinction between public & private headers.
Public headers need to be released in there own separate include
directory. I suggested that they go in BuildFolder\include\tesseract
and provided a python program called tesshelper.py to automatically
copy the relevant headers.
However, the current source tree spreads these headers out over
numerous directories and no distinction is made between headers which
need to be public. This leads to errors such as discussed in "Issue
362 - unresolved external symbol" [3].
Shouldn't we change things so that all public headers instead go into
a single directory? I am admittedly ignorant on how all this impacts
linux developers who plan to use libtesseract (shared or otherwise).
Now that we are closer to actually releasing v3.02 and people seem to
be actually addressing the outstanding issues (I was pleasantly
surprised at recent the flurry of source checkins), I would suggest
that the many questions I raised in [2] still need to be answered
(irregardless of the separate work on the C-API which I think was a
bit premature).
Sheepishly, I gave up on my overly ambitious plans mentioned in that
thread when it resulted in zero responses (and I got a little
overwhelmed at the complexity and frankly strange behavior of some of
the more esoteric baseapi functions). However, given the recent uptick
in activity, I guess I'll fire up TortoiseSVN, VS2008, and Ubuntu
again (have people been testing the --enable-visibility flag?) this
weekend and see if I can make more progress.
Slightly offtopic. Has anyone read:
"API Design for C++" by Martin Reddy
http://www.amazon.com/API-Design-C-Martin-Reddy/dp/0123850037/
Paperback: 472 pages
Publisher: Morgan Kaufmann; 1 edition (February 18, 2011)
http://APIBook.com/
Reading it is another thing that's been on my todo list for months :)
Certainly the TOC looks interesting [4].
[1] http://tesseract-ocr.googlecode.com/svn/trunk/vs2008/doc/building.html#building-the-training-applications
[2] "Visibility" support summary and future work
https://groups.google.com/forum/?fromgroups=#!topic/tesseract-dev/kcBEJY0s9H8
[3] https://groups.google.com/forum/?fromgroups=#!topic/tesseract-dev/S8w7cfzr4kE
[4] http://www.apibook.com/blog/contents
-- Tom
I would have thought that windows developers would want a tesseract DLL for integrating into some run-time application, but they have absolutely no real need to be able to build the training tools using the DLL, as they are used only for training.
Putting all the includes necessary for export to a DLL into a separate directory breaks the dependency hierarchy and creates circular dependencies.
Recognita is a prime example of this. Because all the low-level code uses the (Recognita) API, which depends on everything, everything depends on everything.
It took a lot of work to clean up the dependencies in Tesseract to a clean hierarchy.
A far better solution would be to have a separate header file in the api directory that includes all the headers needed by anything that uses the DLL. It would cut down on the number of includes that need to be made, but probably not having to specify a long list of directories in the build tools for apps that use the DLL. For that there is copy-paste. Even this header should not be included in baseapi.h though, because of the namespace pollution problem.
BaseAPI includes a lot of crap that isn't needed by real apps that use the DLL. The idea behind the ResultIterator and friends is that BaseAPI users shouldn't need a huge number of includes. DLL-based apps shouldn't be touching PAGE_RES for instance, even though access to it is exposed by some of the API functions.
Am Freitag, 28. September 2012 11:09:32 UTC+2 schrieb Zdenko Podobný:On Mon, Sep 24, 2012 at 9:57 AM, troplin <tro...@gmail.com> wrote:Hello,since the installer for Windows will contain the libtesseract DLL in the next version (v 3.02),I realised that there was not discusssion about this yet and I thoght that installer will include only statically linked tesseract... My understanding is that end users (interested in installer) do not need dll (maybe I am wrong).On other hand (windows) programmers will need more than dll - for them there should be packages like tesseract-ocr-3.02-vs2008.zip (solution files), tesseract-3.02-win32-lib-include-dirs.zip (library with include files) , maybe package with example how to use API.What is your option?Sorry to be annoying but is there any chance that we could include the Windows DLL in the installer? Maybe in the next update? Or a minor patch to 3.02?
I would do it myself, but I have absolutely no idea how the windows installer is built. Maybe it would be good to add the installer templates to the repository, so that they are versioned too and can be modified by everyone.
On Thu, Dec 27, 2012 at 12:02 PM, troplin <tro...@gmail.com> wrote:Sorry to be annoying but is there any chance that we could include the Windows DLL in the installer? Maybe in the next update? Or a minor patch to 3.02?Can you be please more specific what do you mean by "include the Windows DLL in the installer"? Are there any problem with installling tesseract libraries by installer?
I would do it myself, but I have absolutely no idea how the windows installer is built. Maybe it would be good to add the installer templates to the repository, so that they are versioned too and can be modified by everyone.
Commited as r815.
Am Donnerstag, 27. Dezember 2012 20:56:35 UTC+1 schrieb Zdenko Podobný:On Thu, Dec 27, 2012 at 12:02 PM, troplin <tro...@gmail.com> wrote:Sorry to be annoying but is there any chance that we could include the Windows DLL in the installer? Maybe in the next update? Or a minor patch to 3.02?Can you be please more specific what do you mean by "include the Windows DLL in the installer"? Are there any problem with installling tesseract libraries by installer?Oh, I didn't notice that.Previously, the answer seemed to be 'no', so I didn't even try.But actually it doesn't work, I'm getting an error 404 when it tries to download the libtesseract libary. liblept works however.Also, when installing all language data, I'm getting some additional errors (some 404 and sometimes it tries to overwrite existing files).Also I think the installer is mainly designed for developers instead of users. I would group the contents in the following way:(x) Install by default(-) Mixed content( ) Don't install by default
- (-) Components required for normal use
- (x) Executable
- (x) Release-DLLs without headers. (libtesseract302.dll, liblept168.dll) Only one bullet point, since libtesseract does not work without liblept
- (-) Language data
- (x) English
- (x) OSD
- ( ) Other languages
- (x) Basic documentation for users
- ( ) Components required for advanced users (Everything required for customization of recognition)
- ( ) Training tools
- ( ) Tools for manipulating language data
- ( ) Documentation for those tools
- ...
- ( ) Components required for developers that use tesseract in their products
- ( ) Public tesseract and leptonica headers.
- ( ) Debug DLLs
- ( ) Tesseract and leptonica stub libraries (libtesseract302.lib, liblept168.lib). Those used for linking to the DLLs.
- ( ) API documentation / doxygen
- ...
- ( ) Components required for tesseract developers
- ( ) Tesseract source
- ( ) static libraries
- ( ) VS 2008 tools
- ...
What do you think about? Did I forget something?
If I find some time, I will get my hands dirty and try to modify the template.I would do it myself, but I have absolutely no idea how the windows installer is built. Maybe it would be good to add the installer templates to the repository, so that they are versioned too and can be modified by everyone.Commited as r815.Great!Some really minor nitpicks concerning the installer:
- I find the behavior of the feature checkboxes in the installer really counterintuitive. Normally you get the description by clicking on a feature, not by hovering over. But here clicking on the text toggles the selection.
- The installer succedes even if there are errors, and your really don't know what actually succeeded and what not.
- No repair feature, just uninstall and install.
- Executable instead of MSI
All of those points are probably because of the use of NSIS, which i'm not so fond of.However, I must acknowledge that you probably don't want to invest much time learning WiX. And I'm not a WiX expert neither, so NSIS is probably the best solution currently.The reason I tell you those points in the first place is, that with tesseract getting better and easier to use, it will attract professional users (Professional in the sense of money, not competence). A standard MSI just looks more professional than a custom installer. And you can use the scripting fuctionality of msiexec.
Tobi
On Thu, Jan 3, 2013 at 9:35 AM, troplin <tro...@gmail.com> wrote:Am Donnerstag, 27. Dezember 2012 20:56:35 UTC+1 schrieb Zdenko Podobný:On Thu, Dec 27, 2012 at 12:02 PM, troplin <tro...@gmail.com> wrote:Sorry to be annoying but is there any chance that we could include the Windows DLL in the installer? Maybe in the next update? Or a minor patch to 3.02?Can you be please more specific what do you mean by "include the Windows DLL in the installer"? Are there any problem with installling tesseract libraries by installer?Oh, I didn't notice that.Previously, the answer seemed to be 'no', so I didn't even try.But actually it doesn't work, I'm getting an error 404 when it tries to download the libtesseract libary. liblept works however.Also, when installing all language data, I'm getting some additional errors (some 404 and sometimes it tries to overwrite existing files).Also I think the installer is mainly designed for developers instead of users. I would group the contents in the following way:(x) Install by default(-) Mixed content( ) Don't install by default
- (-) Components required for normal use
- (x) Executable
- (x) Release-DLLs without headers. (libtesseract302.dll, liblept168.dll) Only one bullet point, since libtesseract does not work without liblept
I am not sure if this is need for as default - tesseract is linked statically (because of training programs - search archive for reason) So common user do not need it.
- (-) Language data
- (x) English
- (x) OSD
- ( ) Other languages
- (x) Basic documentation for users
I forget to include manual pages (html files in doc directory). Maybe download of pdf documentation files (from svn repository) could be other option...
- ( ) Components required for advanced users (Everything required for customization of recognition)
- ( ) Training tools
- ( ) Tools for manipulating language data
- ( ) Documentation for those tools
- ...
- ( ) Components required for developers that use tesseract in their products
- ( ) Public tesseract and leptonica headers.
- ( ) Debug DLLs
- ( ) Tesseract and leptonica stub libraries (libtesseract302.lib, liblept168.lib). Those used for linking to the DLLs.
- ( ) API documentation / doxygen
- ...
- ( ) Components required for tesseract developers
- ( ) Tesseract source
- ( ) static libraries
- ( ) VS 2008 tools
- ...
What do you think about? Did I forget something?It looks like to match splits for me ;-) By my needs are different than your ;-)
If I find some time, I will get my hands dirty and try to modify the template.I would do it myself, but I have absolutely no idea how the windows installer is built. Maybe it would be good to add the installer templates to the repository, so that they are versioned too and can be modified by everyone.Commited as r815.Great!Some really minor nitpicks concerning the installer:
- I find the behavior of the feature checkboxes in the installer really counterintuitive. Normally you get the description by clicking on a feature, not by hovering over. But here clicking on the text toggles the selection.
- The installer succedes even if there are errors, and your really don't know what actually succeeded and what not.
- No repair feature, just uninstall and install.
- Executable instead of MSI
All of those points are probably because of the use of NSIS, which i'm not so fond of.However, I must acknowledge that you probably don't want to invest much time learning WiX. And I'm not a WiX expert neither, so NSIS is probably the best solution currently.The reason I tell you those points in the first place is, that with tesseract getting better and easier to use, it will attract professional users (Professional in the sense of money, not competence). A standard MSI just looks more professional than a custom installer. And you can use the scripting fuctionality of msiexec.TobiFeel free to modify it (or bring something better). I just did it because there was nobody else ;-). I do not like MSI because (usually?) sw installed with MSI require MSI file for uninstall (maybe this is problem of packager but I hate this behavior because I fight for free space).
Here are some of my comments/explanation that can give you some light for current behavior:
- I wanted to use else than NSIS, but I come back to NSIS for 3.02 release ;-). My criteria for installer:
- are free software, so anybody can check/improve my work
- it should be able to download packages (e.g. installer has only needed parts) through proxy server with authorization
- it should be uninstall software without needing installer
- it should be able to use gzip and zip archives or run external program
- it should be able to compress installer efficiently (I love lzma compression in NSIS)
- I wanted to use official packages: I did not want to split leptonica library and uploaded it to tesseract-ocr project.
- I wanted to include only "must have files" (from my point of view) - other files should be possible to download
- I don't wanted to create a lot of packages (for downloading).
I have possibility to test it on Windows XP installations with power-user rights only, so other combination could cause unexpected behavior.