tesseract 3.04 can be downloaded as a package for msys2 (will work on windows)

1,053 views
Skip to first unread message

Shree Devi Kumar

unread,
Aug 26, 2014, 3:36:11 AM8/26/14
to tesser...@googlegroups.com, tesser...@googlegroups.com
Follow instructions on 


to setup msys2


---------- Forwarded message ----------
From: Alexx83 <lex...@users.sf.net>
Date: Tue, Aug 26, 2014 at 12:21 PM
Subject: [msys2:tickets] #71 tesseract-ocr build failed with bad reloc address 0x23
To: "[msys2:tickets]" <7...@tickets.msys2.p.re.sf.net>

Now tesseract-orc can be installed via pacman.
For future, I prefer to discuss issues with present packages or new packages adding on github:
https://github.com/Alexpux/MINGW-packages

For MSYS2 packages:
https://github.com/Alexpux/MSYS2-packages

You can clone git repo with our scripts and create pull requests with fixes or new packages.


Shree Devi Kumar

unread,
Aug 26, 2014, 4:04:27 AM8/26/14
to tesser...@googlegroups.com, tesser...@googlegroups.com
Please note that this does NOT install any language data.

Shree Devi Kumar
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

zdenko podobny

unread,
Aug 26, 2014, 7:09:44 AM8/26/14
to tesser...@googlegroups.com, tesser...@googlegroups.com
Please stop with this releases!!!
3.04 was not released! We are skipping 3.03 release because some people decided to spread 3.03 on internet and there was need to change API. AFAIK more API changes for 3.04 should come!
You are not helping this project defintely.

Zdenko


--
You received this message because you are subscribed to the Google Groups "tesseract-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-de...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-dev.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-dev/CAG2NduVZ4sEhonj8YXAZk5xh0S9pm8HrWA4RfLXSbJSbeSL%3DGA%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

shree

unread,
Aug 26, 2014, 9:46:04 PM8/26/14
to tesser...@googlegroups.com, tesser...@googlegroups.com, zdenko podobny, Ray Smith
Zdenko,

Sorry it was not meant to be a 'release' of 3.04, I just wanted to get the latest code compiled under msys2 and asked the developers for help and suggested a package of tesseract and leptonica under msys2. I presume, it is ok to label it as 3.03 with the Revision: 298e31465a44.

However, as I had asked you in an earlier post, your last commit of configure.ac does show tesseract version as 3.04 

>> 
# ----------------------------------------
# Initialization
# ----------------------------------------

AC_PREREQ(2.50)
AC_INIT([tesseract], [3.04], [http://code.google.com/p/tesseract-ocr/issues/list]) >>

FYI, training tools did compile under msys2 on windows8.

Thanks,
Shree

zdenko podobny

unread,
Aug 27, 2014, 5:11:28 AM8/27/14
to shree, tesser...@googlegroups.com, tesser...@googlegroups.com, Ray Smith
Anybody who is packaging tesseract and publicaly sharing 3.03 (excluding -rc1) and 3.04 is lying. There are no such releases. 
Repository is intended for developers and testers not for packagers! And it is absolutely normal that there are changes of version withing repository. There are for developers and testers.

If packagers are not able to respect project (there are reasong why there is no new release) that we should we should remove public tesseract repository.

Zdenko

Janusz S. Bien

unread,
Aug 27, 2014, 5:25:23 AM8/27/14
to tesser...@googlegroups.com
Quote/Cytat - zdenko podobny <zde...@gmail.com> (Wed 27 Aug 2014
11:10:57 AM CEST):

> Anybody who is packaging tesseract and publicaly sharing 3.03 (excluding
> -rc1) and 3.04 is lying. There are no such releases.
> Repository is intended for developers and testers not for packagers! And it
> is absolutely normal that there are changes of version withing repository.
> There are for developers and testers.
>
> If packagers are not able to respect project (there are reasong why there
> is no new release) that we should we should remove public tesseract
> repository.

It is quite easy to google out many packages which I are build e.g.
every night from the current repository. My first hit was

http://aquamacs.org/nightlies.shtml

but definitely there is more of them.

I think such packages are very useful for users and indirectly also to
developers.

I see no reason for discouraging the packagers, it would be better to
help them to do it properly. For example, I've seen packages with the
word 'snapshot' and the date in the name.

Best regards

Janusz


--
Prof. dr hab. Janusz S. Bień - Uniwersytet Warszawski (Katedra
Lingwistyki Formalnej)
Prof. Janusz S. Bień - University of Warsaw (Formal Linguistics Department)
jsb...@uw.edu.pl, jsb...@mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~jsbien/

Shree Devi Kumar

unread,
Aug 27, 2014, 5:26:26 AM8/27/14
to zdenko podobny, tesser...@googlegroups.com, tesser...@googlegroups.com, Ray Smith
What is the git clone command to get tesseract 3.03 rc1 ?

zdenko podobny

unread,
Aug 27, 2014, 5:56:24 AM8/27/14
to tesser...@googlegroups.com
On Wed, Aug 27, 2014 at 11:21 AM, Janusz S. Bien <jsb...@mimuw.edu.pl> wrote:
Quote/Cytat - zdenko podobny <zde...@gmail.com> (Wed 27 Aug 2014 11:10:57 AM CEST):

Anybody who is packaging tesseract and publicaly sharing 3.03 (excluding
-rc1) and 3.04 is lying. There are no such releases.
Repository is intended for developers and testers not for packagers! And it
is absolutely normal that there are changes of version withing repository.
There are for developers and testers.

If packagers are not able to respect project (there are reasong why there
is no new release) that we should we should remove public tesseract
repository.

It is quite easy to google out many packages which I are build e.g. every night from the current repository. My first hit was

http://aquamacs.org/nightlies.shtml

but definitely there is more of them.

I think such packages are very useful for users and indirectly also to developers.

Are you serious? So there will be several 3.04.00 versions of tesseract created by different packagers with different API. This should help for users? Developers?

I see no reason for discouraging the packagers, it would be better to help them to do it properly. For example, I've seen packages with the word 'snapshot' and the date in the name.

  1. If packager is misleading users/developers that it is not ok.
  2. Project should not be driven by packagers
  3. If there is no release by the project there is a reason for if. Respect it

Zdenko


Shree

unread,
Aug 27, 2014, 7:31:08 AM8/27/14
to tesser...@googlegroups.com
Zdenko,

This seems to be a case of misunderstanding. There is no packager conspiracy to release versions of tesseract on the internet.

As I had mentioned earlier, I was trying to compile tesseract along with training tools under msys2 on windows8 for testing Hindi using the latest code from the public repository. I got some errors and filed a bug report at msys2 for an error I got during compile.

I found msys2 to be much more user friendly compared to msys+mingw or cygwin. Their package installing system makes it very easy to install any package along with it dependencies. Hence I suggested to them to offer a package of tesseract and pointed them to the google code page at https://code.google.com/p/tesseract-ocr/source/checkout . Based on the version number in the config file, I also told them that the version was 3.04.

Well, you pointed out that was incorrect and I have asked them to change it to 3.03. The source is still pointing to the latest source.

Please let me know the link for the 'official 3.03 rc1 release' and I'll ask them to change it to that.

Prof. Bień had a good suggestion, by putting some kind of "id" it will be useful for testers as well as developers because then when someone files an issue they can refer to the tesseract version as reported by the program they are using. It will help in confirming and troubleshooting the issue.

Thanks,
Shree

zdenko podobny

unread,
Aug 27, 2014, 4:52:28 PM8/27/14
to Shree Devi Kumar, tesser...@googlegroups.com, tesser...@googlegroups.com, Ray Smith
there is no git command for it (well maybe we could track down the revision number and tag it, but...)
If somebody want to use -rc1, he/she should use googledrive package.

Zdenko

zdenko podobny

unread,
Aug 27, 2014, 5:43:14 PM8/27/14
to tesser...@googlegroups.com
Shree,

there is ALWAYS problem if people make suggestion and have no information about subject of suggestion.

3.03-rc1 package should serve for wider testing. E.g. check portability, build system, prepare tesseract based products for new version.
IMO this should not be used for distribution! Why?:
  1. This release candidate led to some feature requests that were accepted and they caused API change
  2. Testers found out several issues. Based on Ray investigation fixes will lead to  API changes...
In meantime some linux distributions started to distribute this rc1 (and maybe some fixes) as 3.03 tesseract...

So there was choice:
  • to release buggy 3.03 and without API changes (e.g. without new features).
  • to increase version to 3.04, keep the new features and fix issues (change the API)
Ray chose second option. And what happened? Somebody started to generate 3.04 packages just based on change in configure.ac!

My option: whoever, who takes seriously its users/customers will not distribute unstable releases (and it looks like 3.03 will never be released officially, so those who distribute them in stable/productive channels should face their users/customers).



Zdenko


--
You received this message because you are subscribed to the Google Groups "tesseract-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-de...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-dev.

Shree Devi Kumar

unread,
Aug 27, 2014, 10:33:14 PM8/27/14
to tesser...@googlegroups.com
Zdenko,

Thanks for your reply.

I think this calls for further discussion regarding versioning of tesseract.

So there was choice:
  • to release buggy 3.03 and without API changes (e.g. without new features).
  • to increase version to 3.04, keep the new features and fix issues (change the API)
Ray chose second option.

​OK, so the current code on git is 3.04. I assume from your comments so far that this is actually a development version for testers and developers.​ If that is the case then it should be labelled as such. You can decide what it will be.

In addition to that I would suggest that there is some kind of minor versioning also added, so that when testers compile the source and test with it, the program reports exactly which revision they are using.



git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1146 d0cd1f9f-
072b-0410-8dd7-cf729c803f20

then when I run tesseract or give the -v option it should identify the above in some manner.

Git probably allows for short tags to identify versions.

This I think would be very useful to developers because the usual question you ask when someone files an issue is 'what version'? If testers are using the development version, it is good to know exactly which revision they are running.
I would request others on the list with more experience regarding these issues and git to suggest what will work best for tesseract.
---

Regarding the package under msys2, please let me know which source link I should ask them to use and what should that version be called.

Thanks,
Shree






Shree Devi Kumar
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com


--
You received this message because you are subscribed to a topic in the Google Groups "tesseract-dev" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tesseract-dev/wCIymFqFO0g/unsubscribe.
To unsubscribe from this group and all its topics, send an email to tesseract-de...@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-dev.

Shree Devi Kumar

unread,
Aug 28, 2014, 1:26:37 AM8/28/14
to tesser...@googlegroups.com

Shree Devi Kumar
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com


Tom Morris

unread,
Aug 28, 2014, 12:07:24 PM8/28/14
to tesser...@googlegroups.com, Shree Devi Kumar, tesser...@googlegroups.com, Ray Smith
On Wed, Aug 27, 2014 at 4:51 PM, zdenko podobny <zde...@gmail.com> wrote:
there is no git command for it (well maybe we could track down the revision number and tag it, but...)

Isn't tagging releases good software engineering practice regardless of all this other discussion?

Looks to me like the appropriate rev is:

Note that there have been almost 100 commits since that source drop was made in February, so it's quite out of date.

Tom

Paul

unread,
Aug 30, 2014, 6:04:06 AM8/30/14
to tesser...@googlegroups.com, shree...@gmail.com, tesser...@googlegroups.com, thera...@gmail.com
I think a lot of this is caused by the project home page claiming that Tesseract 3.03 is shipped with Ubuntu 2014.04. This sounds like it is a final release. I'd change or remove that statement.

Roadmap

Version 3.03 release candidate is now available (source only so far) for download and contains many new features. (See the ReleaseNotes for a full list.) Please check out the ReadMe before going to Downloads as you need more than one file. Even the windows executables tarball is incomplete as language files are required. Most notable new features:

  • PDF output.
  • New Renderer for extracting detailed recognition information at a document level.

Version 3.03 ships with recent Linux distributions such as Ubuntu 14.04.

Version 3.02 ships with Ubuntu 12.04

Paul

Shree

unread,
Aug 31, 2014, 10:28:56 PM8/31/14
to tesser...@googlegroups.com, tesser...@googlegroups.com, shree...@gmail.com, thera...@gmail.com
Also

2014-02-04 v3.03
* Added new training tool text2image to generate box/tif file pairs from
  text and truetype fonts.
* Added support for PDF output with searchable text.
* Removed entire IMAGE class and all code in image directory.
* Tesseract executable: support for output to stdout; limited support for one 
  page images from stdin  (especially on Windows)
* Added Renderer to API to allow document-level processing and output
  of document formats, like hOCR, PDF.
* Major refactor of word-level recognition, beam search, eliminating dead code.
* Refactored classifier to make it easier to add new ones.
* Generalized feature extractor to allow feature extraction from greyscale.
* Improved sub/superscript treatment.
* Improved baseline fit.
* Added set_unicharset_properties to training tools.
* Many bug fixes.
* More training source data included.

zdenko podobny

unread,
Sep 2, 2014, 3:55:53 PM9/2/14
to tesser...@googlegroups.com
Git (in past svn) repository is for those who want to contribute to tesseract. E.g. if somebody whats to fix or add new feature - (s)he should do if again current code in repository. Code in repository is subject of change - any time could be changed without warning. It is not intended for packing.

Packages:
  • The stable version is 3.02.02 - this package should be available for any user in any distribution. If needed some patched could be added, but then packager should change version number based on rules for particular distribution).
  • The testing version 3.03-rc1 - this package could be available if distributions offer something like "beading edge", "testing" version of distribution. Because of mention issues I would be very careful whether to offer this package.
Providing package that was not release by tesseract team: such package should be clearly marked as unofficial package, and packager should provide its address where packager will fix issues reported by its users (=> there will be any support for such packages in tesseract project). 


Zdenko


On Thu, Aug 28, 2014 at 4:32 AM, Shree Devi Kumar <shree...@gmail.com> wrote:
This is a secure message chain, protected by Virtru.

zdenko podobny

unread,
Sep 2, 2014, 3:58:08 PM9/2/14
to tesser...@googlegroups.com, Shree Devi Kumar, Ray Smith
Unfortunately only project owners can modify project page.

Zdenko


--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.

zdenko podobny

unread,
Sep 2, 2014, 4:02:06 PM9/2/14
to tesser...@googlegroups.com, Shree Devi Kumar, Ray Smith
AFAIR the same procedure was used in past releases - when Ray started new version he included info to changelog. And changelog was modified until public release.

Zdenko


zdenko podobny

unread,
Sep 2, 2014, 4:09:12 PM9/2/14
to tesser...@googlegroups.com, Ray Smith
svn repository was tagged (excluding 3.03-rc1).
It seams that tags were not transferred to git repository...
I will put it to my TODO list, but this is not big priority for me for the moment....

Zdenko


--
You received this message because you are subscribed to the Google Groups "tesseract-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-de...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.

Shree Devi Kumar

unread,
Sep 3, 2014, 8:21:04 AM9/3/14
to tesser...@googlegroups.com
Thanks for the clarifications regarding packaging, Zdenko. I have forwarded the email to the msys2 developers.

Shree

Shree Devi Kumar
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com


Jeff Breidenbach

unread,
Sep 6, 2014, 1:22:01 AM9/6/14
to tesser...@googlegroups.com
Hi everyone,

This is all my fault. 

I'm the person who took a source code snapshot and shipped it as version 3.03 
in several Linux distributions. I had two reasons to ship the code itself. First,
there were people who really needed the newly added PDF output from the 
command line tool. Second, Ray and I both thought the 3.03 release was 
imminent, and that my involvement would help get it done.

In retrospect it was a mistake. At the very least I should have labelled the 
program as a dated source code snapshot. Perhaps I should have ignored 
the various deadlines, and instead helped Ray make an official release before 
shipping anything. Bottom line is I made a mistake, I'm sorry. Zdenko in 
particular, I hope we can meet in person some day and that I can buy you
a beer.

Anyway, I learned my lesson and my goal now is to just help wherever I 
can towards the official source release. I have also removed the confusing 
portion of the homepage that Paul mentioned.

For reference here are some of the Linux usage numbers. (For comparison, 
Emacs23 to Tesseract installations are about 4:1 with is AMAZING.)
Also, I've checked and we are very lucky at least in Debian/Ubuntu. There
do not appear to be any programs using the portions of the API that are still
changing. So hopefully I will be able to get things to a better state without
causing too much pain.


Please let me know if you have any questions. Throwing rotten tomatoes is 
also acceptable. I can take it.

Cheers,
Jeff

Janusz S. Bien

unread,
Sep 6, 2014, 2:36:46 AM9/6/14
to tesser...@googlegroups.com
Quote/Cytat - Jeff Breidenbach <breid...@gmail.com> (Sat 06 Sep
2014 07:22:01 AM CEST):

> Please let me know if you have any questions. Throwing rotten tomatoes is
> also acceptable. I can take it.

I've just discovered how packaging is handled in MythTV:

https://github.com/MythTV/packaging

I think it is a good example. By running e.g. build-debs.sh from the
deb directory you get packages named like this:
mythtv_0.27.0~master.20140906.35dca9e-0ubuntu1.dsc.

With such names any confusion is impossible.

Thanks for your work and best regards

zdenko podobny

unread,
Sep 19, 2014, 6:51:22 PM9/19/14
to tesser...@googlegroups.com
I tagged master branch in repository (AFAIK initial code commit was 1.03). You can try:
    git tag -n1
or on tesseract source change page [1] you can select tag from combobox for master branch.


Zdenko

Shree Devi Kumar

unread,
Sep 21, 2014, 11:20:32 AM9/21/14
to tesser...@googlegroups.com, tesser...@googlegroups.com
Thanks for tagging the releases zdenko.

Now, it will be possible to automatically mark teh revisions so that the code compiled from git does not report being '3.04'

I posted the following as response to issue 1317

By using [m4_esyscmd_s([git describe --tags --long --always])] 
in configure.ac 
you can get a version number of the format 
"3.03-rc1-106-g9e8629d"

where 
3.03-rc1 is the tag for the last tagged commit
106 is the number of commits since then till the current revision
9e8629d  is the abbreviated hash tag of the current revision





Shree Devi Kumar
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

Reply all
Reply to author
Forward
0 new messages