Tesseract 4.1.0 released

2,726 views
Skip to first unread message

Zdenko Podobny

unread,
Jul 7, 2019, 1:34:37 PM7/7/19
to tesser...@googlegroups.com, tesser...@googlegroups.com
Hello all,

I am proud to announce that tesseract OCR engine version 4.1.0 - the bug fix release with new renders (API extension) Alto, LSTMBox, WordStrBox.  
See online Release notes [1].
Source code can be downloaded from GitHub [2].



Zdenko

Abstract

unread,
Jul 8, 2019, 6:05:37 AM7/8/19
to tesseract-ocr
Hi !

But what about vcpkg update for this version ? vcpkg is still 4.0.0, while --head version cannot compile due to C++11 incorrect changes

Zdenko Podobny

unread,
Jul 8, 2019, 12:14:06 PM7/8/19
to tesser...@googlegroups.com
We do not maintained vcpkg.
We officially support autotools, cmake (clang, msvc, g++),cppan(depreciated) and  sw builds. Or other way around - there are people that use these tools and contribute necessary changes.

Zdenko


po 8. 7. 2019 o 12:05 Abstract <a.na...@ivc.spb.ru> napísal(a):
Hi !

But what about vcpkg update for this version ? vcpkg is still 4.0.0, while --head version cannot compile due to C++11 incorrect changes

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/7ea92de7-2289-4fcd-9d9d-e83c7ac5307b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Zdenko Podobny

unread,
Jul 8, 2019, 12:33:04 PM7/8/19
to tesser...@googlegroups.com
I do not know what do you mean with:
cannot compile due to C++11 incorrect changes 

I just tried:
> mkdir build.msvc && cd build.msvc
> "c:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Auxiliary\Build\vcvars64.bat"
> set PKG_CONFIG_PATH=F:/win64/lib/pkgconfig/
> set INSTALL_DIR=F:\win64\bin\
> cmake .. -G "Visual Studio 15 2017 Win64" -DOPENMP_BUILD=OFF -DBUILD_TRAINING_TOOLS=OFF -DCPPAN_BUILD=OFF  -DCMAKE_INSTALL_PREFIX=%INSTALL_DIR% -DCMAKE_PREFIX_PATH=%INSTALL_DIR% -DCMAKE_BUILD_TYPE=Release
> cmake --build . --config Release

and the build was successful (1568 Warning(s), 0 Error(s))
AFAIK vcpkg use cmake, ninja and msvc for building.

Zdenko


po 8. 7. 2019 o 18:13 Zdenko Podobny <zde...@gmail.com> napísal(a):

ElGato ElMago

unread,
Jul 12, 2019, 4:09:33 AM7/12/19
to tesseract-ocr
Hello,

How do you use Alto, LSTMBox, and WordStrBox? Are they options for training or do you use them as command line options for tesseract?

ElMagoElGato

2019年7月8日月曜日 2時34分37秒 UTC+9 zdenop:

Shree Devi Kumar

unread,
Jul 12, 2019, 5:00:11 AM7/12/19
to tesser...@googlegroups.com
See https://github.com/tesseract-ocr/tesseract/blob/master/doc/tesseract.1.asc

Lstmbox and wordstrbox create box files for training.

Alto creates XML output.

Hocr creates HTML output.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

Abstract

unread,
Jul 12, 2019, 8:10:05 AM7/12/19
to tesseract-ocr
I mean I tried to compile with vcpkg tool (it's the only makes short names files in Windows native style, others make long prefixes with no chance to prevent it), as it's a nightmare to build all the used libs manually.
As I wrote, there're no packages for 4.1.0 version, so I can only use 4.0.0 or current repo state. 4.0.0 is built ok. But if I try to build --head, first I got compile messages for MSVC2015 about those changes in dawg...

After I switched to `2017 version, it goes on about that error, but has new problems with tiff lib paths in package (debug target). I paused a bit on editing build files to correct that, will try later.


понедельник, 8 июля 2019 г., 19:33:04 UTC+3 пользователь zdenop написал:

Joseph DiFrancisco

unread,
Jul 12, 2019, 2:11:07 PM7/12/19
to tesseract-ocr
When will this release be available in Homebrew?  4.0.0 is still the current formula https://formulae.brew.sh/formula/tesseract

Abdou

unread,
Jul 27, 2019, 6:44:53 AM7/27/19
to tesseract-ocr

Hello everyone I tried to use OCRD-train with tesseract 4.1 but I did not succeed. I noticed that with the RTL language, the wordstrbox reversed the text and wrote it as an LTR language. is a bug or I have to change some configuration for it to work well Thank you



Le dimanche 7 juillet 2019 19:34:37 ​​UTC + 2, zdenop a écrit:
Bonjour à tous,

Je suis fier de vous annoncer que la version 4.1.0 du moteur OCR tesseract - la version corrigeant les bogues avec de nouveaux rendus (extension API), Alto, LSTMBox, WordStrBox.  
Voir les notes de version en ligne [1].
Le code source peut être téléchargé à partir de GitHub [2].
Zdenko

Shree Devi Kumar

unread,
Jul 28, 2019, 4:23:47 AM7/28/19
to tesseract-ocr
It is not a bug but is intentional.


--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.


--

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

Alex Cohn

unread,
Jul 28, 2019, 9:12:22 AM7/28/19
to tesseract-ocr
Hi everybody,

I am proud to announce Android support for the new 4.1.0 version of tesseract OCR engine. This repo [1] includes both 3.05 and 4.1 branches, and lets you painlessly build a static command-line binary. In addition, it builds the Java binding, so libtress and liblept can be used from Java code of your app.

This release is different from the desktop. E.g., instead of adding libtiff, I provide a dummy DebugPixa class.

I welcome comments and reviews, as well as more careful testing.

Enjoy,
Alex

Shree Devi Kumar

unread,
Jul 28, 2019, 10:11:08 AM7/28/19
to tesseract-ocr
Thanks. Please add the info to Tesseract wiki page also.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

Alex Cohn

unread,
Jul 28, 2019, 11:28:28 AM7/28/19
to tesser...@googlegroups.com

You received this message because you are subscribed to a topic in the Google Groups "tesseract-ocr" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tesseract-ocr/oKtTOIGIMaM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWvRd_3UEnMnudndOR5C%2BJK0BSBtCriWu6D8yZJYxMp-w%40mail.gmail.com.

René Hansen

unread,
Jul 30, 2019, 8:05:52 AM7/30/19
to tesser...@googlegroups.com
A bit late to the party here, but I've just pushed changes that update build configs for tesseract 4 in https://github.com/rhardih/bad.

It now supports building 4.0.0 and 4.1.0. I've tested both versions on x86, armv7-a and arm64-v8a. All seems to be working just fine.

I'm using the default build tools of the project and *mostly* unmodified sources based on the official releases of the main repo. Relevant cmake command here:

https://github.com/rhardih/bad/blob/master/tesseract/tesseract-4.0.0.Dockerfile#L40

One thing to note however; I've had to replace some code that relied on glob.h, which isn't available if you want to build for Android 6.0. At least afaik.

See the diff here:

https://github.com/tesseract-ocr/tesseract/compare/4.1.0...rhardih:4.1.0-rhardih

It seems fileio.h is only used in training, so perhaps this change isn't even needed if there's a way to avoid building the training stuff, when building for Android.

Can anyone give me any pointers on that?

In any case, would it be worthwhile mentioning https://github.com/rhardih/bad in the wiki as an alternate means of building for Android and all you want is .so files?


/René




--
Never fear, Linux is here.

Shree Devi Kumar

unread,
Jul 30, 2019, 9:30:32 PM7/30/19
to tesseract-ocr
Please make a PR in the tesseract repo regarding the changes you needed for Android 6.0.

I am sure there is a way to build without training tools on Android. With autotools it is a separate step. 

Please update the wiki with link to your repo as an alternative way to build on Android.


Alex Cohn

unread,
Jul 31, 2019, 5:57:12 AM7/31/19
to tesseract-ocr
I don't build training, and I excluded fileio, following the path of Robyer (https://github.com/adaptech-cz/Tesseract4Android/commit/7852e08fa51ae1461883e5cf1dc858d531bb21c2).

This said, to setup and run ndk-build on any supported platform (Linux, Windows, MacOS) is IMHO easier than to use docker.

BR,
Alex


On Tuesday, July 30, 2019 at 3:05:52 PM UTC+3, René Hansen wrote:
A bit late to the party here, but I've just pushed changes that update build configs for tesseract 4 in https://github.com/rhardih/bad.

It now supports building 4.0.0 and 4.1.0. I've tested both versions on x86, armv7-a and arm64-v8a. All seems to be working just fine.

I'm using the default build tools of the project and *mostly* unmodified sources based on the official releases of the main repo. Relevant cmake command here:

https://github.com/rhardih/bad/blob/master/tesseract/tesseract-4.0.0.Dockerfile#L40

One thing to note however; I've had to replace some code that relied on glob.h, which isn't available if you want to build for Android 6.0. At least afaik.

See the diff here:

https://github.com/tesseract-ocr/tesseract/compare/4.1.0...rhardih:4.1.0-rhardih

It seems fileio.h is only used in training, so perhaps this change isn't even needed if there's a way to avoid building the training stuff, when building for Android.

Can anyone give me any pointers on that?

In any case, would it be worthwhile mentioning https://github.com/rhardih/bad in the wiki as an alternate means of building for Android and all you want is .so files?


/René


On Sun, 28 Jul 2019 at 17:28, Alex Cohn <sash...@gmail.com> wrote:
On Sun, 28 Jul 2019, 17:11 Shree Devi Kumar, <shree...@gmail.com> wrote:
Thanks. Please add the info to Tesseract wiki page also.

On Sun, 28 Jul 2019, 18:42 Alex Cohn, <sash...@gmail.com> wrote:
Hi everybody,

I am proud to announce Android support for the new 4.1.0 version of tesseract OCR engine. This repo [1] includes both 3.05 and 4.1 branches, and lets you painlessly build a static command-line binary. In addition, it builds the Java binding, so libtress and liblept can be used from Java code of your app.

This release is different from the desktop. E.g., instead of adding libtiff, I provide a dummy DebugPixa class.

I welcome comments and reviews, as well as more careful testing.

Enjoy,
Alex

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesser...@googlegroups.com.

--
You received this message because you are subscribed to a topic in the Google Groups "tesseract-ocr" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tesseract-ocr/oKtTOIGIMaM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to tesser...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesser...@googlegroups.com.

René Hansen

unread,
Jul 31, 2019, 6:43:24 AM7/31/19
to tesser...@googlegroups.com
Thanks Alex, I'll go and have a look. One would imagine that -D BUILD_TRAINING_TOOLS=OFF should be enough.

I know Docker is not everyones cup of tea, but in my case, I've just become so used to trying to avoid installing anything on my host system if possible..

One thing that got me down this path, was the insane install size of the NDK. I think when I started out, r16b took up more than 3GB on my host system. And usually you'll need at least a few versions of the NDK for compatibility purposes and the different versions of compilers between releases.

In comparison, the docker image with the standalone toolchain for e.g. the r16b/android-23/arm-linux-androideabi-4.9 combo is about ~1.3GB. When you're done building, you just remove the image and reclaim the space.

Another thing docker provides, is the determinism that comes with running in the exact same environment, no matter what host you're on. In my experience, subtle differences between platforms always has a way of costing me headaches and time, due to some edge case bug or incompatibility. With docker I've never had that happen.

I think having more projects related to the same issue is inherently a good thing. It makes the surface area that much larger, and thus easier to find, when people go solution hunting on google.

I think it's just a matter of different strokes for different folks.

@Shree

I would much prefer to download/compile from official releases, but I think the solution about not building the training tools for Android, is a better than peppering the codebase with legacy cruft. I don't imagine running the tools on Android will ever be a use-case?

I'll go ahead and update the wiki.


/René



To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/aacc72db-9317-46bd-ac6e-af509228f4e0%40googlegroups.com.

Alex Cohn

unread,
Jul 31, 2019, 12:31:15 PM7/31/19
to tesseract-ocr
Hi René,

thanks for your detailed post.

Let me try to explain why I prefer to use NDK 'directly'. 

When we need some libraries (like libtess) as parts of our apps, we need the library integrated into the app. Often, the library comes with JNI layer and can be accessed (inderectly) from the app Java/Kotlin code. Such setup requires building libtess with extra C++ classes that are not present in https://github.com/tesseract-ocr/tesseract repository. They only exist in https://github.com/rmtheis/tess-two repository and its forks. Also, this requires building the Java/Kotlin side of the JNI wrapper, which needs Android SDK, not only NDK.

An alternative approach is to consume libtess in a C++ library that is used by app Java code. This time, we need NDK to be integrated into Android Studio. Actually, we always want NDK to be integrated into Android Studio, because this is not only a build environment, it's a debug environment, too. Android Studio 3.2 and higher allows to step through Java code into C++ code, and back. To set breakpoints in both Java and C++, and so on.

A library that is compiled externally, like in docker, is harder to debug. I agree that for some libraries, in some cases, the hardship of building it properly outweighs the hardship of debugging it. But here and now, building libtess – both static library that can be seamlessly linked into your own C++ library, and as shared library that can be called through JNI from the app – is easy and even quite quick.

As for your complaint about keeping multiple versions of Android NDK for compatibility purposes, I beg to differ. The NDK team does a very good job of maintaining backwards compatibility, so there is no added value in using older NDK releases (as long as you can master the build process to fit the latest and greatest NDK release). That's what I took care of in my fork https://github.com/alexcohn/tess-two and now you don't need NDK r16 build anymore. I dare say, still using NDK r16 these days is unjustified. It has known runtime bugs, e.g. with crash reports for arm64, and may be considered a security risk.

Best Regards,
Alex

Alex Cohn

unread,
Jul 31, 2019, 1:32:00 PM7/31/19
to tesseract-ocr
On Wednesday, July 31, 2019 at 1:43:24 PM UTC+3, René Hansen wrote:
Thanks Alex, I'll go and have a look. One would imagine that -D BUILD_TRAINING_TOOLS=OFF should be enough.

Disabling build of training is not enough. You must explicitly exclude fileio.cpp, too, because it's not a part of training, even though it is used only there.

Alex 

Sergei Sokolov

unread,
Jul 31, 2019, 1:51:22 PM7/31/19
to tesseract-ocr
Is there a docker container with 4.1.0 version available on docker hub?

René Hansen

unread,
Aug 1, 2019, 7:01:08 AM8/1/19
to tesser...@googlegroups.com
I can completely understand the reasons and need for the way tess-two does things. If I was working with Android Studio and Java/Kotlin, I would probably never have spend time on this. Last time I used tess-two it worked flawlessly.

I am coming at this from the perspective of Qt projects however. That means I'm already in C++ land for the most part, which also means, that I have a preference towards including and/or building libraries in a more traditional way.

The debugging aspect is probably not as nice as with Android Studio integrations, but I don't strip debug symbols by default in my builds, so using gdb or lldb should work without further ado.


/René


--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

René Hansen

unread,
Aug 1, 2019, 7:04:01 AM8/1/19
to tesser...@googlegroups.com
Good point, I see fileio.h referenced here:

unittest/fileio_test.cc
unittest/ligature_table_test.cc
unittest/include_gunit.h
unittest/pango_font_info_test.cc
src/training/boxchar.cpp
src/training/text2image.cpp
src/training/pango_font_info.cpp
src/training/lang_model_helpers.cpp
src/training/unicharset_training_utils.cpp
src/ccutil/fileio.cpp
src/ccutil/Makefile.am
src/ccutil/fileio.h


So perhaps it's not completely without reason, to modularise the build in such a way, that it isn't included at all. Otherwise including the patch might be a better option. I'll have to look into it some more.


/René



--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

Alex Cohn

unread,
Aug 1, 2019, 8:11:30 AM8/1/19
to tesseract-ocr
It's nice that there are different ways to achieve (almost) same things with as little hassle as possible.

BTW, I also added reference to your B.A.D. in https://github.com/tesseract-ocr/tesseract/wiki/4.0-Docker-Containers.

Sincerely,
Alex

Zdenko Podobny

unread,
Aug 1, 2019, 9:16:30 AM8/1/19
to tesser...@googlegroups.com
Thanks. Attached patch should fix it (it does not solve unittest part @Shree: are you able to fix unittest). Can you test it?

Zdenko


št 1. 8. 2019 o 13:03 René Hansen <ren...@gmail.com> napísal(a):
fileio.patch

René Hansen

unread,
Aug 1, 2019, 1:06:02 PM8/1/19
to tesser...@googlegroups.com
Thanks Alex.

Cool Zdenko,

I can't find any reference to the unittest sub-directory in the main CMakeLists.txt, so it seems to only be included in the autotools build. Guess that is not a problem then.

I've tested your patch; I'm building tag tag 4.1.0-rhardih-00 off my own branch, where I've applied your patch. Commit 8c4518.

Somehow getting an object file for fileio.cpp and so I fail in the linker step still:

...
[100%] Linking CXX executable bin/tesseract
libtesseract.so: undefined reference to `glob'
libtesseract.so: undefined reference to `globfree'
clang70++: error: linker command failed with exit code 1 (use -v to see invocation)
...


And rightly so:

# nm -g ./CMakeFiles/libtesseract.dir/src/ccutil/fileio.cpp.o | grep -B 100 glob
0000000000000000 V DW.ref.__gxx_personality_v0
                 U _Unwind_Resume
                 U _Z7tprintfPKcz
000000000000034c T _ZN9tesseract11InputBuffer4ReadEPNSt6__ndk112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEE
0000000000000410 T _ZN9tesseract11InputBuffer9CloseFileEv
0000000000000628 T _ZN9tesseract11InputBufferC1EP7__sFILE
0000000000000674 T _ZN9tesseract11InputBufferC1EP7__sFILEm
0000000000000628 T _ZN9tesseract11InputBufferC2EP7__sFILE
0000000000000674 T _ZN9tesseract11InputBufferC2EP7__sFILEm
00000000000006c0 T _ZN9tesseract11InputBufferD1Ev
00000000000006c0 T _ZN9tesseract11InputBufferD2Ev
00000000000006f0 T _ZN9tesseract12OutputBuffer11WriteStringERKNSt6__ndk112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEE
000000000000070c T _ZN9tesseract12OutputBuffer9CloseFileEv
00000000000006d0 T _ZN9tesseract12OutputBufferC1EP7__sFILE
00000000000006d8 T _ZN9tesseract12OutputBufferC1EP7__sFILEm
00000000000006d0 T _ZN9tesseract12OutputBufferC2EP7__sFILE
00000000000006d8 T _ZN9tesseract12OutputBufferC2EP7__sFILEm
00000000000006e0 T _ZN9tesseract12OutputBufferD1Ev
00000000000006e0 T _ZN9tesseract12OutputBufferD2Ev
00000000000001bc T _ZN9tesseract4File16ReadFileToStringERKNSt6__ndk112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEEPS7_
0000000000000570 T _ZN9tesseract4File19DeleteMatchingFilesEPKc
00000000000000b0 T _ZN9tesseract4File22WriteStringToFileOrDieERKNSt6__ndk112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEES9_
0000000000000000 T _ZN9tesseract4File4OpenERKNSt6__ndk112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEES9_
0000000000000528 T _ZN9tesseract4File6DeleteEPKc
0000000000000440 T _ZN9tesseract4File8JoinPathERKNSt6__ndk112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEES9_
0000000000000184 T _ZN9tesseract4File8ReadableERKNSt6__ndk112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEE
0000000000000024 T _ZN9tesseract4File9OpenOrDieERKNSt6__ndk112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEES9_
                 U _ZNK7ERRCODE5errorEPKc16TessErrorLogCodeS1_z
0000000000000000 W _ZNKSt6__ndk121__basic_string_commonILb1EE20__throw_length_errorEv
                 U _ZNSt11logic_errorC2EPKc
                 U _ZNSt12length_errorD1Ev
0000000000000000 W _ZNSt6__ndk112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEE6appendEPKcm
0000000000000000 W _ZNSt6__ndk112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEE6assignEPKcm
0000000000000000 W _ZNSt6__ndk1plIcNS_11char_traitsIcEENS_9allocatorIcEEEENS_12basic_stringIT_T0_T1_EERKS9_PKS6_
0000000000000000 W _ZNSt6__ndk1plIcNS_11char_traitsIcEENS_9allocatorIcEEEENS_12basic_stringIT_T0_T1_EERKS9_SB_
                 U _ZTISt12length_error
                 U _ZTVSt12length_error
                 U _ZdlPv
                 U _Znwm
                 U __cxa_allocate_exception
                 U __cxa_free_exception
                 U __cxa_throw
                 U __gxx_personality_v0
                 U clearerr
                 U fclose
                 U ferror
                 U fopen
                 U fputs
                 U fread
                 U fseek
                 U ftell
                 U glob
                 U globfree


My quick grep skills haven't revealed why it's still included though ....

/René



Zdenko Podobny

unread,
Aug 1, 2019, 1:14:53 PM8/1/19
to tesser...@googlegroups.com
try to run build in new directory. There should not be any  ccutil/fileio.cpp.o - file is move to training part....

Zdenko


št 1. 8. 2019 o 19:05 René Hansen <ren...@gmail.com> napísal(a):

JB Data31

unread,
Aug 2, 2019, 1:52:02 AM8/2/19
to tesser...@googlegroups.com
$ git clone https://github.com/alexcohn/tess-two.git tess-two-git
Cloning into 'tess-two-git'...
...
$ ndk-build -C tess-two-git/tess-two tesseract APP_ABI=arm64-v8a APP_PLATFORM=android-24
Android NDK: WARNING: APP_PLATFORM android-24 is higher than android:minSdkVersion 1 in ./AndroidManifest.xml. NDK binaries will *not* be compatible with devices older than android-24. See https://android.googlesource.com/platform/ndk/+/master/docs/user/common_problems.md for more information.
make: Entering directory `.../tess-two-git/tess-two'
make: *** No rule to make target `tesseract'.  Stop.
make: Leaving directory `.../tess-two-git/tess-two'
$

I'd like to compare to a painfully build process, I follow the 2 steps wiki how to, but fails.
Where is my bad ?

@JBΔ



Alex Cohn

unread,
Aug 2, 2019, 9:05:32 AM8/2/19
to tesseract-ocr

On Friday, August 2, 2019 at 8:52:02 AM UTC+3, JB Data31 wrote:
$ git clone https://github.com/alexcohn/tess-two.git tess-two-git
Cloning into 'tess-two-git'...
...
$ ndk-build -C tess-two-git/tess-two tesseract APP_ABI=arm64-v8a APP_PLATFORM=android-24
Android NDK: WARNING: APP_PLATFORM android-24 is higher than android:minSdkVersion 1 in ./AndroidManifest.xml. NDK binaries will *not* be compatible with devices older than android-24. See https://android.googlesource.com/platform/ndk/+/master/docs/user/common_problems.md for more information.
make: Entering directory `.../tess-two-git/tess-two'
make: *** No rule to make target `tesseract'.  Stop.
make: Leaving directory `.../tess-two-git/tess-two'
$

I'd like to compare to a painfully build process, I follow the 2 steps wiki how to, but fails.
Where is my bad ?

@JBΔ
 
It's my fault. I should have removed the **master** branch (done now). You should choose either 4.1 or 3.05.

Alex
 

JB Data31

unread,
Aug 5, 2019, 1:21:32 AM8/5/19
to tesser...@googlegroups.com

$ git clone https://github.com/alexcohn/tess-two.git tess-two-git
Cloning into 'tess-two-git'...
...
Resolving deltas: 100% (7359/7359), done.

$ ndk-build -C tess-two-git/tess-two tesseract APP_ABI=arm64-v8a APP_PLATFORM=android-24
Android NDK: WARNING: APP_PLATFORM android-24 is higher than android:minSdkVersion 1 in ./AndroidManifest.xml. NDK binaries will *not* be compatible with devices older than android-24. See https://android.googlesource.com/platform/ndk/+/master/docs/user/common_problems.md for more information.
make: Entering directory `.../tess-two-git/tess-two'
make: *** No rule to make target `jni/../../tesseract/src/api/tesseractmain.cpp', needed by `obj/local/arm64-v8a/objs/tesseract/api/tesseractmain.o'.  Stop.

make: Leaving directory `.../tess-two-git/tess-two'
$

?

@JBΔ

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

Zdenko Podobny

unread,
Aug 5, 2019, 2:15:45 AM8/5/19
to tesser...@googlegroups.com
I am sorry I found the problem - moving fileio.* was already staged, so it did not became part of patch... Now it is part of master, so you can cherry-pick it for 4.1 if needed.

Zdenko


št 1. 8. 2019 o 19:14 Zdenko Podobny <zde...@gmail.com> napísal(a):

René Hansen

unread,
Aug 5, 2019, 2:35:06 AM8/5/19
to tesser...@googlegroups.com
Awesome! Thanks Zdenko.

Would it be possible to tag c5a50b93ce as something like 4.1.1?

That way I can target an official release and get rid of my own fork.


/René



Zdenko Podobny

unread,
Aug 5, 2019, 2:56:20 AM8/5/19
to tesser...@googlegroups.com
I would like to create/release 4.1.1 (just cherry-pick fixes from master/5.0.0), but it requires time... Maybe end of August, just to see what happens in master repository.

Zdenko


po 5. 8. 2019 o 8:35 René Hansen <ren...@gmail.com> napísal(a):

Alex Cohn

unread,
Aug 5, 2019, 3:00:22 AM8/5/19
to tesseract-ocr
On Monday, August 5, 2019 at 8:21:32 AM UTC+3, JB Data31 wrote:

$ date
Mon Aug  5 04:58:08 UTC 2019
$ git clone https://github.com/alexcohn/tess-two.git tess-two-git
Cloning into 'tess-two-git'...
...
Resolving deltas: 100% (7359/7359), done.

Which branch did you check in this time? 
 
$ ndk-build -C tess-two-git/tess-two tesseract APP_ABI=arm64-v8a APP_PLATFORM=android-24
Android NDK: WARNING: APP_PLATFORM android-24 is higher than android:minSdkVersion 1 in ./AndroidManifest.xml. NDK binaries will *not* be compatible with devices older than android-24. See https://android.googlesource.com/platform/ndk/+/master/docs/user/common_problems.md for more information.
make: Entering directory `.../tess-two-git/tess-two'
make: *** No rule to make target `jni/../../tesseract/src/api/tesseractmain.cpp', needed by `obj/local/arm64-v8a/objs/tesseract/api/tesseractmain.o'.  Stop.
make: Leaving directory `.../tess-two-git/tess-two'

Also, there may be old .o.d files under tess-two-git/tess-two/obj, it sometimes helps to delete them.

BR, 
Alex 

René Hansen

unread,
Aug 5, 2019, 3:23:59 AM8/5/19
to tesser...@googlegroups.com
Alright, I'll keep the fork around till then. Thanks.


/René



Stefan Weil

unread,
Aug 7, 2019, 4:55:59 AM8/7/19
to tesseract-ocr
It's a pity that I did not see this discussion earlier. I understand that old Android now builds fine. On the other side, the Appveyor CI build for Windows was now broken, and unittest still no longer build. That's not a good result. :-(

I therefore suggest to go back to my commit which moved fileio.* from training to ccutil. Then conditional compilation for Android can be added to fileio.cpp. How should it look like?

René Hansen

unread,
Aug 7, 2019, 5:50:50 AM8/7/19
to tesser...@googlegroups.com
Agreed. Maybe the real solution after all, is to drop the usage of glob, and go for a portable solution?

This is how I got around it initially. Not the best code though:
https://github.com/tesseract-ocr/tesseract/compare/4.1.0...rhardih:4.1.0-rhardih



On Wed, 7 Aug 2019 at 10:56, 'Stefan Weil' via tesseract-ocr <tesser...@googlegroups.com> wrote:
It's a pity that I did not see this discussion earlier. I understand that old Android now builds fine. On the other side, the Appveyor CI build for Windows was now broken, and unittest still no longer build. That's not a good result. :-(

I therefore suggest to go back to my commit which moved fileio.* from training to ccutil. Then conditional compilation for Android can be added to fileio.cpp. How should it look like?

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

Alex Cohn

unread,
Aug 8, 2019, 3:27:35 AM8/8/19
to tesseract-ocr
I believe that there is no true need to change anything. To run unittest (and even training) on Android, it's enough to choose __ANDROID_API__=28 (or higher). Methinks that this is a reasonable restriction. The production version of the library can still be built with  __ANDROID_API__=16 and exclude fileio.cpp

Alex

JB Data31

unread,
Aug 13, 2019, 6:31:20 AM8/13/19
to tesseract-ocr
$ git clone -b "4.1" --single-branch https://github.com/alexcohn/tess-two.git tess-two-git-2
Cloning into 'tess-two-git-2'...
remote: Enumerating objects: 34, done.
remote: Counting objects: 100% (34/34), done.
remote: Compressing objects: 100% (34/34), done.
remote: Total 11423 (delta 1), reused 27 (delta 0), pack-reused 11389
Receiving objects: 100% (11423/11423), 157.20 MiB | 1.20 MiB/s, done.
Resolving deltas: 100% (6708/6708), done.

$ ndk-build -C tess-two-git-2/tess-two tesseract APP_ABI=arm64-v8a APP_PLATFORM=android-24

Android NDK: WARNING: APP_PLATFORM android-24 is higher than android:minSdkVersion 1 in ./AndroidManifest.xml. NDK binaries will *not* be compatible with devices older than android-24. See https://android.googlesource.com/platform/ndk/+/master/docs/user/common_problems.md for more information.
make: Entering directory `.../tess-two-git-2/tess-two'

make: *** No rule to make target `jni/../../tesseract/src/api/tesseractmain.cpp', needed by `obj/local/arm64-v8a/objs/tesseract/api/tesseractmain.o'.  Stop.
make: Leaving directory `.../tess-two-git-2/tess-two'

$ ll tess-two-git-2/tess-two/obj

ls: cannot access 'tess-two-git-2/tess-two/obj': No such file or directory

Branch : 4.1, no tess-two/obj dir, created no changes.

Alex Cohn

unread,
Aug 13, 2019, 8:22:11 AM8/13/19
to tesseract-ocr
Oh, I now understand the problem. You need  git clone --recurse-submodules. To add the missing submodules after clone,

git submodule init 
git submodule update

BR,
Alex

JB Data31

unread,
Aug 14, 2019, 4:01:44 AM8/14/19
to tesser...@googlegroups.com
Build done.
...
[arm64-v8a] StaticLibrary  : libpngt_static.a
[arm64-v8a] Executable     : tesseract

Is it the static command-line executable tesseract WIKI says ?
$ file tess-two-git-3/tess-two/obj/local/arm64-v8a/tesseract
tess-two-git-3/tess-two/obj/local/arm64-v8a/tesseract: ELF 64-bit LSB shared object, ARM aarch64, version 1 (SYSV), dynamically linked, interpreter /system/, BuildID[sha1]=d09c8cfe1013d63e57afeaaf0837a54905cbb7ef, with debug_info, not stripped

@JBΔ

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

Alex Cohn

unread,
Aug 14, 2019, 5:45:46 AM8/14/19
to tesseract-ocr
On Wednesday, August 14, 2019 at 11:01:44 AM UTC+3, JB Data31 wrote:
Build done.
...
[arm64-v8a] StaticLibrary  : libpngt_static.a
[arm64-v8a] Executable     : tesseract

Is it the static command-line executable tesseract WIKI says ?
$ file tess-two-git-3/tess-two/obj/local/arm64-v8a/tesseract
tess-two-git-3/tess-two/obj/local/arm64-v8a/tesseract: ELF 64-bit LSB shared object, ARM aarch64, version 1 (SYSV), dynamically linked, interpreter /system/, BuildID[sha1]=d09c8cfe1013d63e57afeaaf0837a54905cbb7ef, with debug_info, not stripped

@JBΔ

Let me explain what I mean by static. This command-line executable does not depend on other user libraries, like libpng, liblept, or libtess. So, to run it on any Android device, you only need to copy a single file. 

But this  does not include the system runtime code (libc, libm, et al.). Technically, it's possible to build an executable that contains also system libraries, see e.g. https://eli.thegreenplace.net/2012/08/13/how-statically-linked-programs-run-on-linux. This will result in a bigger file, but no advantages before the current version. After all, the system calls are still performed by the kernel, so even a statically linked executable depends on the Android system where it runs.

Sincerely,
Alex Cohn
 

NY C

unread,
Dec 7, 2019, 3:37:01 AM12/7/19
to tesseract-ocr

Hi, I am using tess-two for OCR.


The version I use is : https://github.com/alexcohn/tess-two


Code:

        TessBaseAPI baseApi = new TessBaseAPI();
        baseApi.setDebug(true);
        baseApi.init(pathfiles, language);
        baseApi.setVariable(TessBaseAPI.VAR_CHAR_WHITELIST, "0123456789");
        baseApi.setPageSegMode(TessBaseAPI.PageSegMode.PSM_AUTO);
        baseApi.setImage(bmp);
        result= baseApi.getUTF8Text();
        baseApi.end();


The code run perfectly when I use this tessdata :https://github.com/tesseract-ocr/tessdata


But when I use tessdata_fast (https://github.com/tesseract-ocr/tessdata_fast), The code crashes on baseApi.init.


There is no error message since the init method calls native C++. As far as I can trace, the init method crashes on this line:


boolean success = nativeInitOem(mNativeData, datapath, language, ocrEngineMode);


Is it possible to use tessdata_fast in tess-two?

Or did I miss something?


Shree Devi Kumar

unread,
Dec 7, 2019, 4:05:42 AM12/7/19
to tesseract-ocr
tessdata supports both legacy engine and lstm engine. Tessdata_fast and tessdata_best only support lstm engine.

To use tessdata_fast , use oem engine code 1. 

On command line it is --oem 1.please look up the corresponding syntax. 

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

Shree Devi Kumar

unread,
Dec 7, 2019, 4:06:08 AM12/7/19
to tesseract-ocr
ocrEngineMode

NY C

unread,
Dec 7, 2019, 4:37:59 AM12/7/19
to tesseract-ocr
I  changed the the oem to this as you said :
baseApi.init(pathfiles, language, TessBaseAPI.OEM_CUBE_ONLY);
but it still crashes.

I tried all the parameters I can find
(OEM_TESSERACT_ONLY = 0,  OEM_CUBE_ONLY = 1, OEM_TESSERACT_CUBE_COMBINED = 2, OEM_DEFAULT = 3)
They crashes on the same line.


Message has been deleted

NY C

unread,
Dec 8, 2019, 11:03:57 AM12/8/19
to tesseract-ocr
Also, I think CUBE is removed from tesseract 4x.
I found it very strange that there is no suitable OEM value in tess-two 9.0.0.

Could somebody help me here. Do I miss anything to make tessdata_fast work in tess-two?



NY C於 2019年12月7日星期六 UTC+8下午5時37分59秒寫道:
Reply all
Reply to author
Forward
0 new messages