Re: Heads up: release of tesseract 4.0

508 views
Skip to first unread message

Zdenko Podobny

unread,
Sep 30, 2018, 1:50:59 PM9/30/18
to tesser...@googlegroups.com, tesser...@googlegroups.com
RC 1[1] ready.
Please test, test, test. Especially if you are wrapping tesseract and creating/providing packages.
Report problems ASAP in issue tracker, so we can fix it until finale release.



Zdenko


so 22. 9. 2018 o 17:06 Zdenko Podobny <zde...@gmail.com> napísal(a):
Hello,

I would like to thank all who share their thought about releasing new version of tesseract [1]. I took my time and I decided we should make release at the middle October 2018 (14-21...).

This should means that no new features will be applied to current code. There is not time for testings. Anyway please feel free to send your patch/PR - it will included after 4.0 release.

There are several ways, how people can contribute to this process:
  • Developers: go through open issues, try to fix it. Please make a comment when you start do deal with issue, so we can use our capacity efficiently.
  • Packagers: please test if building and packaging process is working fine. If something is broken, try to fix&submit it fast. Please give a note to forum or me directly, where users can find your "product", so we can put information about supported systems to release notes.
  • "Wrappers": if you are producing wrapper for tesseract, please give a note to forum or me directly if you support tesseract 4: I would like to promote your work. 
  • "No code" developers:
    • check open issues, test it with the latest code if it still valid report, prepare test case if missing, report duplicates, suggest label etc.
    • Improve documentation, release notes, man pages etc...
    • English native speaker: check documentation, release notes etc.
Thanks to all who help us to get to this point. I really appreciate all ways of support.


Zdenko

Zdenko Podobny

unread,
Oct 7, 2018, 3:18:49 PM10/7/18
to tesser...@googlegroups.com, tesser...@googlegroups.com
RC 2 is ready[1].
Please test, test, test. Especially if you are wrapping tesseract and creating/providing packages.
If you have any patch for the current code please provide it ASAP (via pull request or attach patch to issue tracker) for evaluation.
Report problems ASAP in issue tracker, so we can fix it until finale release.  


ne 30. 9. 2018 o 19:50 Zdenko Podobny <zde...@gmail.com> napísal(a):

Zdenko Podobny

unread,
Oct 14, 2018, 1:49:22 PM10/14/18
to tesser...@googlegroups.com, tesser...@googlegroups.com
RC 3 is ready[1].

Please test, test, test. 
Especially if you are building tesseract on other platform than linux or windows (cppan+cmake).

If you have any patch for the current code please provide it ASAP (via pull request or attach patch to issue tracker) for evaluation.
Report problems ASAP in issue tracker, so we can fix it until finale release.  

ne 7. 10. 2018 o 21:18 Zdenko Podobny <zde...@gmail.com> napísal(a):

universal reseller

unread,
Oct 14, 2018, 2:10:05 PM10/14/18
to tesser...@googlegroups.com
hi
is thre any other release candidate? or this is last?!

Zdenko Podobny

unread,
Oct 14, 2018, 2:17:00 PM10/14/18
to tesser...@googlegroups.com
it will depends based on number of (significant) commits and findings ;-) 
E.g. just yesterday we got fixes for Mac and it is still not clear if build from scratch will work on Mac...

Just short statistics about number of commits:
4.0.0-beta.3..4.0.0-beta.4    259 commits
4.0.0-beta.4..4.0.0-rc1       178 commits
4.0.0-rc1..4.0.0-rc2           51 commits
4.0.0-rc2..4.0.0-rc3           70 commits

Do a lot of topics were improved during last weeks.

Zdenko


ne 14. 10. 2018 o 20:10 universal reseller <unire...@gmail.com> napísal(a):
hi
is thre any other release candidate? or this is last?!

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAC9ebroWFKLbBLNF2J2Rkr73BMdBmgz5Uw2T1Yrsizgm0d%2BOjA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Soumik Ranjan Dasgupta

unread,
Oct 15, 2018, 4:09:42 AM10/15/18
to tesser...@googlegroups.com
Is there any way tesseract could be installed using pip for Ubuntu 16.04 systems and above?


For more options, visit https://groups.google.com/d/optout.


--
Regards,
Soumik Ranjan Dasgupta

Zdenko Podobny

unread,
Oct 15, 2018, 4:30:13 AM10/15/18
to tesser...@googlegroups.com
Are familiar with tools you try to use? 
pip is for distribution python modules and tesseract is c++ project, that are distributed with other tools (depending on linux distribution) - on Ubuntu it should be apt.

Zdenko


po 15. 10. 2018 o 10:09 Soumik Ranjan Dasgupta <srd...@cse.jgec.ac.in> napísal(a):

Soumik Ranjan Dasgupta

unread,
Oct 15, 2018, 4:32:39 AM10/15/18
to tesser...@googlegroups.com
Didn't know that, sorry. Thank you for the information.  
In that case, would it be possible to find a way to install tesseract via apt on Ubuntu 16.04 systems?    

Zdenko Podobny

unread,
Oct 15, 2018, 4:34:17 AM10/15/18
to tesser...@googlegroups.com
read the forum, and wiki ;-)
It is already there.

Zdenko


po 15. 10. 2018 o 10:32 Soumik Ranjan Dasgupta <srd...@cse.jgec.ac.in> napísal(a):

Zdenko Podobny

unread,
Oct 16, 2018, 3:48:05 AM10/16/18
to tesser...@googlegroups.com, tesser...@googlegroups.com
Is here anybody who build&use tesseract on Android? 

ne 14. 10. 2018 o 19:48 Zdenko Podobny <zde...@gmail.com> napísal(a):

avikam

unread,
Oct 21, 2018, 1:34:45 AM10/21/18
to tesseract-ocr
I was able to compile and use 4.0.0-rc3 on Android (tested on a virtual device with API level 26 and on a real API level 29 device).
Basically, I used NDK CMake toolchain to build it, but I had to make minor changes. 

${SDK}/cmake/3.6.4111459/bin/cmake -G"Android Gradle - Ninja" -DCMAKE_MAKE_PROGRAM=${SDK}/cmake/3.6.4111459/bin/ninja -DCMAKE_TOOLCHAIN_FILE=${NDK}/build/cmake/android.toolchain.cmake -DANDROID_ABI=${ARCH} -DCMAKE_LIBRARY_OUTPUT_DIRECTORY=/opt/output/${ARCH}/ -DCMAKE_INSTALL_PREFIX=/opt/${ARCH} -DBUILD_TRAINING_TOOLS=OFF -DBUILD_TESTS=OFF -DANDROID_PLATFORM=android-26 -DANDROID_STL=c++_static ..


Out of the box, this build would fail because of a dependency on glob() std call (in File::DeleteMatchingFiles).
While it might work by simply changing -DANDROID_PLATFORM=android-28, I noticed that this function is only used for training so I removed it and it worked OK.

Zdenko Podobny

unread,
Oct 21, 2018, 7:39:43 AM10/21/18
to tesser...@googlegroups.com
Thanks! Did somebody try to build with  ndk-build ?
 
Zdenko


ne 21. 10. 2018 o 7:34 avikam <avi...@gmail.com> napísal(a):
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

Zdenko Podobny

unread,
Oct 24, 2018, 3:04:25 AM10/24/18
to tesser...@googlegroups.com, tesser...@googlegroups.com
RC 4 is ready[1].

I expect this is the last RC. Any improvement to docs and release notes are welcomed.
Report problems ASAP in issue tracker, so we can fix it until finale release.  


ne 14. 10. 2018 o 19:48 Zdenko Podobny <zde...@gmail.com> napísal(a):

flavi...@gmail.com

unread,
Oct 29, 2018, 9:17:09 AM10/29/18
to tesseract-ocr
    Hi Zdenko. Could you send me a example (from a link, or from something) to reveal me how to compile Tesseract 4 in order to use it in a VC++ project ? I have a task to do that, and my time is running out, and I haven't found a functional and working sample of how to do that ... I guess you already have a project that using Tesseract in VC++ ... I will be grateful for any hint to lead me for solving my task.

Regards,
Flaviu.

Zdenko Podobny

unread,
Oct 29, 2018, 1:50:05 PM10/29/18
to tesser...@googlegroups.com
I already gave you step by step instructions (the same as on the wiki ;-) but just commands you need to write).
You replied that is does not work for you without any explanation what does not work. With this state of mind I do not know how to help you.




Zdenko


po 29. 10. 2018 o 14:17 <flavi...@gmail.com> napísal(a):
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

flavi...@gmail.com

unread,
Oct 30, 2018, 4:11:58 AM10/30/18
to tesseract-ocr
I have compiled now the tesseract library, with cppan.

and I have found a test app, with this source code:

/*
dependencies:
    pvt.cppan.demo.google.tesseract.libtesseract: master
    pvt.cppan.demo.danbloomberg.leptonica: 1
*/

#include <iostream>
#include <memory>

#include <allheaders.h> // leptonica main header for image io
#include <baseapi.h> // tesseract main header

int main(int argc, char *argv[])
{
// if (argc == 1)
// return 1;

tesseract::TessBaseAPI tess;

if (tess.Init("./tessdata", "eng"))
{
std::cout << "OCRTesseract: Could not initialize tesseract." << std::endl;
return 1;
}

// setup
tess.SetPageSegMode(tesseract::PageSegMode::PSM_AUTO);
tess.SetVariable("save_best_choices", "T");

// read image
auto pixs = pixRead(argv[1]);
if (! pixs)
{
std::cout << "Cannot open input file: " << argv[1] << std::endl;
return 1;
}

// recognize
tess.SetImage(pixs);
tess.Recognize(0);

// get result and delete[] returned char* string
std::cout << std::unique_ptr<char[]>(tess.GetUTF8Text()).get() << std::endl;

// cleanup
tess.Clear();
pixDestroy(&pixs);

return 0;
}

and when I am trying to run this, I got:

Error opening data file ./tessdata/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
OCRTesseract: Could not initialize tesseract.

There is not enough to compile tesseract and dependencies, what else should I setup in order to run this code in a VC++ project ?

Zdenko Podobny

unread,
Oct 30, 2018, 5:04:08 AM10/30/18
to tesser...@googlegroups.com
First learn to write forum e-mails! Stop stealing email threads. 
Your questions/problems has nothing to do with content of original posting.

Zdenko


ut 30. 10. 2018 o 9:12 <flavi...@gmail.com> napísal(a):

flavi...@gmail.com

unread,
Oct 30, 2018, 7:52:31 AM10/30/18
to tesseract-ocr
Ok, sorry. I will update the original post.

Santosh Prasad Sah

unread,
Nov 19, 2018, 1:15:05 AM11/19/18
to tesseract-ocr
Hey,
Can I have instruction to compile and run tesseract 4.0 in android devices.

Nirajan Pant

unread,
Dec 1, 2018, 8:58:53 AM12/1/18
to tesseract-ocr
Hi, we are working on OCR for Android (for visually impaired) but the accuracy of 3.04 is very low for Devanagari script. We are planning for using v4 knowing that it is not supported yet we are not able to meet our goals.

But I am happy to know that you compiled it successfully. Could you please help with the instructions you followed?

Nirajan Pant
niraja...@ku.edu.np

Reply all
Reply to author
Forward
0 new messages