Include Tesseract in C++ code

10,197 views
Skip to first unread message

Gustavo Souto

unread,
Mar 29, 2012, 10:52:55 AM3/29/12
to tesser...@googlegroups.com
Hi everyone, I need you help...

I want to create a program in C++ with Tesseract, but when I try to compile the source code some errors appear. I don't know well how to link the libs to the source code, but I did do like this:

------------------------------------  CODE ---------------------------------------------------------------
#include <baseapi.h>
#include <allheaders.h>
#include <iostream>
#include <opencv/cv.h>
#include <opencv/highgui.h>

using namespace std;

int main() {

    tesseract::TessBaseAPI tess;

    tess.Init(NULL, "eng", tesseract::OEM_DEFAULT);
    cv::Mat image = cv::imread("/home/souto3/Pictures/num.tif");
    tess.SetImage((uchar*)image.data, image.size().width, image.size().height, image.channels(), image.step1());
    tess.Recognize(0);
    const char* out = tess.GetUTF8Text();

    cout << out;

    return 0;
}
------------------------------ END CODE -------------------

-------------------- ERROR --------------------------------
/tmp/ccAUGhHc.o: In function `main': main.cpp:(.text+0x1b): undefined reference to `tesseract::TessBaseAPI::TessBaseAPI()' main.cpp:(.text+0x75): undefined reference to `cv::imread(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int)' main.cpp:(.text+0x101): undefined reference to `tesseract::TessBaseAPI::SetImage(unsigned char const*, int, int, int, int)' main.cpp:(.text+0x115): undefined reference to `tesseract::TessBaseAPI::Recognize(ETEXT_DESC*)' main.cpp:(.text+0x124): undefined reference to `tesseract::TessBaseAPI::GetUTF8Text()' main.cpp:(.text+0x15c): undefined reference to `tesseract::TessBaseAPI::~TessBaseAPI()' main.cpp:(.text+0x1c8): undefined reference to `tesseract::TessBaseAPI::~TessBaseAPI()' /tmp/ccAUGhHc.o: In function `tesseract::TessBaseAPI::Init(char const*, char const*, tesseract::OcrEngineMode)': main.cpp:(.text._ZN9tesseract11TessBaseAPI4InitEPKcS2_NS_13OcrEngineModeE[tesseract::TessBaseAPI::Init(char const*, char const*, tesseract::OcrEngineMode)]+0x4f): undefined reference to `tesseract::TessBaseAPI::Init(char const*, char const*, tesseract::OcrEngineMode, char**, int, GenericVector<STRING> const*, GenericVector<STRING> const*, bool)' /tmp/ccAUGhHc.o: In function `cv::Mat::release()': main.cpp:(.text._ZN2cv3Mat7releaseEv[cv::Mat::release()]+0x4b): undefined reference to `cv::fastFree(void*)'
------------------------ END ERROR ----------------------------------------------------------------------


Mayur Mudigonda

unread,
Mar 29, 2012, 10:47:34 PM3/29/12
to tesser...@googlegroups.com
You need to 

#include "tessbaseapi.h"

and make sure that when you compile you have the -I flag point to the include directories of tesseract and the -L flag point to the compiled tesseract libraries. Read the documentation and the other posts for example code and more info!

Cheers,
M


--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesser...@googlegroups.com
To unsubscribe from this group, send email to
tesseract-oc...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en



--

URL:
www.cse.msu.edu/~mudigon1
www.blindsight.com/team
Elegance is not a dispensable luxury but a factor that decides between success and failure.
Edsger Dijkstra

zdenko podobny

unread,
Mar 30, 2012, 2:23:43 AM3/30/12
to tesser...@googlegroups.com
You did not mentioned how you try to compile file (neither which version of tesseract you use). But I guess you forget to link to tesseract library (-ltesseract).

Zdenko

TP

unread,
Mar 30, 2012, 9:07:41 AM3/30/12
to tesser...@googlegroups.com

And using leptonica's pixRead() function to read images is easier (see the source for tesseractmain.cpp for an example [1]) .

[1] http://code.google.com/p/tesseract-ocr/source/browse/trunk/api/tesseractmain.cpp#148

Pavel Mazniker

unread,
Apr 27, 2012, 9:01:16 AM4/27/12
to tesser...@googlegroups.com

Hi,
 

I have a linking to tesseract problem in Qt C++ project on Windows,

 
 Hello.
I've build the libraries ( together with one that is already within the tesseract-3.01-win_vs - > vs2008 ) :
 
 
04/27/2012  08:47 AM         5,124,542 ccmain.lib
04/27/2012  08:48 AM         3,192,960 ccstruct.lib
04/27/2012  08:47 AM         1,364,840 ccutil.lib
04/27/2012  08:48 AM         3,061,008 classify.lib
04/27/2012  08:48 AM         5,364,110 cube.lib
04/27/2012  08:48 AM           586,500 cutil.lib
04/27/2012  08:48 AM         1,671,500 dict.lib
10/21/2011  07:23 PM           221,526 giflib-static-mtdll-debug.lib
10/21/2011  07:23 PM            70,450 giflib-static-mtdll.lib
04/27/2012  08:48 AM           179,362 image.lib
10/21/2011  07:23 PM         1,583,122 libjpeg-static-mtdll-debug.lib
10/21/2011  07:23 PM           363,212 libjpeg-static-mtdll.lib
10/21/2011  07:23 PM         7,522,574 liblept-static-mtdll-debug.lib
10/21/2011  07:23 PM         2,519,300 liblept-static-mtdll.lib
10/21/2011  07:23 PM         1,672,192 liblept.dll
10/21/2011  07:23 PM           444,944 liblept.lib
10/21/2011  07:23 PM         1,672,192 liblept168.dll
10/21/2011  07:23 PM         3,326,464 liblept168d.dll
10/21/2011  07:23 PM         3,326,464 libleptd.dll
10/21/2011  07:23 PM           446,954 libleptd.lib
10/21/2011  07:23 PM         1,003,572 libpng-static-mtdll-debug.lib
10/21/2011  07:23 PM           331,028 libpng-static-mtdll.lib
04/27/2012  08:47 AM            11,906 libtesseract_tessopt.lib
04/27/2012  08:47 AM           143,476 libtesseract_training.lib
10/21/2011  07:23 PM         3,800,010 libtiff-static-mtdll-debug.lib
10/21/2011  07:23 PM         1,777,404 libtiff-static-mtdll.lib
04/27/2012  08:47 AM           896,898 neural_networks.lib
04/27/2012  08:48 AM         7,634,678 textord.lib
04/27/2012  08:48 AM         1,459,832 viewer.lib
04/27/2012  08:48 AM         2,637,262 wordrec.lib
10/21/2011  07:23 PM           199,940 zlib-static-mtdll.lib
10/21/2011  07:23 PM           452,728 zlibd-static-mtdll-debug.lib
 
using vs2010 on project that is in vs2008.
 
but when I link to the libraries in QTesseract on windows in .pro file:
 
#tesseract includes:INCLUDEPATH += C:\Users\Paul01\OCR\OCRProject\includes \         += C:\Users\Paul01\OCR\OCRProject\includes\leptonica#LIBS += -L"C:\Users\Paul01\OCR\OCRProject\libs" -ltesseract#LIBS += -L"C:\Users\Paul01\OCR\OCRProject\libs" -llibtiffLIBS += C:\Users\Paul01\OCR\OCRProject\libs\*.dll#tesseract libs:#LIBS += C:\Users\Paul01\OCR\OCRProject\libs\*.lib#LIBS += C:\Users\Paul01\OCR\OCRProject\libs\*.dll#LIBS += -L$$PWD/../../libs/ -lccmain##PRE_TARGETDEPS += $$PWD/../../libs/ccmain.lib#LIBS += -L$$PWD/../../libs/ -lccstruct##PRE_TARGETDEPS += $$PWD/../../libs/ccstruct.lib#LIBS += -L$$PWD/../../libs/ -lccutil##PRE_TARGETDEPS += $$PWD/../../libs/ccutil.lib#LIBS += -L$$PWD/../../libs/ -lclassify##PRE_TARGETDEPS += $$PWD/../../libs/classify.lib#LIBS += -L$$PWD/../../libs/ -lcube##PRE_TARGETDEPS += $$PWD/../../libs/cube.lib#LIBS += -L$$PWD/../../libs/ -lcutil##PRE_TARGETDEPS += $$PWD/../../libs/cutil.lib#LIBS += -L$$PWD/../../libs/ -ldict##PRE_TARGETDEPS += $$PWD/../../libs/dict.lib#LIBS += -L$$PWD/../../libs/ -lgiflib-static-mtdll##PRE_TARGETDEPS += $$PWD/../../libs/giflib-static-mtdll.lib#LIBS += -L$$PWD/../../libs/ -lgiflib-static-mtdll##PRE_TARGETDEPS += $$PWD/../../libs/giflib-static-mtdll.lib#LIBS += -L$$PWD/../../libs/ -llibjpeg-static-mtdll##PRE_TARGETDEPS += $$PWD/../../libs/libjpeg-static-mtdll.lib#LIBS += -L$$PWD/../../libs/ -limage##PRE_TARGETDEPS += $$PWD/../../libs/image.lib#LIBS += -L$$PWD/../../libs/ -lliblept##PRE_TARGETDEPS += $$PWD/../../libs/liblept.lib#LIBS += -L$$PWD/../../libs/ -lliblept-static-mtdll##PRE_TARGETDEPS += $$PWD/../../libs/liblept-static-mtdll.lib#LIBS += -L$$PWD/../../libs/ -llibtesseract_tessopt##PRE_TARGETDEPS += $$PWD/../../libs/libtesseract_tessopt.lib#LIBS += -L$$PWD/../../libs/ -llibtesseract_training##PRE_TARGETDEPS += $$PWD/../../libs/libtesseract_training.lib#LIBS += -L$$PWD/../../libs/ -lneural_networks##PRE_TARGETDEPS += $$PWD/../../libs/neural_networks.lib#LIBS += -L$$PWD/../../libs/ -ltextord##PRE_TARGETDEPS += $$PWD/../../libs/textord.lib#LIBS += -L$$PWD/../../libs/ -lviewer##PRE_TARGETDEPS += $$PWD/../../libs/viewer.lib#LIBS += -L$$PWD/../../libs/ -lwordrec#LIBS += -L$$PWD/../../libs/ -ltesseract#PRE_TARGETDEPS += $$PWD/../../libs/wordrec.lib
 
 
  I get linking problems:
 
 ./release\myqmainwindow.o:myqmainwindow.cpp:(.text$_ZN12ReaderThreadD1Ev[ReaderThread::~ReaderThread()]+0xad): undefined reference to `tesseract::TessBaseAPI::~TessBaseAPI()'
./release\myqmainwindow.o:myqmainwindow.cpp:(.text$_ZN12ReaderThreadD1Ev[ReaderThread::~ReaderThread()]+0x166): undefined reference to `tesseract::TessBaseAPI::~TessBaseAPI()'
./release\tesstext.o:tesstext.cpp:(.text+0x4ab): undefined reference to `WERD::bounding_box()'
./release\tesstext.o:tesstext.cpp:(.text+0xb7c): undefined reference to `STRING::string() const'
./release\tesstext.o:tesstext.cpp:(.text+0x166b): undefined reference to `WERD::bounding_box()'
./release\tesstext.o:tesstext.cpp:(.text+0x16d1): undefined reference to `STRING::length() const'
./release\tesstext.o:tesstext.cpp:(.text+0x16dd): undefined reference to `STRING::length() const'
./release\tesstext.o:tesstext.cpp:(.text+0x16f4): undefined reference to `STRING::string() const'
./release\tesstext.o:tesstext.cpp:(.text+0x1fc7): undefined reference to `ELIST::length() const'
./release\readerthread.o:readerthread.cpp:(.text+0x2336): undefined reference to `tesseract::TessBaseAPI::TessBaseAPI()'
./release\readerthread.o:readerthread.cpp:(.text+0x2557): undefined reference to `tesseract::TessBaseAPI::~TessBaseAPI()'
./release\readerthread.o:readerthread.cpp:(.text+0x264a): undefined reference to `tesseract::TessBaseAPI::TessBaseAPI()'
./release\readerthread.o:readerthread.cpp:(.text+0x286b): undefined reference to `tesseract::TessBaseAPI::~TessBaseAPI()'
./release\readerthread.o:readerthread.cpp:(.text+0x2969): undefined reference to `UNICHARSET::UNICHARSET()'
./release\readerthread.o:readerthread.cpp:(.text+0x2d35): undefined reference to `ELIST_ITERATOR::forward()'
./release\readerthread.o:readerthread.cpp:(.text+0x2d89): undefined reference to `UNICHARSET::id_to_unichar(int) const'
./release\readerthread.o:readerthread.cpp:(.text+0x2fb7): undefined reference to `C_BLOB::bounding_box()'
./release\readerthread.o:readerthread.cpp:(.text+0x3537): undefined reference to `ELIST_ITERATOR::forward()'
./release\readerthread.o:readerthread.cpp:(.text+0x354b): undefined reference to `CLIST_ITERATOR::forward()'
./release\readerthread.o:readerthread.cpp:(.text+0x3592): undefined reference to `ERRCODE::error(char const*, signed char, char const*, ...) const'
./release\readerthread.o:readerthread.cpp:(.text+0x35e3): undefined reference to `UNICHARSET::~UNICHARSET()'
./release\readerthread.o:readerthread.cpp:(.text+0x3613): undefined reference to `ERRCODE::error(char const*, signed char, char const*, ...) const'
./release\readerthread.o:readerthread.cpp:(.text+0x3647): undefined reference to `ERRCODE::error(char const*, signed char, char const*, ...) const'
./release\readerthread.o:readerthread.cpp:(.text+0x371f): undefined reference to `ERRCODE::error(char const*, signed char, char const*, ...) const'
./release\readerthread.o:readerthread.cpp:(.text+0x3760): undefined reference to `ERRCODE::error(char const*, signed char, char const*, ...) const'
./release\readerthread.o:readerthread.cpp:(.text+0x37b0): undefined reference to `ERRCODE::error(char const*, signed char, char const*, ...) const'
./release\readerthread.o:readerthread.cpp:(.text+0x37e2): more undefined references to `ERRCODE::error(char const*, signed char, char const*, ...) const' follow
./release\readerthread.o:readerthread.cpp:(.text+0x38c0): undefined reference to `UNICHARSET::~UNICHARSET()'
./release\readerthread.o:readerthread.cpp:(.text+0x3b2e): undefined reference to `ERRCODE::error(char const*, signed char, char const*, ...) const'
./release\readerthread.o:readerthread.cpp:(.text+0x3b6c): undefined reference to `ERRCODE::error(char const*, signed char, char const*, ...) const'
./release\readerthread.o:readerthread.cpp:(.text+0x3baa): undefined reference to `ERRCODE::error(char const*, signed char, char const*, ...) const'
./release\readerthread.o:readerthread.cpp:(.text+0x48c1): undefined reference to `WERD::bounding_box()'
./release\readerthread.o:readerthread.cpp:(.text+0x4c06): undefined reference to `STRING::length() const'
./release\readerthread.o:readerthread.cpp:(.text+0x4c12): undefined reference to `STRING::length() const'
./release\readerthread.o:readerthread.cpp:(.text+0x4c2c): undefined reference to `STRING::string() const'
./release\readerthread.o:readerthread.cpp:(.text+0x5215): undefined reference to `PAGE_RES_IT::start_page(bool)'
./release\readerthread.o:readerthread.cpp:(.text+0x522b): undefined reference to `PAGE_RES_IT::start_page(bool)'
./release\readerthread.o:readerthread.cpp:(.text+0x52e9): undefined reference to `PAGE_RES_IT::internal_forward(bool, bool)'
./release\readerthread.o:readerthread.cpp:(.text+0x5479): undefined reference to `STRING::length() const'
./release\readerthread.o:readerthread.cpp:(.text+0x548b): undefined reference to `STRING::length() const'
./release\readerthread.o:readerthread.cpp:(.text+0x54aa): undefined reference to `STRING::string() const'
./release\readerthread.o:readerthread.cpp:(.text+0x56b6): undefined reference to `tesseract::TessBaseAPI::SetImage(Pix const*)'
./release\readerthread.o:readerthread.cpp:(.text+0x57ce): undefined reference to `tesseract::TessBaseAPI::Recognize(ETEXT_DESC*)'
./release\readerthread.o:readerthread.cpp:(.text+0x589f): undefined reference to `PAGE_RES_IT::start_page(bool)'
./release\readerthread.o:readerthread.cpp:(.text+0x58ca): undefined reference to `PAGE_RES_IT::start_page(bool)'
./release\readerthread.o:readerthread.cpp:(.text+0x5bab): undefined reference to `STRING::length() const'
./release\readerthread.o:readerthread.cpp:(.text+0x5bc9): undefined reference to `STRING::length() const'
./release\readerthread.o:readerthread.cpp:(.text+0x5bf5): undefined reference to `STRING::string() const'
./release\readerthread.o:readerthread.cpp:(.text+0x5d3f): undefined reference to `WERD::bounding_box()'
./release\readerthread.o:readerthread.cpp:(.text+0x5f18): undefined reference to `PAGE_RES_IT::internal_forward(bool, bool)'
./release\readerthread.o:readerthread.cpp:(.text+0x5f88): undefined reference to `STRING::length() const'
./release\readerthread.o:readerthread.cpp:(.text+0x5faf): undefined reference to `STRING::length() const'
./release\readerthread.o:readerthread.cpp:(.text+0x5fe4): undefined reference to `STRING::string() const'
./release\readerthread.o:readerthread.cpp:(.text+0x6143): undefined reference to `tesseract::TessBaseAPI::GetUTF8Text()'
./release\readerthread.o:readerthread.cpp:(.text+0x6b65): undefined reference to `tesseract::TessBaseAPI::SetVariable(char const*, char const*)'
./release\readerthread.o:readerthread.cpp:(.text+0x6bbf): undefined reference to `tesseract::TessBaseAPI::Init(char const*, char const*, tesseract::OcrEngineMode, char**, int, GenericVector<STRING> const*, GenericVector<STRING> const*, bool)'
./release\readerthread.o:readerthread.cpp:(.text+0x6ca4): undefined reference to `tesseract::TessBaseAPI::SetPageSegMode(tesseract::PageSegMode)'
./release\readerthread.o:readerthread.cpp:(.text+0x6ce6): undefined reference to `tesseract::TessBaseAPI::End()'
./release\moc_readerthread.o:moc_readerthread.cpp:(.text$_ZN12ReaderThreadD0Ev[ReaderThread::~ReaderThread()]+0xad): undefined reference to `tesseract::TessBaseAPI::~TessBaseAPI()'
./release\moc_readerthread.o:moc_readerthread.cpp:(.text$_ZN12ReaderThreadD0Ev[ReaderThread::~ReaderThread()]+0x17c): undefined reference to `tesseract::TessBaseAPI::~TessBaseAPI()'
./release\moc_readerthread.o:moc_readerthread.cpp:(.rdata$_ZTV12ReaderThread[vtable for ReaderThread]+0x4c): undefined reference to `tesseract::TessBaseAPI::Threshold(Pix**)'
collect2: ld returned 1 exit status
mingw32-make.exe[1]: *** [release\ocr01.exe] Error 1
mingw32-make.exe: *** [release] Error 2
 
  My question is what .dll or .lib file is needed and how to link in order to use the tesseract api in C++ project  on Windows? How to build the QTesseract on Windows ? Thanks in advance for your help.

zdenko podobny

unread,
Apr 27, 2012, 10:16:25 AM4/27/12
to tesser...@googlegroups.com
1. tesseract-ocr 3.01 do not (officially) support/create dll. As far as I remember, only static linking was successful. Use 3.02 (in svn).
2. do not mix things. Try to create program using tesseract library without QT, so you know how to use tesseract.

--
Zdenko

Pavel Mazniker

unread,
Apr 27, 2012, 12:54:56 PM4/27/12
to tesser...@googlegroups.com
 
Hi,
 
You made great work with tesseract!Thanks!
 
I succeed to link it to Qt project using one of the tesseract projects posted at github ( tesseract using mingw ). Is it full ( does that project include also training ? )
 
Now I try to imrove the performance and accuracy such as recognizing text in complex patterns such as captcha.
 
Has anybody experience with using tesseract on captcha and on road signs for example?
 
I wish the tesseract to be available to build as single library, .dll or .lib , does anybody work on such feature ?
 
Thank you.
 

 

Pavel Mazniker

unread,
Apr 27, 2012, 2:37:45 PM4/27/12
to tesser...@googlegroups.com
1. tesseract-ocr 3.01 do not (officially) support/create dll. As far as I remember, only static linking was successful. Use 3.02 (in svn).
What libs exactly should I create for linking and compiling on MinGW build system ? I built previously using visual studio compiler and got above libs but failed to link using mingw build system.
Thanks.

zdenko podobny

unread,
Apr 27, 2012, 4:06:22 PM4/27/12
to tesser...@googlegroups.com
Check 3.02 in svn.

--
Zdenko

zdenko podobny

unread,
Apr 28, 2012, 2:26:27 AM4/28/12
to tesser...@googlegroups.com
On Fri, Apr 27, 2012 at 8:37 PM, Pavel Mazniker <pmaz...@gmail.com> wrote:
1. tesseract-ocr 3.01 do not (officially) support/create dll. As far as I remember, only static linking was successful. Use 3.02 (in svn).
What libs exactly should I create for linking and compiling on MinGW build system ?

I do not understand what you mean. Just compile tesseract-3.02 in mingw+msys environment and it will create everything you need.
 
I built previously using visual studio compiler and got above libs but failed to link using mingw build 
system.
 
As far as I understood you did not build 3.02 version (that create just one lib) but 3.01. And using libs created by different compilers is tricky and not recommended. 

--
Zdenko

Pavel Mazniker

unread,
Apr 29, 2012, 2:59:13 AM4/29/12
to tesser...@googlegroups.com
 
Hi,
 
trying to build using MinGW + MSYS the 3.02
 
checked from svn r724. Is that right ?
 
then runned in msys terminal on the root directory of the checkout
 
./autogen.sh
 
 and got:
 
"
Running aclocal
./autogen.sh : line50 : aclocal : command not found
 
Something went wrong, bailing out!
 
"
 
Why such ?
 
Thanks.

 

Pavel Mazniker

unread,
Apr 29, 2012, 3:51:59 AM4/29/12
to tesser...@googlegroups.com
Hi,
 
Just compile tesseract-3.02 in mingw+msys environment and it will create everything you need.
Trying to build.
 
I checked  r724 - is that O.K. version 3.02?
 
when running ./configure on the r724 root folder from mingw-msys terminal I get following:
 
checking for pixCreate in -llept... no
configure : error : leptonica library missing
how to resolve the leptonica on windows ?
 
Thanks

Pavel Mazniker

unread,
Apr 29, 2012, 4:55:02 AM4/29/12
to tesser...@googlegroups.com

Hi,

 

I checked out full  r724 from repository,

 
I get when running configure in ming+msys system terminal:
 
 "checking for pixCreate in -llept... no
  configure: leptonica library missing"
 
on windows, nevertheless I copied leptonica .dll and .lib files to Windows/System directory,
 
Can anybody help how to resolve this problem and configure and build library on Windows ?
 
Thanks.

zdenko podobny

unread,
Apr 29, 2012, 11:24:22 AM4/29/12
to tesser...@googlegroups.com
On Sun, Apr 29, 2012 at 8:59 AM, Pavel Mazniker <pmaz...@gmail.com> wrote:
 
Hi,
 
trying to build using MinGW + MSYS the 3.02
 
checked from svn r724. Is that right ?
 
yes
 
then runned in msys terminal on the root directory of the checkout
 
./autogen.sh
 
 and got:
 
"
Running aclocal
./autogen.sh : line50 : aclocal : command not found
 
Something went wrong, bailing out!
 
"
 
Why such ?

Because it can not find aclocal?

 
Thanks.

 

--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesser...@googlegroups.com
To unsubscribe from this group, send email to
tesseract-oc...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en



--
Zdenko

zdenko podobny

unread,
Apr 29, 2012, 11:26:31 AM4/29/12
to tesser...@googlegroups.com
the same way as tesseract - compiling from source - as far as I know there is no (public) leptonica build by mingw.


--
Zdenko
Message has been deleted

Robert Komar

unread,
Apr 18, 2013, 1:08:58 PM4/18/13
to tesser...@googlegroups.com
On Thu, 18 Apr 2013, TedJ wrote:

> Could someone please post a direct link to this mysterious
> r724 repository? I've searched and searched the Github
> site and I can't find it.
> Thanks,
> Ted

That's a revision in the main repository. When you
check the source out using svn, add "-r 724" to the
arguments and you'll get that revision rather than the
latest.

Rob Komar

TP

unread,
Apr 20, 2013, 8:36:17 AM4/20/13
to tesseract-ocr
On Fri, Apr 19, 2013 at 1:44 PM, TedJ <ted...@gmail.com> wrote:
Thanks for your reply.  I was hoping that I'd be able to find a regular browser page with the download ZIP file option.  Could you post all the required login info, please?  Ugh, I never have good luck using svn clients.  I just tried to download 'Setup-Subversion-1.7.9.msi' for Windows32.  It promptly reported that it couldn't be loaded.  Can you recommend an svn client that actually works on Windows or as a plug-in for Eclipse or as a simple binary that works on the Windows command line?

I've got the "undefined reference to `tesseract::TessBaseAPI::TessBaseAPI()'" blues too.  I'm trying to build a tesseract 3.02.02 c++ project in Eclipse and at the Win command line and that's as far as I get.  BTW, tessbaseapi.h is nowhere to be found in any of my 3.02.02 install (or 3.01 or tesseract-android-tools for that matter)..  But baseapi.h is present.  My include and library dirs are from "win32-lib-include-dirs" from a ZIP download.  Compiles, but linking fails with that error.

Have you looked at my "Using the latest Tesseract-OCR sources" page [1] that explains how to use TortoiseSVN to get the latest sources?

"Programming with libtesseract" [2] discusses how to use libtesseract with VS2008 but some of the information applies to any compiler. baseapi.h and leptonica's allheaders.h are the headers you need to include.

I wrote this over a year ago but I believe most of it still applies.

You might want to look at.[3] which contains a very short tesseract sample app (essentially the same as tesseract.exe).

---

[1] http://tesseract-ocr.googlecode.com/svn/trunk/vs2008/doc/setup.html#using-the-latest-tesseractocr-sources

[2] http://tesseract-ocr.googlecode.com/svn/trunk/vs2008/doc/programming.html

[3] http://code.google.com/p/tesseract-ocr/downloads/detail?name=tesseract-ocr-API-Example-vs2008.zip

TP

unread,
Apr 21, 2013, 9:10:15 AM4/21/13
to tesseract-ocr
On Sat, Apr 20, 2013 at 2:06 PM, TedJ <ted...@gmail.com> wrote:
>Have you looked at my "Using the latest Tesseract-OCR sources" page [1] that explains how to use TortoiseSVN to get the latest sources?

I tried installing Tortroise too.  Couldn't install it either.  

Why couldn't you install TortoiseSVN? I found it a pretty straightforward process (i.e. trivial).

 
I have only downloaded and unzipped the latest 3.02 ZIP contributions from Google's Tesseract-OCR download site into individual directories (preserving their directory structure of course).  Isn't that sufficient?  So svn is the base directory of the repository at Github that you were referring to.


Well, apparently there are some important new changes that haven't been included in *any* zip file. Thus zdenko's advice to get it from the svn repository.

Github? I'm pretty sure everything relevant is on googlecode.com.

 
>"Programming with libtesseract" [2] discusses how to use libtesseract with VS2008 but some of the information applies to any compiler. baseapi.h and leptonica's allheaders.h are the headers you need to include.

They're already included.  Guess that's why it compiles (but won't link).  So tessbaseapi.h was from an earlier version?

Yes.
 
 I don't have Visual Studio and I don't want to buy it.  Thus, all the vcproj files are evidently useless to me.

Microsoft as usual is trying to hide how to download the earlier free visual studio versions. However, stackoverflow does have a page [1] that discusses how to still get it. [2] is the link they give for VS2008 Express SP1.

[1] http://stackoverflow.com/questions/4482159/where-can-i-download-visual-studio-express-2008-not-2010

[2] http://download.microsoft.com/download/E/8/E/E8EEB394-7F42-4963-A2D8-29559B738298/VS2008ExpressWithSP1ENUX1504728.iso

Since you seem to be having so much trouble, I would suggest first getting things to work with Visual Studio 2008 Express. Then once you know how things are *supposed* to work (including looking at the various compiler and linker settings), you can try other build systems. I suppose it would be nice to have some detailed docs on how to build with MinGW but I don't have enough experience to write that.

 
 I've been trying to use MinGW GCC as my current toolchain and CDT Internal Builder as my current builder within Eclipse. In a batch file, it should be either:

set ti=<installdir>\tesseract-3.02.02-win32-lib-include-dirs\include\tesseract
set li=<installdir>\leptonica-1.68-win32-lib-include-dirs\include\leptonica
g++ -c -I%li% -I%ti% src\MinAreaRects.cpp

or...

set ti=<installdir>\tesseract-ocr-3.02.02-src\Tesseract-OCR\include\tesseract
set li=<installdir>\leptonica-1.68-win32-lib-include-dirs\include\leptonica
g++ -c -I%li% -I%ti%\ccmain -I%ti%\ccutil -I%ti%\api -I%ti%\ccstruct src\MinAreaRects.cpp

i.e. - I assume that when compiling with the 'tesseract-3.02.02-win32-lib-include-dirs' then only one path entry is needed: '\include\tesseract'.
And when compiling with 'tesseract-ocr-3.02.02-src'', then the relevant subdirectories (like \Tesseract-OCR\api) DO need to be added to the compiler's include path.

Since I don't have VS2008, I can't attempt to use the Microsoft Visual C++ toolchain (having no 'cl.exe').

>You might want to look at.[3] which contains a very short tesseract sample app (essentially the same as tesseract.exe).

I assume that references to using 'libtesseract' (at least in terms of ver 3.02) really means linking 'libtesseract302.dll' or 'libtesseract302d.dll' from the downloaded lib directory.

Yes. Alternatively, with Windows Vista & above you can create symbolic links and name them anything you want using the mklink command [3]. For Windows XP you can make hardlinks (which almost act like symbolic links but not quite) using the fsutil command.

 
Is the d version for debugging?

Yes.
 
So, the library entry for the linker should be 'tesseract302' or 'tesseract302d' (dropping the lib prefix and the .dll suffix).  Just wanna be sure I've dotted all my i's.  In my code snippet below, the variable declaration works (indicating a good compile).  But the instantiation just below it never does (indicating a bad link).

TessBaseAPI *tessa;
tessa = new TessBaseAPI();

Correct me if I'm wrong about any of this.  Thanks again.

Someone else will have to answer the MingGW specific questions. Sorry.


TP

unread,
Apr 22, 2013, 9:42:44 AM4/22/13
to tesseract-ocr

On Sun, Apr 21, 2013 at 1:15 PM, TedJ <ted...@gmail.com> wrote:
The following error has occurred during XML parsing:

File: I:\Android\Tesseract\tesseract-3.02.02\tesseract-ocr-3.02-API-Example-vs2008\APIExample\baseapitester\baseapitester.vcproj
Line: 27
Column: 4
Error Message:
Property sheet file '..\..\include\tesseract_versionnumbers.vsprops' was not found or failed to load.
The file
'I:\Android\Tesseract\tesseract-3.02.02\tesseract-ocr-3.02-API-Example-vs2008\APIExample\baseapitester\baseapitester.vcproj' has failed to load.

tesseract_versionnumbers.vsprops doesn't exist in the APIExample directory or its subdir.  However, those exist in my:
tesseract-3.02.02-win32-lib-include-dirs\include and tesseract-3.02.02-win32-lib-include-dirs-vs2008\include directories.  I copied the latter to the same directory as your sln file.  I ran VC again.  Same error.  I copied it to the baseapitester subdir instead.  Same error.  The Line referenced in the error message refers to the line in baseapitester.vcproj that reads:
} else if (image == NULL) {
which seems totally irrelevant to the error.


I don't have a very clear idea of how you structured your Build directory (this is *crucial* to success when using a VS solution that can be relocated since it has to do everything using *relative* file paths). However, one tip I can give you is that all the paths are relative to the location of the relevant .vcproj file (not the .sln file). So in this case, your include and lib directories *have* to be in the
'I:\Android\Tesseract\tesseract-3.02.02\tesseract-ocr-3.02-API-Example-vs2008\ folder.

 
Actually, I chose MinGW toolchain in Eclipse because OpenCV242\build\x86 had only MinGW and vc9 and vc10 directories to choose from for includes and libs.  Not having vc at the time, I thought it would offer the best chance of success.  So, the OpenCV part of my program (which I didn't mention earlier) DOES compile and link.

I downloaded TortoiseSVN-1.7.11.23600-x64-svn-1.7.8.msi.  When I run it, the following dialog pops up:

Windows Installer
This installation package is not supported by this processor type.  Contact your product vendor.


In a previous post you mentioned that you are using a 32-bit version of Windows? In that case, you need the 32-bit version of TortoiseSVN *not* the 64 bit version which you are trying to use. And that error message sounds like what you would get if you tried to run a 64-bit exe on 32-bit Windows.

 
I just have ALL the luck, don't I?  Still, your suggestion about recent code variations is worth exploring and I'll find some other client to download it with. Eclipse has a facility (or available plug-in) for linking to repositories.  Just gotta figure out the nitty gritty.  Do I need to register for a username and password at Google as part of repository access or will 'anonymous' do?  I just want to build a simple c++ exe, so I fail to see how the latest and greatest is going to help.  And point taken about Github/Google.  Thank you so much for the time you've spent in trying to help.  -Ted
Just follow my instructions at [1] and it should be trivial.

(Pretty sure anonymous works fine as long as you aren't trying to commit *back* to the googlecode repository. Also since you already have a gmail account, that can be used to login to googlecode.)

Most of the people who answer questions here (especially the most helpful like zdenko), are using the latest SVN version. And it makes it easier for them (reduces the number of possibilities to consider) if you are also using the latest.


zdenko podobny

unread,
Apr 22, 2013, 4:29:46 PM4/22/13
to tesser...@googlegroups.com
On Sat, Apr 20, 2013 at 11:06 PM, TedJ <ted...@gmail.com> wrote:
>Have you looked at my "Using the latest Tesseract-OCR sources" page [1] that explains how to use TortoiseSVN to get the latest sources?

I tried installing Tortroise too.  Couldn't install it either.  I have only downloaded and unzipped the latest 3.02 ZIP contributions from Google's Tesseract-OCR download site into individual directories (preserving their directory structure of course).  Isn't that sufficient?  So svn is the base directory of the repository at Github that you were referring to.

>"Programming with libtesseract" [2] discusses how to use libtesseract with VS2008 but some of the information applies to any compiler. baseapi.h and leptonica's allheaders.h are the headers you need to include.

They're already included.  Guess that's why it compiles (but won't link).  So tessbaseapi.h was from an earlier version?  I don't have Visual Studio and I don't want to buy it.  Thus, all the vcproj files are evidently useless to me.  I've been trying to use MinGW GCC as my current toolchain and CDT Internal Builder as my current builder within Eclipse. In a batch file, it should be either:

If you plan to use mingw - have a look at my test[1] with mingw+msys (VC++2008 is preferred and tested compiler on Windows for tesseract-ocr). I tried to make step by step tutorial. Anyway you will need to find way how to integrate it to Eclipse, but I expect you are familiar with your tool chain....

 
set ti=<installdir>\tesseract-3.02.02-win32-lib-include-dirs\include\tesseract
set li=<installdir>\leptonica-1.68-win32-lib-include-dirs\include\leptonica
g++ -c -I%li% -I%ti% src\MinAreaRects.cpp

or...

set ti=<installdir>\tesseract-ocr-3.02.02-src\Tesseract-OCR\include\tesseract
set li=<installdir>\leptonica-1.68-win32-lib-include-dirs\include\leptonica
g++ -c -I%li% -I%ti%\ccmain -I%ti%\ccutil -I%ti%\api -I%ti%\ccstruct src\MinAreaRects.cpp

i.e. - I assume that when compiling with the 'tesseract-3.02.02-win32-lib-include-dirs' then only one path entry is needed: '\include\tesseract'.
And when compiling with 'tesseract-ocr-3.02.02-src'', then the relevant subdirectories (like \Tesseract-OCR\api) DO need to be added to the compiler's include path.

Since I don't have VS2008, I can't attempt to use the Microsoft Visual C++ toolchain (having no 'cl.exe').

>You might want to look at.[3] which contains a very short tesseract sample app (essentially the same as tesseract.exe).

I assume that references to using 'libtesseract' (at least in terms of ver 3.02) really means linking 'libtesseract302.dll' or 'libtesseract302d.dll' from the downloaded lib directory.  Is the d version for debugging?  So, the library entry for the linker should be 'tesseract302' or 'tesseract302d' (dropping the lib prefix and the .dll suffix).  Just wanna be sure I've dotted all my i's.  In my code snippet below, the variable declaration works (indicating a good compile).  But the instantiation just below it never does (indicating a bad link).

TessBaseAPI *tessa;
tessa = new TessBaseAPI();

Correct me if I'm wrong about any of this.  Thanks again.

--
--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesser...@googlegroups.com
To unsubscribe from this group, send email to
tesseract-oc...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en
 
---
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

TP

unread,
Apr 29, 2013, 3:29:20 AM4/29/13
to tesseract-ocr

On Sun, Apr 28, 2013 at 2:16 PM, TedJ <ted...@gmail.com> wrote:
But if anyone knows of another angle/translation/scale image correction approach (or code), I'd love to hear about it.  I.e. Image stabilization.

I would just use leptonica's pixRead() to read in an image, deskew with pixFindSkewAndDeskew [1] which can give you the deskew angle, and then pass the deskewed image to tesseract by using TessBaseAPI::SetImage(const Pix* pix). I would also binarize the image first so Tesseract won't have to do *any* image preprocessing and you know exactly what you are sending to it.

Examples of pixFindSkewAndDeskew use are in prog\textlinemask.c [2]. Also see leptonica's docs on "Measuring the Skew of Document Images" [3], "Image Rotation" [4], and possibly "Image Scaling" [5] and "Grayscale Mapping and Binarization" [6].


[1] http://tpgit.github.io/Leptonica/skew_8c_source.html#l00180

[2] http://tpgit.github.io/Leptonica/textlinemask_8c_source.html

[3] http://tpgit.github.io/UnOfficialLeptDocs/leptonica/skew-measurement.html

[4] http://tpgit.github.io/UnOfficialLeptDocs/leptonica/rotation.html

[5] http://tpgit.github.io/UnOfficialLeptDocs/leptonica/scaling.html

[6] http://tpgit.github.io/UnOfficialLeptDocs/leptonica/binarization.html

ajmal azeez

unread,
Jul 1, 2013, 1:33:28 PM7/1/13
to tesser...@googlegroups.com
Wer can i find tessbaeapi.h??

zdenko podobny

unread,
Jul 2, 2013, 2:54:43 AM7/2/13
to tesser...@googlegroups.com
On Mon, Jul 1, 2013 at 7:33 PM, ajmal azeez <ajmalaz...@gmail.com> wrote:
Wer can i find tessbaeapi.h??

I never heard about it. Why do you think you need it?

--
Zdenko
 
--
--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesser...@googlegroups.com
To unsubscribe from this group, send email to
tesseract-oc...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en
 

ajmal azeez

unread,
Jul 2, 2013, 4:03:06 AM7/2/13
to tesser...@googlegroups.com
When i try to compile the code given below i am getting an error message "test_simple.cpp:4:25: fatal error: tessbaseapi.h: No such file or directory
compilation terminated."

#include <baseapi.h>
#include <allheaders.h>
#include <sys/time.h>
#include <tessbaseapi.h>

int main() {
       
        tesseract::TessBaseAPI *myOCR =
                new tesseract::TessBaseAPI();
    
      
       printf("Leptonica version: %s\n",
               getLeptonicaVersion());

        if (myOCR->Init(NULL, "eng")) {
          fprintf(stderr,"Could not initialize tesseract.\n");
          exit(1);
        }

        Pix *pix = pixRead("phototest.tif");
        myOCR->SetImage(pix);
       
        char* outText = myOCR->GetUTF8Text();
        printf("OCR output:\n\n");
        printf(outText);
      
        myOCR->Clear();
        myOCR->End();
        delete [] outText;
        pixDestroy(&pix);
        return 0;

zdenko podobny

unread,
Jul 2, 2013, 12:53:22 PM7/2/13
to tesser...@googlegroups.com
And what will happen if you remove that header file?

Zdenko

ajmal azeez

unread,
Jul 3, 2013, 1:22:17 AM7/3/13
to tesser...@googlegroups.com
i get an error as follows
ajmal@ajmal-Inspiron-5523:~/Desktop/project$ g++ sample.cpp phototest.tif
/usr/bin/ld:phototest.tif: file format not recognized; treating as linker script
/usr/bin/ld:phototest.tif:1: syntax error

collect2: ld returned 1 exit status

zdenko podobny

unread,
Jul 3, 2013, 2:34:11 AM7/3/13
to tesser...@googlegroups.com
On Wed, Jul 3, 2013 at 7:22 AM, ajmal azeez <ajmalaz...@gmail.com> wrote:
i get an error as follows
ajmal@ajmal-Inspiron-5523:~/Desktop/project$ g++ sample.cpp phototest.tif
/usr/bin/ld:phototest.tif: file format not recognized; treating as linker script
/usr/bin/ld:phototest.tif:1: syntax error

collect2: ld returned 1 exit status

Do you have any clue what you are doing? ;-)
I think you should take time for studying...

Zdenko

ajmal azeez

unread,
Jul 4, 2013, 2:09:33 PM7/4/13
to tesser...@googlegroups.com
Well you are right.I am a beginner in tesseract.So could you please tell me from where do i have to start :D



You received this message because you are subscribed to a topic in the Google Groups "tesseract-ocr" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tesseract-ocr/xp4hajq-o2A/unsubscribe.
To unsubscribe from this group and all its topics, send an email to tesseract-oc...@googlegroups.com.

zdenko podobny

unread,
Jul 5, 2013, 2:40:58 AM7/5/13
to tesser...@googlegroups.com
google for "learn c++", "how to compile c++" etc.

Zdenko

ajmal azeez

unread,
Jul 5, 2013, 7:54:31 AM7/5/13
to tesser...@googlegroups.com
i know c++ well. I want to learn tesseract
I would like to know wheather there is any code like tesseract to get images from a picture

Nick White

unread,
Jul 5, 2013, 9:54:03 AM7/5/13
to tesser...@googlegroups.com
If you know c++ well why did you ask the compiler to compile a TIFF
image?

You can find code samples on the wiki, in this mailing list, and by
searching the web. But reading baseapi.cpp and baseapi.h should be
plenty of information for you.

Nick

ajmal azeez

unread,
Jul 5, 2013, 10:50:17 AM7/5/13
to tesser...@googlegroups.com
I am using linux now and am new to it.So i didnt know how to give input using terminal.I am familiar with only Turbo C

Thanks.I'll go through through baseapi .

Mahesh Sherkar

unread,
May 14, 2014, 6:15:11 AM5/14/14
to tesser...@googlegroups.com, ghs...@gmail.com
I am facing same problem . please let me know how to solved this problem .
Reply all
Reply to author
Forward
0 new messages