using Tesseract with Embarcadero RAD Studio 10 C++Builder

444 views
Skip to first unread message

Matthias Schneider

unread,
Dec 9, 2015, 10:22:51 AM12/9/15
to tesseract-ocr
Hi all,

I'm currently trying to get Tesseract working with Embarcadero RAD Studio 10 C++Builder.
Using IMPLIB I was able to create an import library and use tesseract C-API functions.
My problem is, that I need to get all recognized words and their coordinates. TessBaseAPIGetUTF8Text returns a string with the complete text, while TessBaseAPIGetWords returns an array of Pix.
Unfortunately the Pix::text member is always empty and only the coordinates are set. Is there something I'm doing wrong or is there any other way to get all words and coordinates using only C-API?

thanks,
Matthias

Tom Morris

unread,
Dec 10, 2015, 12:25:30 PM12/10/15
to tesseract-ocr
I don't see what the IDE has to do with anything.

Matthias Schneider

unread,
Dec 10, 2015, 12:56:45 PM12/10/15
to tesseract-ocr
@Tom: no it's not related to the IDE but to the fact that I can only use the C-API.

I'm a beginner with Tesseract and it's not easy to get a full understanding of the API, additionally most of the samples I found are using the 'baseapi'.

Just in case anyone is interested: I got it working using following code:
 
   TessResultIterator* it = TessBaseAPIGetIterator(hTesseract);
   
TessPageIterator* pit = TessResultIteratorGetPageIterator (it);
 
   
if (it != 0)
       
do {
           
char * text = TessResultIteratorGetUTF8Text(it, RIL_WORD);
           
int left, top, right, bottom;
           
TessPageIteratorBoundingBox (pit, RIL_WORD, &left, &top, &right, &bottom);
       
} while (TessResultIteratorNext(it, level));

Now I'm able to get all the words and their coordinates, though I'm not sure if this the best way to do it.

thanks,
Matthias
Reply all
Reply to author
Forward
0 new messages