Using TessPDFRenderer in tesseract 3.05 in C++

243 views
Skip to first unread message

Roger Jefferson

unread,
Jul 21, 2017, 2:57:25 AM7/21/17
to tesseract-ocr
I want to use tesseract 3.05 to generate searchable PDF programmatically in C++. Here is my code:

int main(int argc, const char * argv[])
{
    tesseract::TessBaseAPI *api = new tesseract::TessBaseAPI();
    // Initialize tesseract-ocr with English, without specifying tessdata path
    if (api->Init(NULL, "eng")) {
        fprintf(stderr, "Could not initialize tesseract.\n");
        exit(1);
    }
    Pix *image = pixRead("/Users/user1/pictures/page1.png");

    tesseract::TessResultRenderer* renderer = new tesseract::TessPDFRenderer("/Users/user1/Documents/", "/usr/local/share/tessdata");
    api->ProcessPage(image, 0, "/Users/user1/Documents/page1_pdf", NULL, 0, renderer);
    api->End();

    pixDestroy(&image);
    delete renderer;

    return 0;
}


The problem is everytime I get to api->ProcessPage() I keep getting assertion error:

size_used_ > 0:Error:Assert failed:in file ../ccutil/genericvector.h, line 696

Can anyone help? What's wrong? Is there a better way to generate PDF output?

Thanks in advance

ShreeDevi Kumar

unread,
Jul 21, 2017, 4:25:14 AM7/21/17
to tesser...@googlegroups.com
Are you able to create pdfs using commandline?

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/5e66b13b-5dce-4920-bbc8-dc16e201ef62%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Roger Jefferson

unread,
Jul 21, 2017, 7:41:34 AM7/21/17
to tesseract-ocr
The command line works OK but I need to integrate this. So, I need to do this in C++.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

ShreeDevi Kumar

unread,
Jul 21, 2017, 8:38:59 AM7/21/17
to tesser...@googlegroups.com
take a look at  tesseractmain.cpp.

 
352  api->GetBoolVariable("tessedit_create_pdf", &b);
353  if (b) {
354  bool textonly;
355  api->GetBoolVariable("textonly_pdf", &textonly);
356  renderers->push_back(new tesseract::TessPDFRenderer(
357  outputbase, api->GetDatapath(), textonly));
358  }
359 

Roger Jefferson

unread,
Jul 25, 2017, 4:25:36 AM7/25/17
to tesseract-ocr
Actually, things just work if I use the other constructor.
This is the constructor that works:

TessPDFRenderer(const char* outputbase, const char* datadir, bool textonly)

The other one failed.
Thanks for all the replies.
Reply all
Reply to author
Forward
0 new messages