Memory leak in read_squished_dawg

64 views
Skip to first unread message

Vojtěch Frič

unread,
Apr 7, 2017, 5:49:28 AM4/7/17
to tesseract-ocr
I have compiled the basic example from the wiki:
https://github.com/tesseract-ocr/tesseract/wiki/APIExample

#include <tesseract/baseapi.h>
#include <leptonica/allheaders.h>

int main()
{
   
char *outText;

    tesseract
::TessBaseAPI *api = new tesseract::TessBaseAPI();
   
// Initialize tesseract-ocr with English, without specifying tessdata path
   
if (api->Init(NULL, "eng")) {
        fprintf
(stderr, "Could not initialize tesseract.\n");
       
exit(1);
   
}

   
// Open input image with leptonica library
   
Pix *image = pixRead("/usr/src/tesseract/testing/phototest.tif");
    api
->SetImage(image);
   
// Get OCR result
    outText
= api->GetUTF8Text();
    printf
("OCR output:\n%s", outText);

   
// Destroy used object and release memory
    api
->End();
   
delete [] outText;
    pixDestroy
(&image);
   
delete api; // <-- added by me, as it is missing in the example

   
return 0;
}




When I run valgrind on it, it reports serious memory leak:

==18441== 18,635,728 bytes in 1 blocks are still reachable in loss record 29 of 29
==18441==    at 0x4C2CB3F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==18441==    by 0x5445978: tesseract::SquishedDawg::read_squished_dawg(_IO_FILE*, tesseract::DawgType, STRING const&, PermuterType, int) (in /usr/lib/libtesseract.so.3.0.4)
==18441==    by 0x5446C75: tesseract::DawgLoader::Load() (in /usr/lib/libtesseract.so.3.0.4)
==18441==    by 0x5446FD6: tesseract::DawgCache::GetSquishedDawg(STRING const&, char const*, tesseract::TessdataType, int) (in /usr/lib/libtesseract.so.3.0.4)
==18441==    by 0x544D7B5: tesseract::Dict::Load(tesseract::DawgCache*) (in /usr/lib/libtesseract.so.3.0.4)
==18441==    by 0x541199D: tesseract::Wordrec::program_editup(char const*, bool, bool) (in /usr/lib/libtesseract.so.3.0.4)
==18441==    by 0x5350D68: tesseract::Tesseract::init_tesseract_internal(char const*, char const*, char const*, tesseract::OcrEngineMode, char**, int, GenericVector<STRING> const*, GenericVector<STRING> const*, bool) (in /usr/lib/libtesseract.so.3.0.4)
==18441==    by 0x535184C: tesseract::Tesseract::init_tesseract(char const*, char const*, char const*, tesseract::OcrEngineMode, char**, int, GenericVector<STRING> const*, GenericVector<STRING> const*, bool) (in /usr/lib/libtesseract.so.3.0.4)
==18441==    by 0x5302247: tesseract::TessBaseAPI::Init(char const*, char const*, tesseract::OcrEngineMode, char**, int, GenericVector<STRING> const*, GenericVector<STRING> const*, bool) (in /usr/lib/libtesseract.so.3.0.4)
==18441==    by 0x108F26: tesseract::TessBaseAPI::Init(char const*, char const*) (baseapi.h:240)
==18441==    by 0x108DC4: main (main.cpp:10)



There are several more leak related to this function, and some others related to leptonica, but nothing of this magnitude. I have added the  missing delete api; line, that's missing in the API example, but that changes nothing.

Is there really such a major leak in the lib, or am I using it incorrectly?

OS: Ubuntu 16.10 (x64)
Tesseract: 3.0.4 from Ubuntu repositories
GCC: gcc version 6.2.0 20161005 (Ubuntu 6.2.0-5ubuntu12)


Best,
vf

Fish Money

unread,
Jul 27, 2023, 2:26:35 PM7/27/23
to tesseract-ocr
Hey there, just faced the same issue
Have you fixed the valgrind warning someway?

пятница, 7 апреля 2017 г. в 02:49:28 UTC-7, fri...@gmail.com:
Reply all
Reply to author
Forward
0 new messages