c# libraries of tesseract are faster than native c++

86 views
Skip to first unread message

Iain Downs

unread,
Nov 16, 2024, 3:30:40 AM11/16/24
to tesseract-ocr
I mainly develop under windows and have used the emgu cv port and one of the tesseract c# ports (Charles Weld).  Both lack some features so I've build C++ libraries to better support pageiterators (and with the especial hope that bold and italic in v5 will someday work).

I'm finding that the C++ implementations are 3 or 4 times slower than the equivalent code written in c#.  The same is true if I have a C++ application and if I use static libraries or dlls for Tesseract.

I have imported Tesseract and Leptonica using VCPKG.

My code in all test cases resolves to the below or the c# equivalent:-

tesseract::TessBaseAPI* tessApi = new tesseract::TessBaseAPI();
retval = tessApi->Init(NULL, "eng");
tessApi->SetPageSegMode(tesseract::PSM_SINGLE_BLOCK);
Pix* image = pixRead(argv[1]);
tessApi->SetImage(image);
tessApi->Recognize(0);
char* text = tessApi->GetUTF8Text();
tessApi->End();
delete tessApi;

I have no idea where to look here and would appreciate any help.

Iain

Zdenko Podobny

unread,
Nov 16, 2024, 4:42:38 AM11/16/24
to tesser...@googlegroups.com
Are you sure you compare apples to apples? e.g. Is not your c++ code built as debug  and C# with release optimisations?

Reasoning: Charles Weld (.NET wrapper[1]) did not port tesseract to C# but it wraps (uses) tesseract C-API in C# library. C-API  wraps tesseract C++ API. Each wrapping should cause small (maybe not even measurable) performance, but definitely wrapper could not be faster than the original library.


Zdenko


so 16. 11. 2024 o 9:30 Iain Downs <ia...@idcl.co.uk> napísal(a):
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/tesseract-ocr/8dbfdc4c-fa3a-4a92-9ded-4aec27384e7dn%40googlegroups.com.

Iain Downs

unread,
Nov 16, 2024, 6:30:09 AM11/16/24
to tesseract-ocr
Zdenko.  Thank you.  that seems to be the explanation.  My test (C++) has gone down from 5,700 ms to 900ms.  I had assumed that the nuget package would reflect debug or release as the main project would.  It would seem that I'm wrong on that one.

Iain

Reply all
Reply to author
Forward
0 new messages