Calling from API the same way as from command line

Yesbird

unread,

Dec 23, 2021, 3:28:18 PM12/23/21

to tesseract-ocr

Hi, guys !

I am doing video subtitles recognition for one of my C++ projects and can not figure out why for the same image tesseract gives good results when I run it from command line but fails from API. I see a couple of different parameters when running

tesseract --print-parameters

and don't know how to find which of them affect results.

Could anyone help me, please ?

-- From command line ----------
tesseract ./subtitles/sub_ron_1.png stdout -l ron --dpi 600
-----------------------------------------
Turul virtual făcut de Kira şi Matt
a fost foarte amuzant.

-----------------------------------------

-- From API -------------------------

char *text;
std::string lang = "rum";
ocr->Init(NULL, lang.c_str());
ocr->SetImage(avframe->data[0], avframe->width, avframe->height, 4, avframe->linesize[0]);
text = ocr->GetUTF8Text();

-----------------------------------------
€ II a] e E ăn si 2 W a p:] VA

Turul'virtual făcut de Kira şi Matt
nat fn arte - SE,
a fost foarte amuzant.
-----------------------------------------

-- Version info ---------------------------
tesseract 4.1.1
leptonica-1.79.0
libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 2.0.3) : libpng 1.6.37 : libtiff 4.1.0 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.1
Found SSE
Found libarchive 3.4.0 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.8 liblz4/1.9.2 libzstd/1.4.4
---------------------------------------------

Image:

Yesbird

unread,

Dec 23, 2021, 3:37:27 PM12/23/21

to tesseract-ocr

And sorry, language initialization from API is the same as from comman line:
std::string lang = "ron";

Yesbird

unread,

Dec 23, 2021, 7:09:09 PM12/23/21

to tesseract-ocr

Problem solved by additional preprocessing - blending with white background.

This form of SetImage():

SetImage(avframe->data[0], avframe->width, avframe->height, 4, avframe->linesize[0]);

do not removing alpha, so I need to do it myself.

Reply all

Reply to author

Forward