Hi everyone,
I am trying to use Tesseract for single character recognizing and the results are awful.
"h" is recognized as "n", "4" as "/i", "O" as "()";



Single character mode seems not to act, as many characters are recognized as two characters,
not just one. My images are simple bilevel black and white TIFF images,
latin characters. This is bitmap font, not scanned images, they are absolutely clean and
need no improvement.
Оnly about half of the characters are correctly recognized, which seems to be
a very low percent for such a simple task.
The library Tesseract version I am using is "4.0.0-beta.3".
This is how I call Tesseract.
int CharRecognizer::recognizeTIFFData(char* data, int datalength){
char *outText;
TessBaseAPI* api = new TessBaseAPI();
// Initialize tesseract-ocr with English, without specifying tessdata path
if (api->Init(NULL, "deu")) {
fprintf(stderr, "Could not initialize tesseract.\n");
exit(1);
}
api->SetPageSegMode(tesseract::PSM_SINGLE_CHAR);
Pix *image = pixReadMem(data,datalength);
api->SetImage(image);
// Get OCR result
outText = api->GetUTF8Text();
printf("\nOCR output:\n%s", outText);
// Destroy used object and release memory
int utf8 = outText[0];
api->End();
delete[] outText;
pixDestroy(&image);
return utf8;
}
I am new to Tesseract, so probably I am missing something. Do I have to somehow train
the library first? May be I should set another OcrEngineMode? I have expected no
problems with simple bitmap font recognizing and am quite at lost now.
Thank you very much in advance,
Yuliana