why tesseract gives junk value for japanese language?

183 views
Skip to first unread message

mahendrag gajera

unread,
Jul 12, 2018, 9:15:51 AM7/12/18
to tesseract-ocr
Hello all

I am try to ocr japanese images via below code. But it give junk character.
My tesseract version is 4.0 

Please let me know what is missing here.

void Test(char* imagePath)
{
char *outText;

tesseract::TessBaseAPI *api = new tesseract::TessBaseAPI();
// Initialize tesseract-ocr with English, without specifying tessdata path
if (api->Init("D:\\tessdata", "jpn", tesseract::OcrEngineMode::OEM_TESSERACT_ONLY))
{
fprintf(stderr, "Could not initialize tesseract.\n");
exit(1);
}

// Open input image with leptonica library
Pix *image = pixRead(imagePath);
api->SetImage(image);
// Get OCR result
outText = api->GetUTF8Text();
printf("OCR output:\n%s", outText);

// Destroy used object and release memory
api->End();
delete[] outText;
pixDestroy(&image);
}

Using train data from here


Test data image


Thanks,

Shree Devi Kumar

unread,
Jul 12, 2018, 9:24:03 AM7/12/18
to tesser...@googlegroups.com
Try traineddata from tessdata_best and tessdata_fast

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/7bfe8e31-91ea-491c-8e8c-61bdab47dff4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

mahendrag gajera

unread,
Jul 16, 2018, 5:08:42 AM7/16/18
to tesseract-ocr
It is resolve it just issue of decoding. Thanks
Reply all
Reply to author
Forward
0 new messages