Hello Adrian,
I can try. I'm using C# .Net, btw. Tesseract 4.1.1, which is the latest for this. I do think my settings are very specific to my purpose, so it may be of no benefit to you.
I tried several different settings and most of them didn't work. So, I experimented.
Here are the things I tried. The things commented out with "//" are things that I tried and didn't work for me. I'm trying to extract text from messages of a specific font type from an image. The messages have several lines of alphanumeric text.
Here is the code section where I did most of my experimenting:
var page = _engine.Process(img, PageSegMode.Auto);
//var page = _engine.Process(img, PageSegMode.AutoOnly); // Performs okay, but still no A or B
//var page = _engine.Process(img, PageSegMode.AutoOsd); // Performs okay, but still no A or B
//var page = _engine.Process(img, PageSegMode.RawLine); // terrible performance
//var page = _engine.Process(img, PageSegMode.SingleColumn);
//var page = _engine.Process(img, PageSegMode.SparseText);
//var page = _engine.Process(img, PageSegMode.SingleBlockVertText); // terrible performance
_engine.SetVariable("tessedit_char_whitelist", " abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ");
//_engine.SetVariable("tessedit_char_whitelist", "AB");
string result = page.GetText();
You can see from above that I settled on the following settings:
var page = _engine.Process(img, PageSegMode.Auto);
_engine.SetVariable("tessedit_char_whitelist", " abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ");
string result = page.GetText();
The images I use are from screenshots. That might not help. Hope it does!
Regards,
...John