Since you're talking screenshots:
- tesseract is designed and trained to process books and published papers, i.e. black printed text on white background. If you have your UI set to "dark mode" i.e. bright text on dark background, you can help tesseract a lot by preprocessing your image, e.g. invert the colors, so the input image is much closer to black text on white BG. what tesseract does under the hood is only try both ways (regular + inverted word image snippet) for any word/particle that resulted in a lower than 0.7 confidence estimate on the first try: by making sure your input image is as clean as possible and black/,text on white/bright background you save yourself and tesseract up to half of the OCR attempts.
- cleanliness is godliness in OCR ;-) : remove any noise from your input image, including window borders and other graphical elements that are not text: this saves tesseract time in it's image-to-line/word segmenter and will consequently produce fewer and cleaner bboxes (bounding boxes) of image snippets to feed into the neural net that does the image pixels to text transformation. Less pixels to munge means more speed going through a 'page' (= input image).
Tesseract has an internal image preprocess which detects long lines (window borders and such) and a few other bits of graphic content, but that is a very generic machine: you surely can do better in a bespoke solution as part of your own image preprocessing stage of the entire screen-to-searchable-text process.
- where text scraping is possible, it will always win: across the board it's lower CPU cost than running an image-based neural net and has FAR fewer quality issues due to the inherent statistics of both procedures. OCR is and always should be: a last resort.
- in the old days, with lower Rez displays, yes, the computer text was 'crisp' - in a very specific technical way that's not conducive to good generic OCR, which is usually printed-book trained and oriented, and with modern displays you get some human-visual improvements but also do realize those new 'crisp' looking characters carry some edge noise, thanks to modern anti aliasing (ClearText and other algos used by the various os'es and display drivers) and ubiquitous subpixel positioning. Hence, an 'A' here does not have to match an 'A' there, pixel for pixel, in the same window+screenshot any more.
That being said, it might be useful to check other, more direct, pattern recognition approaches when your input is decoding rendered text consoles. Maybe look around at openCV, for instance. I don't know: I haven't dealt with your particular input myself.
Cheers,
Ger