Hi all,
We need to process the title bars from a set of screen recordings for a programming IDE. An example of a title is:
Java - commons-collections4/src/main/java/org/apache/commons/collections4/list/LazyList.java - Eclipse
The videos have already been recorded so we are stuck with the quality of the frames as is (I have included an example of this image as an attachment).
When running it under tesseract with stock settings, the output is instead:
> tesseract title_lazylist.png stdout
lava , (ammon5chIIemansA/src/msun/Java/arg/apame/wmmans/calIemansA/nst/Lazyust Java , Eclipse
I expect that recognition will be poor with default settings, but I'm unclear on what I should be doing to proceed in this particular case -- whether it is to apply some filter on the image first as a pre-processing step, if I should have custom config settings (such as "load_system_dawg 0") or some combination of both.
I'm not an expert in OCR so any suggestions are appreciated.
The version of tesseract is:
tesseract 3.05.00dev
leptonica-1.73
libgif 4.1.6(?) : libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.6.20 : libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.4.3 : libopenjp2 2.1.0
Thanks,
Titus