I captured a screenshot of a VirtualBox guest boot crash and Tesseract didn't seem to do very well OCRing that text, so I wanted to try the older engine, which the help says should be possible by using "--oem 0". However, this doesn't work:
D:\temp\virtualbox-project>"c:\Program Files\Tesseract-OCR\tesseract.exe" vb-crash.png output --oem 0
Error: Tesseract (legacy) engine requested, but components are not present in c:\Program Files\Tesseract-OCR/tessdata/eng.traineddata!!
Failed loading language 'eng'
Tesseract couldn't load any languages!
Could not initialize tesseract.
But, I installed Tesseract 5.4.0 using the prebuilt binary:
https://digi.bib.uni-mannheim.de/tesseract/tesseract-ocr-w64-setup-5.4.0.20240606.exeand so that file IS present at the location claimed:
c:\Program Files\Tesseract-OCR\tessdata>dir
Volume in drive C is DESKOS
Volume Serial Number is EA89-635E
Directory of c:\Program Files\Tesseract-OCR\tessdata
06/07/2024 10:59 AM <DIR> .
06/07/2024 10:59 AM <DIR> ..
06/07/2024 04:50 AM <DIR> configs
06/06/2024 09:18 AM 4,113,088 eng.traineddata
01/16/2019 03:53 PM 33 eng.user-patterns
01/16/2019 03:53 PM 27 eng.user-words
06/06/2024 09:19 AM 128,076 jaxb-api-2.3.1.jar
06/06/2024 09:18 AM 10,562,727 osd.traineddata
06/06/2024 09:41 AM 572 pdf.ttf
06/06/2024 09:19 AM 125,187 piccolo2d-core-3.0.1.jar
06/06/2024 09:19 AM 149,558 piccolo2d-extras-3.0.1.jar
06/07/2024 04:50 AM <DIR> script
06/06/2024 09:19 AM 26,376 ScrollView.jar
06/07/2024 04:50 AM <DIR> tessconfigs
9 File(s) 15,105,644 bytes
5 Dir(s) 1,600,415,711,232 bytes free
So it looks like either paths aren't being handled properly on Windows (note the use of forward slashes in the output), or somehow the old engine expects a different format than the eng.traineddata installed with 5.4.0