Hello,
I created an issue (see below) on Github. Not sure if it is a bug or something for discussion forum...
### Environment
* **Tesseract Version**: tesseract 4.0.0-beta.1
leptonica-1.75.3
libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.2) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0
Found AVX2
Found AVX
Found SSE
* **Platform**: Linux getzinmw-XPS-15-9550 4.15.0-72-generic #81-Ubuntu SMP Tue Nov 26 12:20:02 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
### Current Behavior:
I am currently having issues with the hOCR output from tesseract as compared to the default .txt output. In the attached image, for example, my hOCR output does not register the majority of the numbers on the left side of the page, while they are registered in the .txt output file.
Commands tried:
tesseract input.png output -l eng --psm 6
tesseract input.png output -l eng --psm 6 hocr
### Expected Behavior:
I would expect that the recognition of text would be consistent between the two modes with the output format being the only difference.
### Suggested Fix:
Ensuring consistent output from the various formats.