Attached pdf OCRed by ocrmypdf using tesseract 4.00.00alpha
Linux 4.13.0-32-generic #35~16.04.1-Ubuntu SMP x86_64 x86_64 x86_64 GNU/Linux
In some pdf viewers (Evince, Chrome, Opera) all ok but in other (Firefox, Alfresco Share, pdfjs) not so good - lost spaces between the words.
So text "Test PDF from LibreOffice" looks like one big word "TestPDFfromLibreOffice" after copy/paste.
If use some other commercial OCR engines for source pdf - got OCRed pdf with normal spaces in all pdf viewers (in pdfjs too all ok).
So this is two side problem: tesseract devs says - its pdfjs problem, pdfjs devs says - its tesseract problem.
Is it possible to solve this "spaces" problem via some keys for tesseract (ocrmypdf) to force space recognition (like in other OCRs)?
Or make understanding problem root for some more info for pdfjs devs.