This command:
$ tesseract.exe 18.jpg test
Gives me "test.txt", which has all the text from 18.jpg, as expected.
This command:
$ tesseract.exe 18.jpg test pdf
Gives me "test.pdf", which doesn't appear to have most of the sentences that exist in test.txt when opened in SumatraPDF. All the PDF text can be highlighted, but when doing a search from within the PDF, only fragments of sentences are found. Opening this same file in Adobe Reader, all text can be found with the find function.
My environment:
$ tesseract.exe -v
tesseract 3.04.00
leptonica-1.71
libjpeg 8d : libpng 1.5.18 : libtiff 4.0.3 : zlib 1.2.8
SumatraPDF v2.5.2
Adobe Reader 11.0.07
Can someone help me out with why this might be happening?
Thanks,
Chris