# first, conversion to TIFF with ghostscript
ghostscript -o 2000_gs.tif -sDEVICE=tiffgray -r720x720 -g6120x7920 -sCompression=lzw 2000.pdf
# then, rotation with imagemagick
convert 2000_gs.tif -rotate 89.4 -background white -alpha Off 2000_rotated.tif
# then, OCR with tesseract, using suggested parameters
tesseract 2000_rotated.tif 2000_readable_gs_custom -c load_system_dawg=0 -c load_freq_dawg=0 -c textord_tablefind_recognize_tables=1 -c textord_tabfind_find_tables=1 pdf
Hi Timo,
I tried the line removal example [1] included with leptonica, I have had luck before using it with tesseract for images with horizontal lines. I didn't manipulate the pdf beyond converting it to a grayscale image and rotating it, my ghostscript won't handle your parameters for some reason. This is the unadorned image without the horizontal lines [2] and these are the results [3]. Not 100% but I think more than 30% and maybe an approach to consider.
art
---
1. http://www.leptonica.com/line-removal.html
2. https://drive.google.com/file/d/0B-PK1n92dlzwalM1bTRtb0FiMVU/view?usp=sharing
3. https://drive.google.com/file/d/0B-PK1n92dlzweDl5aWFPd0pDQnc/view?usp=sharing