Hi,
I am trying to extract tabular data. For this I am converting the image into hocr.
Now this hocr is not coming properly. It first puts the data for one column and then for the other. I do not get data which is put row wise and column wise so that the extraction comes as a proper table.
I have tried with -psm 5 and with -psm 6 but in both cases the hocr looks identical.
I am using tesseract 3.05
even preserve_interword_space set to 1 is not working.
Any help would be useful
For eg
we have the following in the image
Colulmn 1 Column 2
X 1
Y 2
Z 3
hocr is giving
X
Y
Z
1
2
3
I would like the output to be
X 1
Y 2
Z 3
Will be grateful for any help and/or ideas
Thanks