Hi,
First of all, thanks for this very useful piece of software!
Here's an issue I'm seeing on 3.03 and git HEAD. On the attached image, page segmentation (-psm 3, also default) seems to find some valid but also one invalid column. Going through the output:
10
15
20
25
30
35
EP 2 377 850 A1
This is a good detecting of the narrow column on the left, and of the top line.
1-(2-(dimethylamino)-4-(trifluoromethyl)benzyl)-3-(2,3-dihydro-2—oxo-1H-benzo[d]imidazo|—4-yl)ure
1-(4-(trif|uoromethyl)-2—(pyrrolidin-1-y|)benzy|)-3-(2,3-dihydro-2—oxo—1H-benzo[d]imidazol-4-yl)urea
[...]
Also good.
1-( -(trif|uoromethyl)-2—(pyrrolidin-1-y|)benzy|)-3-(2,3-dihydro—2—oxobenzo[d]oxazo|—4-y|)urea
1-( -(trif|uoromethyl)-2—(piperidin-1-y|)benzyl)-3-(2,3-dihydro-2—oxobenzo[d]oxazol-4-yl)urea
[...]
Here one character (the 4) is missing from each line.
[...]
The 4s seem to have been detected as a separate column, which is not desired. Seems to me a column should not be detected here, both because the 4s are actually close to other characters (no column separation), and because this column largely overlaps with the main (widest) one.
Would someone familiar with the code be able to check why this is happening? If pointed in the right direction, I could have a try as well :)
Cheers,
Daniel