improve accuracy for these documents?

52 views
Skip to first unread message

Caleb Pitman

unread,
Aug 24, 2016, 12:52:09 PM8/24/16
to tesseract-ocr
Hello,

I am trying to get better results out of the attached example documents.

Tesseract 3.03, leptonica 1.7, libgif 4.1.6(?) : libjpeg 8d : libpng 1.2.50 : libtiff 4.0.3 : zlib 1.2.8 : webp 0.4.0


Most scans not shown below have fared much better by disabling the following:
load_system_dawg F
load_freq_dawg F
load_punc_dawg F
load_number_dawg F
load_unambig_dawg F
load_bigram_dawg F
load_fixed_length_dawgs F



I'm only after the XY data and/or Lat/Long and any relevant text such as NAD27/NAD83/State Plane Zone. 

results per image:
a.tif (400 dpi)
dawg F: Coordinates NAD 27 TC ZONE X - 1289509 Y- 812593 LAT.: N 31.88025 LONé.: W EZLBZZZS
dawg T: Coordinates NAD 27 TC ZONE X 8 1289509 Y- 812593 LAT; N 31.88025 LONéJ W M62225
freds textcleaner: 
Coordinates NAD 27 TC ZONE X - 1289509 Y- 812593 LAT.: N 31.88025 LONG.: W 102.62225
textcleaner -g -e stretch -f 25 -o 20 -t 30 -u -s 1 -T -p 20  a.tif a2.tif 



c.tif (300 dpi)
dawg F: 
LAT!TUDE: N 3!'5!â22' LONG!TUDE: W 102'42â18" C.A.G.
STATE PLANE COORDINATES: ' DI;IA&;;{:âE'SIOSAEI.E%?:S&'O;%VEIESIN
NORTH: 10646661 EAST: l5602l0 .
dawg T: 
LATITUDE: N 3I'5Iâ22' LONGITUDE: W 102°42âI8' C.A.G.
STATE PLANE COORDINATES: ' DIaAï¬EEEOSAéEI§ES§O;%VEIESIN
NORTH: 1064666! EAST: I5602IO 
freds textcleaner:
LATITUDE: N 3!'5!'22â LONG!TUDE: W 102â42'!8" C.A.G.
STATE PLANE coommres: ' D{STANCE TO NEAREST rowu m Co.:
Tws WELL wcm(m SHOWM ON ms NORTH: f0646661 EAST: 1560210 51 MfLES SE OF NOTREES, TX
convert c.tif -flatten -colorspace gray -median 2 -resize 90% -threshold 18% c3.tif
LATITUDE: N 3!'5{'22' LONG!TUDE: W 102'42'18" C.A.G.
STATE PLANE COORDWATES: msnwcg TO NEAREST TOWN w CO.:
NORTH: 10646661 EAST: !5602!0 51 MILES SE OF NOTREES, TX




d.tif (300 dpi)
dawg F:
SURVEY WELL DATA ~ NAD 27. TX(C)
\ \\  X=1408281.18': Y= 748062.84'
\ 4015 40|4 \ 4}ï¬ T===§3==§§ La+.= 31â
dawg T:
\ \ SURVEY WELL DATA ~ NAD 27. TX(C)
\ \\  X=1408281.18': Y= 748062.84'
\ 4015 4014 \ 4}? T===§3==§§ LOT.= 31°42â34.52â
freds textcleaner:
;'    \ \ §SQJER\/E§§( WELL DATA-âNAD 27. TX(C)
\ \\ L@@K = 1408281.18â: Y= 748062.84â
\ 40|5 40|4 \ 41? jr==:§==§§ La+.= 31°42'34.52â
convert d.tif -flatten -colorspace gray -median 2 -resize 90% -threshold 18% d3.tif
:ggnggy/;g;\/ WELL DATA # NAD 27. TX(C)
.âJ %: x: W408281.18': Y: 748062.84â
,i:g H r;f>:3@* La+.: 3Tâ42â34.52":Long.: 102â14â10.19â



Any ideas how to improve accuracy ?


a.tif
c.tif
d3.jpg
Reply all
Reply to author
Forward
0 new messages