Help - Simple Example

144 views
Skip to first unread message

Tristan Hodge

unread,
Sep 11, 2016, 3:19:23 PM9/11/16
to tesseract-ocr
Hello All,

I'm trying to do a very simple OCR of a static set of images, the issue im having is its not getting the top row. Here is the image in question

I'm using tesseract-ocr with the command -c preserve_interword_spaces=1 and sometimes -psm 6

The results are (hope the formatting works)

Ina LEVEL	GAMERTAG	PLAYER NAME	CLASS	 CR	MIN	6	 A	OFF	DEF	TEA
LW	14	MCDONALD |19|	Brodie McDonald	 TDNG	 480	58	2	3	100%	 52%	 64%
RW	2	izChalupaBatman	Chalupa Batman	SNP	485	58	1	1	 77%	 63%	 63%
LD	5	1	xHelleury	Niklas Lidstrom	PMD	502	53	0	1	 66%	28%	72%
RD	5	1	Bmexx	Jamal Bieber	TWD	500	59	1	1	80%	41%	70%

IIIH LEVEL	GAMERTAG	PLAYER NAME	CLASS	 CR	MIN	 SV	SV%	SAV	POS	TEA
G	4	Netflixx n Phil	Philip Payne	 HYB	450	60	8	0.530	17%	74%	55%

IIIH LEVEL	GAMERTAG	PLAYER NAME	CLASS	 CR	MIN	G	 A	OFF	DEF	TEA
C	 11	LetsGoFlyersSB	 Ryan Clay	PWF	582	60	5	2	100%	 84%	100%
LW	2	 SnipechGriff	Snipez McGriff	TDNG	513	60	1	2	 89%	 67%	 73%
RW	9	 Johnny Coombs	 Johnny Coombs	PWF	555	60	0	2	 79%	 53%	 95%
LD	9	CRUSHED LOLIPOP	 Nathan Hall	EFD	602	48	1	1	 74%	 46%	 82%
RD	6	 franklow99	FRANK LOW	TWD	446	57	O	2	 78%	 81%	 51%

IIIH LEVEL	GAMERTAG	PLAYER NAME	CLASS	 CR	MIN	 SV	SV%	SAV	POS	TEA
G	 12	cup4b|ues	lm CheeseBurger	 HYB	604	60	12	0.710	 42%	 67%	 65%

The issue is the first column header is that yellow and the value should be "POS", now the issue is that yellow column could be any of the columns depending on what the user has highlighted.
I've tried doing a lot of pre-processing of the image using imagemagick like convert to grey scale and even playing with the threshold but none seem to work

I'm sure it should just be a setting or two in tesseract,

Can anyone help?

Brais Gabín Moreira

unread,
Sep 11, 2016, 8:23:44 PM9/11/16
to tesseract-ocr
You can try somthing like this: http://www.imagemagick.org/Usage/color_mods/#level make the light colors completely white and the dark colors completely black. I use something similar with my images and it works great (I can't use imagemagick).

Tristan Hodge

unread,
Sep 12, 2016, 10:05:27 AM9/12/16
to tesseract-ocr

Thanks very much for the reply

This is a good tip, I use -level 65% and i get a nice clean image. However i still cant get the value of that dark blue column, this is my OCR results

Ian LEVEL	GAMERTAG	PLAYER NAME	CLASS	 CR	MIN	6	A	OFF	DEF	TEA
LW	14	MCDONALD |19|	 Brodie McDonald	TDNG	480	58	2	3	100%	 52%	 64%
Rw	2	izChalupaBatman	Chalupa Batman	SNP	485	58	1	1	 77%	 63%	 63%
LD	5	xHelleury	Niklas Lidstrom	PMD	502	53	0	1	 66%	 28%	 72%
RD	5	Bmexx	 Jamal Bieber	 TWD	500	59	1	1	 80%	 41%	 70%

Ian LEVEL	GAMERTAG	PLAYER NAME	CLASS	 CR	MIN	 SV	SV%	SAV	POS	TEA
G	 4	Netflixx n Phil	Philip Payne	 HYB	450	60	8	0.530	17%	74%	55%

It! LEVEL	GAMERTAG	PLAYER NAME	CLASS	 CR	MIN	6	A	OFF	DEF	TEA
C	11	LetsGoFlyersSS	Ryan Clay	 PWF	582	60	5	2	100%	 84%	100%
LW	2	 SnipechGriff	Snipez McGriff	TDNG	513	6D	1	2	 89%	 67%	 73%
RW	9	 Johnny Coombs	 Johnny Coombs	PWF	555	60	0	2	 79%	 53%	 95%
LD	9	CRUSHED LOLIPOP	 Nathan Hall	EFD	602	48	1	1	 74%	 46%	 82%
RD	6	 franklow99	FRANK LOW	TWD	446	57	0	2	 78%	 81%	 51%

m
G	12	cup4blues	lm CheeseBurger	HYB	604	60	12	0.710	 42%	 67%	 65%
Reply all
Reply to author
Forward
0 new messages