Can't get bib#'s from tshirt JPG: should be simple.

47 views
Skip to first unread message

Tim Nettleton

unread,
Nov 30, 2022, 1:02:00 AM11/30/22
to tesseract-ocr
I've tried many combinations of -psm and -l but can't extract numbers from JPGs.

I have about 100 pictures of people running and they are wearing a bib #.
I've had a little more success if I convert to tif but not really.

What parameters to simply extract numbers from pics like this?

I'm a bit overwhelmed.

Tim


TSP_12484529.JPG
TSP_12486625.JPG
TSP_12484557.JPG

Zdenko Podobny

unread,
Nov 30, 2022, 1:17:48 AM11/30/22
to tesser...@googlegroups.com

st 30. 11. 2022 o 7:01 Tim Nettleton <t...@truespeedphoto.com> napísal(a):
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/9614c183-f252-403e-a39f-e3bea8ab637bn%40googlegroups.com.

Tim Nettleton

unread,
Nov 30, 2022, 2:09:47 PM11/30/22
to tesseract-ocr
I do not understand: is tesseract not capable of doing this?
Is there no input or guidance that anyone can yield to get closer to a solution with tesseract?

Tim

Giuseppe Coniglio

unread,
Dec 1, 2022, 2:33:58 AM12/1/22
to tesser...@googlegroups.com
Hi Tim, below the code

Tesseract tesseract = new Tesseract();
try {
tesseract.setDatapath("tessdata");
// the path of your tess data folder inside the extracted file
String text = tesseract.doOCR(new File("c:\\temp\\TSP_12484529.JPG"));
// path of your image file
System.out.print(text);
} catch (TesseractException e) {
e.printStackTrace();
}


Text:

“linge
— jer bed eh i
' ad ~m & .] i — ——
; om +. Yi * . ¥ ™ * :
i q 7 OW ne ~ .~ ia ‘, " i
: 77s 7 US we 1 oe
Ps - a F 7. ' a
3 7 e si * +” " sa = : *
| es _” nr
es , , ;
TH + Fo } a
: ‘ = ee ey W 4 p =e ( \\\ 3 1
Ea 7 * fii Y7 A (s . \ > . '
si 7. } WH ‘.@ oh j
: : t Y p fn we ¢ “a « a Pe,
ca ¥v a RELAY ii ae : P yi te
ae é a Sa "Pu
) > 8 oe —_ - z Z J
. * ! a es rs
S Q Alm» 4 , + : ‘
& fa ts !. i
- | so ix 2108 A As Cees
¥ 2 @ —— j
a : ‘
a ; % . + a
a i a . * & : 4 1” 2
“ * a - ema we q
vg a . i = oe ,
ad - | e < P
, ' ee oe:
s a a a a4 me , “ 3
. ae : wal” +
: | mS wip cus ot
@ Ps él : — oe e a a -,
' mY —- ; —_  -F « La
’ , ; ~ - ~  Y _ ,
e ; a . Y 7 . a 2 ~
» a= ; oa , fas . ’ mi
. ** -< ‘ » = a | ~ e : * d
: J . S -_ , ne os e 4 reat . ¥ ;
- f , a - & , ‘ “- c LL. : y ~. ae <
On gn ne Ae a yA
a v. od on ie . mm > A J a-< 5 7 Oe .
ig ee Es 2 ee
ft SOE ode Zi ee ae DP ee) |
ga . s>* 4 " > ; “4 fF SR, +
zt Jx-e cot % “i | : A Pa c L/
ne ; ao “ \] oo TT. ¥ ae : “ J J
Se , Sw og a 2 ~~
i A ile PPE POP we
: . j \ . - 7 J 4 y 4 pe Jk 4 “ae >
ee awe ee = ot, % ~ thy i 4 é c& ie’ 4 lies > ’ ; J ©
Ree sR rh JEP
=. << a — > y £ wit 2 . = 5 A y _s-
——e t f ~ by ae? ‘ Ft wer
x : <. >» re of Fie
i = | s fe“ mee =. Se Le i, f
— : _ - er | Ps Perk g
OT Au ik Prova) SAE: eee et
» Te eel - = As f a = - = 5 a hee De = - CS7 “g oy
‘ : , = ~ ( ‘ : = gre = at a VG he ws aed
« - «?f-=- \| i = ¥ } ‘ c wry we
- ‘ ) ‘i \ f F ae - Cue thy
(cite i —_ 4 \ eS a Et |
* it a= - -_ o> , _ a) ee > <—&  _ree. > =x = ws Ls he ,
: 2 FS ae ect. - a 5
See 2 > — ee. er LE é 2 ~~ pln 9
Zs be ee ee, : j a eB ; Ze ; ; Z ae 3 - = a : i
SEE - Zs Te Se ee Do 2 CEE, ae

Tim Nettleton

unread,
Dec 2, 2022, 9:29:34 AM12/2/22
to tesseract-ocr
I found a site that uses tesseract and it does VERY well with nature and numbers.

When I use tesseract I do NOT get the same results that they do.
 
I’ve attached an example image that clearly I need to get 1433 for this team member.
When I use your OCR(https://www.imagetotext.info/) it says “AMERICAN FAMILY INSURANC 1433” which is great!
 
When I run tesseract, I get trash on the same image:
 
c:\>tesseract.exe 12749691.jpg stdout -l eng --psm 6 --oem 3
we As eee ┬╗
Ate ae
é FAS
; Z cae f .
if\ iy
i * ΓÇÖ .
| TPE
xX * dp
 
What are we doing wrong? 
Do I need to run a program before tesseract to isolate areas?
What are we missing?

Any help would be great!
 
Thanks,
 
Tim Net
Reply all
Reply to author
Forward
0 new messages