Training Won't Accept 'Slashed' Zero

1,013 views
Skip to first unread message

jmar...@myway.com

unread,
Feb 10, 2011, 2:48:24 PM2/10/11
to tesseract-ocr
[Avatar]
2011-02-10 14:34:32 EST
The log file below is the result of training with an image containing
"slashed" zeros (zero with a diagonal line in it to differentiate it
from Upper-case O.)

If I edit out the diagonal, there are no errors in tesseract.log, but
interpretation of zero and O are unreliable, even with a line in
eng.unicharambigs.

How can I get tesseract to accept the slashed zero? So far I have
converted the image to black text on white background and scaled up to
approx. 300 dpi.

----------------- tesseract.log
-------------------------------------
Found fonts: ['IA']
Tesseract Open Source OCR Engine with Leptonica
APPLY_BOXES: boxfile 1/51/0 ((2295,326),(2323,370)): FAILURE! box
overlaps no bl obs or blobs in multiple rows
APPLY_BOXES: boxfile 3/51/0 ((2289,137),(2317,181)): FAILURE! box
overlaps no bl obs or blobs in multiple rows
APPLY_BOXES: More than one block??
APPLY_BOXES: FATALITY - 0 labelled samples of "0 [30 ]" - target is
2:
APPLY_BOXES: Boxes read from boxfile: 226
Initially labelled blobs: 224 in 4 rows
Box failures detected: 2
Duped blobs for rebalance: 0
"0" has fewest samples: 0
Total unlabelled words: 0
Final labelled words: 224
Generating training data TRAINING ... Font name = IA
Generated training data for 224 blobs


See tif image at: http://www.flickr.com/photos/59351419@N05/5434403800/

calg...@gmail.com

unread,
Sep 13, 2013, 5:06:04 PM9/13/13
to tesser...@googlegroups.com, jmar...@myway.com
Hello friend, have you got a solution for this issue?

I need to process an IBM 3270 terminal screen but I got no success with tesseract training for slashed zeros.

Thanks in advance

Carlos

zdenko podobny

unread,
Sep 14, 2013, 8:14:25 AM9/14/13
to tesser...@googlegroups.com
try to set edges_use_new_outline_complexity to  True (see box.train2 in attached test case).

Zdenko



--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To post to this group, send email to tesser...@googlegroups.com.
To unsubscribe from this group, send email to tesseract-oc...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.


test_case.tar.gz

Carlos Garcia

unread,
Sep 15, 2013, 9:56:50 PM9/15/13
to tesser...@googlegroups.com
Ok, thanks for your response.

I'll check this than I back to you.

Carlos


Carlos Garcia

As informações contidas nesta mensagem são CONFIDENCIAIS (artigos 153, 154 do Código Penal, c.c, art. 195 da Lei 9279/96 e Legislação Civil aplicável), protegidas pelo sigilo legal e por direitos autorais. A divulgação, distribuição, reprodução ou qualquer forma de utilização do teor deste documento depende de autorização do emissor, sujeitando-se o infrator às sanções legais. O emissor desta mensagem utiliza o recurso somente no exercício do seu trabalho ou em razão dele, eximindo-se o empregador de qualquer responsabilidade por utilização indevida ou pessoal. Caso esta comunicação tenha sido recebida por engano, favor avisar imediatamente, respondendo esta mensagem.


2013/9/14 zdenko podobny <zde...@gmail.com>

--
--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesser...@googlegroups.com
To unsubscribe from this group, send email to
tesseract-oc...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en
 
---
You received this message because you are subscribed to a topic in the Google Groups "tesseract-ocr" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tesseract-ocr/FE2YDm67-gU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to tesseract-oc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply all
Reply to author
Forward
0 new messages