Re: Using tesseract for read ZONES

2,913 views
Skip to first unread message

zdenko podobny

unread,
Jan 21, 2013, 5:18:23 PM1/21/13
to tesser...@googlegroups.com
tesseract tries to open uzn files for defined[1] page segmentation modes (from 4 to 10 and 0[2] or other way: it does not use them for modes that request for automatic page segmentation).

In attachment you can find example of image + uzn file from isri-ocr-evaluation-tools[3]. You can test it commands:
  1. tesseract 8531_001.3B.tif 8531_001.3B_no_uzn
  2. tesseract 8531_001.3B.tif 8531_001.3B_uzn -psm 4
First will not use uzn file, second one will use it.


Zdenko


On Thu, Jan 17, 2013 at 2:45 PM, Jorge Barraza <jorge....@gmail.com> wrote:
Hi , im ussing tesseract 3.02 and it works perfect(for recognition).

I have a prototype like a matrix , and i want to recognize the mark "X" on the white spaces of the matrix. 
I read in this forum that i can use a "UZN" files to recognize ZONES of the images but i don't know how to make this files.

Can u help me please?

Thanks.

--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesser...@googlegroups.com
To unsubscribe from this group, send email to
tesseract-oc...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

8531_001.3B.uzn
8531_001.3B.tif

Krijesh PV

unread,
May 29, 2014, 6:09:21 AM5/29/14
to tesser...@googlegroups.com
Hi, I am working on tesseract-ocr.. I am using tesseract ocr  and importing the images and reading the images, that is working fine. But as you said the command 
  1. tesseract 8531_001.3B.tif 8531_001.3B_uzn -psm 4 is generating a 8531_001.3B_uzn.txt file. I am not able to getting uzn file. I need to generate a template for my image contents.
Please help me anyone as soon as possible

Nick White

unread,
May 29, 2014, 9:07:52 AM5/29/14
to tesser...@googlegroups.com
Hi Krijesh,

On Thu, May 29, 2014 at 03:09:21AM -0700, Krijesh PV wrote:
> But as you said the command
>
> 1. tesseract 8531_001.3B.tif 8531_001.3B_uzn -psm 4 is generating
> a 8531_001.3B_uzn.txt file. I am not able to getting uzn file. I need to
> generate a template for my image contents.

Zdenko didn't actually say that command would generate a uzn file,
you didn't read his message carefully enough.

> On Tuesday, January 22, 2013 3:48:23 AM UTC+5:30, zdenop wrote:
> 1. tesseract 8531_001.3B.tif 8531_001.3B_no_uzn
> 2. tesseract 8531_001.3B.tif 8531_001.3B_uzn -psm 4
> First will not use uzn file, second one will use it.

So, to restate, you have to create a uzn file yourself, then if you
name it appropriately and use the appropriate -psm value, Tesseract
will use it.

Nick

Krijesh PV

unread,
May 29, 2014, 10:38:17 AM5/29/14
to tesser...@googlegroups.com
hi,

i am completely a novice on this topics, can please explain on complete process, how can i create this uzn files are there any tools for that, wat does this psn switch mean, how to identify this values.

i could highly appreciate if you could take time and anwser by question. 

krijesh  :)

Nick White

unread,
May 29, 2014, 11:18:51 AM5/29/14
to tesser...@googlegroups.com
Hi Krijesh,

On Thu, May 29, 2014 at 07:38:17AM -0700, Krijesh PV wrote:
> i am completely a novice on this topics, can please explain on complete
> process, how can i create this uzn files are there any tools for that,

There aren't any tools to create uzn files, that I know of. You can
see how they are formatted from the isri example files mentioned
earlier in the thread.

> wat does this psn switch mean, how to identify this values.

Read the tesseract manual (as well as earlier in this thread) for
details about the psm switch, and how it relates to uzn.

Note that you can also use the API to select regions to OCR, if
you're in a programming mood.

Nick

Krijesh PV

unread,
May 30, 2014, 8:02:48 AM5/30/14
to tesser...@googlegroups.com
thanks nick for your help, now am able to create uzn files and extract needed data, but when i pass parameter hocr, the whole document is getting converted and am getting all text content in xhtml, how can restrict to just bring  uzn file zones in hocr file. 

Krijesh PV

unread,
Jun 3, 2014, 5:06:09 AM6/3/14
to tesser...@googlegroups.com
Hi Nick, 
I  generated  the uzn file and added some bbox values of 486 110 635 142 which has only one word of the whole image. 

Next i used the command  tesseract 101436364.tiff 10143636.3B_uzn -psm4 hocr ,So it generated the xhtml of the whole image. 

Next  i used the command  tesseract 101436364.tiff 10143636.3B_uzn -psm4  ,now it generated the text file of  486 110 635 142  area,  which i provided in the uzn file.

Now i need the xhtml of the 486 110 635 142 area. 

How can i generate it ?  I hope you understood my question. 

 It is very urgent for me,Please help me,  Appreciate your help . 
 
Regards
Krijesh PV.
Reply all
Reply to author
Forward
0 new messages