Re: Counting pixels and dpi

221 views
Skip to first unread message

Sven Pedersen

unread,
Nov 12, 2012, 2:20:34 PM11/12/12
to tesser...@googlegroups.com
Measure the height of a lower case 'x' in your image using an image program, such as Gimp or the standard image viewer on your platform (such as Windows Paint or Mac Preview). 

If the height of a lower-case 'x' in your text is less than 20 pixels, you need to resize it or rescan your documents.
--Sven


On Mon, Nov 12, 2012 at 10:40 AM, chikev <kevin1m...@gmail.com> wrote:
I'd be grateful if someone could help me here.

Here is my request to Zdenko and the reply.

Could you perhaps help me understand, and then change the page, the meaning of:
"A quick check is to count the pixels of the x-height of your characters. (X-height is the height of the lower case x.)"
I have no idea what this means or how to do it.

Well then it would better if you find something else than tesseract. Honestly. You will be lost and disappointed with tesseract because tesseract requires some knowledge (e.g. from image processing). It could be compared to university - if you got there it is expected that you finished your studies in high-school. Nobody there will bother to explain you basis...   IMO there can not be clearer definition of x-height and what to do with it. BTW it is in FAQ and you complain about wrong information in Compilation wiki ;-)

Here is what the FAQ says:

There is a minimum text size for reasonable accuracy. You have to consider resolution as well as point size. Accuracy drops off below 10pt x 300dpi, rapidly below 8pt x 300dpi. A quick check is to count the pixels of the x-height of your characters. (X-height is the height of the lower case x.) At 10pt x 300dpi x-heights are typically about 20 pixels, although this can vary dramatically from font to font. Below an x-height of 10 pixels, you have very little chance of accurate results, and below about 8 pixels, most of the text will be "noise removed".

So if someone could help me, I'm sure I wouldn't be the only one to benefit.

--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesser...@googlegroups.com
To unsubscribe from this group, send email to
tesseract-oc...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en



--
``All that is gold does not glitter,
  not all those who wander are lost;
the old that is strong does not wither,
  deep roots are not reached by the frost.
From the ashes a fire shall be woken,
  a light from the shadows shall spring;
renewed shall be blade that was broken,
  the crownless again shall be king.”

Kevin McCready

unread,
Nov 12, 2012, 3:46:16 PM11/12/12
to tesser...@googlegroups.com
cheers
that was easy!!
many thanks
I wonder if Z will now change the FAQ to tell ppl to use an image program to do the measuring?
Cheers

kevin1m...@gmail.com
32 Hawera Rd
Kohimarama 1071
Auckland, New Zealand
+64 (0)9 528 1174 home
+64 (0)226 710 335 cell
http://kmccready.wordpress.com
Reply all
Reply to author
Forward
0 new messages