Tesseract Training Problem (under Mac)

470 views
Skip to first unread message

OCR Newbie

unread,
Aug 28, 2010, 2:45:41 AM8/28/10
to tesseract-ocr
Hi All,

Currently I am trying to use Tesseract(2.04) to recognize my own data,
with Mac OS X Snow Leopard.
I find this http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract
and I am trying to follow this tutorial.
My questions are:
1. I already have my train.tif ready, but I am not sure where I should
place the image file, (under 'tessdata' folder or can be anywhere?
2.About run the tesseract on my training image, it asks to run
'tesseract train.tif train batch.nochop makebox' , I guess I should
use the terminal, but when I type this command into it, it keep saying
'tesseract command not found', I tried to run the configure terminal
first and type 'make', but it is still not working.
Can anyone give me some instructions on how to run this, specifically?

I am sorry I am totally fresh to the Mac OS.

Any help is appreciated!!

Jimmy O'Regan

unread,
Aug 29, 2010, 9:30:55 PM8/29/10
to tesser...@googlegroups.com
On 28 August 2010 07:45, OCR Newbie <4eve...@gmail.com> wrote:
> Hi All,
>
> Currently I am trying to use Tesseract(2.04) to recognize my own data,
> with Mac OS X Snow Leopard.
> I find this http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract
> and I am trying to follow this tutorial.
> My questions are:
> 1. I already have my train.tif ready, but I am not sure where I should
> place the image file, (under 'tessdata' folder or can be anywhere?

If you're running 'tesseract train.tif ...', it just needs to be in
the current directory.

> 2.About run the tesseract on my training image, it asks to run
> 'tesseract train.tif train batch.nochop makebox' , I guess I should
> use the terminal, but when I type this command into it, it keep saying
> 'tesseract command not found', I tried to run the configure terminal
> first and type 'make', but it is still not working.

You also need to use 'make install', or provide a path to the
executable - Unix-like systems (unlike DOS, etc.) do not include the
current directory in the executable search path. (You can, of course,
change that but it's A Bad Idea.)

If tesseract is in /home/jim and $PWD (use 'echo $PWD') is /home/jim I
could use:
./tesseract ...
('.' means 'this directory')
/home/jim/tesseract
(the full path)
or even
../jim/tesseract
('..' means 'one level lower' - in this case, '/home')
or even:
$PWD/tesseract

($PWD is an environment variable, and will always be there... unless
you remove it from another shell, but you probably don't need to worry
about that).

I think MacOS uses /User or something else, just substitute with
actual values. Using 'make install' will be more convenient, though.
--
<Leftmost> jimregan, that's because deep inside you, you are evil.
<Leftmost> Also not-so-deep inside you.

John Smith

unread,
Sep 3, 2010, 11:15:08 PM9/3/10
to tesser...@googlegroups.com
Hi,

Thank you so much for the reply.
I just have one more step to make, I am using Tesseract 2.04 now and I've got all the files ready, I am trying to combine them all together but there is no combine_tessdata for 2.04, I want to know how to combine them under 2.04.

Thank you so much!!

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To post to this group, send email to tesser...@googlegroups.com.
To unsubscribe from this group, send email to tesseract-oc...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.


zdenko podobny

unread,
Sep 5, 2010, 6:01:56 AM9/5/10
to tesser...@googlegroups.com
Hello,

Tesseract 2.04 do not use "combined" file, so there is no combine_tessdata. Just copy your files to tessdata directory. 

At the moment http://code.google.com/p/tesseract-ocr/wiki/TestingTesseract describe training for Tesseract 3.0 (with mistakes ;-) - I started to check it so soon there will be correct version). If you want to see description for Tesseract 2.04 look at svn repository http://code.google.com/p/tesseract-ocr/source/browse/wiki/TrainingTesseract.wiki?r=318. It is in wiki syntax but it is easy readable.

BR,

Zd.

Zdenko Podobný

unread,
Sep 6, 2010, 2:05:17 PM9/6/10
to tesser...@googlegroups.com
.
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en.


 --
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To post to this group, send email to tesser...@googlegroups.com.
To unsubscribe from this group, send email to

Jimmy O'Regan

unread,
Sep 7, 2010, 7:35:18 AM9/7/10
to tesser...@googlegroups.com
2010/9/6 Zdenko Podobný <zde...@gmail.com>:

Yeah. Unfortunately, I'm not aware of any means of deleting comments.

Reply all
Reply to author
Forward
0 new messages