Exactly How Does This Work?

152 views
Skip to first unread message

Mr.Dave

unread,
Jan 7, 2008, 4:06:34 PM1/7/08
to tesseract-ocr
I just found out what OCR is because I've been looking for a way to
digitize my hard documents. So, how does Tesseract work?

I downloaded the following file to my computer (which is running XP):
tesseract-2[1].01.tar.gz

Okay, what is a "gz" file?

I have no idea how this works... help!

I have a scanner and I can convert to tiff using photoshop but that's
where I'm at. Can someone give me a step by step process of how to
make this work?

Thanks!

Pedro Lopes de Almeida

unread,
Jan 7, 2008, 4:36:25 PM1/7/08
to tesser...@googlegroups.com
I'm on the same stage as you. The only difference is that I'm working on
Linux Ubuntu 7.10. Whay does tesseract isn's on the ubuntu repositories?

Yours,
Pedro Lopes de Almeida

Jeffrey Ratcliffe

unread,
Jan 8, 2008, 1:08:27 AM1/8/08
to tesser...@googlegroups.com
On 07/01/2008, Pedro Lopes de Almeida <pedrolop...@gmail.com> wrote:
> I'm on the same stage as you. The only difference is that I'm working on
> Linux Ubuntu 7.10. Whay does tesseract isn's on the ubuntu repositories?

It is. tesseract 1.03 is in Gutsy (7.10) and 2.01 is in Hardy (8.04).

sudo apt-get install tesseract-ocr

will do you the job.

Regards

Jeff

Pedro Lopes de Almeida

unread,
Jan 8, 2008, 11:22:41 AM1/8/08
to tesser...@googlegroups.com
Thanks!

Ok. I've installed Tesseract 1.03. I use Xsane for scanning. On
PREFERENCES»SETUP»OCR»OCR COMMAND I've typed tesseract 1.03, is it
right?
In INPUTFILE OPTION: -i
In OUTPUTFILE OPTION: -o

In GUI output-fd option -x
Progress keyword (nothing write)

But when, after digitalizing an image I order it to save in OCR text, or
to perform OCR task, it saves NOTHING, it happens nothing, just like if
I haven't done anything. So, it is not working.

Can anyone help me, tell what is wrong about my configuration?

I attached a screen of my OCR setup in Xsane.

Yours,
Pedro Almeida

Screenshot-xsane setup.png

Pedro Lopes de Almeida

unread,
Jan 8, 2008, 11:28:09 AM1/8/08
to tesser...@googlegroups.com
By the way, although I only work on Ubuntu, I've also installed Kooka.
However, it does not work properly, many options are not "clicable", or
do not answer. Maybe because it is designed for kde. But if there is a
way to make ocr work on it, I would appreciate if someone explain it to
me.

Best regards,
Pedro Almeida


On Tue, 2008-01-08 at 07:08 +0100, Jeffrey Ratcliffe wrote:

Jeffrey Ratcliffe

unread,
Jan 8, 2008, 11:59:07 AM1/8/08
to tesser...@googlegroups.com
On 08/01/2008, Pedro Lopes de Almeida <pedrolop...@gmail.com> wrote:
> Ok. I've installed Tesseract 1.03. I use Xsane for scanning. On
> PREFERENCES»SETUP»OCR»OCR COMMAND I've typed tesseract 1.03, is it
> right?

I suggest you use gscan2pdf - it integrates scanning and tesseract
nicely - and is also in Gutsy.

Regards

Jeff

Ala

unread,
Jan 9, 2008, 5:11:12 AM1/9/08
to tesseract-ocr
Iam using windows XP , i tried to run the program but i didnt
success..
could you help me please ?

On Jan 8, 6:59 pm, "Jeffrey Ratcliffe" <jeffrey.ratcli...@gmail.com>
wrote:

Jeffrey Ratcliffe

unread,
Jan 9, 2008, 5:30:17 AM1/9/08
to tesser...@googlegroups.com
On 09/01/2008, Ala <tips...@gmail.com> wrote:
> Iam using windows XP , i tried to run the program but i didnt
> success..
> could you help me please ?

I don't have a windows box, but you will have to create your tiffs
with a program like Imaging, and then install tesseract from the
tesseract-2.01.exe.tar.gz file in the downloads section and use it
from the command line.

Julien Benoit

unread,
Jan 10, 2008, 3:50:29 AM1/10/08
to tesser...@googlegroups.com
On Jan 7, 2008 10:06 PM, Mr.Dave <di...@aol.com> wrote:
> I just found out what OCR is because I've been looking for a way to
> digitize my hard documents. So, how does Tesseract work?

> I downloaded the following file to my computer (which is running XP):
> tesseract-2[1].01.tar.gz
>
> Okay, what is a "gz" file?

A gz file is a compressed file similar to a zip. You are to decompress
it with 7zip for instance (http://www.7-zip.org/).

Additional instruction (found in a comment on the wiki)
1) download tesseract-2.01.exe.tar.gz and tesseract-2.00.eng.tar.gz
2) extract these files into the same folder (7-zip or whatever
expanding software you prefer)
3) open a command window for this folder, where the tesseract.exe file
is located.
4) prep a tiff image, in my case I took a digital picture of a book,
tweaked it in photoshop and saved as a tiff with no compression. You
could do the same with the Gimp.
5) now I put the tiff image into the same folder and then in the
command window invoke the operation 'tesseract.exe MyImage?.tif
MyImageConverted? -l eng'
6) the process runs in the background for a few seconds and then a new
text-file appears with the name 'MyImageConverted?.txt'.

> I have a scanner and I can convert to tiff using photoshop but that's
> where I'm at. Can someone give me a step by step process of how to
> make this work?

Only tiffs with no compression will work, otherwise bmp files are working.

--
olorin

Mr.Dave

unread,
Jan 10, 2008, 3:38:20 PM1/10/08
to tesseract-ocr
Thank you Julien.

However, I'm still a little lost here. When I extract the
tesseract-2.01.exe.tar.gz and tesseract-2.00.eng.tar.gz files, they
then just become "tar" files. I'm not sure what you mean by 'open a
command window' and I don't see where the tesseraxt.exe file is....

Thanks,

Dave

Julien Benoit

unread,
Jan 14, 2008, 11:42:36 AM1/14/08
to tesser...@googlegroups.com
Actually, the tar file is an uncompressed archive (again like a zip): to get your files, just open also the tar file with 7zip (I know it's not practical at all)

Then you will get your files, with the tesseract.exe file.

To open a command window, the simplest way is to click on "Start", "Execute": then type cmd and click Ok. You will get a black window.
Now you need to go to the right folder: if your tesseract files are in c:\tesseract, you will need to type:
cd c:\tesseract
And then enter.
This was the step 3.

If nothing is wrong, you should be able to type:
tesseract.exe image.tif image.txt -l eng

--
Julien Benoit

Mr.Dave

unread,
Jan 15, 2008, 9:17:07 AM1/15/08
to tesseract-ocr
Thanks again Julien.

Still having problems, however.

When I open up the "tar" files I still do not get an .exe file. I get
either .box, .tiff, and other assorted files only. No .exe file.
Also, there is no "Execute" under "Start" on my windows XP. What could
I be missing here?

Dave

Softi

unread,
Jan 16, 2008, 5:25:42 AM1/16/08
to tesseract-ocr
Hi Dave

The tar contains the source code for tesseract which needs to be
compiled using visual c++

If you do not want to compile them yourself then there is a seperate
download with the exe files called "Windows executables (vc++6) for
Teseract 2.01" on the download page

Or perhaps FreeOCR might be better for you which has a Windows
interface and includes Tesseract V2.01: http://www.softi.co.uk/freeocr.htm

If you still have problems and would prefer to use the command line
version then let me know and I zip up a working version for you.

Ralph.

Terri

unread,
Jan 16, 2008, 3:39:17 PM1/16/08
to tesseract-ocr
Hi. I have downloaded files mentioned, put them in a folder, read all
the info here - and am still thoroughly confused. This seems to be for
experts... I can do most things but this is beyond me. Sorry if this
is deemed inappropriate here [and if so just ignore!] - Is there any
easy - to - use text conversion/recognition software around please? My
Dell used to have something which worked fine but since a crash it has
disappeared and the re-installation disc doesn't include it.
Thanks much ! T

Avery Pennarun

unread,
Jan 16, 2008, 3:51:57 PM1/16/08
to tesser...@googlegroups.com
On 16/01/2008, Terri <Ana...@gmail.com> wrote:
> Hi. I have downloaded files mentioned, put them in a folder, read all
> the info here - and am still thoroughly confused. This seems to be for
> experts... I can do most things but this is beyond me. Sorry if this
> is deemed inappropriate here [and if so just ignore!] - Is there any
> easy - to - use text conversion/recognition software around please? My
> Dell used to have something which worked fine but since a crash it has
> disappeared and the re-installation disc doesn't include it.
> Thanks much ! T

Just use the Document Imaging program that comes with Microsoft Office.

Avery

Terri

unread,
Jan 17, 2008, 4:32:30 AM1/17/08
to tesseract-ocr
Thank you very much xxxx

On Jan 16, 8:51 pm, "Avery Pennarun" <apenw...@gmail.com> wrote:
Reply all
Reply to author
Forward
0 new messages