The wiki (http://code.google.com/p/tesseract-ocr/w/list) is
extensively updated with new release notes (http://code.google.com/p/
tesseract-ocr/wiki/ReleaseNotes), documentation on training (http://
code.google.com/p/tesseract-ocr/wiki/TrainingTesseract) and
documentation on testing (http://code.google.com/p/tesseract-ocr/wiki/
TestingTesseract)
Be aware that this version has substantial changes and therefore may
have broken the build on one of the systems for which we have no
direct testing. The extern "C" problem should be a thing of the past
however.
Ray.
I have download the latest Tesseract and tried it out. I run the
Tesseract.exe in Window platform and it give me this error "the system
cannot execute the specified program". Is it I gonna run the training
first in order to run the tesseract and dlltest execution file?
On Jul 19, 6:12 am, theraysm...@gmail.com wrote:
> The code for Tesseract 2.00 is now checked in to subversion and the
> tarballs are on the main site.
> Seehttp://code.google.com/p/tesseract-ocr/downloads/list.
while installing the tesseract from SVN i found that the following
files are not checked in
tessdata/tessconfigs/Makefile.in
tessdata/configs/Makefile.in
This was breaking the ./configure.
After copying these files from tesseract-2.00.tar.gz. I was able to
build the system.
--lohith
On Jul 19, 3:12 am, theraysm...@gmail.com wrote:
> The code for Tesseract 2.00 is now checked in to subversion and the
> tarballs are on the main site.
> Seehttp://code.google.com/p/tesseract-ocr/downloads/list.
I'm new trying to use Tesseract and I'm a little bit confused. I have
already used OCR systems for Window, such as Readiris and Omnipage
Pro. I want to have a flexible OCR system under Ubuntu 7.04, and I
have found Tesseract, which seems to be pretty interesting. I have
three general questions:
a) How does Tesseract compare to Omnipage and Readiris? Is it better,
easier to configure, ...?
b) Is there any 'simple' guide for novices?
c) Would you (or anybody) please give an advice on how can I use 50
pages already scanned and stored in JPG? These pages have spanish and
english text mixed.
Again, thank you very much for your efforts,
kind regards,
Ricardo
On 19 jul, 00:12, theraysm...@gmail.com wrote:
> The code for Tesseract 2.00 is now checked in to subversion and the
> tarballs are on the main site.
> Seehttp://code.google.com/p/tesseract-ocr/downloads/list.
b) Read all the Wikis at http://code.google.com/p/tesseract-ocr/w/list.
It is important to note that Tesseract currently has some important
features missing, like page segmentation and it also has no graphical
user interface (GUI).
c) You will have to convert from jpg to tif. Mixed languages may be a
problem. You will have to pick the most frequent, but bias your choice
in favour of Spanish, as that has more accents, and with English you
will lose them.
On Jul 19, 8:54 pm, theraysm...@gmail.com wrote:
> Please see issue 43.http://code.google.com/p/tesseract-ocr/issues/detail?id=43
Hi,
I had tried in window platform - same error message"the system
> cannot execute the specified program" displayed in "CommandPrompt".
Unless it is rectified, I have doubt whether training is able to run -
which I have not tested yet
Any solution is forthcoming?.
On Jul 19, 6:59 am, "slch2...@gmail.com" <slch2...@gmail.com> wrote:
> Hi,
>
> I have download the latest Tesseract and tried it out. I run the
> Tesseract.exe in Window platform and it give me this error "the system
> cannot execute the specified program". Is it I gonna run the training
> first in order to run the tesseract and dlltest execution file?
>
> On Jul 19, 6:12 am, theraysm...@gmail.com wrote:
>
> > The code for Tesseract 2.00 is now checked in to subversion and the
> > tarballs are on the main site.
> > Seehttp://code.google.com/p/tesseract-ocr/downloads/list.
> > Note that this version recognizes 6 languages. To be completely
> > language independent, there is *no* language data with the source, so
> > you have to download a separate language file to get it to work at
> > all.
>
> > The wiki (http://code.google.com/p/tesseract-ocr/w/list ) is
> > extensively updated with new release notes (http://code.google.com/p/
> > tesseract-ocr/wiki/ReleaseNotes), documentation on training (http://
> > code.google.com/p/tesseract-ocr/wiki/TrainingTesseract) and
> > documentation on testing ( http://code.google.com/p/tesseract-ocr/wiki/
Try the new exe6 tarball that I put up yesterday...
Ray
On 7/21/07, 74yrsold < withbl...@gmail.com > wrote:
Hi,
I had tried in window platform - same error message"the system
> cannot execute the specified program" displayed in "CommandPrompt".
Unless it is rectified, I have doubt whether training is able to run -
which I have not tested yet
Any solution is forthcoming?.
On Jul 19, 6:59 am, "slch2...@gmail.com" < slch2...@gmail.com> wrote:
> Hi,
>
> I have download the latest Tesseract and tried it out. I run the
> Tesseract.exe in Window platform and it give me this error "the system
> cannot execute the specified program". Is it I gonna run the training
> first in order to run the tesseract and dlltest execution file?
>
> On Jul 19, 6:12 am, theraysm...@gmail.com wrote:
>
> > The code for Tesseract 2.00 is now checked in to subversion and the
> > tarballs are on the main site.
> > Seehttp://code.google.com/p/tesseract-ocr/downloads/list.
> > Note that this version recognizes 6 languages. To be completely
> > language independent, there is *no* language data with the source, so
> > you have to download a separate language file to get it to work at
> > all.
>
> > The wiki ( http://code.google.com/p/tesseract-ocr/w/list ) is
You guys are the testers for the documentation, so I am happy to help
you and update it with the deficiencies that I learn along the way.
You won't get much help for the next 2-3 weeks though as I will be
traveling and will not be checking my email much. The forum is a good
place to work through this though, as any one else having trouble can
see the answers.
For now, this may help:
When you have prepared a training.box file matching a training.tif
file, the command line is:
tesseract training.tif junk nobatch box.train
You should not expect any interesting output at all in junk.txt. The
output goes to training.tr and that SHOULD change if you change the
content of training.box.
Training.tr is then the input to mftraining and cntraining. I will add
a diagram to illustrate the data flows soon.
Regards,
Ray.
i tried everything but i thik i missed something. Can someone give a
step by step example how to train tess and what files i do need?
kind regards
The new application uploaded works fine but I would like that you
upload the project with the new source files too.
Thanks,
Alexandrino.
On Jul 25, 6:11 pm, "Ray Smith" <theraysm...@gmail.com> wrote:
> Sriranga, Keith,
>
> You guys are the testers for the documentation, so I am happy to help
> you and update it with the deficiencies that I learn along the way.
> You won't get much help for the next 2-3 weeks though as I will be
> traveling and will not be checking my email much. The forum is a good
> place to work through this though, as any one else having trouble can
> see the answers.
>
> For now, this may help:
> When you have prepared a training.box file matching a training.tif
> file, the command line is:
> tesseract training.tif junk nobatch box.train
> You should not expect any interesting output at all in junk.txt. The
> output goes to training.tr and that SHOULD change if you change the
> content of training.box.
> Training.tr is then the input to mftraining and cntraining. I will add
> a diagram to illustrate the data flows soon.
>
> Regards,
> Ray.
>
> On 7/23/07, Keith Beaumont <beaumon...@gmail.com> wrote:
>
> > Ray,
> > Prev msg from sriranga(74yrsold):
> > "Is it possible to modify the wiki/training accordinly for benefit of
> > newbies."
>
> > Speaking as a newbie, yes PLEASE. I am completely lost when reading these
> > notes. Sorry!!
>
> > Maybe we could have an Email conversation. I say "how do you ...?" you
> > answer "blah blah"
> > And we keep going till I understand (may take a while!!! I'm a bit
> > simple!!).
> > Then you can publish new instructions.
> > I would be using window .exe's ONLY!!
>
> > By the way, should I be using the forum for this request?
>
> > On 7/21/07, withblessi...@gmail.com <withblessi...@gmail.com> wrote:
>
> > > Ray,
> > > sub:New Revision: 73TrainingTesseract
> > > How to use the tools provided to train Tesseract for a new language
> > > I find little difficult to follow instructions in the absence of
> > > example.i.e.
> > > input(example) and what output expected for each command line.
> > > Is it possible to modify the wiki/training accordinly for benefit of
> > > newbies.
> > > With Regards,
> > > -sriranga(74yrsold)
>
> > > > On 7/21/07, Ray Smith < theraysm...@gmail.com> wrote:
>
> > > > > Try the new exe6 tarball that I put up yesterday...
> > > > > Ray
>
> > > > > > > > The wiki (http://code.google.com/p/tesseract-ocr/w/list) is
On Jul 31, 4:01 pm, withblessi...@gmail.com wrote:
> Alexandrino,
> I could understand which "new application uploaded works fine"
> Will you kindly elaborate/explain in detail, since I am curious to know?
> Have suceeded in training and if so which language?
> Regards,
> -sriranga(74 yrs old)
>
On 31 Jul., 20:13, "Keith Beaumont" <beaumon...@gmail.com> wrote:
> Where is this tesseractOCR. pdf?
>
> On 7/27/07, withblessi...@gmail.com <withblessi...@gmail.com> wrote:
>
>
>
> > Hi Ray,
> > Just know downloaded tesseractOCR. pdf - which is impressive and educative
> > one, lucidly explained.especially baselines. Based on baselines,now I am
> > feeling that I can suceed Kannada(kan) Language (one of the Indian
> > languages) which is complex and no one has developed OCR for
> > Kannada/telugu only whereas other Indian langauges like hindi, tamil,
> > marathi, Gujarati, bengali etc are generally available.
> > Appreciate your pdf.
> > With regards,
> > -sriranga(74yrsold)
>
> > On 7/26/07, Ray Smith <theraysm...@gmail.com > wrote:
>
> > > Sriranga, Keith,
>
> > > You guys are the testers for the documentation, so I am happy to help
> > > you and update it with the deficiencies that I learn along the way.
> > > You won't get much help for the next 2-3 weeks though as I will be
> > > traveling and will not be checking my email much. The forum is a good
> > > place to work through this though, as any one else having trouble can
> > > see the answers.
>
> > > For now, this may help:
> > > When you have prepared a training.box file matching a training.tif
> > > file, the command line is:
> > > tesseract training.tif junk nobatch box.train
> > > You should not expect any interesting output at all in junk.txt. The
> > > output goes to training.tr and that SHOULD change if you change the
> > > content of training.box.
> > > Training.tr <http://training.tr/> is then the input to mftraining and
> > > cntraining. I will add
> > > a diagram to illustrate the data flows soon.
>
> > > Regards,
> > > Ray.
>
> > > On 7/23/07, Keith Beaumont < beaumon...@gmail.com> wrote:
> > > > Ray,
> > > > Prev msg from sriranga(74yrsold):
> > > > "Is it possible to modify the wiki/training accordinly for benefit of
> > > > newbies."
>
> > > > Speaking as a newbie, yes PLEASE. I am completely lost when reading
> > > these
> > > > notes. Sorry!!
>
> > > > Maybe we could have an Email conversation. I say "how do you ...?" you
> > > > answer "blah blah"
> > > > And we keep going till I understand (may take a while!!! I'm a bit
> > > > simple!!).
> > > > Then you can publish new instructions.
> > > > I would be using window .exe's ONLY!!
>
> > > > By the way, should I be using the forum for this request?
>
> > > > On 7/21/07, withblessi...@gmail.com <withblessi...@gmail.com> wrote:
>
> > > > > Ray,
> > > > > sub:New Revision: 73TrainingTesseract
> > > > > How to use the tools provided to train Tesseract for a new
> > > language
> > > > > I find little difficult to follow instructions in the absence of
> > > > > example.i.e.
> > > > > input(example) and what output expected for each command line.
> > > > > Is it possible to modify the wiki/training accordinly for benefit of
> > > > > newbies.
> > > > > With Regards,
> > > > > -sriranga(74yrsold)
>
> > > > > On 7/21/07, withblessi...@gmail.com <withblessi...@gmail.com >
> > > > > > On 7/21/07, Ray Smith < theraysm...@gmail.com> wrote:
>
> > > > > > > Try the new exe6 tarball that I put up yesterday...
> > > > > > > Ray
>