Update of cygwin package for training

110 views
Skip to first unread message

Marco Atzeri

unread,
Dec 14, 2015, 5:02:17 AM12/14/15
to tesser...@googlegroups.com
Hi,
I updated both arch (x86 and x86_64) packages to 3.04.00-3.

The tesseract-training-util packages now contains
the scripts taken from development repository and should work correctly.

Mini HOWTO using the Kan language files provided by Sriranga.
as example:

1) package to be installed

tesseract-ocr
tesseract-training-util
tesseract-training-core

in addition the specific font needed for the language
lohit-kannada-fonts


2) copied directory "/usr/share/tessdata/training"
to a working area.
In my case "/pub/devel/tesseract/training"


3) added the kan subdirectory with the specific language files

training/kan/desired_characters
training/kan/kan.config
training/kan/kan.numbers
training/kan/kan.punc
training/kan/kan.training_text
training/kan/kan.training_text.bigram_freqs
training/kan/kan.training_text.train_ngrams
training/kan/kan.training_text.unigram_freqs
training/kan/kan.unicharambigs
training/kan/kan.word.bigrams
training/kan/kan.wordlist

4) command for traininig

tesstrain.sh --lang kan --langdata_dir /pub/devel/tesseract/training
--tessdata_dir /usr/share/tessdata/ --fontlist "Lohit Kannada"
--training_text /pub/devel/tesseract/training/kan/kan.training_text

As result the output file is located on

/tmp/tesstrain/tessdata/kan.traineddata

and the log of the run can be found on

/tmp/tmp<randon-name>/kan/tesstrain.log


Hoping this help

Regards
Marco


Sriranga(83yrsold)

unread,
Dec 14, 2015, 8:38:07 AM12/14/15
to tesser...@googlegroups.com
It would have be nice to build  packages to 3.05.00dev(released on 22 July) also. It works for me in the ubuntu 15.10.
from where the said packages have download for cygwin.


Marco


--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/566E938C.1060001%40gmail.com.
For more options, visit https://groups.google.com/d/optout.

Marco Atzeri

unread,
Dec 14, 2015, 9:11:37 AM12/14/15
to tesser...@googlegroups.com
On 14/12/2015 14:37, Sriranga(83yrsold) wrote:

> It would have be nice to build packages to 3.05.00dev(released on 22
> July) also. It works for me in the ubuntu 15.10.
> from where the said packages have download for cygwin.

as 3.05.00dev seems a development tag and not yet a stable release
I am not planning to package it.

3.04.00 was released the 11th of July so I doubt there is any big difference
anyway.

The improvement on the training scripts from September were imported.

Regards
Marco



Marco Atzeri

unread,
Dec 14, 2015, 9:12:07 AM12/14/15
to tesser...@googlegroups.com
On 14/12/2015 14:37, Sriranga(83yrsold) wrote:

> It would have be nice to build packages to 3.05.00dev(released on 22
> July) also. It works for me in the ubuntu 15.10.
> from where the said packages have download for cygwin.

Sriranga(83yrsold)

unread,
Dec 14, 2015, 9:53:17 AM12/14/15
to tesser...@googlegroups.com
Marco,
thanks for the updated.
Now your updated pacakage can downloaded from where? -  for updating the existing  installed cygwin
with best wishes,sriranga

Regards
Marco



--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

Sriranga(83yrsold)

unread,
Dec 20, 2015, 7:42:51 AM12/20/15
to tesser...@googlegroups.com
Marco,
awaiting response for upadated pacakage for download.
w/b sriranga

Marco Atzeri

unread,
Dec 20, 2015, 9:45:23 AM12/20/15
to tesser...@googlegroups.com
same procedure used for install.

Setup will propose the installation of any updated packages

Don't forget to select the font package lohit-kannada-fonts


On 20/12/2015 13:42, Sriranga(83yrsold) wrote:
> Marco,
> awaiting response for upadated pacakage for download.
> w/b sriranga
>
> On Mon, Dec 14, 2015 at 8:22 PM, Sriranga(83yrsold)
> <withblessing....@gmail.com
> <mailto:withblessing....@gmail.com>> wrote:
>

ShreeDevi Kumar

unread,
Mar 5, 2016, 3:38:03 AM3/5/16
to tesser...@googlegroups.com, Marco Atzeri
Hi Marco,


​Please update cygwin with the new release. Thanks!
 

Mikael Egibyan

unread,
May 10, 2016, 8:39:41 AM5/10/16
to tesseract-ocr
Hi Marco,

Can you please link a tutorial how to generate/create all the specific language files?

Thanks!
Mikayel

Marco Atzeri

unread,
May 10, 2016, 9:39:37 AM5/10/16
to tesser...@googlegroups.com
On 10/05/2016 14:39, Mikael Egibyan wrote:
> Hi Marco,
>
> Can you please link a tutorial how to generate/create all the specific
> language files?
>
> Thanks!
> Mikayel

Hi Mikayel,

It is not clear your request.

Are you asking about training file ?
On cygwin it works as on the other system
https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract

Or how to add additional languages to cygwin ?
Anyone in particular ?

The specific language packages are
just containing the same files from

https://github.com/tesseract-ocr/tessdata

$ tar -tf tesseract-ocr-ita-3.04-1.tar.xz

usr/share/tessdata/ita.cube.bigrams
usr/share/tessdata/ita.cube.fold
usr/share/tessdata/ita.cube.lm
usr/share/tessdata/ita.cube.nn
usr/share/tessdata/ita.cube.params
usr/share/tessdata/ita.cube.size
usr/share/tessdata/ita.cube.word-freq
usr/share/tessdata/ita.tesseract_cube.nn
usr/share/tessdata/ita.traineddata
usr/share/tessdata/ita_old.traineddata

Mikael Egibyan

unread,
May 10, 2016, 9:46:34 AM5/10/16
to tesseract-ocr
Thanks for reply.
The question is that if I create a new language "lan", how shall I generate the files you mention? What files do I need to generate them from?

Thanks,
Mikayel
Reply all
Reply to author
Forward
0 new messages