Training !!!!

Eric.yang

unread,

Aug 2, 2010, 2:42:43 AM8/2/10

to tesseract-ocr

Hi,all.I'm currently using tesseract-2.04 to recognition Chinese, in
Windows xp.
I read the introduction in http://code.google.com/p/tesseract-ocr/w/list,
but when I do my training run into some problem. Here are the steps i
did:

1.tesseract 1.tif 1 batch.nochop makebox--------------make a txt file
2.Remane 1.txt to 1.box, then use bbtesseract to adjustment.
3.Tesseract 1.tif junk nobatch box.train --------make 1.tr and
junk.txt
4.mftraining scan.tr5.cnTraining scan.tr6.unicharset_extractor
scan.box

Ok, there are inttemp / normproto/ pffmtable/ unicharset, but how do i
use them?
Did I do something wrong?

Thinks a lot!

Jimmy O'Regan

unread,

Aug 2, 2010, 12:13:14 PM8/2/10

to tesser...@googlegroups.com

On 2 August 2010 07:42, Eric.yang <bernab...@gmail.com> wrote:
> Hi,all.I'm currently using tesseract-2.04 to recognition Chinese, in
> Windows xp.

Tesseract 2 will be rubbish for Chinese. Tesseract 3 has specific
support for Chinese/Japanese/Korean.

> I read the introduction in http://code.google.com/p/tesseract-ocr/w/list,
> but when I do my training run into some problem. Here are the steps i
> did:
>
> 1.tesseract 1.tif 1 batch.nochop makebox--------------make a txt file
> 2.Remane 1.txt to 1.box, then use bbtesseract to adjustment.
> 3.Tesseract 1.tif junk nobatch box.train --------make 1.tr and
> junk.txt
> 4.mftraining scan.tr5.cnTraining scan.tr6.unicharset_extractor
> scan.box
>
> Ok, there are inttemp / normproto/ pffmtable/ unicharset, but how do i
> use them?
> Did I do something wrong?
>

Err... you'd have to read further in the training document, where
that's explained.

> Thinks a lot!
>
> --
> You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
> To post to this group, send email to tesser...@googlegroups.com.
> To unsubscribe from this group, send email to tesseract-oc...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.
>
>

--
<Leftmost> jimregan, that's because deep inside you, you are evil.
<Leftmost> Also not-so-deep inside you.

曾婉茹

unread,

Jul 21, 2015, 9:06:15 AM7/21/15

to tesser...@googlegroups.com

Hi Eric,

I am using Tesseract 3.02 to train Chinese(Traditional) data, on Windows 7.

I had been trained English Script font. It works and does improve performance. Now, I am facing problems on Chinese training.

Detail steps as below:

1. I entered command below

tesseract [lang].[fontname].exp[number].tif [lang].[fontname].exp[number] batch.nochop makebox

2. The .BOX fie generated.

3. I revised the .BOX file, because the original .BOX file always has wrong characters.

4. Then, I entered command below

tesseract [lang].[fontname].exp[number].tif [lang].[fontname].exp[number] nobatch box.train

5. Command window showed "Found 0 good blobs. 7 remaining unlabelled words deleted"(I use 7 Chienese character to train).