Training !!!!

145 views
Skip to first unread message

Eric.yang

unread,
Aug 2, 2010, 2:42:43 AM8/2/10
to tesseract-ocr
Hi,all.I'm currently using tesseract-2.04 to recognition Chinese, in
Windows xp.
I read the introduction in http://code.google.com/p/tesseract-ocr/w/list,
but when I do my training run into some problem. Here are the steps i
did:

1.tesseract 1.tif 1 batch.nochop makebox--------------make a txt file
2.Remane 1.txt to 1.box, then use bbtesseract to adjustment.
3.Tesseract 1.tif junk nobatch box.train --------make 1.tr and
junk.txt
4.mftraining scan.tr5.cnTraining scan.tr6.unicharset_extractor
scan.box

Ok, there are inttemp / normproto/ pffmtable/ unicharset, but how do i
use them?
Did I do something wrong?

Thinks a lot!

Jimmy O'Regan

unread,
Aug 2, 2010, 12:13:14 PM8/2/10
to tesser...@googlegroups.com
On 2 August 2010 07:42, Eric.yang <bernab...@gmail.com> wrote:
> Hi,all.I'm currently using tesseract-2.04 to recognition Chinese, in
> Windows xp.

Tesseract 2 will be rubbish for Chinese. Tesseract 3 has specific
support for Chinese/Japanese/Korean.

> I read the introduction in http://code.google.com/p/tesseract-ocr/w/list,
> but when I do my training run into some problem. Here are the steps i
> did:
>
> 1.tesseract 1.tif 1 batch.nochop makebox--------------make a txt file
> 2.Remane 1.txt to 1.box, then use bbtesseract to adjustment.
> 3.Tesseract 1.tif junk nobatch box.train --------make 1.tr and
> junk.txt
> 4.mftraining scan.tr5.cnTraining scan.tr6.unicharset_extractor
> scan.box
>
> Ok, there are inttemp / normproto/ pffmtable/ unicharset, but how do i
> use them?
> Did I do something wrong?
>

Err... you'd have to read further in the training document, where
that's explained.

> Thinks a lot!
>
> --
> You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
> To post to this group, send email to tesser...@googlegroups.com.
> To unsubscribe from this group, send email to tesseract-oc...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.
>
>

--
<Leftmost> jimregan, that's because deep inside you, you are evil.
<Leftmost> Also not-so-deep inside you.

曾婉茹

unread,
Jul 21, 2015, 9:06:15 AM7/21/15
to tesser...@googlegroups.com
Hi Eric,
I am using Tesseract 3.02 to train Chinese(Traditional) data, on Windows 7.
I had been trained English Script font. It works and does improve performance. Now, I am facing problems on Chinese training.
Detail steps as below:

1. I entered command below
tesseract [lang].[fontname].exp[number].tif [lang].[fontname].exp[number] batch.nochop makebox

2. The .BOX fie generated.

3. I revised the .BOX file, because the original .BOX file always has wrong characters.

4. Then, I entered command below
tesseract [lang].[fontname].exp[number].tif [lang].[fontname].exp[number] nobatch box.train 

5. Command window showed "Found 0 good blobs. 7 remaining unlabelled words deleted"(I use 7 Chienese character to train).

Could you share with me how did you train your source?
Looking forward to your tips and reply.

Good day ans super thanks.


Cheers,
Raccoon Tseng 



Eric.yang於 2010年8月2日星期一 UTC+8下午2時42分43秒寫道:
Reply all
Reply to author
Forward
0 new messages