Some quesetions about ocrd-train

34 views

Skip to first unread message

unread,

Jun 4, 2020, 1:40:45 AM6/4/20

to tesseract-ocr

Hello,everyone:

Currently I use the "https://github.com/tesseract-ocr/tesstrain " project for training my own dataset.

I use this command,

make unicharset lists training MODEL_NAME=foo TESSDATA=./tessdata_best GROUND_TRUTH_DIR=./data/foo-ground-truth PSM=7

I encounter one error that tell me loss of the "radical-stroke.txt" file,so I put the /langdata_lstm/radical-stroke.txt into the data folder and try

run the command again.I seems okay at this moment.

But I still have some questions:

1.What does "radical-stroke.txt" used for?I can see it contains five digital each lines like this

19886 3 23 6 3

19737 13 10 20 6

19736 17 7 0 6

19735 7 3 16 6

19734 6 4 16 6

19733 6 16 9 6

19732 20 16 16 6

19731 6 12 7 6

19618 4 20 3 6

19575 7 0 16 19

19617 16 6 20 6

19616 16 6 3 9

19615 16 6 16 1

19619 7 0 16 19

18870 20 16 21 7

18871 20 22 22 5

18843 20 16 21 7

18847 20 22 22 5

18822 16 7 7 2

18821 16 20 3 10

18819 16 0 5 9

18818 16 24 13 13

18813 16 13 19 24

18810 16 0 7 19

18759 16 24 13 13

What is the digital meaning?

2. Is my command right?

Thanks in advance.

unread,

Jun 4, 2020, 12:23:35 PM6/4/20

to tesseract-ocr

radical-stroke.txt is used only for CJK languages, but tesseract checks for it during training process, so you need to make it available.