Some quesetions about ocrd-train

34 views
Skip to first unread message

易鑫

unread,
Jun 4, 2020, 1:40:45 AM6/4/20
to tesseract-ocr
Hello,everyone:
      Currently I use the "https://github.com/tesseract-ocr/tesstrain " project for training my own dataset.
I use this command,
make unicharset lists training MODEL_NAME=foo TESSDATA=./tessdata_best GROUND_TRUTH_DIR=./data/foo-ground-truth PSM=7 
I encounter one error that tell me loss of the "radical-stroke.txt" file,so I put the /langdata_lstm/radical-stroke.txt into the data folder and try 
run the command again.I seems okay at this moment. 
But I still have some questions:
1.What does "radical-stroke.txt" used for?I can see it contains five digital each lines like this
19886 3 23 6 3
19737 13 10 20 6
19736 17 7 0 6
19735 7 3 16 6
19734 6 4 16 6
19733 6 16 9 6
19732 20 16 16 6
19731 6 12 7 6
19618 4 20 3 6
19575 7 0 16 19
19617 16 6 20 6
19616 16 6 3 9
19615 16 6 16 1
19619 7 0 16 19
18870 20 16 21 7
18871 20 22 22 5
18843 20 16 21 7
18847 20 22 22 5
18822 16 7 7 2
18821 16 20 3 10
18819 16 0 5 9
18818 16 24 13 13
18813 16 13 19 24
18810 16 0 7 19
18759 16 24 13 13
What is the digital meaning?

2. Is my command right?

Thanks in advance.


Piyush Chandra

unread,
Jun 4, 2020, 12:23:35 PM6/4/20
to tesseract-ocr
radical-stroke.txt is used only for CJK languages, but tesseract checks for it during training process, so you need to make it available.

You are doing it correctly. 
Reply all
Reply to author
Forward
0 new messages