How to replace top LSTM top layer ?

191 views
Skip to first unread message

이경준

unread,
Mar 13, 2018, 3:07:18 AM3/13/18
to tesseract-ocr

Shreeshrii commented on 29 Jun 2017  

I think this happens when the complex characters in your training text are not part of the original Korean Unicharset that the 4.00.00alpha kor.traineddata was trained with.

Do 'replace top layer' training instead of finetune. @abhishekchopde has had good results with it - see #1009

It will take longer than finetuning.



Hi shree I have a question ... you uploade this passage . But this link is not right . plz check again 

ShreeDevi Kumar

unread,
Mar 13, 2018, 3:22:08 AM3/13/18
to tesser...@googlegroups.com

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/2878cbf6-a064-4fe5-ab5c-cfcd54248e9e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

이경준

unread,
Mar 13, 2018, 3:23:27 AM3/13/18
to tesseract-ocr
There is no way about replacing top layer ... ㅜㅜ 

2018년 3월 13일 화요일 오후 4시 22분 8초 UTC+9, shree 님의 말:

ShreeDevi Kumar

unread,
Mar 13, 2018, 3:24:52 AM3/13/18
to tesser...@googlegroups.com
That info is given in the training wiki page.

이경준

unread,
Mar 13, 2018, 3:30:42 AM3/13/18
to tesseract-ocr

https://github.com/tesseract-ocr/tesseract/issues/549



@harinath141 If you are getting a lot of these errors during finetune, try replace top layer training. You can use the box/tiff pairs generated for finetune. Commands will be similar to the following:

mkdir -p ~/tesstutorial/tellayer_from_tel 

combine_tessdata -e ../tessdata/tel.traineddata \
  ~/tesstutorial/tellayer_from_tel/tel.lstm
  
lstmtraining -U ~/tesstutorial/tel/tel.unicharset \
  --script_dir ../langdata  --debug_interval 0 \
  --continue_from ~/tesstutorial/tellayer_from_tel/tel.lstm \
  --append_index 5 --net_spec '[Lfx256 O1c105]' \
  --model_output ~/tesstutorial/tellayer_from_tel/tellayer \
  --train_listfile ~/tesstutorial/tel/tel.training_files.txt \
  --target_error_rate 0.01


I found the article you wrote

but --script_dir doesn't work in the lstmtraining ? 

How do I change this option(flag) ??? what is replaced by that phrase 

2018년 3월 13일 화요일 오후 4시 24분 52초 UTC+9, shree 님의 말:

ShreeDevi Kumar

unread,
Mar 13, 2018, 4:07:11 AM3/13/18
to tesser...@googlegroups.com
That command applies to an older version of the source code.

Now you need a starter traineddata.

Please see the wiki page at 

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

이경준

unread,
Mar 13, 2018, 4:21:46 AM3/13/18
to tesseract-ocr
Thank U

2018년 3월 13일 화요일 오후 5시 7분 11초 UTC+9, shree 님의 말:
Reply all
Reply to author
Forward
0 new messages