How you can train tesseract 4.0 LSTM for receipts

392 views
Skip to first unread message

Ahmad Moawad

unread,
Jun 4, 2017, 2:36:35 PM6/4/17
to tesseract-ocr
Hello All,

I want to train tesseract 4.0 LSTM for receipt, So what I am asking related to:
  1. Training based on image
  2. Image processing
  3. Add new words to the dictionary
  • I have read the documentation and I think the good option is: Finetune. So I need to provide box/tiff before training.
  • I know this command will create box file in under directory in /tmp, So should I edit the box file here or edit and provide it to this command in this case how can I provide it to this command.



training
/tesstrain.sh \
 
--fonts_dir /usr/share/fonts \
 
--training_text ../langdata/ara/ara.training_text \
 
--langdata_dir ../langdata \
 
--tessdata_dir ./tessdata \
 
--lang ara \
 
--linedata_only \
 
--noextract_font_properties \
 
--exposures "0" \
 
--fontlist "Arial" \
 
--output_dir ~/tesstutorial/aratest


Any Help, thank you.


Reply all
Reply to author
Forward
0 new messages