I am unable to train a new font to tesseract, I am getting a deserialize failed error

270 views
Skip to first unread message

Adepu Sai Rahul

unread,
Nov 23, 2023, 1:33:54 AM11/23/23
to tesseract-ocr

chinnu@SaiRahul2507:~/tesseract_tutorial/tesstrain$ TESSDATA_PREFIX=../tesseract/tessdata make training MODEL_NAME=Y145 START_MODEL=eng TESSDATA=../tesseract/tessdata MAX_ITERATIONS=200
    You are using make version: 4.3
lstmtraining \
  --debug_interval 0 \
  --traineddata data/Y145/Y145.traineddata \
  --old_traineddata ../tesseract/tessdata/eng.traineddata \
  --continue_from data/eng/Y145.lstm \
  --learning_rate 0.0001 \
  --model_output data/Y145/checkpoints/Y145 \
  --train_listfile data/Y145/list.train \
  --eval_listfile data/Y145/list.eval \
  --max_iterations 200 \
  --target_error_rate 0.01
Loaded file data/eng/Y145.lstm, unpacking...
Warning: LSTMTrainer deserialized an LSTMRecognizer!
Code range changed from 111 to 111!
Num (Extended) outputs,weights in Series:
  1,36,0,1:1, 0
Num (Extended) outputs,weights in Series:
  C3,3:9, 0
  Ft16:16, 160
Total weights = 160
  [C3,3Ft16]:16, 160
  Mp3,3:16, 0
  TxyLfys64:64, 20736
  Lfx96:96, 61824
  RxLrx96:96, 74112
  Lfx512:512, 1247232
  Fc111:111, 56943
Total weights = 1461007
Previous null char=110 mapped to 110
Continuing from data/eng/Y145.lstm
Deserialize failed: data/Y145-ground-truth/eng_0.tif read 0/1229531648 lines

in list.train I put some paths to tif files 
how to solve this

Des Bw

unread,
Nov 23, 2023, 2:21:49 AM11/23/23
to tesser...@googlegroups.com
Make sure that the tif files are not corrupted; or the box files are not zero size. 

Des

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/ae675f4b-c5ab-4322-8171-1c68f47bfa92n%40googlegroups.com.

Adepu Sai Rahul

unread,
Nov 23, 2023, 2:29:39 AM11/23/23
to tesseract-ocr
the tif files are not corrupted and box files are not of size zero
evid.png

Des Bw

unread,
Nov 23, 2023, 2:42:42 AM11/23/23
to tesseract-ocr
Probably your issue is contingent with this one: https://github.com/tesseract-ocr/tesseract/issues/792

Are you in Windows or Ubuntu?

You might try by upgrading  tesseract to version 5.  I am not well versed into tesseract. So, my knowledge is very limited. 

Simon

unread,
Nov 23, 2023, 4:22:11 AM11/23/23
to tesseract-ocr
As I learned in the list.train and list.eval folders there are lstmf file paths required. Also make sure when you are using tesseract on linux the end of file in the file should be LF and NOT the windows standard CRLF. 

Zdenko Podobny

unread,
Nov 23, 2023, 1:01:31 PM11/23/23
to tesser...@googlegroups.com
Please provide files for replicating the problem, otherwise....

Zdenko


št 23. 11. 2023 o 8:29 Adepu Sai Rahul <sairah...@gmail.com> napísal(a):
Reply all
Reply to author
Forward
0 new messages