Ronny Zimmermann
unread,May 31, 2024, 11:54:18 AMMay 31Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to tesseract-ocr
I'm trying to improve tesseract's recognition for European license plates.
The corresponding font only has 41 characters.
I did the following steps that I'm not sure if I'm using correctly (bash script):
# tesseract-ocr training script
# Generation of image and box files
text2image --fonts_dir /usr/share/fonts/ --font='Euro Plate' --outputbase=output --text=plates_all.txt --ptsize 12 && \
# Creation of the LSTM training files
tesseract output.tif output --psm 6 lstm.train && \
# Creation of the Unicharset file
unicharset_extractor output.box
# Creation of font properties
set_unicharset_properties -U output.unicharset -O output.unicharset -X xheights --script_dir /home/worker/Projekte/Kennzeichenerkennung/langdata_lstm/ && \
# Creation of the training file list
echo "output.lstmf" > train_listfile.txt && \
mkdir output || true && \
# Start training with lstmtraining
lstmtraining --model_output output/fe --train_listfile train_listfile.txt --max_iterations 10 --traineddata /usr/share/tesseract-ocr/5/tessdata/deu.traineddata --old_traineddata /usr/share/tesseract-ocr/5/tessdata/eng.traineddata --net_spec "[1,48,0,1 Ct3,3,16 Mp3,3 Lfys64 Lfx96 Lrx96 Lfx512 O1c41]" && \
# Combining the checkpoints into a .traineddata file
lstmtraining --stop_training --continue_from output/fe_checkpoint --traineddata /usr/share/tesseract-ocr/5/tessdata/deu.traineddata --model_output output/fe.traineddata && \
#
sudo cp output/fe.traineddata /usr/share/tesseract-ocr/5/tessdata/plate.traineddata
The result is nothing or incorrectly recognized.
I am very grateful for every tip
Kind regards
Ronny