Train tesseract with a font for European car license plates

42 views
Skip to first unread message

Ronny Zimmermann

unread,
May 31, 2024, 11:54:18 AMMay 31
to tesseract-ocr
I'm trying to improve tesseract's recognition for European license plates.
The corresponding font only has 41 characters.
I did the following steps that I'm not sure if I'm using correctly (bash script):

# tesseract-ocr training script
# Generation of image and box files
text2image --fonts_dir /usr/share/fonts/ --font='Euro Plate' --outputbase=output --text=plates_all.txt --ptsize 12 && \
# Creation of the LSTM training files
tesseract output.tif output --psm 6 lstm.train && \
# Creation of the Unicharset file
unicharset_extractor output.box
# Creation of font properties
set_unicharset_properties -U output.unicharset -O output.unicharset -X xheights --script_dir /home/worker/Projekte/Kennzeichenerkennung/langdata_lstm/ && \
# Creation of the training file list
echo "output.lstmf" > train_listfile.txt && \
mkdir output || true && \
# Start training with lstmtraining
lstmtraining --model_output output/fe --train_listfile train_listfile.txt --max_iterations 10 --traineddata /usr/share/tesseract-ocr/5/tessdata/deu.traineddata --old_traineddata /usr/share/tesseract-ocr/5/tessdata/eng.traineddata --net_spec "[1,48,0,1 Ct3,3,16 Mp3,3 Lfys64 Lfx96 Lrx96 Lfx512 O1c41]" && \
# Combining the checkpoints into a .traineddata file
lstmtraining --stop_training --continue_from output/fe_checkpoint --traineddata /usr/share/tesseract-ocr/5/tessdata/deu.traineddata --model_output output/fe.traineddata && \
#
sudo cp output/fe.traineddata /usr/share/tesseract-ocr/5/tessdata/plate.traineddata

The result is nothing or incorrectly recognized.
I am very grateful for every tip

Kind regards
Ronny
pates_all.txt
EuroPlate.ttf
Reply all
Reply to author
Forward
0 new messages