OCR-fine-tunning ~ LSTMTraining

108 views

Skip to first unread message

ossama khalyl

unread,

Jun 11, 2025, 5:26:33 AM6/11/25

to tesseract-ocr

I followed the steps for fine-tuning Tesseract for handwriting recognition. I have the character images and the corresponding box files. Then I generated the .lstmf files, followed by the lstm_train.txt and lstm_test.txt files.

However, when I launch the training using these list files, it doesn't work. But when I test the training with only a single path in the train and test text files, it works perfectly — the training starts correctly.

Also, all the .lstmf files are generated properly, because I wrote a script that trains on each file one by one, continuing from the last checkpoint each time. This worked for all the .lstmf files.

I'm not sure if the issue is with the generation of the lstm_train.txt, or if lstmtraining only accepts a single .lstmf file as input?

Here is the code for generating the lstm_train.txt and lstm_test.txt files :

import os
import random

input_dir = "test"
train_file = "lstm_train.txt"
test_file = "lstm_test.txt"

# Liste tous les fichiers .lstmf
all_files = [f for f in os.listdir(input_dir) if f.endswith(".lstmf")]
random.shuffle(all_files)  # Mélange aléatoire

# Proportion pour l'entraînement (80%)
train_split = 0.8
train_count = int(len(all_files) * train_split)

train_files = all_files[:train_count]
test_files = all_files[train_count:]

# Écriture des fichiers train et test avec chemins relatifs
with open(train_file, "w", encoding="utf-8") as f_train, \
     open(test_file, "w", encoding="utf-8") as f_test:
    
    for f in train_files:
        relative_path = os.path.join(input_dir, f)
        f_train.write(relative_path+"\n")
        
    for f in test_files:
        relative_path = os.path.join(input_dir, f)
        f_test.write(relative_path+"\n")

print(f"[OK] Fichiers '{train_file}' et '{test_file}' créés avec chemins relatifs.")

voici un extrait de fichier lstm_train.txt :

Capture d'écran 2025-06-11 095440.png

Reply all

Reply to author

Forward

0 new messages