I wanted to reach out regarding my recent attempt to train Tesseract 5 for a new font, specifically in German. I followed a tutorial I found on YouTube: https://www.youtube.com/watch?v=KE4xEzFGSU8) and initially had success when training it for English. However, upon transitioning to German, I encountered an error that I'm struggling to resolve.
The issue arises with the file data/deu/Apex.lstm-unicharset, which appears to be missing. In langdata, I've confirmed that the file deu.unicharset exists and is correct; all German characters are present as expected. However, upon further inspection, I noticed discrepancies in the file data/Apex/my.unicharset. Not all characters from the all-gt dataset seem to be included.
I've reviewed the process and ensured that all steps were followed accurately, but I'm still encountering this error.