Now for every letter of the alphabet there are at least two different styles of letters. The normal traineddata of deu_frak is able to only recognize ONE of them. Now I want to train Tesseract to be able to read both of them. First I thought I should create "a new language". So I started with Aletheia and then proceeded in Franken+. But the traineddata of deu_frak on github is not bad, I just need to add some glyphs/letter. Otherwise I need to start a complete new langdata but that's going to be too much work since the dictionary is very complicated and needs a lot of manual correction in Aletheia.
I have downloaded the langdata on github that are needed (the are in the folder "frk") but I don't know what to do with them. How can I add another letters/glyphs to be recognized correctly? I was also confused when I unpacked the original traineddata "deu_frak" with Tesseract of Aletheia that I get somehow complete different files. If needed, I can attach the folder containing those files.
I think it's not really helping to solve my problem that I'm working on Windows? Well I'm actually just a linguist so I have worked my way through all of that by myself but somehow I just need to be able to re-train that already existing and good traineddata of "deu_frak".
Maybe someone here could help my out, that would be just too great!
Thanks a lot!
Regards,
Sebastian