Hey everyone! I'm currently working on a personal project where I'm training a new font for the English language using Tesseract. The font is called Aurebesh and it's from the Star Wars universe. Basically, each letter in Aurebesh corresponds to a letter in English. I've collected close to 100,000 images and their corresponding translations, but I'm not sure how many iterations I should run for a file of this size. I've tried training with only 100 images, but it didn't work out. Can anyone advise me on how many iterations I should run and whether it's even possible to train a new font like this?
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/1b20c2e0-76b2-41a0-bc9f-e1a16b9c67a2n%40googlegroups.com.
Hello,
Thank you for providing the references, but I'm still a bit confused. I have trained tesseract using the same method as described in https://github.com/tesseract-ocr/tesstrain/blob/main/ocrd-testset.zip, with 100,000 sentences and a maximum iteration of 10,000. However, it still cannot recognize a 6-letter word that I input from a TIF file using the same font and settings. I have tried using fewer iterations, such as 1,000, as well as more iterations, such as 20,000 and 100,000, but still no results. Additionally, the BCER (Character Error Rate) doesn't seem to change significantly with largere iterations, remaining at 3.56%. I'm unsure of what I'm doing wrong or what I should do next, but any help would be appreciated.
Thank you.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/2cab8f1d-b81e-4926-a21b-8065a4178d04n%40googlegroups.com.