Hi,
I need to extract hand written malayalam text. I think it's possible to fine-tune Tesseract 5for handwritten Malayalam.
There is no single document explicitly stating the data requirements for fine tune Tesseract 5 on handwritten Malayalam (at least, I couldn’t find one—though there may be some). According to ChatGPT, the estimated data requirement is 4 lakh text samples. From where we get the authenticity of this data requirement. Additionally, based on the documentation, I believe it runs only on a CPU. How much time is required for training, but I couldn’t find answers to these questions in the documentation. Where can we find information on aspects like training time, data requirements, etc.?