amount of data needed for fine-tuning on a particular font?

42 views
Skip to first unread message

Ben Crowell

unread,
May 11, 2021, 7:51:31 PM5/11/21
to tesseract-ocr
I'm working on OCRing a book that has intermixed English and Greek. The accuracy is pretty poor so far, and I want to try fine-tuning tesseract for the Greek font used in this book. It seems to think δ looks like S because it has a curly top, and it mistakes λ for d. I've prepared about of page of text as training data, comprising about 20 lines of text. Is this too little to be useful? How much would be a normal amount of sample text to use for this purpose? I'm finding it's pretty time-consuming to prepare the data. It took me about an hour to do the one page.
Reply all
Reply to author
Forward
0 new messages