amount of data needed for fine-tuning on a particular font?

42 views

Skip to first unread message

Ben Crowell

unread,

May 11, 2021, 7:51:31 PM5/11/21

to tesseract-ocr

I'm working on OCRing a book that has intermixed English and Greek. The accuracy is pretty poor so far, and I want to try fine-tuning tesseract for the Greek font used in this book. It seems to think δ looks like S because it has a curly top, and it mistakes λ for d. I've prepared about of page of text as training data, comprising about 20 lines of text. Is this too little to be useful? How much would be a normal amount of sample text to use for this purpose? I'm finding it's pretty time-consuming to prepare the data. It took me about an hour to do the one page.

Reply all

Reply to author

Forward

0 new messages