amount of data needed for fine-tuning on a particular font?
42 views
Skip to first unread message
Ben Crowell
unread,
May 11, 2021, 7:51:31 PM5/11/21
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to tesseract-ocr
I'm working on OCRing a book that has intermixed English and Greek. The accuracy is pretty poor so far, and I want to try fine-tuning tesseract for the Greek font used in this book. It seems to think δ looks like S because it has a curly top, and it mistakes λ for d. I've prepared about of page of text as training data, comprising about 20 lines of text. Is this too little to be useful? How much would be a normal amount of sample text to use for this purpose? I'm finding it's pretty time-consuming to prepare the data. It took me about an hour to do the one page.