New Language submission: Hawaiian

238 views
Skip to first unread message

Edwin Solares

unread,
Mar 21, 2025, 12:47:47 PMMar 21
to tesseract-dev
Hi,

I am a faculty member in the Halıcıoğlu Data Science Institute and Department of Computer Science at the University of California San Diego and some of my students have been able to perform transfer learning for training a new language using tesseract.

I am in the process of uploading the training script and training data (images along with the ground truth). We trained it on our new Hawaiian Model with 234 pages total - O Kamehameha (136 pages) and O Lunalilo (98 pages) for 20k iterations. This resulted in a BCER of 1.275%.


We are continuing on labeling more data and working on other new languages as well, and will continue to post them on this link.
Reply all
Reply to author
Forward
0 new messages