Italian, Portuguese, Arabic, Japanese, Korean, and Chinese test datasets

69 views
Skip to first unread message

Sarasi Lalithsena

unread,
Apr 19, 2019, 1:19:17 AM4/19/19
to tesseract-ocr

Hello everyone, 


I am looking for some datasets to test OCR engines for languages Italian, Portuguese, Arabic, Japanese, Korean, and Chinese. Datasets need to have raw OCR documents and the groud truth text. If you know any such dataset, please post here. Maybe it is helpful to have a catalog of these datasets. 


Thank you

Sarasi Lalithsena

Reply all
Reply to author
Forward
0 new messages