Hello everyone,
I am looking for some datasets to test OCR engines for languages Italian, Portuguese, Arabic, Japanese, Korean, and Chinese. Datasets need to have raw OCR documents and the groud truth text. If you know any such dataset, please post here. Maybe it is helpful to have a catalog of these datasets.
Thank you
Sarasi Lalithsena