Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

Assistance Needed with OCR of Arabic Digits

44 views
Skip to first unread message

Sara Elshobaky

unread,
Jan 22, 2025, 3:54:45 AMJan 22
to tesser...@googlegroups.com

Hi,

I’m currently working on OCR for Arabic digits (also known as Hindi numbers) extracted from table cells in old documents. After cutting the table cells, I’ve been OCRing the content individually. However, I’ve noticed some repetitions in the recognized digits.

While visualizing the coordinates of each digit, I discovered that extra bounding boxes were generated. Do you have any suggestions for resolving this issue?

I’ve attached the visual results, highlighting the inaccuracies with red circles, along with the original cell image for your reference.

I am utilizing a tuned version of the Arabic.traineddata model, which was adjusted using training lines from the same collection of books that I’m OCRing. The OCR process is being done with PSM=6 and OEM=1.

Tesseract 5.5.0, 
Python 3.13.1, 
tesserocr  2.7.1, 
 leptonica-1.82.0, 

Thank you!

Sara Elshobaky


ar_nums2.png428_349_160_751.png



Reply all
Reply to author
Forward
0 new messages