tesseract osd retraining and script vs language text extraction

48 views

Skip to first unread message

Omesharma

unread,

Oct 31, 2020, 5:09:31 AM10/31/20

to tesseract-ocr

#Hey

---------------------

##i am Using Tesseract OCR for the text extraction form the image :

-------------------------

--------------------

##I need your valuable suggestion for the below mentioned points.

-------------------------

- How can i Retrain osd.traindata file for adding Ethiopic and other scripts , because current osd.traindata file unable to detect few scripts name eg:(ethiopic , gujarati, gurmukhi) but script files for them are available in script directory.

------------------------

---------------------

- which is more accurate for text extraction [LANGUAGE TRAIN DATA FILES] or [SCRIPT TRAIN DATA FILES]

---------------

------------------

- Does it make nay difference to use the script for text extraction instead of language.traindata in term of text extraction accuracy.

-----------------------

---------------------------

Please Share your Views for above list as per your experience with tesseract. it'll be very helpful for my final year project.

------------------------------

Contact: sharma...@gmail.com .

---------------------------------------

Thanks and regards

Omesh sharma

Reply all

Reply to author

Forward

0 new messages