tesseract osd retraining and script vs language text extraction

48 views
Skip to first unread message

Omesharma

unread,
Oct 31, 2020, 5:09:31 AM10/31/20
to tesseract-ocr
#Hey
---------------------
##i am Using Tesseract OCR for the text extraction form the image :

-------------------------

--------------------
##I need your valuable suggestion for the below mentioned points.
-------------------------
- How can i Retrain osd.traindata file for adding Ethiopic and other scripts , because current osd.traindata file unable to detect few scripts name eg:(ethiopic , gujarati, gurmukhi) but script files for them are available in script directory.
------------------------
---------------------
- which is more accurate for text extraction [LANGUAGE TRAIN DATA FILES]  or [SCRIPT TRAIN DATA FILES]
---------------
------------------
- Does it make nay difference to use the script for text extraction instead of language.traindata in term of text extraction accuracy.
-----------------------
---------------------------
Please Share your Views for above list as per your experience with tesseract. it'll be very helpful for my final year project.

------------------------------
---------------------------------------

Thanks and regards
Omesh sharma

Reply all
Reply to author
Forward
0 new messages