Any hints for Arabic user custom traineddata (e.g. new font)

67 views
Skip to first unread message

bmwmine

unread,
May 5, 2017, 10:06:04 AM5/5/17
to tesseract-ocr
Hi everybody

I am creating curpus data to train. It is more than 9000pages (like 3 encyclopedias) of tif for "Traditional Arabic" font (will add other fonts later)

will use Tesseract 4.00 Alpha LSTM any hints will be useful.

I took same config of recent ara.traineddata (12mb). set char_spacing to 0.7

will use this lstm paras


--learning_rate 10e-5
--net_spec '[1,0,0,1 Ct5,5,16 Mp3,3 Lfys64 Lfx128 Lrx128 Lfx256 O1c1]'
--net_mode 192
--perfect_sample_delay 19

as referenced

Also, It is good to share your experience with tesseract ocr engine if anybody has custom trained data for Arabic please share it


Reply all
Reply to author
Forward
Message has been deleted
0 new messages