Training tesseract, APPLY_BOXES: ... FAILURE! Couldn't find a matching blob for BENGALI language.

64 views
Skip to first unread message

Boring Guy69

unread,
Jan 28, 2021, 12:02:06 PM1/28/21
to tesseract-ocr

Hello i am new to tesseract. i am working on bengali language [kalpurush font].
I got lots of error when i make TR files. if i describe my work flow
At first i create text file in utf-8 format. in those text file i put some Bengali word which is obviously in kalpurush font.
then i create box files and tif files with help of Jtessboxeditor.
then when i execute this command [ tesseract ben.kalpurush.exp0.tif ben.kalpurush.exp0 box.train ] it gives me error like......could not find a matching blob......box failed resegmentation. Suppose in my file there is 600 word it found only 300 good blobs.
i attached a screenshot.
Do i have to change any config for Bengali language. Can anyone tell me or suggest me what to do. i cant find any way to resolve this problem?
Screenshot (162).png

Shree Devi Kumar

unread,
Jan 28, 2021, 12:51:53 PM1/28/21
to tesseract-ocr
For Bengali, you need to train the LSTM model. Legacy model training won't work.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/0b64b093-fbad-46b7-b604-56b4fb51c9e1n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages