Weekly progress

10 views

Skip to first unread message

Debayan Banerjee

unread,

Apr 18, 2011, 1:19:59 AM4/18/11

to indi...@googlegroups.com

I have been looking into the Tesseract source code of late. i was
trying to write some small files calling the api and do simple stuff
like getting bounding boxes for glyphs and getting baselines etc. I
also have been trying to modify the way Tesseract combines 2 connected
components together if they are not separated horizontally. If we can
change this, and if we can simply 'clip' the point between
<http://1.bp.blogspot.com/-Y7CaiQH_iZ4/TZ4UVtTeJzI/AAAAAAAAH0k/7c6DMj-zlhY/s1600/46.png_.jpg>
the consonant and the descending vowel, Tesseract will do the rest.

I had committed to creating a high level schematic diagram of the OCR
we are trying to create, but right now I am not very sure what
architecture we will follow, because it depends on how our algorithms
work out.
--
Debayan Banerjee

Reply all

Reply to author

Forward

0 new messages