Any way to get bbox information only with tesseract or some other tool?

33 views

Skip to first unread message

Finjon Kiang

unread,

May 3, 2014, 12:25:39 PM5/3/14

to tesser...@googlegroups.com

We are trying to deal with lots of Traditional Chinese documents ( scanned as images ). The result from tesseract is not good in default configs ( We are not familiar with tesseract training yet. ). So we tried to make a crowdsourcing website and invite people to provide identified text. The problem is that we need the bbox information extracted from tesseract using hocr config. But we don't need the OCR result. As the OCR process in Chinese is very slow. Is there anyway to get the bbox information directly without OCR process? ( either with tesseract or other tools ).

--- environments @ Ubuntu 13.10---

tesseract --version

tesseract 3.02.01

leptonica-1.69

libgif 4.1.6 : libjpeg 8d : libpng 1.2.49 : libtiff 4.0.2 : zlib 1.2.8

---

kiang

Reply all

Reply to author

Forward

0 new messages