tesseract data language model sources

37 views
Skip to first unread message

abram stern

unread,
Oct 17, 2019, 11:40:33 PM10/17/19
to tesseract-ocr
Hi tesseract community,

I'm working on a research project about OCR and I'm wondering where the included data models (eg 'fast', 'best') come from -- or put another way, what source material is used for training them?  I haven't been able to find this documented anywhere and am interested to know if it involves public domain corpora, data obtained through book scanning, or other sources.

Best regards,
Abram

Shree Devi Kumar

unread,
Oct 18, 2019, 12:10:25 AM10/18/19
to tesseract-ocr

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/bdb45c2b-1764-4384-95e5-a5d884e2c5ab%40googlegroups.com.


--

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

Shree Devi Kumar

unread,
Oct 18, 2019, 12:11:14 AM10/18/19
to tesseract-ocr

abram stern

unread,
Oct 18, 2019, 1:00:59 AM10/18/19
to tesser...@googlegroups.com
thanks, this is exactly what I was looking for! -a



--
Abram Stern (aphid)
PhD Candidate, Film and Digital Media
University of California, Santa Cruz
ap...@ucsc.edu // a...@aphid.org ⚛ // (831) 224-0334 (mobile/signal)
Reply all
Reply to author
Forward
0 new messages