russian-old?

Skip to first unread message

Yury Tarasievich

unread,
Oct 18, 2017, 1:15:26 PM10/18/17
to tesser...@googlegroups.com
Hi guys,

I may be wrong but the Russian tessdata does not
provide for recognising old orthography and
Church Slavonic glyphs? You know, i with dot,
theta, yat, etc.

Would it be very hard to add the 'rus_old'
variant? Or, is it too difficult to
roll-your-own the changed rus.tessdata on the
local system?

-Yury

Simon Eigeldinger

unread,
Oct 18, 2017, 3:34:44 PM10/18/17
to tesser...@googlegroups.com
Hi Yury,

Maybe the same happened to it like the german fraktur data.
they seem to have not been updated for a long time and they have been
removed from the main repos.

Greetings,
Simon
--
Simon Eigeldinger
Follow me on Twitter: http://www.twitter.com/domasofan/
E-Mail: simon.ei...@vol.at
ICQ: 121823966
Jabber: doma...@andrelouis.com

---
Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
https://www.avast.com/antivirus

Simon Eigeldinger

unread,
Oct 18, 2017, 3:40:10 PM10/18/17
to tesser...@googlegroups.com
I guess i have to correct myself.
german fraktur is in the tessdata repo.

ShreeDevi Kumar

unread,
Oct 18, 2017, 11:31:05 PM10/18/17
to tesser...@googlegroups.com
​Please add as an issue in the langdata repository. Thanks.​

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/2ef10244-42b9-c23a-4874-ee163252a7bd%40vol.at.

For more options, visit https://groups.google.com/d/optout.

ShreeDevi Kumar

unread,
Oct 19, 2017, 9:39:51 AM10/19/17
to tesser...@googlegroups.com

Does that meet what you are looking for?

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com



-Yury

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

Yury

unread,
Oct 19, 2017, 10:21:10 AM10/19/17
to tesseract-ocr
Shree, thank you, and yes, accented vowels would be fine,
but right now I was talking of ' іѣѳѵ' set (U+0406,0462,0472,0474 uppercase and U+0456,0463,0473,0475 lowercase).

The 4.0.0.0 version from git definitely refuses to recognise those, and AFAICT there is no mention of the codes in the source files.

I'm a complete noob at git, how could I know when the PR you mentioned becomes available in git as downloads?

-Yury

ShreeDevi Kumar

unread,
Oct 19, 2017, 11:50:36 AM10/19/17
to tesser...@googlegroups.com
Well, If that PR was the right one you could add a reminder for Ray Smith (chief developer) to include it.


ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

Yury

unread,
Oct 19, 2017, 1:20:16 PM10/19/17
to tesseract-ocr
Thanks again, Shree,
also, for that comment on the PR page.

Now, the PR looks to me like that's a circa 2015 language data with modifications.
Wouldn't the OCR quality regress compared with 4.* data, or did the langdata source remain the same?

I think I'll just have to file the issue. The materiel in the PR looks too intimidating for me to try to install this stuff by myself. :) Still trying to make sense of that 'plusminus' material!
Reply all
Reply to author
Forward
0 new messages