Parse Russian date/time

107 views
Skip to first unread message

Oleg Broytmann

unread,
Apr 22, 2009, 12:48:35 PM4/22/09
to ParseDateTime-Dev
Hello. I've created a class with Russian constants. Unfortunately, parsing
don't work even for simplest strings like u'1 day ago' (in Russian) -
whatever I do parsedatetime returns a tuple for current time. The problem,
I'm afraid, is that parsedatetime currently only works with us-ascii
strings; despite the facts there are unicode strings (u'...') in
parsedatetime_consts.py and though it contains two non-English constants
classes (es and de) they have only ascii characters; regexps are strings,
not unicode; regexp matching done in non-unicode mode.
Any idea if it is possible to patch parsedatetime to work with non-ascii
unicode strings? if it's possible at all - how hard it could be?

Oleg.
--
Oleg Broytmann http://phd.webhost.ru/ oleg.br...@gmail.com
Programmers don't die, they just GOSUB without RETURN.

bear

unread,
Apr 22, 2009, 7:39:56 PM4/22/09
to parsedat...@googlegroups.com
pdt should already work with unicode strings - all of the constants
and for sure all of the regex's are defined as u'...' strings - well,
at least I thought they were, i'll have to check.

If you could give me some of the test strings translated I'll add them
to the tests and make sure it handles Russian properly.

thanks
--
---
Bear

be...@seesmic.com (work)
be...@code-bear.com (jabber & email)
http://code-bear.com/bearlog (weblog)

PGP Fingerprint = 9996 719F 973D B11B E111 D770 9331 E822 40B3 CD29

Oleg Broytmann

unread,
Apr 23, 2009, 7:37:13 AM4/23/09
to parsedat...@googlegroups.com
On Wed, Apr 22, 2009 at 07:39:56PM -0400, bear wrote:
> pdt should already work with unicode strings - all of the constants
> and for sure all of the regex's are defined as u'...' strings - well,
> at least I thought they were, i'll have to check.
>
> If you could give me some of the test strings translated I'll add them
> to the tests and make sure it handles Russian properly.

I have sent a tarball by private mail; there are class 'pdtLocale_ru' (I
don't have PyICU, should I?), patches and tests. Some of the tests passed,
some failed.
The worst problem I have is that Russian is a synthetic language[1]. In
English the root (stem) is not changed, at most there is a plural form
(day => days) and possessive case (day => day's). But in Russian words have
prefixes, suffixes, even the stem is changed.
1 day -> 1 den in Russian
2 days -> 2 dnya
3 days -> 3 dnya
4 days -> 4 dnya
5 days -> 5 dney
6 days -> 6 dney
... and so on up to 21 ...
20 days -> 20 dney
21 days -> 21 den
22 days -> 22 dnya
I haven't found a way to express this in pdtLocale_ru constants; because
of that I didn't translate some of the tests. Any help?

1. http://en.wikipedia.org/wiki/Fusional_language

Oleg Broytmann

unread,
Apr 24, 2009, 11:07:50 AM4/24/09
to bear, ParseDateTime-Dev
On Thu, Apr 23, 2009 at 03:32:47PM +0400, Oleg Broytmann wrote:
> The worst problem I have is that Russian is a synthetic langauge[1]. In

> English the root (stem) is not changed, at most there is a plural form
> (day => days) and possessive case (day => day's). But in Russain words have

> prefixes, suffixes, even the stem is changed.
> 1 day -> 1 den in Russian
> 2 days -> 2 dnya
> 3 days -> 3 dnya
> 4 days -> 4 dnya
> 5 days -> 5 dney
> 6 days -> 6 dney
> ... and so on up to 21 ...
> 20 days -> 20 dney
> 21 days -> 21 den
> 22 days -> 22 dnya

Even worse is the word "god" (year) - not only it is changed - in some
forms it has a completely different stem, so it's impossible to match it
with prefix/stem/suffix matcher:

1 year -> 1 god
2 years -> 2 goda
3 years -> 3 goda
4 years -> 4 goda
5 years -> 5 let (oops!)
6 years -> 6 let


... and so on up to 21 ...

20 years -> 20 let
21 years -> 21 god

I'd be quite happy if parsedatetime would parse any form: '1 god', '1
goda', '1 let' - just give me back '+/- 1 year'.

bear

unread,
Apr 24, 2009, 11:37:43 AM4/24/09
to parsedat...@googlegroups.com
wow - that is going to be a challenge.

I may have to make the routine that does the various checks work with
custom functions so that we can replace simple regex parsing with
something more complicated.

thanks for the examples - and please poke me if I haven't responded in
a couple of days.
Reply all
Reply to author
Forward
0 new messages