Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Converting DD MM YYYY into YYYY-MM-DD?

20 views
Skip to first unread message

Gilles Ganault

unread,
Aug 17, 2009, 6:26:47 PM8/17/09
to
Hello,

I need to convert DD MM YYYY dates into the MySQL-friendly
YYYY-MM-DD, and translate the month name from literal French to its
numeric equivalent (eg. "Janvier" into "01").

Here's an example:

SELECT dateinscription, dateconnexion FROM membres LIMIT 1;
26 Mai 2007|17 Ao�t 2009 - 09h20

I'd like to update the row into "2007-05-26" and "2009-08-17 09:20",
respectively.

What is the best way to do this in Python?

Thank you.

Message has been deleted

Che M

unread,
Aug 17, 2009, 7:58:28 PM8/17/09
to
On Aug 17, 6:26 pm, Gilles Ganault <nos...@nospam.com> wrote:
> Hello,
>
>         I need to convert DD MM YYYY dates into the MySQL-friendly
> YYYY-MM-DD, and translate the month name from literal French to its
> numeric equivalent (eg. "Janvier" into "01").
>
> Here's an example:
>
> SELECT dateinscription, dateconnexion FROM membres LIMIT 1;
> 26 Mai 2007|17 Août 2009 - 09h20

>
> I'd like to update the row into "2007-05-26" and "2009-08-17 09:20",
> respectively.
>
> What is the best way to do this in Python?
>
> Thank you.

Likely this is not the best way, but I would do, for
the first one (and the same idea for the second):

def convert(date):
frenchdict = {'Mai':'May'} #etc...
day = mystring[:2]
month = frenchdict[ mystring[3:6] ]
year = mystring[7:11]
newdate = year+'-'+month+'-'+day
print 'newdate is ', newdate

Jonathan Gardner

unread,
Aug 17, 2009, 8:05:28 PM8/17/09
to
On Aug 17, 3:26 pm, Gilles Ganault <nos...@nospam.com> wrote:
>         I need to convert DD MM YYYY dates into the MySQL-friendly
> YYYY-MM-DD, and translate the month name from literal French to its
> numeric equivalent (eg. "Janvier" into "01").
>
> Here's an example:
>
> SELECT dateinscription, dateconnexion FROM membres LIMIT 1;
> 26 Mai 2007|17 Août 2009 - 09h20

>
> I'd like to update the row into "2007-05-26" and "2009-08-17 09:20",
> respectively.
>
> What is the best way to do this in Python?
>

Unfortunately, there isn't any string to date parsers in the built-
ins. Not to worry, though, since writing your own is easy, especially
if you use regular expressions from the re module. I suggest using an
RE such as:

r"(?P<date>\d+)\s+(?P<month>\w+)\s+(?P<year>\d+)"

If you want to translate month names to month numbers, then you need
some sort of dict to do so. Unfortunately, there isn't a terrific
standard for this, so your best bet is to put it in some file
somewhere, or even hard-code it in your code. (Month names won't
change over the lifetime of your program, so it's reasonable to put
them in your code somewhere.)

month_names_to_numbers = {
'jan':1, ... }

Once you have the year, month, and date, formatting it is trivial with
the built-in formatter.

"%04d-%02d%02d %02d:%02d" % (year, month, date, hour, minute)

The variety of date formats out there have prevented a universal,
clean solution to this problem. Until we all start sticking to the
same conventions, we will always have to write code to translate dates
from one format to another.

Rami Chowdhury

unread,
Aug 17, 2009, 8:18:33 PM8/17/09
to Che M, pytho...@python.org
Correct me if I'm wrong, but doesn't
http://docs.python.org/library/datetime.html#datetime.datetime.strptime do
this?

>>> import locale
>>> locale.setlocale(locale.LC_ALL, 'FR')
'French_France.1252'
>>> date_str = '05 Mai 2009 - 18h25'
>>> fmt = '%d %B %Y - %Hh%M'
>>> date_obj = datetime.strptime(date_str, fmt)
>>> date_obj
datetime.datetime(2009, 5, 5, 18, 25)
>>> date_obj.strftime('%Y-%m-%d %H:%M')
'2009-05-05 18:25'

If you're using a recent enough version of Python (2.5 and up) I'd imagine
that's the best way to do it?

--
Rami Chowdhury
"Never attribute to malice that which can be attributed to stupidity" --
Hanlon's Razor
408-597-7068 (US) / 07875-841-046 (UK) / 0189-245544 (BD)

Ben Finney

unread,
Aug 17, 2009, 8:20:05 PM8/17/09
to
Gilles Ganault <nos...@nospam.com> writes:

> I need to convert DD MM YYYY dates into the MySQL-friendly
> YYYY-MM-DD

This is not specific to MySQL. It is the common international standard
date representation format defined by ISO 8601.

> and translate the month name from literal French to its numeric
> equivalent (eg. "Janvier" into "01").

The simplest way to do this would be by a mapping from month-name to
month-number.

An obvious, and wrong, approach to this would be to hard-code the twelve
month names into your program data.

Instead, you should generate the map based on the standard library (in
this case, the underlying C standard library) locale database
<URL:http://docs.python.org/library/locale.html?highlight=locale%20date#locale.nl_langinfo>:

>>> import locale
>>> locale.setlocale(locale.LC_TIME, "en_AU.UTF-8")
>>> months = dict(
... (locale.nl_langinfo(getattr(locale, key)), i)
... for (key, i) in (
... ('MON_%(i)d' % vars(), i)
... for i in range(1, 12+1)))

>>> import pprint
>>> pprint.pprint(months)
{'April': 4,
'August': 8,
'December': 12,
'February': 2,
'January': 1,
'July': 7,
'June': 6,
'March': 3,
'May': 5,
'November': 11,
'October': 10,
'September': 9}

Of course, if you can avoid having to generate this mapping at all in
your program, that's best; see below.

> Here's an example:
>
> SELECT dateinscription, dateconnexion FROM membres LIMIT 1;

> 26 Mai 2007|17 Août 2009 - 09h20


>
> I'd like to update the row into "2007-05-26" and "2009-08-17 09:20",
> respectively.

Storing a timestamp as a text attribute in a database seems perverse and
begging for trouble. Doesn't the database have a timestamp data type? Or
perhaps that's what you're trying to achieve?

> What is the best way to do this in Python?

The ‘datetime.strptime’ function will create a Python ‘datetime’ object
from a string, parsed according to a format
<URL:http://docs.python.org/library/datetime.html?highlight=parse%20date%20time#datetime.datetime.strptime>.

I don't know whether that function allows for month names in the current
locale (as set by ‘locale.setlocale(locale.LC_TIME, …)’). If it does,
that's the right way, since it doesn't involve explciitly generating the
mapping as shown above.

Use your preferred Python-to-database library to feed that ‘datetime’
object directly to the database and store it in an attribute of the
native database timestamp type.

Then, format the timestamp value at the point of outputting that value,
instead of storing the text representation in the database.

--
\ “To succeed in the world it is not enough to be stupid, you |
`\ must also be well-mannered.” —Voltaire |
_o__) |
Ben Finney

Ben Finney

unread,
Aug 17, 2009, 10:06:37 PM8/17/09
to
Jonathan Gardner <jgar...@jonathangardner.net> writes:

> Unfortunately, there isn't any string to date parsers in the built-
> ins.

Fortunately, Python 2.5 or later has the ‘datetime.strptime’ function.

--
\ “You could augment an earwig to the point where it understood |
`\ nuclear physics, but it would still be a very stupid thing to |
_o__) do!” —The Doctor, _The Two Doctors_ |
Ben Finney

Jonathan Gardner

unread,
Aug 17, 2009, 10:31:36 PM8/17/09
to
On Aug 17, 5:20 pm, Ben Finney <ben+pyt...@benfinney.id.au> wrote:
>
> Instead, you should generate the map based on the standard library (in
> this case, the underlying C standard library) locale database
> <URL:http://docs.python.org/library/locale.html?highlight=locale%20date#lo...>:
>

Does Windows support POSIX locales?

Jonathan Gardner

unread,
Aug 17, 2009, 10:36:24 PM8/17/09
to
On Aug 17, 7:06 pm, Ben Finney <ben+pyt...@benfinney.id.au> wrote:

> Jonathan Gardner <jgard...@jonathangardner.net> writes:
> > Unfortunately, there isn't any string to date parsers in the built-
> > ins.
>
> Fortunately, Python 2.5 or later has the ‘datetime.strptime’ function.
>

Hate to weasel out of this one, but the language that strptime
provides is pretty limited. I don't find it useful except in the
trivial cases. Same goes for strftime. Also, both of these are very
Western European centric. Yes, Asian languages are supported but not
naturally.

Jonathan Gardner

unread,
Aug 17, 2009, 10:38:34 PM8/17/09
to
On Aug 17, 5:18 pm, "Rami Chowdhury" <rami.chowdh...@gmail.com> wrote:
>
> >>> import locale
> >>> locale.setlocale(locale.LC_ALL, 'FR')

locale is nice when you only have a single thread.

Webservers aren't single threaded. You can't serve up one page for one
locale and then another in another locale without seeing very, very
weird behavior.

Ben Finney

unread,
Aug 17, 2009, 10:46:24 PM8/17/09
to
Jonathan Gardner <jgar...@jonathangardner.net> writes:

If it does not, it should :-) since it addresses the problem in one
standard place. It would be foolish for Python to re-implement that
functionality when presumably the operating system already knows how to
map between dates and locale-specific text representations.

You'll need to check the operating system documentation for what
alternative it might provide.

--
\ “I got fired from my job the other day. They said my |
`\ personality was weird. … That's okay, I have four more.” |
_o__) —Bug-Eyed Earl, _Red Meat_ |
Ben Finney

Rami Chowdhury

unread,
Aug 17, 2009, 11:22:05 PM8/17/09
to pytho...@python.org, Ben Finney

----
My sample interactive session (locale.setlocale and all) was on a 32-bit Vista
install of Python 2.5, so it works on that...

---
Rami Chowdhury
"A man with a watch knows what time it is. A man with two watches is never
sure". -- Segal's Law


408-597-7068 (US) / 07875-841-046 (UK) / 0189-245544 (BD)

Gilles Ganault

unread,
Aug 18, 2009, 2:59:57 AM8/18/09
to
Thanks everyone for the help. This script is just a one-shot thingie
on my work host, not as a web script or anything professional.

On Mon, 17 Aug 2009 17:05:28 -0700 (PDT), Jonathan Gardner
<jgar...@jonathangardner.net> wrote:
>Unfortunately, there isn't any string to date parsers in the built-
>ins. Not to worry, though, since writing your own is easy, especially
>if you use regular expressions from the re module. I suggest using an
>RE such as:
>
> r"(?P<date>\d+)\s+(?P<month>\w+)\s+(?P<year>\d+)"

I've never seen regexes like this. I'm curious to know what those
mean:

r = Unicode?

(?P<date> = ? means that it shouldn't be greedy, what about P<date>?

Ben Finney

unread,
Aug 18, 2009, 3:10:50 AM8/18/09
to
Gilles Ganault <nos...@nospam.com> writes:

> Thanks everyone for the help. This script is just a one-shot thingie
> on my work host, not as a web script or anything professional.
>
> On Mon, 17 Aug 2009 17:05:28 -0700 (PDT), Jonathan Gardner
> <jgar...@jonathangardner.net> wrote:
> > r"(?P<date>\d+)\s+(?P<month>\w+)\s+(?P<year>\d+)"
>
> I've never seen regexes like this. I'm curious to know what those
> mean:

Luckily, you have access to the documentation to find out.

> r = Unicode?

<URL:http://docs.python.org/reference/lexical_analysis.html#string-literals>

> (?P<date> = ? means that it shouldn't be greedy, what about P<date>?

<URL:http://docs.python.org/library/re#regular-expression-syntax>

--
\ “We reserve the right to serve refuse to anyone.” —restaurant, |
`\ Japan |
_o__) |
Ben Finney

Gilles Ganault

unread,
Aug 18, 2009, 3:49:41 AM8/18/09
to
On Tue, 18 Aug 2009 17:10:50 +1000, Ben Finney
<ben+p...@benfinney.id.au> wrote:
>Luckily, you have access to the documentation to find out.

I never used groups before. Thanks for showing me.

At this point, the script is almost done, but the regex fails if the
month contains accented characters (eg. "Ao�t", but fine if eg.
"Jan").

Adding a line to load the French locale doesn't help :-/

Any idea what I could do to keep the regex happy?

Thank you.

==============
import re
import apsw
import locale

#In case error due to accent in month name, but no soup 4 U
locale.setlocale(locale.LC_ALL, 'FR')

connection=apsw.Connection("test.sqlite")
cursor=connection.cursor()

re_inscription =
re.compile(r"(?P<date>\d+)\s+(?P<month>\w+)\s+(?P<year>\d+)")

sql = 'SELECT id,dateinscription,dateconnexion FROM mytable'
rows=list(cursor.execute(sql))
for row in rows:
dateinscription = row[1]
dateconnexion = row[2]

#Prints OK
print dateinscription

m = re_inscription.search(dateinscription)
if m:
day = m.group("date")
month = m.group("month")
year = m.group("year")
print "%s-%s-%s" % (year,month,day)
else:
print "No go"
==============

Rami Chowdhury

unread,
Aug 18, 2009, 4:11:20 AM8/18/09
to pytho...@python.org, Gilles Ganault
Could you let me know which platform this is on (Windows, *nix)? It may be a
locale encoding issue -- the locale.setlocale() function allows the second
argument to be a tuple of (locale_code, encoding), as below:

locale.setlocale(locale.LC_ALL, ('FR', 'UTF-8'))

Since this is for a one-shot (and presumably threading-agnostic) program, and
a fairly trivially formatted date-string, I would suggest using
datetime.strptime
(http://docs.python.org/library/datetime.html#datetime.datetime.strptime) and
not regular expressions (which IIRC have Issues with non-ASCII characters).


----
Rami Chowdhury
"Ninety percent of everything is crap." -- Sturgeon's Law


408-597-7068 (US) / 07875-841-046 (UK) / 0189-245544 (BD)

On Tuesday 18 August 2009 00:49:41 Gilles Ganault wrote:
> On Tue, 18 Aug 2009 17:10:50 +1000, Ben Finney
>
> <ben+p...@benfinney.id.au> wrote:
> >Luckily, you have access to the documentation to find out.
>
> I never used groups before. Thanks for showing me.
>
> At this point, the script is almost done, but the regex fails if the

> month contains accented characters (eg. "Août", but fine if eg.

Gilles Ganault

unread,
Aug 18, 2009, 4:19:53 AM8/18/09
to
On Tue, 18 Aug 2009 01:11:20 -0700, Rami Chowdhury
<rami.ch...@gmail.com> wrote:
>Could you let me know which platform this is on (Windows, *nix)? It may be a
>locale encoding issue -- the locale.setlocale() function allows the second
>argument to be a tuple of (locale_code, encoding), as below:
>
>locale.setlocale(locale.LC_ALL, ('FR', 'UTF-8'))

It's on XP, and I'm using ActivePython 2.5.1.1.
http://www.activestate.com/activepython/

Python doesn't like the above:

#locale.Error: unsupported locale setting


locale.setlocale(locale.LC_ALL, ('FR', 'UTF-8'))

Maybe it was introduced in more recent versions of Python?

>Since this is for a one-shot (and presumably threading-agnostic) program, and
>a fairly trivially formatted date-string, I would suggest using
>datetime.strptime
>(http://docs.python.org/library/datetime.html#datetime.datetime.strptime) and
>not regular expressions (which IIRC have Issues with non-ASCII characters).

If the regex library can only handle basic latin characters, I'll wait
until a script I'm running is done, and I'll upgrade to the 2.6.2.2 to
see how it goes.

Thank you.

Rami Chowdhury

unread,
Aug 18, 2009, 4:42:26 AM8/18/09
to pytho...@python.org, Gilles Ganault
> Python doesn't like the above:
>
> #locale.Error: unsupported locale setting
> locale.setlocale(locale.LC_ALL, ('FR', 'UTF-8'))
>
> Maybe it was introduced in more recent versions of Python?
Hmm, that's odd. According to the docs
(http://docs.python.org/library/locale.html#locale.setlocale) it's been that
way since 2.0, but I've just checked this on my Windows (Vista) machine and
you're right, it returns an error.

This worked for me on 32-bit Vista:

locale.setlocale(locale.LC_ALL, 'FR')

It uses Windows-1252 for the encoding, but that seems to deal with the
circonflexe in 'Août' just fine, so it should work for this purpose.


----
Rami Chowdhury
"Never attributed to malice that which can be attributed to stupidity." --
Hanlon's Razor


408-597-7068 (US) / 07875-841-046 (UK) / 0189-245544 (BD)

Gilles Ganault

unread,
Aug 18, 2009, 4:52:41 AM8/18/09
to
I find it odd that the regex library can't handle European characters
:-/

Gilles Ganault

unread,
Aug 18, 2009, 5:07:21 AM8/18/09
to
On Tue, 18 Aug 2009 10:52:41 +0200, Gilles Ganault <nos...@nospam.com>
wrote:

>I find it odd that the regex library can't handle European characters
>:-/

Ha, found it! :-)

http://www.regular-expressions.info/python.html

=========
# -*- coding: latin-1 -*-

import locale
import re

locale.setlocale(locale.LC_ALL, 'FR')

re_inscription =
re.compile(r"(?P<date>\d+)\s+(?P<month>\w+)\s+(?P<year>\d+)",re.LOCALE)

dateinscription = "11 Ao�t 2008"



m = re_inscription.search(dateinscription)
if m:
day = m.group("date")
month = m.group("month")
year = m.group("year")
print "%s-%s-%s" % (year,month,day)
else:

print "Yuck"
=========

Thanks everyone!

Ben Finney

unread,
Aug 18, 2009, 6:03:47 AM8/18/09
to
Gilles Ganault <nos...@nospam.com> writes:

> dateinscription = "11 Août 2008"

For any text string that's not ASCII, you should specify it as Unicode.
(Actually, you should specify text as Unicode anyway.) For a literal
text string:

dateinscription = u"11 Août 2008"

If you're using exclusively Python 3, you will get Unicode text literals
by default; but I assume you're using Python 2 based on existing
discussion.

The principles of handling text in Python: Get it to internal Unicode
objects as soon as possible, handle it as Unicode for as long as
possible, and only encode it to some byte stream for output as late as
possible.

--
\ “When a well-packaged web of lies has been sold to the masses |
`\ over generations, the truth will seem utterly preposterous and |
_o__) its speaker a raving lunatic.” —Dresden James |
Ben Finney

Gilles Ganault

unread,
Aug 18, 2009, 6:21:07 AM8/18/09
to
On Tue, 18 Aug 2009 20:03:47 +1000, Ben Finney
<ben+p...@benfinney.id.au> wrote:
>The principles of handling text in Python: Get it to internal Unicode
>objects as soon as possible, handle it as Unicode for as long as
>possible, and only encode it to some byte stream for output as late as
>possible.

Thanks much for the tip. I'll keep that in mind when I have strings
with accents.

Ben Finney

unread,
Aug 18, 2009, 6:37:47 AM8/18/09
to
Gilles Ganault <nos...@nospam.com> writes:

Again, note that these recommendations hold for *any* text in Python,
with or without accents; once you accept that text is best handled in
Unicode, there's little sense in making an exception for the limited
subset that happens to be representable in ASCII.

--
\ “The Bermuda Triangle got tired of warm weather. It moved to |
`\ Alaska. Now Santa Claus is missing.” —Steven Wright |
_o__) |
Ben Finney

Stefan Behnel

unread,
Aug 18, 2009, 7:55:06 AM8/18/09
to
Ben Finney wrote:
>>> The principles of handling text in Python: Get it to internal Unicode
>>> objects as soon as possible, handle it as Unicode for as long as
>>> possible, and only encode it to some byte stream for output as late as
>>> possible.
> Again, note that these recommendations hold for *any* text in Python,
> with or without accents; once you accept that text is best handled in
> Unicode, there's little sense in making an exception for the limited
> subset that happens to be representable in ASCII.

If the QOTW wasn't meant for fun, I'd vote for this. This is very good advice.

Stefan

Mark Tolonen

unread,
Aug 20, 2009, 12:52:36 AM8/20/09
to pytho...@python.org

"Gilles Ganault" <nos...@nospam.com> wrote in message
news:rtqk859vm3rkdfor0...@4ax.com...

>I find it odd that the regex library can't handle European characters

It can. Read the documentation about the re.LOCALE flag.

-Mark


0 new messages