I need to convert DD MM YYYY dates into the MySQL-friendly
YYYY-MM-DD, and translate the month name from literal French to its
numeric equivalent (eg. "Janvier" into "01").
Here's an example:
SELECT dateinscription, dateconnexion FROM membres LIMIT 1;
26 Mai 2007|17 Ao�t 2009 - 09h20
I'd like to update the row into "2007-05-26" and "2009-08-17 09:20",
respectively.
What is the best way to do this in Python?
Thank you.
Likely this is not the best way, but I would do, for
the first one (and the same idea for the second):
def convert(date):
frenchdict = {'Mai':'May'} #etc...
day = mystring[:2]
month = frenchdict[ mystring[3:6] ]
year = mystring[7:11]
newdate = year+'-'+month+'-'+day
print 'newdate is ', newdate
Unfortunately, there isn't any string to date parsers in the built-
ins. Not to worry, though, since writing your own is easy, especially
if you use regular expressions from the re module. I suggest using an
RE such as:
r"(?P<date>\d+)\s+(?P<month>\w+)\s+(?P<year>\d+)"
If you want to translate month names to month numbers, then you need
some sort of dict to do so. Unfortunately, there isn't a terrific
standard for this, so your best bet is to put it in some file
somewhere, or even hard-code it in your code. (Month names won't
change over the lifetime of your program, so it's reasonable to put
them in your code somewhere.)
month_names_to_numbers = {
'jan':1, ... }
Once you have the year, month, and date, formatting it is trivial with
the built-in formatter.
"%04d-%02d%02d %02d:%02d" % (year, month, date, hour, minute)
The variety of date formats out there have prevented a universal,
clean solution to this problem. Until we all start sticking to the
same conventions, we will always have to write code to translate dates
from one format to another.
>>> import locale
>>> locale.setlocale(locale.LC_ALL, 'FR')
'French_France.1252'
>>> date_str = '05 Mai 2009 - 18h25'
>>> fmt = '%d %B %Y - %Hh%M'
>>> date_obj = datetime.strptime(date_str, fmt)
>>> date_obj
datetime.datetime(2009, 5, 5, 18, 25)
>>> date_obj.strftime('%Y-%m-%d %H:%M')
'2009-05-05 18:25'
If you're using a recent enough version of Python (2.5 and up) I'd imagine
that's the best way to do it?
--
Rami Chowdhury
"Never attribute to malice that which can be attributed to stupidity" --
Hanlon's Razor
408-597-7068 (US) / 07875-841-046 (UK) / 0189-245544 (BD)
> I need to convert DD MM YYYY dates into the MySQL-friendly
> YYYY-MM-DD
This is not specific to MySQL. It is the common international standard
date representation format defined by ISO 8601.
> and translate the month name from literal French to its numeric
> equivalent (eg. "Janvier" into "01").
The simplest way to do this would be by a mapping from month-name to
month-number.
An obvious, and wrong, approach to this would be to hard-code the twelve
month names into your program data.
Instead, you should generate the map based on the standard library (in
this case, the underlying C standard library) locale database
<URL:http://docs.python.org/library/locale.html?highlight=locale%20date#locale.nl_langinfo>:
>>> import locale
>>> locale.setlocale(locale.LC_TIME, "en_AU.UTF-8")
>>> months = dict(
... (locale.nl_langinfo(getattr(locale, key)), i)
... for (key, i) in (
... ('MON_%(i)d' % vars(), i)
... for i in range(1, 12+1)))
>>> import pprint
>>> pprint.pprint(months)
{'April': 4,
'August': 8,
'December': 12,
'February': 2,
'January': 1,
'July': 7,
'June': 6,
'March': 3,
'May': 5,
'November': 11,
'October': 10,
'September': 9}
Of course, if you can avoid having to generate this mapping at all in
your program, that's best; see below.
> Here's an example:
>
> SELECT dateinscription, dateconnexion FROM membres LIMIT 1;
> 26 Mai 2007|17 Août 2009 - 09h20
>
> I'd like to update the row into "2007-05-26" and "2009-08-17 09:20",
> respectively.
Storing a timestamp as a text attribute in a database seems perverse and
begging for trouble. Doesn't the database have a timestamp data type? Or
perhaps that's what you're trying to achieve?
> What is the best way to do this in Python?
The ‘datetime.strptime’ function will create a Python ‘datetime’ object
from a string, parsed according to a format
<URL:http://docs.python.org/library/datetime.html?highlight=parse%20date%20time#datetime.datetime.strptime>.
I don't know whether that function allows for month names in the current
locale (as set by ‘locale.setlocale(locale.LC_TIME, …)’). If it does,
that's the right way, since it doesn't involve explciitly generating the
mapping as shown above.
Use your preferred Python-to-database library to feed that ‘datetime’
object directly to the database and store it in an attribute of the
native database timestamp type.
Then, format the timestamp value at the point of outputting that value,
instead of storing the text representation in the database.
--
\ “To succeed in the world it is not enough to be stupid, you |
`\ must also be well-mannered.” —Voltaire |
_o__) |
Ben Finney
> Unfortunately, there isn't any string to date parsers in the built-
> ins.
Fortunately, Python 2.5 or later has the ‘datetime.strptime’ function.
--
\ “You could augment an earwig to the point where it understood |
`\ nuclear physics, but it would still be a very stupid thing to |
_o__) do!” —The Doctor, _The Two Doctors_ |
Ben Finney
Does Windows support POSIX locales?
Hate to weasel out of this one, but the language that strptime
provides is pretty limited. I don't find it useful except in the
trivial cases. Same goes for strftime. Also, both of these are very
Western European centric. Yes, Asian languages are supported but not
naturally.
locale is nice when you only have a single thread.
Webservers aren't single threaded. You can't serve up one page for one
locale and then another in another locale without seeing very, very
weird behavior.
If it does not, it should :-) since it addresses the problem in one
standard place. It would be foolish for Python to re-implement that
functionality when presumably the operating system already knows how to
map between dates and locale-specific text representations.
You'll need to check the operating system documentation for what
alternative it might provide.
--
\ “I got fired from my job the other day. They said my |
`\ personality was weird. … That's okay, I have four more.” |
_o__) —Bug-Eyed Earl, _Red Meat_ |
Ben Finney
----
My sample interactive session (locale.setlocale and all) was on a 32-bit Vista
install of Python 2.5, so it works on that...
---
Rami Chowdhury
"A man with a watch knows what time it is. A man with two watches is never
sure". -- Segal's Law
408-597-7068 (US) / 07875-841-046 (UK) / 0189-245544 (BD)
On Mon, 17 Aug 2009 17:05:28 -0700 (PDT), Jonathan Gardner
<jgar...@jonathangardner.net> wrote:
>Unfortunately, there isn't any string to date parsers in the built-
>ins. Not to worry, though, since writing your own is easy, especially
>if you use regular expressions from the re module. I suggest using an
>RE such as:
>
> r"(?P<date>\d+)\s+(?P<month>\w+)\s+(?P<year>\d+)"
I've never seen regexes like this. I'm curious to know what those
mean:
r = Unicode?
(?P<date> = ? means that it shouldn't be greedy, what about P<date>?
> Thanks everyone for the help. This script is just a one-shot thingie
> on my work host, not as a web script or anything professional.
>
> On Mon, 17 Aug 2009 17:05:28 -0700 (PDT), Jonathan Gardner
> <jgar...@jonathangardner.net> wrote:
> > r"(?P<date>\d+)\s+(?P<month>\w+)\s+(?P<year>\d+)"
>
> I've never seen regexes like this. I'm curious to know what those
> mean:
Luckily, you have access to the documentation to find out.
> r = Unicode?
<URL:http://docs.python.org/reference/lexical_analysis.html#string-literals>
> (?P<date> = ? means that it shouldn't be greedy, what about P<date>?
<URL:http://docs.python.org/library/re#regular-expression-syntax>
--
\ “We reserve the right to serve refuse to anyone.” —restaurant, |
`\ Japan |
_o__) |
Ben Finney
I never used groups before. Thanks for showing me.
At this point, the script is almost done, but the regex fails if the
month contains accented characters (eg. "Ao�t", but fine if eg.
"Jan").
Adding a line to load the French locale doesn't help :-/
Any idea what I could do to keep the regex happy?
Thank you.
==============
import re
import apsw
import locale
#In case error due to accent in month name, but no soup 4 U
locale.setlocale(locale.LC_ALL, 'FR')
connection=apsw.Connection("test.sqlite")
cursor=connection.cursor()
re_inscription =
re.compile(r"(?P<date>\d+)\s+(?P<month>\w+)\s+(?P<year>\d+)")
sql = 'SELECT id,dateinscription,dateconnexion FROM mytable'
rows=list(cursor.execute(sql))
for row in rows:
dateinscription = row[1]
dateconnexion = row[2]
#Prints OK
print dateinscription
m = re_inscription.search(dateinscription)
if m:
day = m.group("date")
month = m.group("month")
year = m.group("year")
print "%s-%s-%s" % (year,month,day)
else:
print "No go"
==============
locale.setlocale(locale.LC_ALL, ('FR', 'UTF-8'))
Since this is for a one-shot (and presumably threading-agnostic) program, and
a fairly trivially formatted date-string, I would suggest using
datetime.strptime
(http://docs.python.org/library/datetime.html#datetime.datetime.strptime) and
not regular expressions (which IIRC have Issues with non-ASCII characters).
----
Rami Chowdhury
"Ninety percent of everything is crap." -- Sturgeon's Law
408-597-7068 (US) / 07875-841-046 (UK) / 0189-245544 (BD)
On Tuesday 18 August 2009 00:49:41 Gilles Ganault wrote:
> On Tue, 18 Aug 2009 17:10:50 +1000, Ben Finney
>
> <ben+p...@benfinney.id.au> wrote:
> >Luckily, you have access to the documentation to find out.
>
> I never used groups before. Thanks for showing me.
>
> At this point, the script is almost done, but the regex fails if the
> month contains accented characters (eg. "Août", but fine if eg.
It's on XP, and I'm using ActivePython 2.5.1.1.
http://www.activestate.com/activepython/
Python doesn't like the above:
#locale.Error: unsupported locale setting
locale.setlocale(locale.LC_ALL, ('FR', 'UTF-8'))
Maybe it was introduced in more recent versions of Python?
>Since this is for a one-shot (and presumably threading-agnostic) program, and
>a fairly trivially formatted date-string, I would suggest using
>datetime.strptime
>(http://docs.python.org/library/datetime.html#datetime.datetime.strptime) and
>not regular expressions (which IIRC have Issues with non-ASCII characters).
If the regex library can only handle basic latin characters, I'll wait
until a script I'm running is done, and I'll upgrade to the 2.6.2.2 to
see how it goes.
Thank you.
This worked for me on 32-bit Vista:
locale.setlocale(locale.LC_ALL, 'FR')
It uses Windows-1252 for the encoding, but that seems to deal with the
circonflexe in 'Août' just fine, so it should work for this purpose.
----
Rami Chowdhury
"Never attributed to malice that which can be attributed to stupidity." --
Hanlon's Razor
408-597-7068 (US) / 07875-841-046 (UK) / 0189-245544 (BD)
>I find it odd that the regex library can't handle European characters
>:-/
Ha, found it! :-)
http://www.regular-expressions.info/python.html
=========
# -*- coding: latin-1 -*-
import locale
import re
locale.setlocale(locale.LC_ALL, 'FR')
re_inscription =
re.compile(r"(?P<date>\d+)\s+(?P<month>\w+)\s+(?P<year>\d+)",re.LOCALE)
dateinscription = "11 Ao�t 2008"
m = re_inscription.search(dateinscription)
if m:
day = m.group("date")
month = m.group("month")
year = m.group("year")
print "%s-%s-%s" % (year,month,day)
else:
print "Yuck"
=========
Thanks everyone!
> dateinscription = "11 Août 2008"
For any text string that's not ASCII, you should specify it as Unicode.
(Actually, you should specify text as Unicode anyway.) For a literal
text string:
dateinscription = u"11 Août 2008"
If you're using exclusively Python 3, you will get Unicode text literals
by default; but I assume you're using Python 2 based on existing
discussion.
The principles of handling text in Python: Get it to internal Unicode
objects as soon as possible, handle it as Unicode for as long as
possible, and only encode it to some byte stream for output as late as
possible.
--
\ “When a well-packaged web of lies has been sold to the masses |
`\ over generations, the truth will seem utterly preposterous and |
_o__) its speaker a raving lunatic.” —Dresden James |
Ben Finney
Thanks much for the tip. I'll keep that in mind when I have strings
with accents.
Again, note that these recommendations hold for *any* text in Python,
with or without accents; once you accept that text is best handled in
Unicode, there's little sense in making an exception for the limited
subset that happens to be representable in ASCII.
--
\ “The Bermuda Triangle got tired of warm weather. It moved to |
`\ Alaska. Now Santa Claus is missing.” —Steven Wright |
_o__) |
Ben Finney
If the QOTW wasn't meant for fun, I'd vote for this. This is very good advice.
Stefan
It can. Read the documentation about the re.LOCALE flag.
-Mark