django.utils.dateparse

223 views
Skip to first unread message

Giuseppe De Marco

unread,
Feb 3, 2019, 11:27:40 AM2/3/19
to django-d...@googlegroups.com
Hi All, it's the first time for me in this ml,

I'd like to purpose a refactor of django.utils.dateparse functions.
Currently a function in it, like parse_date for example, extract date time string with a static regexp...


The first time I used It I thought that those functions parses date, time and date time, according to settings.py definitions like DATETIME_INPUT_FORMATS, but It was not this way... Then I read the code.

Wouldn't It better to use settings.py definitions to manage these formats and django.utils.dateparse behaviour?



Aymeric Augustin

unread,
Feb 3, 2019, 12:44:05 PM2/3/19
to django-d...@googlegroups.com
Hello Guiseppe,

django.utils.dateparse provides helpers needed by Django to implement datetime, date and time fields on SQLite. (SQLite doesn't have a native date time type.) Their job is to parse ISO 8601 fast. That's it.

A utility module should do exactly what Django needs and nothing more. django.utils.dateparse is documented so you can use it if it does what you want. If it doesn't, use something else ;-)

Django forms try various formats when they receive user input because there's uncertainty about the format. If you're parsing user input, then you should use a form and you'll get the behavior you want.

To clarify my point about performance, here's the function you're proposing, minus support for USE_L10N:

def parse_datetime_alternative(value):
    for format in settings.DATETIME_INPUT_FORMATS:
        try:
            return datetime.datetime.strptime(format, value)
        except (ValueError, TypeError):
            continue

It's 10 times slower than the current implementation of parse_datetime:

$ python -m timeit -s "from django.conf import settings; settings.configured or settings.configure(); from django.utils.dateparse import parse_datetime_alternative as parse_datetime" "parse_datetime('2019-02-03T17:27:58.645194')"
5000 loops, best of 5: 54.2 usec per loop

$ python -m timeit -s "from django.utils.dateparse import parse_datetime" "parse_datetime('2019-02-03T17:27:58.645194')"
50000 loops, best of 5: 5.48 usec per loop

I implemented parse_datetime with a regex because that's almost twice as fast as a single call to datetime.datetime.strptime:

$ python -m timeit -s "import datetime" "datetime.datetime.strptime('2019-02-03T17:27:58.645194', '%Y-%m-%dT%H:%M:%S.%f')"
20000 loops, best of 5: 9.87 usec per loop

Best regards,

-- 
Aymeric.



--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CABms%2BYpGtCQ1yzVfGtRKpDv60-4oH_SyGpPQYR6j1cR%2BL5LFRA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Giuseppe De Marco

unread,
Feb 3, 2019, 6:10:32 PM2/3/19
to django-d...@googlegroups.com
Hi Aymeric,

Thank you for the answer and for the tests as well.
I understand and also agree your vision on all the line, I got the specific purpose of parse_date.
I'd like to introduce a generalized way to parse date and datetime string based on Django project configuration, in settings.py.
It could be linked, as reference, to DATE_INPUT_FORMATS or DATE_FORMATS or whatever better choice.

This is the code example, it works as standalone, I hope that it's easy to read...
````
# datetime_euristic_parser.py
import re
import datetime

DATE_FORMATS = ['%Y-%m-%d',
                '%d/%m/%Y',
                '%d/%m/%y']
DATETIME_FORMATS = ['%Y-%m-%d %H:%M:%S',
                    '%d/%m/%Y %H:%M:%S',
                    '%d/%m/%y %H:%M:%S',
                    '%Y%m%d%H%M%SZ',
                    '%Y%m%d%H%M%S.%fZ']
# to be extended with all the matching patterns.
DATETIME_ELEMENTS_REGEXP = {'%Y': '(?P<year>\d{4})',
                            '%y': '(?P<year>\d{2})',
                            '%m': '(?P<month>\d{1,2})',
                            '%d': '(?P<day>\d{1,2})',
                            '%H': '(?P<hour>\d{1,2})',
                            '%M': '(?P<minute>\d{1,2})',
                            '%S': '(?P<second>\d{1,2})',
                            '%f': '(?P<microsecond>\d{6})'} # ...
                    
def datetime_regexp_builder(formats):
    """
    formats = DATE_FORMAT of DATETIME_FORMAT
    """
    regexp_dict = {}
    for df in formats:
        df_regexp = df
        for k,v in DATETIME_ELEMENTS_REGEXP.items():
            df_regexp = df_regexp.replace(k,v)
        regexp_dict[df] = df_regexp+'$'
    return regexp_dict

DATE_FORMATS_REGEXP = datetime_regexp_builder(DATE_FORMATS)
DATETIME_FORMATS_REGEXP = datetime_regexp_builder(DATETIME_FORMATS)

def dformat_insp(date_str, format_regexp_dict, debug=False):
    """
    Takes a date string and returns a matching date regexp.
    """
    insp_formats = []
    for f,p in format_regexp_dict.items():
        if debug: print(date_str, f, p)
        match = re.match(p, date_str)
        if match:
            res = (f, p, {k:int(v) for k,v in match.groupdict().items()})
            insp_formats.append(res)
    return insp_formats

def dateformat_insp(date_str):
    return dformat_insp(date_str, DATE_FORMATS_REGEXP)

def datetimeformat_insp(date_str):
    return dformat_insp(date_str, DATETIME_FORMATS_REGEXP)

def datetime_euristic_parser(value):
    """
    value can be a datestring or a datetimestring
    returns all the parsed date or datetime object
    """
    l = []
    res = dateformat_insp(value) or \
          datetimeformat_insp(value)
    for i in res:
        l.append(datetime.datetime(**i[-1]))
    return l

# example
if __name__ == '__main__':
    tests = ['04/12/2018',
             '04/12/2018 3:2:1',
             '2018-03-4 09:7:4',
             '2018-03-04T09:7:4.645194',
             '20180304121940.948000Z']
   
    for i in tests:
        res = dateformat_insp(i) or datetimeformat_insp(i)
        if res:
            print('Parsing succesfull on "{}": {}'.format(i, res))
            #print(datetime_euristic_parser(i))
        else:
            print('Parsing failed on "{}"'.format(i))
        print()
````

I also checked these tests:
python -m timeit -s "from django.utils.dateparse import parse_datetime" "parse_datetime('2019-02-03T17:27:58.645194')"
10000 loops, best of 3: 32.7 usec per loop

python -m timeit -s "import datetime" "datetime.datetime.strptime('2019-02-03T17:27:58.645194', '%Y-%m-%dT%H:%M:%S.%f')"
10000 loops, best of 3: 53.5 usec per loop

python -m timeit -s "from datetime_euristic_parser. import datetime_euristic_parser; datetime_euristic_parser('2019-02-03T17:27:58.645194')"
10000000 loops, best of 3: 0.0241 usec per loop

In other words I'd like to have in Django a magic datetime parser, based on the settings.py global definitions, instead of iterations on try/except  or some other custom code.
Hope it's fun


For more options, visit https://groups.google.com/d/optout.


--
____________________
Dott. Giuseppe De Marco
CENTRO ICT DI ATENEO
University of Calabria
87036 Rende (CS) - Italy
Phone: +39 0984 496945
e-mail: giuseppe...@unical.it

Giuseppe De Marco

unread,
Feb 3, 2019, 6:37:16 PM2/3/19
to django-d...@googlegroups.com
Regarding the previous example,
better to read it here (my fault: I mistaken the format '%Y-%m-%dT%H:%M:%S.%f'):

and also, it should came also with tzinfo regexp and other functions as well, like parse_date time_duration... it's only an example to share our experiences.

Tom Forbes

unread,
Feb 3, 2019, 7:30:11 PM2/3/19
to django-d...@googlegroups.com

I’m pretty sure 0.0241 usec per loop is either a typo or a mistake during benchmarking. I’ve got no comment what you’re proposing but correct and valid benchmarks are important, so I would double check that.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.

Aymeric Augustin

unread,
Feb 4, 2019, 2:17:42 AM2/4/19
to django-d...@googlegroups.com
Hello Guiseppe,

In which circumstances:

- would this be useful?
- would a Form not be a better choice?

Best regards,

-- 
Aymeric.



Giuseppe De Marco

unread,
Feb 4, 2019, 4:22:06 AM2/4/19
to django-d...@googlegroups.com
Hello everyone, first of all I am grateful for your time and your attention.

@Tom Forbes
The first time I runned it I thought the same thing! Please use https://github.com/peppelinux/Django-snippets/blob/master/datetime_heuristic_parser.py and not the previous pasted one. I'm quite sure that all the tests passes well, because of their output. As we can see I deal with a tuple that contains ('format', 'compiled_regexp', 'values dictionary'), this obviously just for test purpose.

Parsing succesfull on "04/12/2018":
[('%d/%m/%Y', '(?P<day>\\d{1,2})/(?P<month>\\d{1,2})/(?P<year>\\d{4})$', {'year': 2018, 'month': 12, 'day': 4})]

Parsing succesfull on "04/12/2018 3:2:1":
[('%d/%m/%Y %H:%M:%S', '(?P<day>\\d{1,2})/(?P<month>\\d{1,2})/(?P<year>\\d{4}) (?P<hour>\\d{1,2}):(?P<minute>\\d{1,2}):(?P<second>\\d{1,2})$', {'second': 1, 'year': 2018, 'minute': 2, 'month': 12, 'hour': 3, 'day': 4})]

Parsing succesfull on "2018-03-4 09:7:4":
[('%Y-%m-%d %H:%M:%S', '(?P<year>\\d{4})-(?P<month>\\d{1,2})-(?P<day>\\d{1,2}) (?P<hour>\\d{1,2}):(?P<minute>\\d{1,2}):(?P<second>\\d{1,2})$', {'second': 4, 'year': 2018, 'minute': 7, 'month': 3, 'hour': 9, 'day': 4})]

Parsing succesfull on "2018-03-04T09:7:4.645194":
[('%Y-%m-%dT%H:%M:%S.%f', '(?P<year>\\d{4})-(?P<month>\\d{1,2})-(?P<day>\\d{1,2})T(?P<hour>\\d{1,2}):(?P<minute>\\d{1,2}):(?P<second>\\d{1,2}).(?P<microsecond>\\d{6})$', {'second': 4, 'year': 2018, 'minute': 7, 'month': 3, 'microsecond': 645194, 'hour': 9, 'day': 4})]

Parsing succesfull on "20180304121940.948000Z":
[('%Y%m%d%H%M%S.%fZ', '(?P<year>\\d{4})(?P<month>\\d{1,2})(?P<day>\\d{1,2})(?P<hour>\\d{1,2})(?P<minute>\\d{1,2})(?P<second>\\d{1,2}).(?P<microsecond>\\d{6})Z$', {'second': 40, 'year': 2018, 'minute': 19, 'month': 3, 'microsecond': 948000, 'hour': 12, 'day': 4})]

Yesterday I coded it on a tablet, this morning from my laptop I got this performance:
python -m timeit -s "from datetime_heuristic_parser import datetime_heuristic_parser; datetime_heuristic_parser('2019-02-03T17:27:58.645194')"
100000000 loops, best of 3: 0.00891 usec per loop

I also added a simple raise Exception in the case it should return an error, thank you for your suggestion.

@Augustin
Regarding your questions:

- would this be useful?
I think yes, for the following reasons:
1. We have an authentic regexp compiler based on DATE_FORMATS and DATETIME_FORMATS
3. We don't have to write datetime regexp anymore, this code will compile a regexp from a format, indipendently of its delimiter char (if -, / or whatever)
4. We get generalized function that returns datetime objects, no try/except and datetime.strptime, It's faster then other implementations!
5. It's settings.py focused, all we have to worry is a correct settings.py configuration. In other words We just have to collect all the possibile date/datetime formats that could be used in the project, even if they are used in forms or in model.fields
6. We don't need anymore to hardcode datetime regexp pattern in our code, the regexp compiler will work on top of date formats strings!

- would a Form not be a better choice?
Sure, I'm tring to generalize a method that could be a stop application for all the date and datetime approaches. It could be used for forms, in DATETIME_INPUT_FORMATS and DATE_INPUT_FORMAT. These could generate form specialized regexp compilations if this approach will be implemented.

The main goal is to give a tool that will work well and in every conditions,and be funny too!



For more options, visit https://groups.google.com/d/optout.

Tom Forbes

unread,
Feb 4, 2019, 5:07:20 AM2/4/19
to django-d...@googlegroups.com

For me, I get:

In [4]: %timeit  datetime_heuristic_parser('2019-02-03T17:27:58.645194')
18.9 µs ± 431 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

And for Django:

In [3]: %timeit parse_datetime('2019-02-03T17:27:58.645194')
6.97 µs ± 408 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

I assume there is something wrong with the way you benchmarked the code. Python is not that slow, but 0.0241 per loop is way, way too fast.

Giuseppe De Marco

unread,
Feb 4, 2019, 9:04:58 AM2/4/19
to django-d...@googlegroups.com
I also added tzinfo as it come from parse_date, I just copy some code and make get_fixed_timezone as a FixedTimeZone classmethod.
Regarding our doubts about benchmarks, you'll always find them commented in the top of the file, I hope to make them as many immediate as possible to avoid waste of time.

this is what I get at this moment:
python3 -m timeit -s "import datetime" "datetime.datetime.strptime('2019-02-03T17:27:58.645194', '%Y-%m-%dT%H:%M:%S.%f')"
100000 loops, best of 3: 11.2 usec per loop

python3 -m timeit -s "from django.utils.dateparse import parse_datetime" "parse_datetime('2019-02-03T17:27:58.645194')"
100000 loops, best of 3: 6.04 usec per loop

python3 -m timeit -s   "import sys, os; sys.path.append(os.getcwd()); from datetime_heuristic_parser import datetime_heuristic_parser; print(datetime_heuristic_parser('04/12/2018 09:7:4Z'))"
[datetime.datetime(2018, 12, 4, 9, 7, 4, tzinfo=datetime.timezone.utc)]
[datetime.datetime(2018, 12, 4, 9, 7, 4, tzinfo=datetime.timezone.utc)]
[datetime.datetime(2018, 12, 4, 9, 7, 4, tzinfo=datetime.timezone.utc)]
[datetime.datetime(2018, 12, 4, 9, 7, 4, tzinfo=datetime.timezone.utc)]
[datetime.datetime(2018, 12, 4, 9, 7, 4, tzinfo=datetime.timezone.utc)]
[datetime.datetime(2018, 12, 4, 9, 7, 4, tzinfo=datetime.timezone.utc)]
[datetime.datetime(2018, 12, 4, 9, 7, 4, tzinfo=datetime.timezone.utc)]
[datetime.datetime(2018, 12, 4, 9, 7, 4, tzinfo=datetime.timezone.utc)]
[datetime.datetime(2018, 12, 4, 9, 7, 4, tzinfo=datetime.timezone.utc)]
[datetime.datetime(2018, 12, 4, 9, 7, 4, tzinfo=datetime.timezone.utc)]
[datetime.datetime(2018, 12, 4, 9, 7, 4, tzinfo=datetime.timezone.utc)]
100000000 loops, best of 3: 0.00878 usec per loop

...as long I'm running it I still can see a good result as a returned and valid datetime, no exception.
I'll continue to use this code so I'll take care to keep an eye on it, I also hope to share this with you.


For more options, visit https://groups.google.com/d/optout.

Andreas Pelme

unread,
Feb 4, 2019, 9:18:16 AM2/4/19
to django-d...@googlegroups.com
On 4 Feb 2019, at 15:04, Giuseppe De Marco <giuseppe...@unical.it> wrote:

python3 -m timeit -s   "import sys, os; sys.path.append(os.getcwd()); from datetime_heuristic_parser import datetime_heuristic_parser; print(datetime_heuristic_parser('04/12/2018 09:7:4Z'))"

That command is not correct. timeit -s takes two arguments: setup code and benchmark code. This command just executes the setup code and does not run any code at all for the actual benchmark.

The correct command would be something like this (I did not run this command myself but you get the idea):
python3 -m timeit -s   "import sys, os; sys.path.append(os.getcwd()); from datetime_heuristic_parser import datetime_heuristic_parser” "datetime_heuristic_parser('04/12/2018 09:7:4Z’)"

Cheers,
Andreas

Giuseppe De Marco

unread,
Feb 4, 2019, 9:33:51 AM2/4/19
to django-d...@googlegroups.com
Thank you Andreas, finally I can see a real benchmark on my laptop:

python3 -m timeit -s "from django.utils.dateparse import parse_datetime" "print(parse_datetime('2018-04-01 09:07:04'))"
100000 loops, best of 3: 11.1 usec per loop

python3 -m timeit -s "import datetime" "print(datetime.datetime.strptime('2019-02-03T17:27:58.645194', '%Y-%m-%dT%H:%M:%S.%f'))"
100000 loops, best of 3: 18 usec per loop

python3 -m timeit -s   "import sys, os; sys.path.append(os.getcwd()); from datetime_heuristic_parser import datetime_heuristic_parser" "print(datetime_heuristic_parser('04/12/2018 09:7:4'))"
10000 loops, best of 3: 25.4 usec per loop





--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.

For more options, visit https://groups.google.com/d/optout.

Aymeric Augustin

unread,
Feb 4, 2019, 12:05:51 PM2/4/19
to django-d...@googlegroups.com
Hello Guiseppe,

At this point I think we can agree on why we disagree :-)

First, I believe that the function responsible for converting datetimes stored in ISO 8601 format in SQLite databases should parse ISO 8601 and not do anything else. I'm -1 on changing it to accept localized datetimes. (A third-party package could provide a model field supporting this.)

So we're discussing the addition of new functionality.

Second, I'm skeptical of functions accepting a variety of more or less well specified inputs. I don't think such APIs are conducive to good coding practices. Either you're getting data from an automated system, in which case the format is known. Or you're getting data from humans, in which case Django provides one solution: forms. Indeed, when interacting when humans, the hard part isn't validating the data, it's providing good error messages.

I know that not everyone shares this preference for strict APIs. Some languages try hard to make sense of poorly specified inputs, for example:

$ node -e 'console.log("1" + 1);'
11
$ php -r 'echo "1" + 1;'
2
$ python -c 'print("1" + 1)'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
TypeError: must be str, not int

I don't think I need to explain my opinion after showing this example :-)

It's a reasonable analogy for the issue we're discussing here. "1" and 1 are the same input provided in a slightly different format that doesn't make a difference for humans but does make one for computers. So are 2019-02-04 and 04/02/19. (Or is it 02/04/19?)

Anyway, if you think this is generally useful, you can easily package it into a third-party module. Widespread adoption would be a strong argument for integrating it into a future version of Django.

Best regards,

Aymeric.


For more options, visit https://groups.google.com/d/optout.


--
Aymeric.

Giuseppe De Marco

unread,
Feb 4, 2019, 3:04:28 PM2/4/19
to django-d...@googlegroups.com
Hi Augustin, my name Is Giuseppe, i before u :)


At this point I think we can agree on why we disagree :-)

Great!


First, I believe that the function responsible for converting datetimes stored in ISO 8601 format in SQLite databases should parse ISO 8601 and not do anything else. I'm -1 on changing it to accept localized datetimes. (A third-party package could provide a model field supporting this.)

+1 when you know the format All this code Is useless.

 Should parse_date be renamed to parse_date_iso8601?


Second, I'm skeptical of functions accepting a variety of more or less well specified inputs. I don't think such APIs are conducive to good coding practices. Either you're getting data from an automated system, in which case the format is known. Or you're getting data from humans, in which case Django provides one solution: forms.

I was wondering to a utility used to parse the dt format configured in setting.py. It's not mandatory to use this utility.


I don't think I need to explain my opinion after showing this example :-)

I Hope that this idea Will not Be murdered by those example :)

So are 2019-02-04 and 04/02/19. (Or is it 02/04/19?)

It Will depend by settings.py, that's the goal


Anyway, if you think this is generally useful, you can easily package it into a third-party module.

Consider It done, I thougth on a wider featureset in Django.utils.dateparse




--
____________________
Dott. Giuseppe De Marco
CENTRO ICT DI ATENEO
University of Calabria
87036 Rende (CS) - Italy
Phone: +39 0984 496945
e-mail: giuseppe...@unical.it

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CABms%2BYrNN%3DhU4fV%2BxfAdKPR0%3DXiqGiGS2PVLxQ%3DUG2cxOdZNGw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


--
Aymeric.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
Reply all
Reply to author
Forward
0 new messages