#36747: parse_duration() fails to parse valid ISO-8601 durations including years,
months, and weeks due to incorrect regex
--------------------------------+-----------------------------------------
Reporter: florianvazelle | Type: Bug
Status: new | Component: Uncategorized
Version: 5.2 | Severity: Normal
Keywords: | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
--------------------------------+-----------------------------------------
== **Description**
`django.utils.dateparse.parse_duration()` claims to support ISO-8601
duration strings, but the current implementation only handles the
`PnDTnHnMnS` subset. Valid ISO-8601 components such as years (`Y`), months
(`M`), and weeks (`W`) are not supported.
The internal regex used for ISO-8601 durations:
{{{
iso8601_duration_re = _lazy_re_compile(
r"^(?P<sign>[-+]?)"
r"P"
r"(?:(?P<days>\d+([.,]\d+)?)D)?"
r"(?:T"
r"(?:(?P<hours>\d+([.,]\d+)?)H)?"
r"(?:(?P<minutes>\d+([.,]\d+)?)M)?"
r"(?:(?P<seconds>\d+([.,]\d+)?)S)?"
r")?"
r"$"
)
}}}
This means Django currently rejects valid duration strings such as:
{{{
P1Y2M
P3W
P1Y
P2M10DT2H
}}}
Despite the documentation suggesting ISO-8601 support, these forms cannot
be parsed.
== **Steps to Reproduce**
{{{
from django.utils.dateparse import parse_duration
parse_duration("P1Y") # returns None
parse_duration("P2M") # returns None
parse_duration("P3W") # returns None
parse_duration("P1Y2M3DT4H") # returns None
}}}
== **Expected Behavior**
`parse_duration()` should parse all valid ISO-8601 durations that can be
represented as `timedelta`, including:
* `PnW`
* `PnYnMnD`
* ...
Django should capture each calendar unit, or clearly state limitations.
== **Proposed Fix**
1. Replace the current `iso8601_duration_re` with one that uses distinct
group names for each ISO-8601 calendar unit:
{{{
iso8601_duration_re = _lazy_re_compile(
r"^(?P<sign>[-+]?)"
r"P"
r"(?:(?P<years>\d+([.,]\d+)?)Y)?"
r"(?:(?P<months>\d+([.,]\d+)?)M)?"
r"(?:(?P<weeks>\d+([.,]\d+)?)W)?"
r"(?:(?P<days>\d+([.,]\d+)?)D)?"
r"(?:T"
r"(?:(?P<hours>\d+([.,]\d+)?)H)?"
r"(?:(?P<minutes>\d+([.,]\d+)?)M)?"
r"(?:(?P<seconds>\d+([.,]\d+)?)S)?"
r")?"
r"$"
)
}}}
2. Extend `parse_duration()` to convert these new fields to timedelta.
{{{
def parse_duration(value):
match = (
standard_duration_re.match(value)
or iso8601_duration_re.match(value)
or postgres_interval_re.match(value)
)
if match:
kw = match.groupdict()
sign = -1 if kw.pop("sign", "+") == "-" else 1
if kw.get("microseconds"):
kw["microseconds"] = kw["microseconds"].ljust(6, "0")
kw = {k: float(v.replace(",", ".")) for k, v in kw.items() if v is
not None}
days = datetime.timedelta(kw.pop("days", 0.0) or 0.0)
if
match.re == iso8601_duration_re:
+ years = kw.pop("years", 0.0)
+ months = kw.pop("months", 0.0)
+ weeks = kw.pop("weeks", 0.0)
+
+ days = datetime.timedelta(years=years, months=months,
days=kw.pop("days", 0.0) + (weeks * 7))
days *= sign
return days + sign * datetime.timedelta(**kw)
}}}
I can provide a full patch (tests + implementation) if desired.
--
Ticket URL: <
https://code.djangoproject.com/ticket/36747>
Django <
https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.