[http-state] Status update

2 views
Skip to first unread message

Adam Barth

unread,
Aug 9, 2009, 8:23:33 PM8/9/09
to http-state
Here's a brief status update on what I've been up to.

1) Improved testing harness to be able to run all the tests on Chrome
from the command line. I'm using their TestShell executable as the
integration point, which has the same cookie logic as the full
browser.

2) Wrote a number of tests probing the parsing of cookies and values.

3) Replaced the bogus Set-Cookie grammar in the draft with something
more accurate. So far, we're covering only the cookie-name and the
cookie-value. I'm planning on doing the cookie-attributes (i.e.,
their names and opaque values) next.

Currently, I've expressed the syntax as a parsing algorithm because
that's easier to write. Eventually, I'd like to turn this into the
more traditional grammar style. It might be slightly too early to do
that yet.

I'm already finding a number of cases where the major implementations
differ. I've tried to pick the most reasonable behavior based on
testing and historical precedent. I've documented all the cases I
know of in tests.

Useful Areas for Contribution:

1) Expand the test harness to be able to run more implementations
automatically. This is extremely valuable because ensures we
understand the compatibility impact of our decisions.
a) Adding Safari should be fairly straightforward using their
DumpRenderTree executable in a similar way to how we're currently
using Chromium's TestShell.
b) Adding Internet Explorer is probably a matter of writing a simple
WinInet application that functions like CURL.
c) Are there other implementations we should be testing? Is there a
command line utility that uses libsoup?

2) Cookie date examples. I imagine specifying the cooke-date syntax
and semantics will be challenging. If you or anyone you know has a
corpus of cookie dates we can use, please let me know.

Thanks,
Adam
_______________________________________________
http-state mailing list
http-...@ietf.org
https://www.ietf.org/mailman/listinfo/http-state

Julian Reschke

unread,
Aug 10, 2009, 3:26:25 AM8/10/09
to Adam Barth, http-state
Adam Barth wrote:
> ...

> I'm already finding a number of cases where the major implementations
> differ. I've tried to pick the most reasonable behavior based on
> testing and historical precedent. I've documented all the cases I
> know of in tests.
> ...

Checking... what's the purpose of "picking" a specific behavior when
right now implementations differ? Doesn't it make more sense to just
document the areas that do not have inter-operability, and go on?

BR, Julian

Adam Barth

unread,
Aug 10, 2009, 10:28:32 AM8/10/09
to Julian Reschke, http-state
On Mon, Aug 10, 2009 at 12:26 AM, Julian Reschke<julian....@gmx.de> wrote:
> Adam Barth wrote:
>>
>> ...
>> I'm already finding a number of cases where the major implementations
>> differ.  I've tried to pick the most reasonable behavior based on
>> testing and historical precedent.  I've documented all the cases I
>> know of in tests.
>> ...
>
> Checking... what's the purpose of "picking" a specific behavior when right
> now implementations differ?

Mostly so we can make progress. We should view these picks as
tentative, just like adding the text from 2109 was tentative.
Hopefully the new text is more accurate than the text it's replacing,
but we should continue to iterate.

> Doesn't it make more sense to just document the
> areas that do not have inter-operability, and go on?

I've documented everything I know so far in tests. I expect we'll end
up discussing each case that differs between implementations on the
mailing list. That's why it's important to integrate more
implementation into the testing harness. Once we do that, we can read
out all the interoperability issues with a few commands and have a
broader view when we make decisions.

Adam

Daniel Stenberg

unread,
Aug 10, 2009, 4:35:29 PM8/10/09
to Adam Barth, http-state
On Sun, 9 Aug 2009, Adam Barth wrote:

> c) Are there other implementations we should be testing?

Wget is a very popular command line tool with a cookie parser.

> 2) Cookie date examples. I imagine specifying the cooke-date syntax and
> semantics will be challenging. If you or anyone you know has a corpus of
> cookie dates we can use, please let me know.

Dan Winship's note 2009-08-05* doesn't exactly mention specific numbers but
covers five actually used formats to be:

Wdy, DD-Mon-YYYY HH:MM:SS GMT
Wdy, DD Mon YYYY HH:MM:SS GMT
Wdy, DD-Mon-YY HH:MM:SS GMT
Weekday, DD-Mon-YY HH:MM:SS GMT
Wdy Mon DD HH:MM:SS YYYY GMT

Maybe someone with access to a proxy could extract more logs on this.

--

/ daniel.haxx.se

Dan Winship

unread,
Aug 11, 2009, 6:39:00 PM8/11/09
to Adam Barth, http-state
On 08/09/2009 08:23 PM, Adam Barth wrote:
> 1) Expand the test harness to be able to run more implementations
> automatically. This is extremely valuable because ensures we
> understand the compatibility impact of our decisions.

> c) Are there other implementations we should be testing? Is there a


> command line utility that uses libsoup?

I could write one easily enough (and in fact, once we have a good set of
tests, I'll probably add them to the libsoup regression tests). But
there is no "compatibility impact" to understand wrt libsoup's behavior;
wherever it doesn't behave like the majority of other browsers, it's
just a bug and I'm going to change it. So I don't think there's really
any reason for the spec's test suite to be testing it.

> 2) Cookie date examples. I imagine specifying the cooke-date syntax
> and semantics will be challenging. If you or anyone you know has a
> corpus of cookie dates we can use, please let me know.

OK, from the 2973 expires tokens in the cookies I'd collected a few
years ago (from a not at all representative sample of web sites). Counts
represent number of Set-Cookie headers, not number of unique cookies or
sites or whatever.

1987 "Mon, 10-Dec-2007 17:02:24 GMT"
Revised Netscape spec format

533 "Wed, 09 Dec 2009 16:27:23 GMT"
rfc1123-date

239 "Thursday, 01-Jan-1970 00:00:00 GMT"
4-digit-year version of Netscape spec example (see below).
Seems to only come from sites using PHP, but it's not PHP
itself; maybe some framework?

89 "Mon Dec 10 16:32:30 2007 GMT"
The not-quite-asctime format used by Amazon. (Still not fixed!)

62 "Wednesday, 01-Jan-10 00:00:00 GMT"
The syntax used by the example text in the Netscape spec,
although the actual grammar uses abbreviated weekday names

31 "Mon, 10-Dec-07 20:35:03 GMT"
Original Netscape spec

12 "Wed, 1 Jan 2020 00:00:00 GMT"
If this had "01 Jan" it would be an rfc1123-date. This *is* a
legitimate rfc822 date, though not an rfc2822 date because "GMT"
is deprecated in favor of "+0000" there.

8 "Saturday, 8-Dec-2012 21:24:09 GMT"
Would match the "weird php" syntax above if it was "08-Dec"

3 "Thu, 31 Dec 23:55:55 2037 GMT"
God only knows what they were thinking. This came from a
hit-tracker site, and it's possible that it's just totally
broken and no one parses it "correctly"

2 "Sun, 9 Dec 2012 13:42:05 GMT"
Another kind of rfc822 / nearly-rfc1123 date, using superfluous
whitespace.

2 "Wed Dec 12 2007 08:44:07 GMT-0500 (EST)"
Another kind of "lets throw components together at random". The
site that this cookie came has apparently been fixed since then.
(It uses the Netscape spec format now.)

2 "Mon, 01-Jan-2011 00: 00:00 GMT"
Note whitespace inside the time component. Also, the cookie came
with a domain= attribute that didn't match the domain it was
being sent from (at all). Nice job all around.

1 "Sun, 1-Jan-1995 00:00:00 GMT"
1 "Wednesday, 01-Jan-10 0:0:00 GMT"
1 "Thu, 10 Dec 2009 13:57:2 GMT"
Because fixed-width fields are for sissies.


So, you can match 96.6% of those with a parser that basically does
rfc1123-date, except:

- it accepts long or short day names
- it allows the day-of-month to be "1*2DIGIT" or "SPC DIGIT"
- it allows " " or "-" around month name
- it accepts 2 or 4 digit years (which is extra tricky for cookies
since there are legitimate reasons for sending both distant-past
and distant-future dates... we need to test how clients interpret
these).

You can get another 3% by accepting asctime-date, and being lenient if
they have " GMT" at the end. Given that HTTP clients are required to
accept asctime-dates in the Date header anyway, this isn't that harsh.
And also, it's Amazon, so you have no choice.

So that gets you to 99.6% of the cookies I saw, and that's probably good
enough for the grammar, though we could note that there are
even-more-broken dates out there.

-- Dan

Adam Barth

unread,
Aug 11, 2009, 7:13:54 PM8/11/09
to Dan Winship, http-state
On Tue, Aug 11, 2009 at 3:39 PM, Dan Winship<dan.w...@gmail.com> wrote:
> On 08/09/2009 08:23 PM, Adam Barth wrote:
>> 1) Expand the test harness to be able to run more implementations
>> automatically.  This is extremely valuable because ensures we
>> understand the compatibility impact of our decisions.
>
>>   c) Are there other implementations we should be testing?  Is there a
>> command line utility that uses libsoup?
>
> I could write one easily enough (and in fact, once we have a good set of
> tests, I'll probably add them to the libsoup regression tests). But
> there is no "compatibility impact" to understand wrt libsoup's behavior;
> wherever it doesn't behave like the majority of other browsers, it's
> just a bug and I'm going to change it. So I don't think there's really
> any reason for the spec's test suite to be testing it.

Ok. If you'd like to implement it, I'll add it to the harness.
Currently we have support for:

* Firefox
* Safari
* Chrome
* CURL

I'd like to add IE to that list, but that will probably have to wait
until I get back to California and my Windows machine.

This is great data. Would you like to take a crack at writing a date grammar?

For reference here are the tests from Chrome's cookie date parser.

const CookieDateParsingCase tests[] = {
{ "Sat, 15-Apr-17 21:01:22 GMT", true, 1492290082 },
{ "Thu, 19-Apr-2007 16:00:00 GMT", true, 1176998400 },
{ "Wed, 25 Apr 2007 21:02:13 GMT", true, 1177534933 },
{ "Thu, 19/Apr\\2007 16:00:00 GMT", true, 1176998400 },
{ "Fri, 1 Jan 2010 01:01:50 GMT", true, 1262307710 },
{ "Wednesday, 1-Jan-2003 00:00:00 GMT", true, 1041379200 },
{ ", 1-Jan-2003 00:00:00 GMT", true, 1041379200 },
{ " 1-Jan-2003 00:00:00 GMT", true, 1041379200 },
{ "1-Jan-2003 00:00:00 GMT", true, 1041379200 },
{ "Wed,18-Apr-07 22:50:12 GMT", true, 1176936612 },
{ "WillyWonka , 18-Apr-07 22:50:12 GMT", true, 1176936612 },
{ "WillyWonka , 18-Apr-07 22:50:12", true, 1176936612 },
{ "WillyWonka , 18-apr-07 22:50:12", true, 1176936612 },
{ "Mon, 18-Apr-1977 22:50:13 GMT", true, 230251813 },
{ "Mon, 18-Apr-77 22:50:13 GMT", true, 230251813 },
// If the cookie came in with the expiration quoted (which in terms of
// the RFC you shouldn't do), we will get string quoted. Bug 1261605.
{ "\"Sat, 15-Apr-17\\\"21:01:22\\\"GMT\"", true, 1492290082 },
// Test with full month names and partial names.
{ "Partyday, 18- April-07 22:50:12", true, 1176936612 },
{ "Partyday, 18 - Apri-07 22:50:12", true, 1176936612 },
{ "Wednes, 1-Januar-2003 00:00:00 GMT", true, 1041379200 },
// Test that we always take GMT even with other time zones or bogus
// values. The RFC says everything should be GMT, and in the worst case
// we are 24 hours off because of zone issues.
{ "Sat, 15-Apr-17 21:01:22", true, 1492290082 },
{ "Sat, 15-Apr-17 21:01:22 GMT-2", true, 1492290082 },
{ "Sat, 15-Apr-17 21:01:22 GMT BLAH", true, 1492290082 },
{ "Sat, 15-Apr-17 21:01:22 GMT-0400", true, 1492290082 },
{ "Sat, 15-Apr-17 21:01:22 GMT-0400 (EDT)",true, 1492290082 },
{ "Sat, 15-Apr-17 21:01:22 DST", true, 1492290082 },
{ "Sat, 15-Apr-17 21:01:22 -0400", true, 1492290082 },
{ "Sat, 15-Apr-17 21:01:22 (hello there)", true, 1492290082 },
// Test that if we encounter multiple : fields, that we take the first
// that correctly parses.
{ "Sat, 15-Apr-17 21:01:22 11:22:33", true, 1492290082 },
{ "Sat, 15-Apr-17 ::00 21:01:22", true, 1492290082 },
{ "Sat, 15-Apr-17 boink:z 21:01:22", true, 1492290082 },
// We take the first, which in this case is invalid.
{ "Sat, 15-Apr-17 91:22:33 21:01:22", false, 0 },
// amazon.com formats their cookie expiration like this.
{ "Thu Apr 18 22:50:12 2007 GMT", true, 1176936612 },
// Test that hh:mm:ss can occur anywhere.
{ "22:50:12 Thu Apr 18 2007 GMT", true, 1176936612 },
{ "Thu 22:50:12 Apr 18 2007 GMT", true, 1176936612 },
{ "Thu Apr 22:50:12 18 2007 GMT", true, 1176936612 },
{ "Thu Apr 18 22:50:12 2007 GMT", true, 1176936612 },
{ "Thu Apr 18 2007 22:50:12 GMT", true, 1176936612 },
{ "Thu Apr 18 2007 GMT 22:50:12", true, 1176936612 },
// Test that the day and year can be anywhere if they are unambigious.
{ "Sat, 15-Apr-17 21:01:22 GMT", true, 1492290082 },
{ "15-Sat, Apr-17 21:01:22 GMT", true, 1492290082 },
{ "15-Sat, Apr 21:01:22 GMT 17", true, 1492290082 },
{ "15-Sat, Apr 21:01:22 GMT 2017", true, 1492290082 },
{ "15 Apr 21:01:22 2017", true, 1492290082 },
{ "15 17 Apr 21:01:22", true, 1492290082 },
{ "Apr 15 17 21:01:22", true, 1492290082 },
{ "Apr 15 21:01:22 17", true, 1492290082 },
{ "2017 April 15 21:01:22", true, 1492290082 },
{ "15 April 2017 21:01:22", true, 1492290082 },
// Some invalid dates
{ "98 April 17 21:01:22", false, 0 },
{ "Thu, 012-Aug-2008 20:49:07 GMT", false, 0 },
{ "Thu, 12-Aug-31841 20:49:07 GMT", false, 0 },
{ "Thu, 12-Aug-9999999999 20:49:07 GMT", false, 0 },
{ "Thu, 999999999999-Aug-2007 20:49:07 GMT", false, 0 },
{ "Thu, 12-Aug-2007 20:61:99999999999 GMT", false, 0 },
{ "IAintNoDateFool", false, 0 },

Adam Barth

unread,
Aug 12, 2009, 1:17:48 AM8/12/09
to Bil Corry, http-state
On Tue, Aug 11, 2009 at 9:49 PM, Bil Corry<b...@corry.biz> wrote:

> Adam Barth wrote on 8/11/2009 6:13 PM:
>> This is great data.  Would you like to take a crack at writing a date grammar?
>
> I think we first need to understand which UAs can successfully parse the various date formats.

In general, I think we should plan to iterate on each part of the
draft several times instead of trying to get it perfect in one pass.
Dan has presented some pretty detailed data on what kinds of date
formats he's seeing on the web. Writing up a grammar for that would
be a big improvement over the big blank space in the current draft.
:)

> Once we have that, we can specify an "official" date format[1] that should be used, and alternative formats that should be parsable if encountered[2].

Yep. We should probably recommend that server implementors use
whatever date format that is the most widely implemented, sane format.
We also should explain how user agents implementors should cope with
enough crazy formats to achieve some desired level of compatibility
with existing practice.

> And as Dan pointed out[3], we should test how two-digits years are interpreted -- for my own date parser, anything >= 40 is considered 20th century and anything <= 39 is considered 21st century.  It'd be good to know how the UAs handle it and provide a recommendation in the spec.

Definitely. If you investigate this, let us know what you find.

>>     // Some invalid dates
>>     { "98 April 17 21:01:22",                    false, 0 },
>

> Why is that considered invalid?  Because it's unclear if it's 1998 or 2098?

Not sure. I haven't looked into date parsing in detail yet.

Adam

Daniel Stenberg

unread,
Aug 12, 2009, 3:31:43 AM8/12/09
to http-state
On Tue, 11 Aug 2009, Adam Barth wrote:

> This is great data. Would you like to take a crack at writing a date grammar?

Basically they all have two to five types of data:

A) Date (day, month and year)
B) Time (hour, minute, second)
C) Time zone (named or relative UTC)
D) Day (name of the day, mostly pointless)
E) Additional junk (entirely pointless)

[All these with all sorts of separators]

I'm sure all date parsers we use work basically like this. It needs to parse
the string and identify the individual components it specifies. If it has
gotten enough details (A and B are required), it doesn't matter if more junk
(E) is found or added to the date string.

The individual ordering of the components is mostly uninteresting to a parser,
except if you really want to make the parser check for a strict syntax.

Of course, if the same A, B or C type appear more than once the outcome will
be undefined as then the parser cannot reliably pick which one is the correct.

Given the look of that list of dates for the Chrome tests, it seems similar in
spirit to the curl parser.

Combined, it makes it a pain to write a formal syntax spec from.

--

/ daniel.haxx.se

Ian Hickson

unread,
Aug 12, 2009, 4:28:38 AM8/12/09
to Daniel Stenberg, http-state
On Wed, 12 Aug 2009, Daniel Stenberg wrote:
>
> Combined, it makes it a pain to write a formal syntax spec from.

With HTML5 I've found that rather than defining formal syntaxes for this
kind of thing, it's easier just to define imperative parsing steps that
lead to the right behaviour.

For example:

http://www.whatwg.org/specs/web-apps/current-work/#rules-for-parsing-floating-point-number-values

--
Ian Hickson U+1047E )\._.,--....,'``. fL
http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,.
Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'

Julian Reschke

unread,
Aug 12, 2009, 5:17:11 AM8/12/09
to Ian Hickson, Daniel Stenberg, http-state
Ian Hickson wrote:
> On Wed, 12 Aug 2009, Daniel Stenberg wrote:
>> Combined, it makes it a pain to write a formal syntax spec from.
>
> With HTML5 I've found that rather than defining formal syntaxes for this
> kind of thing, it's easier just to define imperative parsing steps that
> lead to the right behaviour.
>
> For example:
>
> http://www.whatwg.org/specs/web-apps/current-work/#rules-for-parsing-floating-point-number-values

They might be useful for implementers of parsers, but they are almost
unreadable for producers.

BR, Julian

Bil Corry

unread,
Aug 12, 2009, 12:49:58 AM8/12/09
to Adam Barth, http-state
Adam Barth wrote on 8/11/2009 6:13 PM:
> This is great data. Would you like to take a crack at writing a date grammar?

I think we first need to understand which UAs can successfully parse the various date formats. Once we have that, we can specify an "official" date format[1] that should be used, and alternative formats that should be parsable if encountered[2]. And as Dan pointed out[3], we should test how two-digits years are interpreted -- for my own date parser, anything >= 40 is considered 20th century and anything <= 39 is considered 21st century. It'd be good to know how the UAs handle it and provide a recommendation in the spec.


> // Some invalid dates
> { "98 April 17 21:01:22", false, 0 },

Why is that considered invalid? Because it's unclear if it's 1998 or 2098?


- Bil

[1] I'm leaning toward the revised Netscape spec format of "Mon, 10-Dec-2007 17:02:24 GMT"
[2] The list of alternative formats would be made up of those that are parsable by the majority (all?) of UAs.
[3] http://groups.google.com/group/http-state/browse_thread/thread/ba2b98c340eed3b8#msg_1e91e84ebc1df4f1

Ian Hickson

unread,
Aug 12, 2009, 6:56:09 AM8/12/09
to Julian Reschke, Daniel Stenberg, http-state
On Wed, 12 Aug 2009, Julian Reschke wrote:
> Ian Hickson wrote:
> > On Wed, 12 Aug 2009, Daniel Stenberg wrote:
> > > Combined, it makes it a pain to write a formal syntax spec from.
> >
> > With HTML5 I've found that rather than defining formal syntaxes for this
> > kind of thing, it's easier just to define imperative parsing steps that lead
> > to the right behaviour.
> >
> > For example:
> >
> > http://www.whatwg.org/specs/web-apps/current-work/#rules-for-parsing-floating-point-number-values
>
> They might be useful for implementers of parsers, but they are almost
> unreadable for producers.

They're not intended for producers (in fact they're hidden in the HTML5
spec when you select the "author" option) so that's not really that
surprising. For producers, you want the much simpler description of what
is valid, which often has little bearing on the parsing rules.

--
Ian Hickson U+1047E )\._.,--....,'``. fL
http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,.
Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'

Daniel Stenberg

unread,
Aug 12, 2009, 7:14:03 AM8/12/09
to Adam Barth, http-state
On Tue, 11 Aug 2009, Adam Barth wrote:

I ran that nice list of date formats through the libcurl date parser, and
there were quite a few that didn't parse or didn't match your output:

> { "WillyWonka , 18-Apr-07 22:50:12 GMT", true, 1176936612 },
> { "WillyWonka , 18-Apr-07 22:50:12", true, 1176936612 },
> { "WillyWonka , 18-apr-07 22:50:12", true, 1176936612 },

> { "Partyday, 18- April-07 22:50:12", true, 1176936612 },
> { "Partyday, 18 - Apri-07 22:50:12", true, 1176936612 },
> { "Wednes, 1-Januar-2003 00:00:00 GMT", true, 1041379200 },

> { "Sat, 15-Apr-17 21:01:22 DST", true, 1492290082 },

> { "Sat, 15-Apr-17 21:01:22 (hello there)", true, 1492290082 },

> { "Sat, 15-Apr-17 21:01:22 11:22:33", true, 1492290082 },
> { "Sat, 15-Apr-17 ::00 21:01:22", true, 1492290082 },
> { "Sat, 15-Apr-17 boink:z 21:01:22", true, 1492290082 },

> { "2017 April 15 21:01:22", true, 1492290082 },
> { "15 April 2017 21:01:22", true, 1492290082 },

All the above formats are invalid to libcurl. The last two are about the only
ones I was a bit surprised it doesn't handle and I might take a deeper look at
that later on.


> { "Thu, 012-Aug-2008 20:49:07 GMT", false, 0 },

libcurl parses this to be 1218574147.

> { "Thu, 12-Aug-31841 20:49:07 GMT", false, 0 },
> { "Thu, 12-Aug-9999999999 20:49:07 GMT", false, 0 },

In cases where the format is fine but the specified date is out of range (like
these examples), libcurl returns MAX_INT with the assumption that it should be
enough to not expire them for quite some time.

All the rest were parsed and got the same results as the table Adam posted.

--

/ daniel.haxx.se

Dan Winship

unread,
Aug 12, 2009, 9:41:47 AM8/12/09
to Adam Barth, http-state
On 08/12/2009 01:17 AM, Adam Barth wrote:
>> Once we have that, we can specify an "official" date format[1] that should be used, and alternative formats that should be parsable if encountered[2].
>
> Yep. We should probably recommend that server implementors use
> whatever date format that is the most widely implemented, sane format.

The most-commonly-used format right now is the revised Netscape spec
format, but unless we can find a cookie implementation that doesn't
parse rfc1123-date (the second-most-common format), I think we should
recommend that, because that's the standard date format in HTTP.

>> And as Dan pointed out[3], we should test how two-digits years are interpreted -- for my own date parser, anything >= 40 is considered 20th century and anything <= 39 is considered 21st century.

Yeah, I was going to suggest 69 as the dividing line, for basically the
same reason; time_t==0 (1970) has to be in the past, and time_t==2^31-1
(2038) has to be in the future. It may be that browsers are inconsistent
about years from 39 to 68...

-- Dan

Adam Barth

unread,
Aug 12, 2009, 11:09:36 AM8/12/09
to Dan Winship, http-state
On Wed, Aug 12, 2009 at 6:41 AM, Dan Winship<dan.w...@gmail.com> wrote:
> On 08/12/2009 01:17 AM, Adam Barth wrote:
>>> Once we have that, we can specify an "official" date format[1] that should be used, and alternative formats that should be parsable if encountered[2].
>>
>> Yep.  We should probably recommend that server implementors use
>> whatever date format that is the most widely implemented, sane format.
>
> The most-commonly-used format right now is the revised Netscape spec
> format, but unless we can find a cookie implementation that doesn't
> parse rfc1123-date (the second-most-common format), I think we should
> recommend that, because that's the standard date format in HTTP.

If that's workable w.r.t. existing implementations, that would be great.

Adam

Daniel Stenberg

unread,
Aug 12, 2009, 11:00:04 AM8/12/09
to Dan Winship, http-state
On Wed, 12 Aug 2009, Dan Winship wrote:

> Maybe just:
>
> cookie-date = rfc1123-like-date | mystery-date
> rfc1123-like-date = weekday "," SP rfc1123-like-dmy SP time SP "GMT"
> weekday = "Monday" | "Mon" | "Tuesday" | "Tue" | ...
> rfc1123-like-dmy = day dmy-div month dmy-div year
> dmy-div = SP | "-"
> day = 2DIGIT | *1SP DIGIT
> month = "Jan" | "Feb" | ...
> year = 2DIGIT | 4DIGIT
> time = 2DIGIT ":" 2DIGIT ":" 2DIGIT
>
> mystery-date = *CHAR ; see below
>
> and then we explain some of the possibilities of mystery-date parsing,
> showing a few examples, but noting that the rfc1123-like-date grammar
> covers 99% of cookies

I think that sounds perfectly fine.

Julian Reschke

unread,
Aug 12, 2009, 12:01:00 PM8/12/09
to Daniel Stenberg, http-state
Daniel Stenberg wrote:
> On Wed, 12 Aug 2009, Dan Winship wrote:
>
>> Maybe just:
>>
>> cookie-date = rfc1123-like-date | mystery-date
>> rfc1123-like-date = weekday "," SP rfc1123-like-dmy SP time SP "GMT"
>> weekday = "Monday" | "Mon" | "Tuesday" | "Tue" | ...
>> rfc1123-like-dmy = day dmy-div month dmy-div year
>> dmy-div = SP | "-"
>> day = 2DIGIT | *1SP DIGIT
>> month = "Jan" | "Feb" | ...
>> year = 2DIGIT | 4DIGIT
>> time = 2DIGIT ":" 2DIGIT ":" 2DIGIT
>>
>> mystery-date = *CHAR ; see below
>>
>> and then we explain some of the possibilities of mystery-date parsing,
>> showing a few examples, but noting that the rfc1123-like-date grammar
>> covers 99% of cookies
>
> I think that sounds perfectly fine.

I think that's a good start. Reminder: please use RFC5234-style ABNF, so
"/" instead of "|" etc,

With respect to mystery-date: maybe that could be defined as something like:

( year / month / dayofmonth / weekday / time / tz / WSP / separators)*

and then have ultra-liberal definitions for each of these components,
and also prose that disallows forms where the same component repeats
multiple times?

BR, Julian

Dan Winship

unread,
Aug 12, 2009, 9:25:54 AM8/12/09
to Daniel Stenberg, http-state
On 08/12/2009 03:31 AM, Daniel Stenberg wrote:
> Given the look of that list of dates for the Chrome tests, it seems
> similar in spirit to the curl parser.

Yeah, RFC 2616 says "Recipients of date values are encouraged to be
robust in accepting date values that may have been sent by non-HTTP
applications" so probably all browsers just have a single all-purpose
very-relaxed date parser.

> Combined, it makes it a pain to write a formal syntax spec from.

Maybe just:

cookie-date = rfc1123-like-date | mystery-date
rfc1123-like-date = weekday "," SP rfc1123-like-dmy SP time SP "GMT"
weekday = "Monday" | "Mon" | "Tuesday" | "Tue" | ...
rfc1123-like-dmy = day dmy-div month dmy-div year
dmy-div = SP | "-"
day = 2DIGIT | *1SP DIGIT
month = "Jan" | "Feb" | ...
year = 2DIGIT | 4DIGIT
time = 2DIGIT ":" 2DIGIT ":" 2DIGIT

mystery-date = *CHAR ; see below

and then we explain some of the possibilities of mystery-date parsing,
showing a few examples, but noting that the rfc1123-like-date grammar

covers 99% of cookies, and there's a long tail of crap after that. (And
the major browsers probably aren't completely consistent about what
parts of that tail they accept.)

-- Dan

Bil Corry

unread,
Aug 12, 2009, 11:12:12 AM8/12/09
to Daniel Stenberg, http-state
Daniel Stenberg wrote on 8/12/2009 2:31 AM:
> Combined, it makes it a pain to write a formal syntax spec from.

Perhaps we should specify the rfc1123-date (Dan Winship's suggestion) as the official date format and reference a separate date parsing spec that provides coverage for the other common formats (and more).

- Bil

Adam Barth

unread,
Aug 14, 2009, 12:21:39 AM8/14/09
to Dan Winship, Daniel Stenberg, http-state
On Wed, Aug 12, 2009 at 6:25 AM, Dan Winship<dan.w...@gmail.com> wrote:
> Maybe just:
>
>    cookie-date       = rfc1123-like-date | mystery-date
>    rfc1123-like-date = weekday "," SP rfc1123-like-dmy SP time SP "GMT"
>    weekday           = "Monday" | "Mon" | "Tuesday" | "Tue" | ...
>    rfc1123-like-dmy  = day dmy-div month dmy-div year
>    dmy-div           = SP | "-"
>    day               = 2DIGIT | *1SP DIGIT
>    month             = "Jan" | "Feb" | ...
>    year              = 2DIGIT | 4DIGIT
>    time              = 2DIGIT ":" 2DIGIT ":" 2DIGIT
>
>    mystery-date      = *CHAR ; see below

I've added this grammar to the draft (with the / characters suggested
by Julian). I haven't tackled the mystery-date format yet.

Adam

Bil Corry

unread,
Aug 17, 2009, 12:58:43 PM8/17/09
to Adam Barth, http-state
Adam Barth wrote on 8/11/2009 6:13 PM:
> For reference here are the tests from Chrome's cookie date parser.

FWIW, here are some date formats that Mozilla handles:

---8<---
983 * Many formats are handled, including:
984 *
985 * 14 Apr 89 03:20:12
986 * 14 Apr 89 03:20 GMT
987 * Fri, 17 Mar 89 4:01:33
988 * Fri, 17 Mar 89 4:01 GMT
989 * Mon Jan 16 16:12 PDT 1989
990 * Mon Jan 16 16:12 +0130 1989
991 * 6 May 1992 16:41-JST (Wednesday)
992 * 22-AUG-1993 10:59:12.82
993 * 22-AUG-1993 10:59pm
994 * 22-AUG-1993 12:59am
995 * 22-AUG-1993 12:59 PM
996 * Friday, August 04, 1995 3:54 PM
997 * 06/21/95 04:24:34 PM
998 * 20/06/95 21:07
999 * 95-06-08 19:32:48 EDT
1000 *
1001 * If the input string doesn't contain a description of the timezone,
1002 * we consult the `default_to_gmt' to decide whether the string should
1003 * be interpreted relative to the local time zone (PR_FALSE) or GMT (PR_TRUE).
1004 * The correct value for this argument depends on what standard specified
1005 * the time string which you are parsing.

(from: http://mxr.mozilla.org/mozilla1.8.0/source/nsprpub/pr/src/misc/prtime.c#961)
--->8---


- Bil

Adam Barth

unread,
Aug 17, 2009, 1:18:28 PM8/17/09
to Bil Corry, http-state
Can you summarize how this compares to the grammar in the current draft?

Adam

Reply all
Reply to author
Forward
0 new messages