(Moving this discussion to some relevant mailing lists. For those who
don't know what this is about, we're trying to define a way of
marshalling DateTime data types from database servers over a network
connection, and then manipulating them from C/C++.)
I spent some time thinking about the problem, and here's what I've got.
When dealing with non-time_t timestamps, there are three separate
confusing issues we have to handle at once:
A) Converting to/from the correct date (leap years, priestly calendar
redefinitions, etc).
B) Converting to/from the right time of day (timezones).
C) Avoiding weird time overlaps (leap seconds, timezone changeovers).
As for the precise encoding methods, there are these questions (at least):
i) Start time. .Net uses January 1, AD 0001, 00:00:00. (You will
recall that, annoyingly, there was no year 0000.) Delphi uses days
since December 30, 1899, probably due to insane historical reasons
(http://www.joelonsoftware.com/items/2006/06/16.html). 64-bit time_t
will use 1970, of course, as before.
ii) Precision. .Net uses 0.1-microsecond units, which may or may have
been nicknamed "billiseconds." Delphi uses days, but its datetime
type is a double, which is rather sneaky. time_t will always be
seconds, which is kind of pointless because time types that expire
after the universe does are Not Efficient Use of Space.
iii) Local time vs. GMT/UTC. Both Delphi and .Net make the horrible
mistake of leaving this "unspecified", which means you can have two
different time_t-like things that mean the same time in different
timezones. Unix always uses UTC, of course.
iv) Leap seconds. UTC has the bad habit of repeating seconds
occasionally due to leap seconds. They never should have done that,
but they did. I gather that everybody's time format is susceptible to
that (except libtai: http://cr.yp.to/libtai.html) and they just suck
it up.
Now to the actual solutions.
I spent some time searching around, and it looks like there's no
magical 64-bit time_t-related goodness already sitting around in glibc
on 32-bit systems and just hiding under some nonstandard symbol names.
Which is too bad, really. But that's not going to help us.
It occurred to me that they must have solved this problem in mono
somehow, so I wrote the program attached below and tried it.
Let's start with the easy (or so I claim) decisions:
I claim that sending time around as a number of ticks, instead of a
broken-down or BCD-encoded YYYYMMDD/HHMMSS number is the only sane
solution, so we immediately discard anything like the latter. It
happens to be what *most* SQL servers already do, but most SQL servers
are dumb.
(C) and (iv): leap seconds are pretty minor and nobody else knows what
to do with them either. Ignore.
(iii): UTC vs. localtime. Historically, anybody who puts their
timestamps in localtime has been screwed. (See
http://blogs.msdn.com/bclteam/archive/2007/06/18/a-brief-history-of-datetime-anthony-moore.aspx
for how the .net people got screwed fantastically by it, despite Win32
actually getting it *right* years earlier.) So we really should use
UTC, at least in the sense that running a timestamp through gmtime()
gives the original hours and minutes back. (Note that database times
are usually in an "unspecified" timezone, which just means we
shouldn't mess with converting them during transit. As long as we
encode them as if they were UTC, then decode them as if they were UTC,
this will work. But treating them as localtime will cause havoc if
the server's timezone is different from the client's.)
Now, the trickier parts:
(A) and (i): date conversions. These are relatively easy, in that we
can write a single, cross-platform function to do this, and it doesn't
need to read any timezone files. The only question is which epoch we
choose. I think I'd lean toward the Unix epoch (1970), if only
because then we can just call localtime() on 64-bit systems and get
the right answer.
(ii) Precision: one-second precision is unnecessarily low when you
have 64 whole bits to play with. You end up with a timestamp capable
of representing hundreds of billions of years in the future, which is
kind of pointless. I'd suggest going with .Net-level 0.1-microsecond
precision, so that conversion to/from .net is just adding/subtracting
an offset. (That gives us about 20000 years, which is a nice range.)
Alternatively, since 0.1 microseconds is kind of a silly precision to
talk about, we could just use microseconds instead. The advantage of
that is 'struct timeval' is already in microseconds, so it's easy to
convert to/from a timeval. Of course, 'struct timespec' is in
nanoseconds, but nobody really uses that.
(B) Timezones: this is the hard one. The only way to do timezone
conversions is to use the timezone conversion files, which are in a
system-dependent binary format (or in non-Unix systems, not even
necessarily files). Only localtime() and friends know how to access
timezones. I used the C# program attached below to see how mono and
MS.NET handle it. Answer: Microsoft's answers are all correct, but I
think the .Net distro comes with its own timezones files (gack!).
Mono just starts ignoring daylight savings after Y2038, which implies
that it just does some kind of conversion like "if (year > 2038)
year=2038;" before passing the resulting time_t to localtime(). I
think that's acceptable for anything I want to do; I might want to do
calculations involving the year 2038, but probably not ones in which
daylight savings will matter.
The reason I'm crossposting this to the wvstreams mailing list is I
think we might as well implement whatever we decide here as a
replacement for the wvstreams WvTime class. Then we'd have a nice,
cross-platform time class that we can use in more than one project.
Thoughts?
Avery
P.S. Here's the .net program I used for testing mono's datetime
conversion code (it found some bugs in mono, incidentally, so I
reported them :)):
using System;
public static class Test
{
public static void test(DateTime dt)
{
System.Console.WriteLine("{0} -- {1}",
dt.ToLocalTime(), dt.ToUniversalTime());
System.Console.WriteLine(" {0} -- {1} -- {2}",
dt.Ticks, dt.ToLocalTime().Ticks, dt.ToUniversalTime().Ticks);
}
public static void Main()
{
test(DateTime.Parse("2007-09-27 5:14:14"));
test(DateTime.Parse("2007-11-27 5:14:14"));
test(DateTime.Parse("2039-09-27 5:14:14"));
test(DateTime.Parse("2039-11-27 5:14:14"));
test(DateTime.Parse("0001-01-01 10:01:02"));
}
}
Fine.
> (C) and (iv): leap seconds are pretty minor and nobody else knows what
> to do with them either. Ignore.
The possible treachery from ignoring leap seconds is if you convert
pure date values between representations that differ in their
ignorance. Depending on the direction you go, and exactly how things
work, you could end up with a specified date, such as 2007-11-29
00:00:00, being turned into 2007-11-28 23:59:59 (or even 23:59:60, if
it's on a leap-second day), which is incorrect if it gets converted
back to a date. Trying to think about whether this would be a problem
converting between Unix Time and UTC makes my head hurt, but I think
I've convinced myself that it's not an issue based on the definitions.
Implementations are a different story, of course; the only difference
between theory and practice is that in theory, there's no difference
between theory and practice. One possible solution is to encode pure
date values as noon.
> (iii): UTC vs. localtime. Historically, anybody who puts their
> timestamps in localtime has been screwed. (See
> http://blogs.msdn.com/bclteam/archive/2007/06/18/a-brief-history-of-datetime-anthony-moore.aspx
> for how the .net people got screwed fantastically by it, despite Win32
> actually getting it *right* years earlier.) So we really should use
> UTC, at least in the sense that running a timestamp through gmtime()
> gives the original hours and minutes back.
It's pretty clear you'd have to have rocks in your head to actually
want your times stored in the local timezone.
> Now, the trickier parts:
>
> (A) and (i): date conversions. These are relatively easy, in that we
> can write a single, cross-platform function to do this, and it doesn't
> need to read any timezone files. The only question is which epoch we
> choose. I think I'd lean toward the Unix epoch (1970), if only
> because then we can just call localtime() on 64-bit systems and get
> the right answer.
I'm fond of 1970-01-01 if only for sentimental reasons. Plus 64-bit
localtime works, and it gives marginally more readable values for
contemporary dates. It doesn't take much effort to be able to eyeball
Unix times and know roughly when they are.
> (ii) Precision: one-second precision is unnecessarily low when you
> have 64 whole bits to play with. You end up with a timestamp capable
> of representing hundreds of billions of years in the future, which is
> kind of pointless. I'd suggest going with .Net-level 0.1-microsecond
> precision, so that conversion to/from .net is just adding/subtracting
> an offset. (That gives us about 20000 years, which is a nice range.)
> Alternatively, since 0.1 microseconds is kind of a silly precision to
> talk about, we could just use microseconds instead. The advantage of
> that is 'struct timeval' is already in microseconds, so it's easy to
> convert to/from a timeval. Of course, 'struct timespec' is in
> nanoseconds, but nobody really uses that.
It doesn't much matter to me which fraction of a second we pick, as
converting is just multiplying by powers of 10, and, as you note,
there's already at least 4 standards in reasonably wide use that I'd
want to convert to and from. I suppose the .Net format is the most
likely to ever be widely encountered: I rarely see either struct
timeval or struct timespec, and just about never in situations where
I'd have to do more than one or two conversions in a program.
> (B) Timezones: this is the hard one. The only way to do timezone
> conversions is to use the timezone conversion files, which are in a
> system-dependent binary format (or in non-Unix systems, not even
> necessarily files). Only localtime() and friends know how to access
> timezones. I used the C# program attached below to see how mono and
> MS.NET handle it. Answer: Microsoft's answers are all correct, but I
> think the .Net distro comes with its own timezones files (gack!).
> Mono just starts ignoring daylight savings after Y2038, which implies
> that it just does some kind of conversion like "if (year > 2038)
> year=2038;" before passing the resulting time_t to localtime(). I
> think that's acceptable for anything I want to do; I might want to do
> calculations involving the year 2038, but probably not ones in which
> daylight savings will matter.
>
> Thoughts?
This is the hard one indeed. Everything else is mostly window
dressing, and is just a matter of taste and not really of bugs or
functionality. The real fundamental question is how we'll deal with
converting this into local time. The choice is either to use
localtime() (limiting us to a 32 bit time_t's worth of useful date
ranges), or to parse the timezone files ourselves (exposing us to a
lot of work and bugs). It appears that most people who deeply care
about dates parse the files themselves (even on .Net, which already
ships its own timezone files), if only so that they know what timezone
and leap second rules are in effect, and know that they're consistent
between installations. A possible workaround for the extra work and
bugs is to steal the localtime() implementation from 64 bit glibc.
As for dealing with years > 2038, Mono's hypothesized solution would
be mostly fine, as you say, but only if it also correctly fudged the
leap years. We may not care too much about timezones and leap seconds
way out there, but missing a day isn't so cool. Leap years are
mercifully predictable though, and it wouldn't take all that much
extra fiddling to get it right. If they're real keeners, it's not
even all that much more work on top to have the days of the week work
out correctly either. Also, I'd feel a little bit uneasy releasing a
library that knowingly produces different results between 32- and
64-bit versions, but I'd also feel bad about providing known-bad
results on 64 bits when it'd be just as easy to call the useful 64-bit
localtime().
Peter.
As much as I love time_t, it won't let me talk about a mortgage
maturity date that's 31 years from now unless I buy a 64-bit computer.
That's not going to fly.
It also can't represent fractional sections (although a second value
that includes the fraction is a reasonable option).
> > (C) and (iv): leap seconds are pretty minor and nobody else knows what
> > to do with them either. Ignore.
>
> The possible treachery from ignoring leap seconds is if you convert
> pure date values between representations that differ in their
> ignorance.
Correct. The only sane thing to do would have been to have time_t and
friends (and UTC, for that matter) ignore leap seconds (counting only
the raw number of seconds since a particular moment in time) and then
have localtime() etc add/subtract leap seconds as necessary when
converting to broken-down time. Unfortunately, that's not what
happened: time_t itself - and UTC - skips during a leap second. I
believe Windows time works the same way.
As dumb as that is, if our only option for making a "reasonable"
timestamp is to somehow take the leap second information back *out*,
then we're doomed. So we should just do it the way current systems
do.
> Depending on the direction you go, and exactly how things
> work, you could end up with a specified date, such as 2007-11-29
> 00:00:00, being turned into 2007-11-28 23:59:59 (or even 23:59:60, if
> it's on a leap-second day), which is incorrect if it gets converted
> back to a date. Trying to think about whether this would be a problem
> converting between Unix Time and UTC makes my head hurt, but I think
> I've convinced myself that it's not an issue based on the definitions.
See: http://en.wikipedia.org/wiki/Unix_time
They have a nice explanation of leap seconds. Basically, there is no
way to represent time "23:59:60" as a time_t, but Unix systems
(supposedly) deal with it by repeating the same time_t twice.
In any case, I'd suggest doing what's currently being done, just for
the sake of our own sanity.
> One possible solution is to encode pure date values as noon.
That's an interesting option, but since nobody else does it that way,
I'd say it's the most dangerous one :)
> It's pretty clear you'd have to have rocks in your head to actually
> want your times stored in the local timezone.
The times when I've seen it be useful is when you're specifically
referring to "whatever the local time", such as things like "all our
stores open at 9am". But yes, that's a special case.
Also, it would appear that plenty of people do in fact have those
rocks in their heads :)
> I'm fond of 1970-01-01 if only for sentimental reasons. Plus 64-bit
> localtime works, and it gives marginally more readable values for
> contemporary dates. It doesn't take much effort to be able to eyeball
> Unix times and know roughly when they are.
Good points.
It did occur to me after I wrote my original message, though: if we
use .net-style precision but change the epoch, then we've invented yet
another time format, which is probably worse for the world overall
than just stealing an existing one.
> > (B) Timezones: this is the hard one.
>
> This is the hard one indeed. Everything else is mostly window
> dressing, and is just a matter of taste and not really of bugs or
> functionality. The real fundamental question is how we'll deal with
> converting this into local time. The choice is either to use
> localtime() (limiting us to a 32 bit time_t's worth of useful date
> ranges), or to parse the timezone files ourselves (exposing us to a
> lot of work and bugs). It appears that most people who deeply care
> about dates parse the files themselves (even on .Net, which already
> ships its own timezone files), if only so that they know what timezone
> and leap second rules are in effect, and know that they're consistent
> between installations. A possible workaround for the extra work and
> bugs is to steal the localtime() implementation from 64 bit glibc.
Stealing glibc's parser is Just Too Much Work (tm) and is the sort of
thing Microsoft would do because they can afford to hire hordes of
programmers to keep things up to date. It would be generally
non-portable (timezone files are different everywhere, AFAIK, when
they're even files, and I doubt glibc even guarantees that their
format won't change).
> As for dealing with years > 2038, Mono's hypothesized solution would
> be mostly fine, as you say, but only if it also correctly fudged the
> leap years.
What I meant specifically was this: write a completely separate
function for converting the "date" part of the timestamp into a date +
seconds + subseconds. Then, use that date in 2038, plus seconds, and
pass it to localtime, thus obtaining the time of day. Then add
subseconds back in if you need the more precise time.
We should be able to steal the date-conversion function from
somewhere. That part is relatively easy and, while slightly error
prone, not timezone-specific or constantly changing. So simply
including our own implementation wouldn't scare me.
Of course, if time_t is 64 bits, just call localtime()/gmtime() directly.
Note that I seem to have lied slightly about mono: it doesn't do DST
*at all* after 2038, which implies that it probably sets the entire
date to 0, then does localtime() on just the time-of-day part.
> Also, I'd feel a little bit uneasy releasing a
> library that knowingly produces different results between 32- and
> 64-bit versions,
Admittedly, but I'd expect that as time goes on, glibc will add a
function to do this more sanely anyhow. When that happens, we can
just use that.
While I care about mortgages that mature in 2039, I'm not so concerned
about whether they mature at precisely 9am or 10am in 2039.
Have fun,
Avery