Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

producing time strings

70 views
Skip to first unread message

Seb

unread,
Mar 21, 2012, 2:41:29 PM3/21/12
to
Hi,

As part of a larger program, I'm constructing time strings from input
that looks like this:

"YYYY-MM-DD HH:MM:SS"

including the double quotes. I need the output to look the same,
without the double quotes. I cannot just substring the input, removing
the double quotes, because the input is messy and may contain any
character instead of the dashes and colons. So I wrote a function, to
actually construct proper times and format the output in a more
consistent manner (test.awk):

---<--------------------cut here---------------start------------------->---
{print fix_time($0)}

function fix_time(tstr) { # tstr=time string field
Y=substr(tstr, 2, 4)
M=substr(tstr, 7, 2)
D=substr(tstr, 10, 2)
h=substr(tstr, 13, 2)
m=substr(tstr, 16, 2)
s=substr(tstr, 19, 2)
t=mktime(sprintf("%s %s %s %s %s %s", Y, M, D, h, m, s))
return strftime("%Y-%m-%d %H:%M:%S", t)
}
---<--------------------cut here---------------end--------------------->---

I was happy with this, until I ran into this:

$ echo '"2012-03-11 02:48:00"' | awk -f test.awk
2012-03-11 03:48:00

What caused it to think it's an hour later?! Other stamps show no
problems:

$ echo '"2012-03-11 01:48:00"' | awk -f test.awk
2012-03-11 01:48:00
$ echo '"2012-03-11 00:48:00"' | awk -f test.awk
2012-03-11 00:48:00
$ echo '"2012-03-11 12:00:00"' | awk -f test.awk
2012-03-11 12:00:00

Cheers,

--
Seb

Luuk

unread,
Mar 21, 2012, 3:43:05 PM3/21/12
to
~/tmp> echo '"2012-03-11 02:48:00"' | awk -f test.awk
2012-03-11 02:48:00
~/tmp> echo '"2012-03-18 02:48:00"' | awk -f test.awk
2012-03-18 02:48:00
~/tmp> echo '"2012-03-25 02:48:00"' | awk -f test.awk
2012-03-25 03:48:00
~/tmp>

it has something to do with DST (daylight saving time)

What version of awk are you using?
(awk --version)
What timezone are you using?

Seb

unread,
Mar 21, 2012, 3:52:57 PM3/21/12
to
On Wed, 21 Mar 2012 20:43:05 +0100,
Luuk <lu...@invalid.lan> wrote:

[...]

> it has something to do with DST (daylight saving time)


> What version of awk are you using? (awk --version)

$ awk --version
GNU Awk 3.1.8
Copyright (C) 1989, 1991-2010 Free Software Foundation.


> What timezone are you using?

I'm in CDT (Central Daylight Savings Time). The shift only occurs with
the second hour, every other time is produced properly, so I don't see
how it could have anything to do with DST. I'm puzzled.

Thanks,

--
Seb

Seb

unread,
Mar 21, 2012, 4:08:46 PM3/21/12
to
On Wed, 21 Mar 2012 14:52:57 -0500,
Seb <spl...@gmail.com> wrote:

[...]

> I'm in CDT (Central Daylight Savings Time). The shift only occurs
> with the second hour, every other time is produced properly, so I
> don't see how it could have anything to do with DST. I'm puzzled.

$ echo '"2012-03-11 03:00:00"' | awk -f test.awk
2012-03-11 03:00:00
$ echo '"2012-03-11 02:00:00"' | awk -f test.awk
2012-03-11 03:00:00


--
Seb

Seb

unread,
Mar 21, 2012, 4:25:50 PM3/21/12
to
Hmm... daylight savings for Central time started on 2012-03-11, and the
problem only occurs for that particular date. Every other date is fine:

$ echo '"2012-03-10 02:00:00"' | awk -f test.awk
2012-03-10 02:00:00
$ echo '"2012-03-11 02:00:00"' | awk -f test.awk
2012-03-11 03:00:00
$ echo '"2012-03-12 02:00:00"' | awk -f test.awk
2012-03-12 02:00:00

So it seems to be a bug in this version. I guess I can work around it
like this:

$ echo '"2012-03-11 02:00:00"' | TZ=UTC awk -f test.awk
2012-03-11 02:00:00


--
Seb

Luuk

unread,
Mar 21, 2012, 4:28:27 PM3/21/12
to

Ed Morton

unread,
Mar 21, 2012, 5:26:21 PM3/21/12
to
Seb <spl...@gmail.com> wrote:

> On Wed, 21 Mar 2012 15:08:46 -0500,
> Seb <spl...@gmail.com> wrote:
>
> > On Wed, 21 Mar 2012 14:52:57 -0500,
> > Seb <spl...@gmail.com> wrote:
>
> > [...]
>
> >> I'm in CDT (Central Daylight Savings Time). The shift only occurs
> >> with the second hour, every other time is produced properly, so I
> >> don't see how it could have anything to do with DST. I'm puzzled.
>
> > $ echo '"2012-03-11 03:00:00"' | awk -f test.awk 2012-03-11 03:00:00 $
> > echo '"2012-03-11 02:00:00"' | awk -f test.awk 2012-03-11 03:00:00
>
> Hmm... daylight savings for Central time started on 2012-03-11, and the
> problem only occurs for that particular date. Every other date is fine:

It's the date plus the time that's the issue, not the date alone, see below.

>
> $ echo '"2012-03-10 02:00:00"' | awk -f test.awk
> 2012-03-10 02:00:00
> $ echo '"2012-03-11 02:00:00"' | awk -f test.awk
> 2012-03-11 03:00:00
> $ echo '"2012-03-12 02:00:00"' | awk -f test.awk
> 2012-03-12 02:00:00
>
> So it seems to be a bug in this version. I guess I can work around it
> like this:
>
> $ echo '"2012-03-11 02:00:00"' | TZ=UTC awk -f test.awk
> 2012-03-11 02:00:00
>

It's not a bug. The DST time change happens at 2am so there is no "2:00am" on a
DST day since when it's about to become 2:00am the clocks change to 3:00am. So
when you're passing 2:00am to mktime() you're asking mktime() to process
essentially an invalid date spec. I'm surprised mktime() didn't just return a
"-1" - THAT might be a bug.

Ed.




Posted using www.webuse.net

Anton Treuenfels

unread,
Mar 21, 2012, 9:43:22 PM3/21/12
to

"Ed Morton" <morto...@gmail.com> wrote in message
news:201203212...@webuse.net...
> It's not a bug. The DST time change happens at 2am so there is no "2:00am"
> on a
> DST day since when it's about to become 2:00am the clocks change to
> 3:00am. So
> when you're passing 2:00am to mktime() you're asking mktime() to process
> essentially an invalid date spec. I'm surprised mktime() didn't just
> return a
> "-1" - THAT might be a bug.

If there's no 2AM on that day, does that mean there's a bug in whatever
program produced the "2012-03-11 02:00:00" timestamp? Or was that just a
random test value?

I'm still rather puzzled about going to all the trouble of extracting the
strings of interest (which already have the values of interest),
re-converting them to a time format and then re-converting them again to
strings. It appears the numeric values always appear as desired, so if the
goal is simply to achieve a uniform output format, why not just create it
directly?

return( sprintf("%s-%s-%s %s:%s:%s", Y, M, D, h, m, s) )

Of course if you had done that we'd never have learned so much about
strftime() and Daylight Savings Time :)

- Anton Treuenfels

Seb

unread,
Mar 21, 2012, 11:09:20 PM3/21/12
to
On Wed, 21 Mar 2012 21:26:21 GMT,
"Ed Morton" <morto...@gmail.com> wrote:

[...]

> It's not a bug. The DST time change happens at 2am so there is no
> "2:00am" on a DST day since when it's about to become 2:00am the
> clocks change to 3:00am. So when you're passing 2:00am to mktime()
> you're asking mktime() to process essentially an invalid date
> spec. I'm surprised mktime() didn't just return a "-1" - THAT might be
> a bug.

I totally agree, if the time stamp doesn't exist (as in this case), then
mktime() should return an error. I just filed a bug report against the
Debian gawk package, and the maintainer suggested this too, consistent
with the behaviour of 'date':

$ TZ=US/Central date -R -d '2012-03-11 02:00:00'
date: invalid date `2012-03-11 02:00:00'


--
Seb

Seb

unread,
Mar 22, 2012, 9:44:17 AM3/22/12
to
On Wed, 21 Mar 2012 20:43:22 -0500,
"Anton Treuenfels" <teamt...@yahoo.com> wrote:

> "Ed Morton" <morto...@gmail.com> wrote in message
> news:201203212...@webuse.net...
>> It's not a bug. The DST time change happens at 2am so there is no
>> "2:00am" on a DST day since when it's about to become 2:00am the
>> clocks change to 3:00am. So when you're passing 2:00am to mktime()
>> you're asking mktime() to process essentially an invalid date
>> spec. I'm surprised mktime() didn't just return a "-1" - THAT might
>> be a bug.

> If there's no 2AM on that day, does that mean there's a bug in
> whatever program produced the "2012-03-11 02:00:00" timestamp? Or was
> that just a random test value?

That time/date does exist in UTC (no daylight savings), and the program
that produced it supplies its output in UTC, so setting the TZ variable
accordingly is probably a must.


> I'm still rather puzzled about going to all the trouble of extracting
> the strings of interest (which already have the values of interest),
> re-converting them to a time format and then re-converting them again
> to strings. It appears the numeric values always appear as desired, so
> if the goal is simply to achieve a uniform output format, why not just
> create it directly?

> return( sprintf("%s-%s-%s %s:%s:%s", Y, M, D, h, m, s) )

I was very tempted to do that, but I don't fully trust the input, so I
wanted to have some (easy) mechanism to spot any problems.


> Of course if you had done that we'd never have learned so much about
> strftime() and Daylight Savings Time :)

> - Anton Treuenfels

Thanks,

--
Seb

Ed Morton

unread,
Mar 22, 2012, 11:18:00 AM3/22/12
to
Seb <spl...@gmail.com> wrote:

> On Wed, 21 Mar 2012 20:43:22 -0500,
> "Anton Treuenfels" <teamt...@yahoo.com> wrote:
>
> > "Ed Morton" <morto...@gmail.com> wrote in message
> > news:201203212...@webuse.net...
> >> It's not a bug. The DST time change happens at 2am so there is no
> >> "2:00am" on a DST day since when it's about to become 2:00am the
> >> clocks change to 3:00am. So when you're passing 2:00am to mktime()
> >> you're asking mktime() to process essentially an invalid date
> >> spec. I'm surprised mktime() didn't just return a "-1" - THAT might
> >> be a bug.
>
> > If there's no 2AM on that day, does that mean there's a bug in
> > whatever program produced the "2012-03-11 02:00:00" timestamp? Or was
> > that just a random test value?
>
> That time/date does exist in UTC (no daylight savings), and the program
> that produced it supplies its output in UTC, so setting the TZ variable
> accordingly is probably a must.

I'd have thought you could just specify the no-DST flag on the mktime format
argument as documented in the manual
(http://www.gnu.org/software/gawk/manual/gawk.html#Time-Functions), but that has
no effect in the version of gawk I'm using (gawk 4.0.0):

$ gawk 'BEGIN{ t=mktime("2012 03 11 02 00 00 0"); print t "=" strftime("%a %b %e
%H:%M:%S %Z %Y",t) }'
-1=Wed Dec 31 17:59:59 CST 1969

$ TZ=UTC /opt/exp/bin/gawk 'BEGIN{ t=mktime("2012 03 11 02 00 00"); print t "="
strftime("%a %b %e %H:%M:%S %Z %Y",t) }'
1331431200=Sun Mar 11 02:00:00 UTC 2012

May be another bug?

Ed.


Posted using www.webuse.net

Seb

unread,
Mar 22, 2012, 12:02:50 PM3/22/12
to
On Thu, 22 Mar 2012 15:18:00 GMT,
"Ed Morton" <morto...@gmail.com> wrote:

[...]

> I'd have thought you could just specify the no-DST flag on the mktime
> format argument as documented in the manual
> (http://www.gnu.org/software/gawk/manual/gawk.html#Time-Functions),
> but that has no effect in the version of gawk I'm using (gawk 4.0.0):

> $ gawk 'BEGIN{ t=mktime("2012 03 11 02 00 00 0"); print t "="
> strftime("%a %b %e %H:%M:%S %Z %Y",t) }'
> -1=Wed Dec 31 17:59:59 CST 1969

Are you in Central Time as well? If so, it seems as if mktime() does
return -1 as the manual says (it doesn't in my gawk: 3.1.8), in your
version, so this might have been fixed already.


> $ TZ=UTC /opt/exp/bin/gawk 'BEGIN{ t=mktime("2012 03 11 02 00 00");
> print t "=" strftime("%a %b %e %H:%M:%S %Z %Y",t) }'
> 1331431200=Sun Mar 11 02:00:00 UTC 2012

> May be another bug?

I don't think so because the DST flag would have no meaning in UTC (my
gawk: 3.1.8):

$ TZ=UTC gawk 'BEGIN{ t=mktime("2012 03 11 02 00 00 1"); print t "=" strftime("%a %b %e %H:%M:%S %Z %Y",t) }'
1331431200=Sun Mar 11 02:00:00 UTC 2012
$ TZ=UTC gawk 'BEGIN{ t=mktime("2012 03 11 02 00 00 0"); print t "=" strftime("%a %b %e %H:%M:%S %Z %Y",t) }'
1331431200=Sun Mar 11 02:00:00 UTC 2012
$ TZ=UTC gawk 'BEGIN{ t=mktime("2012 03 11 02 00 00 -1"); print t "=" strftime("%a %b %e %H:%M:%S %Z %Y",t) }'
1331431200=Sun Mar 11 02:00:00 UTC 2012


--
Seb

Hermann Peifer

unread,
Mar 22, 2012, 12:08:43 PM3/22/12
to Ed Morton
On 22/03/2012 16:18, Ed Morton wrote:
>
> I'd have thought you could just specify the no-DST flag on the mktime format
> argument as documented in the manual
> (http://www.gnu.org/software/gawk/manual/gawk.html#Time-Functions), but that has
> no effect in the version of gawk I'm using (gawk 4.0.0):
>

I actually had the same thought earlier today and can confirm that
adding a DST flag value of 0 to the mktime format did not have the
effect that you (and I) expected: neither with GNU Awk 4.0.0l, nor with
GNU Awk 3.1.7

Setting the DST flag value to 1 does have an effect, but again it is not
the effect I would have expected.

Hermann

$ TZ=US/Central gawk 'BEGIN{ t=mktime("2012 03 11 02 00 00 -1"); print t
"=" strftime("%a %b %e %H:%M:%S %Z %Y",t) }'
1331452800=Sun Mar 11 03:00:00 CDT 2012

$ TZ=US/Central gawk 'BEGIN{ t=mktime("2012 03 11 02 00 00 0"); print t
"=" strftime("%a %b %e %H:%M:%S %Z %Y",t) }'
1331452800=Sun Mar 11 03:00:00 CDT 2012

$ TZ=US/Central gawk 'BEGIN{ t=mktime("2012 03 11 02 00 00 1"); print t
"=" strftime("%a %b %e %H:%M:%S %Z %Y",t) }'
1331449200=Sun Mar 11 01:00:00 CST 2012

Ed Morton

unread,
Mar 22, 2012, 2:11:56 PM3/22/12
to
Seb <spl...@gmail.com> wrote:

> On Thu, 22 Mar 2012 15:18:00 GMT,
> "Ed Morton" <morto...@gmail.com> wrote:
>
> [...]
>
> > I'd have thought you could just specify the no-DST flag on the mktime
> > format argument as documented in the manual
> > (http://www.gnu.org/software/gawk/manual/gawk.html#Time-Functions),
> > but that has no effect in the version of gawk I'm using (gawk 4.0.0):
>
> > $ gawk 'BEGIN{ t=mktime("2012 03 11 02 00 00 0"); print t "="
> > strftime("%a %b %e %H:%M:%S %Z %Y",t) }'
> > -1=Wed Dec 31 17:59:59 CST 1969
>
> Are you in Central Time as well?

Yes

If so, it seems as if mktime() does
> return -1 as the manual says (it doesn't in my gawk: 3.1.8), in your
> version, so this might have been fixed already.

Yes, I noticed that my result is different from yours so I suspect it has been
fixed.

>
>
> > $ TZ=UTC /opt/exp/bin/gawk 'BEGIN{ t=mktime("2012 03 11 02 00 00");
> > print t "=" strftime("%a %b %e %H:%M:%S %Z %Y",t) }'
> > 1331431200=Sun Mar 11 02:00:00 UTC 2012
>
> > May be another bug?
>
> I don't think so because the DST flag would have no meaning in UTC (my
> gawk: 3.1.8):
>
> $ TZ=UTC gawk 'BEGIN{ t=mktime("2012 03 11 02 00 00 1"); print t "="
strftime("%a %b %e %H:%M:%S %Z %Y",t) }'
> 1331431200=Sun Mar 11 02:00:00 UTC 2012
> $ TZ=UTC gawk 'BEGIN{ t=mktime("2012 03 11 02 00 00 0"); print t "="
strftime("%a %b %e %H:%M:%S %Z %Y",t) }'
> 1331431200=Sun Mar 11 02:00:00 UTC 2012
> $ TZ=UTC gawk 'BEGIN{ t=mktime("2012 03 11 02 00 00 -1"); print t "="
strftime("%a %b %e %H:%M:%S %Z %Y",t) }'
> 1331431200=Sun Mar 11 02:00:00 UTC 2012

but I think it should have meaning in mine, CST. Maybe I'm just misunderstanding
the purpose of that flag.

Ed.

Posted using www.webuse.net

Seb

unread,
Mar 22, 2012, 2:47:05 PM3/22/12
to
You're setting the time zone to UTC, so you're overriding your local CST
(well, CDT now). IIUC, the daylight savings time would only be needed
in very rare cases when the TZ variable is not set in your system (the
daylight savings are then implicit), or if one needs to override its
corresponding daylight savings settings.

I'm in CDT:

# mktime() should detect it's DST
$ gawk 'BEGIN{ t=mktime("2012 03 12 02 00 00 -1"); print t "=" strftime("%a %b %e %H:%M:%S %Z %Y",t) }'
1331535600=Mon Mar 12 02:00:00 CDT 2012
# Assume standard time, i.e. ignore the daylight savings
$ gawk 'BEGIN{ t=mktime("2012 03 12 02 00 00 0"); print t "=" strftime("%a %b %e %H:%M:%S %Z %Y",t) }'
1331539200=Mon Mar 12 03:00:00 CDT 2012
# Assume daylight savings, which is correct for this date
$ gawk 'BEGIN{ t=mktime("2012 03 12 02 00 00 1"); print t "=" strftime("%a %b %e %H:%M:%S %Z %Y",t) }'
1331535600=Mon Mar 12 02:00:00 CDT 2012


--
Seb

Geoff Clare

unread,
Mar 23, 2012, 9:50:06 AM3/23/12
to
Seb wrote:

> On Wed, 21 Mar 2012 21:26:21 GMT,
> "Ed Morton" <morto...@gmail.com> wrote:
>
> [...]
>
>> It's not a bug. The DST time change happens at 2am so there is no
>> "2:00am" on a DST day since when it's about to become 2:00am the
>> clocks change to 3:00am. So when you're passing 2:00am to mktime()
>> you're asking mktime() to process essentially an invalid date
>> spec. I'm surprised mktime() didn't just return a "-1" - THAT might be
>> a bug.
>
> I totally agree, if the time stamp doesn't exist (as in this case), then
> mktime() should return an error.

Presumably gawk just uses the C library mktime() function and doesn't
examine the broken-down time itself. In which case, the observed
behaviour is expected. The C mktime() function is specifically
required to accept out-of-range values in the broken-down time, so
that you can do things like add 1 to tm_mday to find out the time_t
value for "this time tomorrow". It only returns -1 if the value
it would need to return can't be represented in a time_t (and POSIX
requires it to set errno to EOVERFLOW when this happens, although
that's an additional requirement beyond the C standard).

> I just filed a bug report against the
> Debian gawk package, and the maintainer suggested this too, consistent
> with the behaviour of 'date':
>
> $ TZ=US/Central date -R -d '2012-03-11 02:00:00'
> date: invalid date `2012-03-11 02:00:00'

It's interesting that date does that, but I would have thought it's
more desirable for gawk mktime() to match the behaviour of C mktime()
than of date.

--
Geoff Clare <net...@gclare.org.uk>

0 new messages