On Mon, Jan 30, 2012 at 11:24:33AM -0500, Ricardo Signes wrote:
> * Paul LeoNerd Evans <
leo...@leonerd.org.uk> [2012-01-28T10:41:44]
> I have updated the smoke-me branch on perl5.git
I really don't think that anyone has actually played with this. :-(
I've managed to spot 1 problem from code inspection, and while thinking about
it spotted another. While writing tests to demonstrate it, I've found more.
1) The export code isn't working. It's supposed to be in @EXPORT_OK:
# Symbols that should not be exported by default because they are recently
# added. It would upset too much of CPAN to export these by default
delete $export{$_} and push @EXPORT_OK, $_ for qw(strptime);
yet
$ ./perl -Ilib -e 'use POSIX "strptime"'
"strptime" is not exported by the POSIX module
Can't continue after import errors at lib/POSIX.pm line 30
BEGIN failed--compilation aborted at -e line 1.
2) The defaults are un-Perlish, and leak through. If I parse "1972-09-21"
with "%Y-%m-%d" I expect to get a result on the 21st September 1972.
What actually happens:
$ ./perl -Ilib -MPOSIX -le 'print scalar gmtime POSIX::mktime(POSIX::strptime("1972-09-21", "%Y-%m-%d"))'
Wed Sep 20 21:58:59 1972
I see from the code examples and documentation that I'm expected to supply
default values for seconds, minutes and hours like this:
$ ./perl -Ilib -MPOSIX -le 'print scalar gmtime POSIX::mktime(POSIX::strptime("1972-09-21", "%Y-%m-%d", 0, 0, 0))'
Wed Sep 20 23:00:00 1972
Why? Nearly every other default in Perl is 0. Why is this -1?
And why do these defaults leak out into the return value?
[Yes, I can see that it's an artefact of the implementation. I shouldn't
need to know that. And it's not documented.]
If we ship the current interface, we're going to get bug reports about it.
3) I like the interface of using pos, and I think that passing a reference is
a good way to implement this. However, the code assumes that any reference
is a reference to a scalar:
$ ./perl -Ilib -MPOSIX -wle 'print foreach POSIX::strptime({}, "%Y-%m-%d")'
Use of uninitialized value in subroutine entry at -e line 1.
$ ./perl -Ilib -MPOSIX -wle 'print foreach POSIX::strptime(qr/Pie/, "%Y-%m-%d")'
$
with the side effect that an object with overloaded stringification is
treaded as a reference, not a plain string. A fix for that is probably:
diff --git a/ext/POSIX/POSIX.xs b/ext/POSIX/POSIX.xs
index 65fd931..ed5b041 100644
--- a/ext/POSIX/POSIX.xs
+++ b/ext/POSIX/POSIX.xs
@@ -1873,7 +1873,7 @@ strptime(str, fmt, sec=-1, min=-1, hour=-1, mday=-1, mon=-1, year=-1, wday=-1, y
tm.tm_yday = yday;
tm.tm_isdst = isdst;
- if(SvROK(str)) {
+ if(SvROK(str) && !SvOBJECT(SvRV(str))) {
strref = SvRV(str);
str_base = str_c = SvPV_nolen(strref);
but that doesn't catch the cases of hash references or regex reference.
I suspect that a check of SvTYPE() is also needed. (Hashes are not SvPOK()
but regex objects *are*, so SvPOK(SvRV(str)) isn't going to be good enough
to weed out regexs)
4) The code is not UTF-8 safe. Totally unsafe:
else {
str_c = SvPV_nolen(str);
}
remains = strptime(str_c, SvPV_nolen(fmt), &tm);
a) If the pattern or the input string have Latin 1 characters, but one is
encoded as UTF-8 and one is not, they don't match
b) If one is Unicode, and the other is the corresponding octet sequence,
they *do* match, when they should not.
The usual approach in the core in these cases is *not* to upgrade the
input scalar. Instead, to take a copy, upgrade that, work using it,
and free it at the end.
5) I suspect that the code to write back to pos() is not UTF-8 safe either.
If strptime() is returning a new position in bytes, then in writing the
pos() magic one needs not be sure that it is written back correctly.
But I can't see how to test that until the previous problem is resolved.
6) Please could you spell check your documentation.
I spotted 1 typo that a spellchecker would have caught.
Which immediately makes me wonder what else lurks:
=item strptime
Parse date and time information from a string. Returns a 9-element list of
time and date information.
Synopsis:
(sec, min, hour, mday, mon, year, wday, yday, isdst) =
strptime(str, fmt, [@init])
Optionally, an existing 9-element list of time and date informaiton may be
passed to initialise the structure before parsing. Any fields not parsed by
the format will be left as initialised.
The month (C<mon>), weekday (C<wday>), and yearday (C<yday>) begin at zero.
I.e. January is 0, not 1; Sunday is 0, not 1; January 1st is 0, not 1. The
year (C<year>) is given in years since 1900. I.e., the year 1995 is 95; the
year 2001 is 101. Consult your system's C<strftime()> manpage for details
about these and the other arguments.
If you want your code to be portable, your format (C<fmt>) argument
should use only the conversion specifiers defined by the ANSI C
standard (C89, to play safe). These are C<aAbBcdHIjmMpSUwWxXyYZ%>.
But even then, the results of some of the conversion specifiers are
non-portable. For example, the specifiers C<aAbBcpZ> change according
to the locale settings of the user, and both how to set locales (the
locale names) and what output to expect are non-standard.
The specifier C<c> changes according to the timezone settings of the
user and the timezone computation rules of the operating system.
The C<Z> specifier is notoriously unportable since the names of
timezones are non-standard. Sticking to the numeric specifiers is the
safest route.
The return values are made consistent as though by calling C<mktime()>
before they are returned, if all of the C<mday>, C<mon> and C<year> fields
are valid.
The string for Tuesday, December 12, 1995.
@time = POSIX::strptime( "Tuesday, December 12, 1995",
"%A, %B %d, %Y", 0, 0, 0 );
local $, = ", ";
print @time, "\n";
If the input string is not valid, or not consumed completely by the format,
then an error occurs; indicated by C<strptime()> returning an empty list.
By passing a reference to a string as the value to parse, C<strptime()> will
use the C<pos()> position to start the parse, and to return the position where
it finished. In this situation, it is not an error if the entire input is not
consumed by the format.
$str = "18:05:29 is the time";
@time = POSIX::strptime( \$str, "%H:%M:%S" );
local $, = ", ";
print @time[0..2], "\n";
print pos($str) . "\n";
=cut
Tests appended for most of the above. All should pass, 13 currently fail.
Nicholas Clark
#!./perl -w
BEGIN {
unshift @INC, 'lib';
}
use strict;
require POSIX;
use Test::More;
sub date {
my $sep = shift // '-';
return join $sep, '1972', '09', '21';
}
sub fmt {
my $sep = shift // '-';
return join $sep, '%Y', '%m', '%d';
}
my @want = (0, 0, 0, 21, 9-1, 1972-1900, 4, 264, 1);
is_deeply([POSIX::strptime(date(), fmt(), 0, 0, 0)], \@want, 'sanity check');
is_deeply([POSIX::strptime(date(), fmt())], \@want, 'defaults expected');
{
package BEEDOOP;
use overload '""' => sub { $_[0][0] };
sub new {
my ($class, $val) = @_;
bless [$val], $class;
}
}
{
my $string = BEEDOOP->new(date());
is_deeply([POSIX::strptime("$string", fmt(), 0, 0, 0)], \@want,
'stringified overloaded object');
is_deeply([POSIX::strptime($string, fmt(), 0, 0, 0)], \@want,
'overloaded object');
}
foreach my $ord (121, 255, 375) {
my $chr = chr $ord;
my $chr_U = $chr;
utf8::upgrade($chr_U);
my $octets = $chr;
utf8::encode($octets);
is_deeply([POSIX::strptime(date($chr), fmt($chr), 0, 0, 0)], \@want,
"chr $ord");
is_deeply([POSIX::strptime(date($chr), fmt($chr_U), 0, 0, 0)], \@want,
"chr $ord, format upgraded");
is_deeply([POSIX::strptime(date($chr_U), fmt($chr), 0, 0, 0)], \@want,
"chr $ord, input upgraded");
is_deeply([POSIX::strptime(date($chr_U), fmt($chr_U), 0, 0, 0)], \@want,
"chr $ord, both upgraded");
is_deeply([POSIX::strptime(date($chr), fmt($octets), 0, 0, 0)], [],
"chr $ord, format as octets");
is_deeply([POSIX::strptime(date($octets), fmt($chr), 0, 0, 0)], [],
"chr $ord, input as octets");
is_deeply([POSIX::strptime(date($chr_U), fmt($octets), 0, 0, 0)], [],
"chr $ord, string upgraded, format as octets");
is_deeply([POSIX::strptime(date($octets), fmt($chr_U), 0, 0, 0)], [],
"chr $ord, input as octets, format upgraded");
}
done_testing();