Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

POSIX-strptime latest code

51 views
Skip to first unread message

Paul LeoNerd Evans

unread,
Jan 20, 2012, 9:06:36 AM1/20/12
to Perl 5 Porters
So as to be less dependent on a chain of other people get these smoked,
this work is now available at

https://github.com/leonerd/perl/tree/smoke-me/paul-evans/POSIX-strptime

Feel free to send me FAIL reports from it

--
Paul "LeoNerd" Evans

leo...@leonerd.org.uk
ICQ# 4135350 | Registered Linux# 179460
http://www.leonerd.org.uk/
signature.asc

Craig A. Berry

unread,
Jan 26, 2012, 7:29:12 PM1/26/12
to Paul LeoNerd Evans, Perl 5 Porters
On Fri, Jan 20, 2012 at 8:06 AM, Paul LeoNerd Evans
<leo...@leonerd.org.uk> wrote:
> So as to be less dependent on a chain of other people get these smoked,
> this work is now available at
>
>  https://github.com/leonerd/perl/tree/smoke-me/paul-evans/POSIX-strptime

Is there any difference between that and this:

http://perl5.git.perl.org/perl.git/tree/smoke-me/paul-evans/POSIX-strptime

?

The latter I have built on VMS and the only failure I see looks like:

not ok 17 - strptime() all date fields with passed time
# Failed test 'strptime() all date fields with passed time'
# at [.t]time.t line 78.
# Structures begin differing at:
# $got->[6] = '-1'
# $expected->[6] = '1'

If I'm counting right, this is tm_wday, and after a *very* quick
glance at the XS code, specifically this:

void
strptime(str, fmt, sec=-1, min=-1, hour=-1, mday=-1, mon=-1, year=-1,
wday=-1, yday=-1, isdst=-1)

it seems to me you're getting what you ask for. The test doesn't
specify wday on input, so it gets the default value on output, which
is -1. Why would it expect 1?

Let me know where I'm wrong or what other testing I can do.

Paul LeoNerd Evans

unread,
Jan 27, 2012, 10:11:20 AM1/27/12
to Craig A. Berry, Perl 5 Porters
On Thu, Jan 26, 2012 at 06:29:12PM -0600, Craig A. Berry wrote:
> >  https://github.com/leonerd/perl/tree/smoke-me/paul-evans/POSIX-strptime
>
> Is there any difference between that and this:
>
> http://perl5.git.perl.org/perl.git/tree/smoke-me/paul-evans/POSIX-strptime
>
> ?

It isn't intended so, no. I push to the former, and it gets copied to
the latter by someone else 'cause I don't have core commit access.

> The latter I have built on VMS and the only failure I see looks like:
>
> not ok 17 - strptime() all date fields with passed time
> # Failed test 'strptime() all date fields with passed time'
> # at [.t]time.t line 78.
> # Structures begin differing at:
> # $got->[6] = '-1'
> # $expected->[6] = '1'
>
> If I'm counting right, this is tm_wday, and after a *very* quick
> glance at the XS code, specifically this:

[6] is tm_wday yes.

> void
> strptime(str, fmt, sec=-1, min=-1, hour=-1, mday=-1, mon=-1, year=-1,
> wday=-1, yday=-1, isdst=-1)
>
> it seems to me you're getting what you ask for. The test doesn't
> specify wday on input, so it gets the default value on output, which
> is -1. Why would it expect 1?

The POSIX::strptime() function calls mktime() on the result before it
returns the broken-down fields back to the user. The intended effect
here is that because all of mday/mon/year are defined, mktime can use
that to calculate the wday and yday fields. It does that for me on
Linux, for example. The point of my smoke-me branch was to compare
behaviour on other OSes to see what differences there are. It may be
that VMS's mktime behaviour differs here.

Is it possible you could litter the code with a few fprintfs to stderr,
and we'll see what the fields look like in various places, and also the
result of the mktime() call? Perhaps there's some differences in
behaviour here.
signature.asc

Craig A. Berry

unread,
Jan 27, 2012, 11:33:00 AM1/27/12
to Paul LeoNerd Evans, Perl 5 Porters
On Fri, Jan 27, 2012 at 9:11 AM, Paul LeoNerd Evans
<leo...@leonerd.org.uk> wrote:


> The POSIX::strptime() function calls mktime() on the result before it
> returns the broken-down fields back to the user. The intended effect
> here is that because all of mday/mon/year are defined, mktime can use
> that to calculate the wday and yday fields. It does that for me on
> Linux, for example. The point of my smoke-me branch was to compare
> behaviour on other OSes to see what differences there are. It may be
> that VMS's mktime behaviour differs here.
>
> Is it possible you could litter the code with a few fprintfs to stderr,
> and we'll see what the fields look like in various places, and also the
> result of the mktime() call? Perhaps there's some differences in
> behaviour here.

It was easier to create a reproducer in pure C:

$ type strptime_test.c
#include <stdio.h>
#include <time.h>

int main(void)
{
struct tm *timeptr,result;


result.tm_sec = 1;
result.tm_min = 2;
result.tm_hour = 3;
result.tm_mday = 4;
result.tm_mon = 5;
result.tm_year = 6;
result.tm_wday = -1;
result.tm_yday = -1;
result.tm_isdst = -1;

if(strptime("12:34:56", "%H:%M:%S", &result) == NULL)
perror("strptime failed");
else
{
if(mktime(&result) == (time_t)-1)
perror("mktime failed");

printf(" tm_sec: %d\n",result.tm_sec);
printf(" tm_min: %d\n",result.tm_min);
printf(" tm_hour: %d\n",result.tm_hour);
printf(" tm_mday: %d\n",result.tm_mday);
printf(" tm_mon: %d\n",result.tm_mon);
printf(" tm_year: %d\n",result.tm_year);
printf(" tm_wday: %d\n",result.tm_wday);
printf(" tm_yday: %d\n",result.tm_yday);
printf("tm_isdst: %d\n",result.tm_isdst);
}

return 0;
}
$ cc strptime_test
$ link strptime_test
$ run strptime_test
mktime failed: error 0
tm_sec: 56
tm_min: 34
tm_hour: 12
tm_mday: 4
tm_mon: 5
tm_year: 6
tm_wday: -1
tm_yday: -1
tm_isdst: -1

So mktime fails but does not set errno. Experimentation indicates
that it doesn't like 1906 as a year. If I change the year from 6 to
106 (i.e., 2006) it succeeds. Since mktime returns a time_t value,
and since by default (when __SIGNED_INT_TIME_T has not been defined)
time_t is an unsigned long int, it seems to me mktime() is correctly
telling us that it cannot represent the time given to it in
(positive) seconds since the epoch (i.e., 1906 is before 1970).

I do note you're not checking the return value of mktime(), which
might be a good idea. Other than that, changing the test to use a
valid year may be the only thing needed.

Paul LeoNerd Evans

unread,
Jan 28, 2012, 9:45:06 AM1/28/12
to Craig A. Berry, Perl 5 Porters
On Fri, Jan 27, 2012 at 10:33:00AM -0600, Craig A. Berry wrote:
> It was easier to create a reproducer in pure C:

Ah yes, often easier. Cuts out the middle-man :)

> So mktime fails but does not set errno. Experimentation indicates
> that it doesn't like 1906 as a year. If I change the year from 6 to
> 106 (i.e., 2006) it succeeds. Since mktime returns a time_t value,
> and since by default (when __SIGNED_INT_TIME_T has not been defined)
> time_t is an unsigned long int, it seems to me mktime() is correctly
> telling us that it cannot represent the time given to it in
> (positive) seconds since the epoch (i.e., 1906 is before 1970).

Ahhh.. That does make a lot of sense now, yes. Oops. I should have
thought of pre-1970 dates. Though, slightly annoying that mktime()
doesn't set errno; I'd have coped for EINVAL or maybe ERANGE.

> I do note you're not checking the return value of mktime(), which
> might be a good idea. Other than that, changing the test to use a
> valid year may be the only thing needed.

Ah yes. I should do that indeed. That said, not sure how to handle this
cornercase of mktime() not setting errno, but I guess there's not a lot
I can do about that.

I'll have a hack of the unit tests and try again.

Thanks for the help there, and I'll let you when there's a new version
ready.
signature.asc

Paul LeoNerd Evans

unread,
Jan 28, 2012, 10:41:44 AM1/28/12
to Craig A. Berry, Perl 5 Porters
On Sat, Jan 28, 2012 at 02:45:06PM +0000, Paul LeoNerd Evans wrote:
> Thanks for the help there, and I'll let you when there's a new version
> ready.

Should now be ready on my newly-recreated branch

https://github.com/leonerd/perl/tree/smoke-me/paul-evans/POSIX-strptime
signature.asc

Eric Brine

unread,
Jan 28, 2012, 6:43:41 PM1/28/12
to Paul LeoNerd Evans, Craig A. Berry, Perl 5 Porters
On Sat, Jan 28, 2012 at 9:45 AM, Paul LeoNerd Evans <leo...@leonerd.org.uk> wrote:
> it seems to me mktime() is correctly
> telling us that it cannot represent  the time given to it in
> (positive) seconds since the epoch (i.e., 1906 is before 1970).

Ahhh.. That does make a lot of sense now, yes. Oops. I should have
thought of pre-1970 dates. Though, slightly annoying that mktime()
doesn't set errno; I'd have coped for EINVAL or maybe ERANGE.

On the linux box whose man page I checked, there's no mention of mktime ever setting errno on error. (If it did set errno, a list of possible error numbers would be listed.) Presumably, invalid arguments is the only possible error.

- Eric

Craig A. Berry

unread,
Jan 28, 2012, 6:51:12 PM1/28/12
to Eric Brine, Paul LeoNerd Evans, Perl 5 Porters
POSIX says that it *may* set errno on error, but the only error it
mentions explicitly is EOVERFLOW:

http://pubs.opengroup.org/onlinepubs/007904875/functions/mktime.html

Eric Brine

unread,
Jan 28, 2012, 7:48:53 PM1/28/12
to Craig A. Berry, Paul LeoNerd Evans, Perl 5 Porters
On Sat, Jan 28, 2012 at 6:51 PM, Craig A. Berry <craig....@gmail.com> wrote:
POSIX says that it *may* set errno on error,

If that means "POSIX systems may have a mktime that sets errno.", we can only count on errno after mktime on some systems.

If that means "POSIX systems have a mktime that may set errno." we can never count on errno after mktime.

but the only error it mentions explicitly is EOVERFLOW:

So there's only one error, and that's invalid arguments, and the arguments are invalid because they would cause an overflow.

http://pubs.opengroup.org/onlinepubs/007904875/functions/mktime.html 

The way I read that, mktime returning -1 indicates an EOVERFLOW error occurred, although errno may not actually be set to that error.

If you wanted to use perror, you could do

         if(mktime(&result) == (time_t)-1) {
             errno = EOVERFLOW;
             perror("mktime failed");
         }

- Eric

Nicholas Clark

unread,
Feb 1, 2012, 9:50:55 AM2/1/12
to Ricardo Signes, perl5-...@perl.org
On Mon, Jan 30, 2012 at 11:24:33AM -0500, Ricardo Signes wrote:
> * Paul LeoNerd Evans <leo...@leonerd.org.uk> [2012-01-28T10:41:44]
> > On Sat, Jan 28, 2012 at 02:45:06PM +0000, Paul LeoNerd Evans wrote:
> > > Thanks for the help there, and I'll let you when there's a new version
> > > ready.
> >
> > Should now be ready on my newly-recreated branch
> >
> > https://github.com/leonerd/perl/tree/smoke-me/paul-evans/POSIX-strptime
>
> I have updated the smoke-me branch on perl5.git

I really don't think that anyone has actually played with this. :-(

I've managed to spot 1 problem from code inspection, and while thinking about
it spotted another. While writing tests to demonstrate it, I've found more.

1) The export code isn't working. It's supposed to be in @EXPORT_OK:

# Symbols that should not be exported by default because they are recently
# added. It would upset too much of CPAN to export these by default
delete $export{$_} and push @EXPORT_OK, $_ for qw(strptime);

yet

$ ./perl -Ilib -e 'use POSIX "strptime"'
"strptime" is not exported by the POSIX module
Can't continue after import errors at lib/POSIX.pm line 30
BEGIN failed--compilation aborted at -e line 1.



2) The defaults are un-Perlish, and leak through. If I parse "1972-09-21"
with "%Y-%m-%d" I expect to get a result on the 21st September 1972.
What actually happens:

$ ./perl -Ilib -MPOSIX -le 'print scalar gmtime POSIX::mktime(POSIX::strptime("1972-09-21", "%Y-%m-%d"))'
Wed Sep 20 21:58:59 1972

I see from the code examples and documentation that I'm expected to supply
default values for seconds, minutes and hours like this:

$ ./perl -Ilib -MPOSIX -le 'print scalar gmtime POSIX::mktime(POSIX::strptime("1972-09-21", "%Y-%m-%d", 0, 0, 0))'
Wed Sep 20 23:00:00 1972

Why? Nearly every other default in Perl is 0. Why is this -1?
And why do these defaults leak out into the return value?
[Yes, I can see that it's an artefact of the implementation. I shouldn't
need to know that. And it's not documented.]

If we ship the current interface, we're going to get bug reports about it.



3) I like the interface of using pos, and I think that passing a reference is
a good way to implement this. However, the code assumes that any reference
is a reference to a scalar:

$ ./perl -Ilib -MPOSIX -wle 'print foreach POSIX::strptime({}, "%Y-%m-%d")'
Use of uninitialized value in subroutine entry at -e line 1.
$ ./perl -Ilib -MPOSIX -wle 'print foreach POSIX::strptime(qr/Pie/, "%Y-%m-%d")'
$

with the side effect that an object with overloaded stringification is
treaded as a reference, not a plain string. A fix for that is probably:

diff --git a/ext/POSIX/POSIX.xs b/ext/POSIX/POSIX.xs
index 65fd931..ed5b041 100644
--- a/ext/POSIX/POSIX.xs
+++ b/ext/POSIX/POSIX.xs
@@ -1873,7 +1873,7 @@ strptime(str, fmt, sec=-1, min=-1, hour=-1, mday=-1, mon=-1, year=-1, wday=-1, y
tm.tm_yday = yday;
tm.tm_isdst = isdst;

- if(SvROK(str)) {
+ if(SvROK(str) && !SvOBJECT(SvRV(str))) {
strref = SvRV(str);

str_base = str_c = SvPV_nolen(strref);


but that doesn't catch the cases of hash references or regex reference.
I suspect that a check of SvTYPE() is also needed. (Hashes are not SvPOK()
but regex objects *are*, so SvPOK(SvRV(str)) isn't going to be good enough
to weed out regexs)



4) The code is not UTF-8 safe. Totally unsafe:


else {
str_c = SvPV_nolen(str);
}

remains = strptime(str_c, SvPV_nolen(fmt), &tm);

a) If the pattern or the input string have Latin 1 characters, but one is
encoded as UTF-8 and one is not, they don't match
b) If one is Unicode, and the other is the corresponding octet sequence,
they *do* match, when they should not.

The usual approach in the core in these cases is *not* to upgrade the
input scalar. Instead, to take a copy, upgrade that, work using it,
and free it at the end.

5) I suspect that the code to write back to pos() is not UTF-8 safe either.
If strptime() is returning a new position in bytes, then in writing the
pos() magic one needs not be sure that it is written back correctly.
But I can't see how to test that until the previous problem is resolved.



6) Please could you spell check your documentation.
I spotted 1 typo that a spellchecker would have caught.
Which immediately makes me wonder what else lurks:

=item strptime

Parse date and time information from a string. Returns a 9-element list of
time and date information.

Synopsis:

(sec, min, hour, mday, mon, year, wday, yday, isdst) =
strptime(str, fmt, [@init])

Optionally, an existing 9-element list of time and date informaiton may be
passed to initialise the structure before parsing. Any fields not parsed by
the format will be left as initialised.

The month (C<mon>), weekday (C<wday>), and yearday (C<yday>) begin at zero.
I.e. January is 0, not 1; Sunday is 0, not 1; January 1st is 0, not 1. The
year (C<year>) is given in years since 1900. I.e., the year 1995 is 95; the
year 2001 is 101. Consult your system's C<strftime()> manpage for details
about these and the other arguments.

If you want your code to be portable, your format (C<fmt>) argument
should use only the conversion specifiers defined by the ANSI C
standard (C89, to play safe). These are C<aAbBcdHIjmMpSUwWxXyYZ%>.
But even then, the results of some of the conversion specifiers are
non-portable. For example, the specifiers C<aAbBcpZ> change according
to the locale settings of the user, and both how to set locales (the
locale names) and what output to expect are non-standard.
The specifier C<c> changes according to the timezone settings of the
user and the timezone computation rules of the operating system.
The C<Z> specifier is notoriously unportable since the names of
timezones are non-standard. Sticking to the numeric specifiers is the
safest route.

The return values are made consistent as though by calling C<mktime()>
before they are returned, if all of the C<mday>, C<mon> and C<year> fields
are valid.

The string for Tuesday, December 12, 1995.

@time = POSIX::strptime( "Tuesday, December 12, 1995",
"%A, %B %d, %Y", 0, 0, 0 );

local $, = ", ";
print @time, "\n";

If the input string is not valid, or not consumed completely by the format,
then an error occurs; indicated by C<strptime()> returning an empty list.

By passing a reference to a string as the value to parse, C<strptime()> will
use the C<pos()> position to start the parse, and to return the position where
it finished. In this situation, it is not an error if the entire input is not
consumed by the format.

$str = "18:05:29 is the time";
@time = POSIX::strptime( \$str, "%H:%M:%S" );
local $, = ", ";
print @time[0..2], "\n";
print pos($str) . "\n";

=cut


Tests appended for most of the above. All should pass, 13 currently fail.

Nicholas Clark

#!./perl -w
BEGIN {
unshift @INC, 'lib';
}

use strict;
require POSIX;
use Test::More;

sub date {
my $sep = shift // '-';
return join $sep, '1972', '09', '21';
}

sub fmt {
my $sep = shift // '-';
return join $sep, '%Y', '%m', '%d';
}

my @want = (0, 0, 0, 21, 9-1, 1972-1900, 4, 264, 1);

is_deeply([POSIX::strptime(date(), fmt(), 0, 0, 0)], \@want, 'sanity check');
is_deeply([POSIX::strptime(date(), fmt())], \@want, 'defaults expected');

{
package BEEDOOP;
use overload '""' => sub { $_[0][0] };
sub new {
my ($class, $val) = @_;
bless [$val], $class;
}
}

{
my $string = BEEDOOP->new(date());
is_deeply([POSIX::strptime("$string", fmt(), 0, 0, 0)], \@want,
'stringified overloaded object');
is_deeply([POSIX::strptime($string, fmt(), 0, 0, 0)], \@want,
'overloaded object');
}

foreach my $ord (121, 255, 375) {
my $chr = chr $ord;
my $chr_U = $chr;
utf8::upgrade($chr_U);
my $octets = $chr;
utf8::encode($octets);

is_deeply([POSIX::strptime(date($chr), fmt($chr), 0, 0, 0)], \@want,
"chr $ord");
is_deeply([POSIX::strptime(date($chr), fmt($chr_U), 0, 0, 0)], \@want,
"chr $ord, format upgraded");
is_deeply([POSIX::strptime(date($chr_U), fmt($chr), 0, 0, 0)], \@want,
"chr $ord, input upgraded");
is_deeply([POSIX::strptime(date($chr_U), fmt($chr_U), 0, 0, 0)], \@want,
"chr $ord, both upgraded");

is_deeply([POSIX::strptime(date($chr), fmt($octets), 0, 0, 0)], [],
"chr $ord, format as octets");
is_deeply([POSIX::strptime(date($octets), fmt($chr), 0, 0, 0)], [],
"chr $ord, input as octets");
is_deeply([POSIX::strptime(date($chr_U), fmt($octets), 0, 0, 0)], [],
"chr $ord, string upgraded, format as octets");
is_deeply([POSIX::strptime(date($octets), fmt($chr_U), 0, 0, 0)], [],
"chr $ord, input as octets, format upgraded");
}

done_testing();

Paul LeoNerd Evans

unread,
Feb 1, 2012, 3:37:17 PM2/1/12
to Nicholas Clark, Perl 5 Porters, Ricardo Signes
Thanks for these comments. I'm taking a look at most of them now, but
while I'm working on those I do have one thought:

On Wed, Feb 01, 2012 at 02:50:55PM +0000, Nicholas Clark wrote:
> a) If the pattern or the input string have Latin 1 characters, but one is
> encoded as UTF-8 and one is not, they don't match
> b) If one is Unicode, and the other is the corresponding octet sequence,
> they *do* match, when they should not.
>
> The usual approach in the core in these cases is *not* to upgrade the
> input scalar. Instead, to take a copy, upgrade that, work using it,
> and free it at the end.

If the string is in UTF-8 and the format is not, then it's simple enough
to upgrade (a local mortal copy of) the string.

If the string is not in UTF-8 but the format is, then would it suffice
simply to attempt to downgrade the format back down again? If this
downgrade fails, then surely the format must have contained some
character that the string cannot possibly have, so it will never match?

This would mean that, in all cases, the UTF-8ness of string and format
now match, and thus we can call strptime() on the result.

> 5) I suspect that the code to write back to pos() is not UTF-8 safe either.
> If strptime() is returning a new position in bytes, then in writing the
> pos() magic one needs not be sure that it is written back correctly.
> But I can't see how to test that until the previous problem is resolved.

... having the further upshot that because we never regraded 'str' up or
down, its pos() magic continues to correctly refer to the byte offsets
we're working with on the C layer, either in (raw) chars or UTF-8 bytes,
as appropriate.


This is all based on the assumption that downgrading the format is the
correct thing to do. Is this a valid assumption, or do I have to go the
long way around of upgrading either to UTF-8 if only one is, and having
to mangle pos() offsets between char and byte counts and back again?
signature.asc

Paul LeoNerd Evans

unread,
Feb 1, 2012, 3:38:41 PM2/1/12
to Nicholas Clark, Perl 5 Porters, Ricardo Signes
On Wed, Feb 01, 2012 at 08:37:17PM +0000, Paul LeoNerd Evans wrote:
> If the string is in UTF-8 and the format is not, then it's simple enough
> to upgrade (a local mortal copy of) the string.
GAH: I meant "upgrade a copy of the format" here. Sorry :)
signature.asc

Nicholas Clark

unread,
Feb 1, 2012, 4:27:26 PM2/1/12
to Paul LeoNerd Evans, Ricardo Signes, Perl 5 Porters
On Wed, Feb 01, 2012 at 08:37:17PM +0000, Paul LeoNerd Evans wrote:
> Thanks for these comments. I'm taking a look at most of them now, but
> while I'm working on those I do have one thought:
>
> On Wed, Feb 01, 2012 at 02:50:55PM +0000, Nicholas Clark wrote:
> > a) If the pattern or the input string have Latin 1 characters, but one is
> > encoded as UTF-8 and one is not, they don't match
> > b) If one is Unicode, and the other is the corresponding octet sequence,
> > they *do* match, when they should not.
> >
> > The usual approach in the core in these cases is *not* to upgrade the
> > input scalar. Instead, to take a copy, upgrade that, work using it,
> > and free it at the end.

> If the string is not in UTF-8 but the format is, then would it suffice
> simply to attempt to downgrade the format back down again? If this
> downgrade fails, then surely the format must have contained some
> character that the string cannot possibly have, so it will never match?

Well, I think so. But, I'm not confident there isn't a hole in here, if
one of the format specifiers (which are in ASCII) can match against
characters encoded as UTF-8.

So I read the man page. On more than one system. I'm assuming that this
is reasonably accurate: http://www.unix.com/man-page/POSIX/3/strptime/

and the bad news starts with the first format specifier:

%a The day of the week, using the locale's weekday names; either
the abbreviated or full name may be specified.

although at least FreeBSD has a reassuring bugs section:

BUGS
There is no conversion specification for the phase of the moon.

The strftime() function does not correctly handle multibyte characters in
the format argument.

http://www.unix.com/man-page/FreeBSD/3/strftime/


which suggest that doing this correctly with locales is a can of worms.

Not totally sure what to do. I can't find the original citation, but the
Internet is awash with a quote from Jarkko:

I can let you in to the ultimate locale secret: Avoid.

yet we seem to be trying to navigate a path of "is your locale UTF-8?
OK, we'll try hard to feed the C library functions UTF-8 sequences"

At which point it seems that we know what to do for "C" or "POSIX", and
"we think it's UTF-8", but it's then not clear what to do for "we don't
know about your locale, and what character set it thinks it's in"

This stuff is a portability mess.



But, in the process of reading the POSIX man page, I spotted something else

On Wed, Feb 01, 2012 at 02:50:55PM +0000, Nicholas Clark wrote:

> 2) The defaults are un-Perlish, and leak through. If I parse "1972-09-21"
> with "%Y-%m-%d" I expect to get a result on the 21st September 1972.
> What actually happens:
>
> $ ./perl -Ilib -MPOSIX -le 'print scalar gmtime POSIX::mktime(POSIX::strptime("1972-09-21", "%Y-%m-%d"))'
> Wed Sep 20 21:58:59 1972
>
> I see from the code examples and documentation that I'm expected to supply
> default values for seconds, minutes and hours like this:
>
> $ ./perl -Ilib -MPOSIX -le 'print scalar gmtime POSIX::mktime(POSIX::strptime("1972-09-21", "%Y-%m-%d", 0, 0, 0))'
> Wed Sep 20 23:00:00 1972
>
> Why? Nearly every other default in Perl is 0. Why is this -1?
> And why do these defaults leak out into the return value?
> [Yes, I can see that it's an artefact of the implementation. I shouldn't
> need to know that. And it's not documented.]
>
> If we ship the current interface, we're going to get bug reports about it.


If I read it correctly, it's not actually possible to guarantee that the
arguments passed in to initialise the struct will be honoured:

It is unspecified whether multiple calls to strptime() using the same
tm structure will update the current contents of the structure or over-
write all contents of the structure. Conforming applications should
make a single call to strptime() with a format and all data needed to
completely specify the date and time being converted.

( from http://www.unix.com/man-page/POSIX/3/strptime/ )

Meh. :-(

Although it seems that Linux and FreeBSD (and therefore OS X) do leave
arguments untouched. (Online versions of NetBSD and OpenBSD man pages
make no mention of this)

I guess that the least worst approach is to document that the code sets them
up, but it's OS dependent on what happens next.

Nicholas Clark

Paul LeoNerd Evans

unread,
Feb 1, 2012, 5:21:06 PM2/1/12
to Nicholas Clark, Perl 5 Porters, Ricardo Signes
I think it's just got worse.

I was wondering how the existing POSIX::strftime() function gets around
this mess. Turns out it doesn't:

char *buf = my_strftime(SvPV_nolen(fmt), sec, min, hour, mday, mon, year, wday, yday, isdst);
if (buf) {
SV *const sv = sv_newmortal();
sv_usepvn_flags(sv, buf, strlen(buf), SV_HAS_TRAILING_NUL);
if (SvUTF8(fmt)) {
SvUTF8_on(sv);
}
ST(0) = sv;
}

So, the UTF-8 flag is set on the result dependent purely on whether it
was set in the $fmt. There are four possibilities here:

$fmt locale

legacy legacy => all fine

legacy UTF-8 => resulting string contains "characters" representing
individual UTF-8 bytes when not marked UTF-8

UTF-8 legacy => resulting string is marked as containing UTF-8 data
despite the fact it may not contain valid encoding
bytes

UTF-8 UTF-8 => all fine

Of these four, I'm most worried about the third. Does SvUTF8_on()
actually sanity-check the bytes, or is it presumed the caller knows what
they are doing?

In any case, it occurs to me that maybe we have to somehow trust the
user here, to supply a $fmt string whose UTF-8-ness is compatible with
the locale. In the case of strptime(), we'll up- or down-grade the $str
to match the $fmt, and hope that all will go well.

> If I read it correctly, it's not actually possible to guarantee that the
> arguments passed in to initialise the struct will be honoured:
>
> It is unspecified whether multiple calls to strptime() using the same
> tm structure will update the current contents of the structure or over-
> write all contents of the structure. Conforming applications should
> make a single call to strptime() with a format and all data needed to
> completely specify the date and time being converted.
>
> ( from http://www.unix.com/man-page/POSIX/3/strptime/ )
>
> Meh. :-(

POSIX indeed says that. It is most annoying.

> Although it seems that Linux and FreeBSD (and therefore OS X) do leave
> arguments untouched. (Online versions of NetBSD and OpenBSD man pages
> make no mention of this)

Furthermore, Solaris can be asked not to zero out the fields provided
you

#define _STRPTIME_DONTZERO
#include <time.h>

> I guess that the least worst approach is to document that the code sets them
> up, but it's OS dependent on what happens next.

This feels most pragmatic.

I am however, beginning to wonder whether it's a mistake to have
POSIX::strptime() call mktime(3) at the end if all the mday/mon/year
fields are valid. I did this because I thought POSIX::strftime() does it
initially, but actually it doesn't. So maybe we'd avoid problems down
the line by not doing this here, at least?
signature.asc

Leon Timmermans

unread,
Feb 1, 2012, 5:29:11 PM2/1/12
to Paul LeoNerd Evans, Nicholas Clark, Perl 5 Porters, Ricardo Signes
On Wed, Feb 1, 2012 at 11:21 PM, Paul LeoNerd Evans
<leo...@leonerd.org.uk> wrote:
> Of these four, I'm most worried about the third. Does SvUTF8_on()
> actually sanity-check the bytes, or is it presumed the caller knows what
> they are doing?

The latter :-(

There's sv_utf8_decode and sv_utf8_upgrade if you want sanity-checking.

Leon

Paul LeoNerd Evans

unread,
Feb 2, 2012, 6:47:22 AM2/2/12
to Nicholas Clark, Perl 5 Porters, Ricardo Signes
On Wed, Feb 01, 2012 at 02:50:55PM +0000, Nicholas Clark wrote:
> 1) The export code isn't working. It's supposed to be in @EXPORT_OK:
>
> # Symbols that should not be exported by default because they are recently
> # added. It would upset too much of CPAN to export these by default
> delete $export{$_} and push @EXPORT_OK, $_ for qw(strptime);
>
> yet
>
> $ ./perl -Ilib -e 'use POSIX "strptime"'
> "strptime" is not exported by the POSIX module
> Can't continue after import errors at lib/POSIX.pm line 30
> BEGIN failed--compilation aborted at -e line 1.

Should now be fixed:

https://github.com/leonerd/perl/commit/bbd97525073436f4c38070045f7b7c01d1cdc0ab

> 2) The defaults are un-Perlish, and leak through. If I parse "1972-09-21"
> with "%Y-%m-%d" I expect to get a result on the 21st September 1972.
> What actually happens:
>
> $ ./perl -Ilib -MPOSIX -le 'print scalar gmtime POSIX::mktime(POSIX::strptime("1972-09-21", "%Y-%m-%d"))'
> Wed Sep 20 21:58:59 1972
>
> I see from the code examples and documentation that I'm expected to supply
> default values for seconds, minutes and hours like this:
>
> $ ./perl -Ilib -MPOSIX -le 'print scalar gmtime POSIX::mktime(POSIX::strptime("1972-09-21", "%Y-%m-%d", 0, 0, 0))'
> Wed Sep 20 23:00:00 1972
>
> Why? Nearly every other default in Perl is 0. Why is this -1?
> And why do these defaults leak out into the return value?
> [Yes, I can see that it's an artefact of the implementation. I shouldn't
> need to know that. And it's not documented.]
>
> If we ship the current interface, we're going to get bug reports about it.

Fixed now to yield undef specifically for unparsed fields, not -1. Also
fixed to ensure mktime() isn't upset by the -1 values:

https://github.com/leonerd/perl/commit/841a8f606dc56dbeb5b0b7a9e93fd23073d80d99

> 3) I like the interface of using pos, and I think that passing a reference is
> a good way to implement this. However, the code assumes that any reference
> is a reference to a scalar:
>
> $ ./perl -Ilib -MPOSIX -wle 'print foreach POSIX::strptime({}, "%Y-%m-%d")'
> Use of uninitialized value in subroutine entry at -e line 1.
> $ ./perl -Ilib -MPOSIX -wle 'print foreach POSIX::strptime(qr/Pie/, "%Y-%m-%d")'
> $
>
> with the side effect that an object with overloaded stringification is
> treaded as a reference, not a plain string. A fix for that is probably:
...
> but that doesn't catch the cases of hash references or regex reference.
> I suspect that a check of SvTYPE() is also needed. (Hashes are not SvPOK()
> but regex objects *are*, so SvPOK(SvRV(str)) isn't going to be good enough
> to weed out regexs)

Now handled:

https://github.com/leonerd/perl/commit/35f0c2c8fa21219f915baa4a36be811a4d6b128e

> 4) The code is not UTF-8 safe. Totally unsafe:
>
>
> else {
> str_c = SvPV_nolen(str);
> }
>
> remains = strptime(str_c, SvPV_nolen(fmt), &tm);
>
> a) If the pattern or the input string have Latin 1 characters, but one is
> encoded as UTF-8 and one is not, they don't match
> b) If one is Unicode, and the other is the corresponding octet sequence,
> they *do* match, when they should not.
>
> The usual approach in the core in these cases is *not* to upgrade the
> input scalar. Instead, to take a copy, upgrade that, work using it,
> and free it at the end.
>
> 5) I suspect that the code to write back to pos() is not UTF-8 safe either.
> If strptime() is returning a new position in bytes, then in writing the
> pos() magic one needs not be sure that it is written back correctly.
> But I can't see how to test that until the previous problem is resolved.

I have a partial fix for these, which I'm still working on. Currently it
will upgrade either fmt or str to UTF-8 if the other one is, with logic
to cope with the pos() magic using utf8_hop() and utf8_distance() to
convert between byte and character counts.

Still unsure of the overall aim though because of the additional problem
of handling the UTF-8ness of the locale. See other fork of this thread.


> 6) Please could you spell check your documentation.
> I spotted 1 typo that a spellchecker would have caught.
> Which immediately makes me wonder what else lurks:

Yes, I'll do that at the end. The point of my smoke-me branch was to
check OS compatibility across a variety of operating systems, not to
suggest that the code was ready to merge into blead.


> Tests appended for most of the above. All should pass, 13 currently fail.

I've managed to make all but 4 of them pass. The ones that still don't
are the ones involving 121 as an octet. Your test tries to assert the
failure of strptime() to parse it, whereas my implementation does parse
correctly. Is this not correct? 121 < 128, after all..

I've also attempted to add unit tests for the other bits above, back
into the main t/time.t unit test.
signature.asc

Paul LeoNerd Evans

unread,
Feb 2, 2012, 5:58:48 PM2/2/12
to Nicholas Clark, Perl 5 Porters, Ricardo Signes
On Wed, Feb 01, 2012 at 02:50:55PM +0000, Nicholas Clark wrote:
> 4) The code is not UTF-8 safe. Totally unsafe:
>
>
> else {
> str_c = SvPV_nolen(str);
> }
>
> remains = strptime(str_c, SvPV_nolen(fmt), &tm);
>
> a) If the pattern or the input string have Latin 1 characters, but one is
> encoded as UTF-8 and one is not, they don't match
> b) If one is Unicode, and the other is the corresponding octet sequence,
> they *do* match, when they should not.
>
> The usual approach in the core in these cases is *not* to upgrade the
> input scalar. Instead, to take a copy, upgrade that, work using it,
> and free it at the end.

Taking the idea from POSIX::strftime() that we just hope that the
UTF-8ness of the fmt string is compatible with the locale, I have now
made it regrade $str up or down to match.

See

https://github.com/leonerd/perl/commit/579e314590c9ceda464b516e4f26086f600785e3
https://github.com/leonerd/perl/commit/58d6481ddd067a51821da2acd2186161a1f759d5

> 5) I suspect that the code to write back to pos() is not UTF-8 safe either.
> If strptime() is returning a new position in bytes, then in writing the
> pos() magic one needs not be sure that it is written back correctly.
> But I can't see how to test that until the previous problem is resolved.

That should be fixed now with careful application of
utf8_hop/utf8_distance. Some unit tests also added to confirm they work
correctly.

> 6) Please could you spell check your documentation.
> I spotted 1 typo that a spellchecker would have caught.
> Which immediately makes me wonder what else lurks:

I'll get on to that next; though most of what I have there is just
copy-pasted from the strftime() docs, so it may be I'll have to fix
spelling there too.
signature.asc

Nicholas Clark

unread,
Feb 6, 2012, 11:49:22 AM2/6/12
to Paul LeoNerd Evans, Perl 5 Porters, Ricardo Signes
On Thu, Feb 02, 2012 at 11:47:22AM +0000, Paul LeoNerd Evans wrote:
> On Wed, Feb 01, 2012 at 02:50:55PM +0000, Nicholas Clark wrote:
> > 1) The export code isn't working. It's supposed to be in @EXPORT_OK:
> >
> > # Symbols that should not be exported by default because they are recently
> > # added. It would upset too much of CPAN to export these by default
> > delete $export{$_} and push @EXPORT_OK, $_ for qw(strptime);
> >
> > yet
> >
> > $ ./perl -Ilib -e 'use POSIX "strptime"'
> > "strptime" is not exported by the POSIX module
> > Can't continue after import errors at lib/POSIX.pm line 30
> > BEGIN failed--compilation aborted at -e line 1.
>
> Should now be fixed:

Rah!

> https://github.com/leonerd/perl/commit/bbd97525073436f4c38070045f7b7c01d1cdc0ab
>
> > 2) The defaults are un-Perlish, and leak through. If I parse "1972-09-21"

> > with "%Y-%m-%d" I expect to get a result on the 21st September 1972.

> > Why? Nearly every other default in Perl is 0. Why is this -1?
> > And why do these defaults leak out into the return value?
> > [Yes, I can see that it's an artefact of the implementation. I shouldn't
> > need to know that. And it's not documented.]
> >
> > If we ship the current interface, we're going to get bug reports about it.
>
> Fixed now to yield undef specifically for unparsed fields, not -1. Also
> fixed to ensure mktime() isn't upset by the -1 values:
>
> https://github.com/leonerd/perl/commit/841a8f606dc56dbeb5b0b7a9e93fd23073d80d99

Cool.

It's still bugging me that the implementation is using -1 as a proxy for
undef, and that

a) someone might decide that that's a documented part of the interface
(should the interface instead abort on negative values? Or is that going
to hinder using someone's OS strptime which gives them meaning?
Or just annoy people?)
b) If an integer has to be used, -1 is very close to 0, which is valid

But I don't have an answer to that. I don't even know if it needs an
answer.


> Now handled:
>
> https://github.com/leonerd/perl/commit/35f0c2c8fa21219f915baa4a36be811a4d6b128e

Cool.

In the code, I see

PPCODE:
{
const char *str_c;
const U8 *orig_bytes;
SV *strref = NULL;
MAGIC *posmg = NULL;
int str_offset = 0;
struct tm tm;
char *remains;


str_offset should not be int. I'm aware that nearly every core use of I32 is
likely a latent bug, such as:

struct magic {
MAGIC* mg_moremagic;
MGVTBL* mg_virtual; /* pointer to magic functions */
U16 mg_private;
char mg_type;
U8 mg_flags;
I32 mg_len;
SV* mg_obj;
char* mg_ptr;
};

but I don't think that int is the right bugfix. I suspect that it should be
STRLEN, but I'm not sure of that. On the other hand, if it's I32, at least
it would stand out as consistent, needs fixing.

I found the use of strref here to be confusing:

else if(SvUTF8(str) && !SvUTF8(fmt)) {
str = sv_mortalcopy(str);
/* If downgrade fails then str must have contained characters
* that could not possibly be matched by fmt */
if(!sv_utf8_downgrade(str, 1))
XSRETURN(0);

str_c = SvPV_nolen(str);

if(str_offset) {
orig_bytes = SvPV_nolen(strref);
str_offset = utf8_distance(orig_bytes + str_offset, orig_bytes);
}
}


I had to dig twice to work out

a) firstly, that it's not a NULL pointer dereference.
b) secondly, that it's not a bug

If I follow the code on the screen above (as in, it's not visible), if
str_offset is non-zero, then

a) strref is a pointer to an SV
b) is actually a copy of the value str had before it was passed to
sv_mortalcopy(str)

I think another local variable would make that a lot clearer.


More generally, looking at the code, this is bugging me:

/* Solaris needs this in order not to zero out all the untouched fields in strptime() */
#define _STRPTIME_DONTZERO

Given:

It is unspecified whether multiple calls to strptime() using the same
tm structure will update the current contents of the structure or over-
write all contents of the structure. Conforming applications should
make a single call to strptime() with a format and all data needed to
completely specify the date and time being converted.

(I assume that http://www.unix.com/man-page/POSIX/3/strptime/ is correct)


So, Solaris defaults to *one* POSIX conformant behaviour, but can support
a *second* POSIX conformant behaviour which "we" would prefer. So we enable
the flags to force that.

I don't think that this in any way affects the rest of the code.


But, extrapolate. We can't be promising the "update means we allow defaults"
interface. What happens on some other *nix OS where it turns out that it's
like Solaris, defaulting to one, but having a way to enable the second.

But we *don't* find out about this until after 5.16.0 ships.

What do we do? Do we change behaviour in 5.18.0?
Or do we say that we keep the current behaviour on all platforms.
Which means that "Solaris is special".


Neither makes me happy. It sort of comes down to the entire dichotomy of
(attempting to) provide POSIX interfaces in the core. Are we shooting for

1: A cross-platform consistent implementation of the interface, that one
can rely on

or

2: A minimal wrapper to the OS implementation, which exposes both the OS
specific extensions and the OS specific bugs to the end user code.


Different people have legitimate reasons for wanting either. And the core
*can't* provide both. Whereas CPAN can.

In this case, Win32 doesn't provide strptime. So the question is not
hypothetical.


> > Tests appended for most of the above. All should pass, 13 currently fail.
>
> I've managed to make all but 4 of them pass. The ones that still don't
> are the ones involving 121 as an octet. Your test tries to assert the
> failure of strptime() to parse it, whereas my implementation does parse
> correctly. Is this not correct? 121 < 128, after all..

Yes, my mistake, sorry. Fixed version attached, everything happy.

> I've also attempted to add unit tests for the other bits above, back
> into the main t/time.t unit test.


To be blunt, your tests aren't good enough. If I make this change:

diff --git a/ext/POSIX/POSIX.xs b/ext/POSIX/POSIX.xs
index b01c1bd..cf03738 100644
--- a/ext/POSIX/POSIX.xs
+++ b/ext/POSIX/POSIX.xs
@@ -1906,7 +1906,7 @@ strptime(str, fmt, sec=-1, min=-1, hour=-1, mday=-1, mon=-1, year=-1, wday=-1, y
str_c = SvPV_nolen(str);

if(str_offset) {
- str_offset = utf8_hop(str_c, str_offset) - (U8*)str_c;
+ // str_offset = utf8_hop(str_c, str_offset) - (U8*)str_c;
}
}
else if(SvUTF8(str) && !SvUTF8(fmt)) {

or this change

diff --git a/ext/POSIX/POSIX.xs b/ext/POSIX/POSIX.xs
index b01c1bd..d3dffe2 100644
--- a/ext/POSIX/POSIX.xs
+++ b/ext/POSIX/POSIX.xs
@@ -1914,7 +1914,7 @@ strptime(str, fmt, sec=-1, min=-1, hour=-1, mday=-1, mon=-1, year=-1, wday=-1, y
/* If downgrade fails then str must have contained characters
* that could not possibly be matched by fmt */
if(!sv_utf8_downgrade(str, 1))
- XSRETURN(0);
+ /* XSRETURN(0); */ abort();

str_c = SvPV_nolen(str);


all the tests pass. When clearly something *should* fail.

Given that strptime is fast, I really feel that a looping data driven "test
all permutations" approach, such as my attached sample set, is the way to
go. In a new test file. The code is complex, and "coverage" alone is not
going to find all the problems, or even make me comfortable that there
likely aren't any.

*My* tests fail:

$ ./perl -Ilib strptime.pl
ok 1 - sanity check
ok 2 - defaults supplied
ok 3 - stringified overloaded object
ok 4 - overloaded object
ok 5 - chr 121
ok 6 - chr 121, format upgraded
ok 7 - chr 121, input upgraded
ok 8 - chr 121, both upgraded
ok 9 - chr 121, format as octets
ok 10 - chr 121, input as octets
ok 11 - chr 121, string upgraded, format as octets
ok 12 - chr 121, input as octets, format upgraded
ok 13 - chr 255
ok 14 - chr 255, format upgraded
ok 15 - chr 255, input upgraded
ok 16 - chr 255, both upgraded
ok 17 - chr 255, format as octets
ok 18 - chr 255, input as octets
ok 19 - chr 255, string upgraded, format as octets
ok 20 - chr 255, input as octets, format upgraded
ok 21 - chr 375
ok 22 - chr 375, format upgraded
ok 23 - chr 375, input upgraded
ok 24 - chr 375, both upgraded
Aborted

I couldn't write tests for the pos interaction with UTF-8 because I knew
that pos was bust.

I'd really like it if someone *else*, who is keen on this feature going in
(I'm at best neutral, so this is not me), voted with their "feet" and
*also* wrote tests for it, with the intent of torturing it. Right now the
interest level on this seems to be Paul "LeoNerd" Evans wants it in, and
everyone else thinks that someone else will do "it". Whatever "it" is.

Nicholas Clark

Paul LeoNerd Evans

unread,
Feb 8, 2012, 8:32:22 AM2/8/12
to Nicholas Clark, Perl 5 Porters, Ricardo Signes
On Mon, Feb 06, 2012 at 04:49:22PM +0000, Nicholas Clark wrote:
> > Fixed now to yield undef specifically for unparsed fields, not -1. Also
> > fixed to ensure mktime() isn't upset by the -1 values:
> >
> > https://github.com/leonerd/perl/commit/841a8f606dc56dbeb5b0b7a9e93fd23073d80d99
>
> Cool.
>
> It's still bugging me that the implementation is using -1 as a proxy for
> undef, and that
>
> a) someone might decide that that's a documented part of the interface
> (should the interface instead abort on negative values? Or is that going
> to hinder using someone's OS strptime which gives them meaning?
> Or just annoy people?)
> b) If an integer has to be used, -1 is very close to 0, which is valid
>
> But I don't have an answer to that. I don't even know if it needs an
> answer.

Well, -1 is specifically not a valid value for any of the 6 primary
fields (sec to year), so it feels OK to set that. Nobody ought to be
specifically passing in -1 on these values.

Maybe the code could be updated to take SVs instead of ints and complain
about negatives, so that the code is free to put in -1 on a real undef
which can't get in there "by accident".

(Annoyingly the typemap doesn't easily help us there)
Ah OK, I'll update to use STRLEN.

> I found the use of strref here to be confusing:
>
> else if(SvUTF8(str) && !SvUTF8(fmt)) {
> str = sv_mortalcopy(str);
> /* If downgrade fails then str must have contained characters
> * that could not possibly be matched by fmt */
> if(!sv_utf8_downgrade(str, 1))
> XSRETURN(0);
>
> str_c = SvPV_nolen(str);
>
> if(str_offset) {
> orig_bytes = SvPV_nolen(strref);
> str_offset = utf8_distance(orig_bytes + str_offset, orig_bytes);
> }
> }
>
>
> I had to dig twice to work out
>
> a) firstly, that it's not a NULL pointer dereference.
> b) secondly, that it's not a bug
>
> If I follow the code on the screen above (as in, it's not visible), if
> str_offset is non-zero, then
>
> a) strref is a pointer to an SV
> b) is actually a copy of the value str had before it was passed to
> sv_mortalcopy(str)
>
> I think another local variable would make that a lot clearer.

You're right; I'll call it perhaps orig_str or something equally clear.

> More generally, looking at the code, this is bugging me:
>
> /* Solaris needs this in order not to zero out all the untouched fields in strptime() */
> #define _STRPTIME_DONTZERO
>
> Given:
>
> It is unspecified whether multiple calls to strptime() using the same
> tm structure will update the current contents of the structure or over-
> write all contents of the structure. Conforming applications should
> make a single call to strptime() with a format and all data needed to
> completely specify the date and time being converted.
>
> (I assume that http://www.unix.com/man-page/POSIX/3/strptime/ is correct)

It is indeed unspecified by POSIX, much to my annoyance.

In practice across any OS I've tried with the exception of Solaris, they
all leave other fields untouched. Solaris also can be asked, by the
above, to leave them untouched.

> So, Solaris defaults to *one* POSIX conformant behaviour, but can support
> a *second* POSIX conformant behaviour which "we" would prefer. So we enable
> the flags to force that.
>
> I don't think that this in any way affects the rest of the code.
>
>
> But, extrapolate. We can't be promising the "update means we allow defaults"
> interface. What happens on some other *nix OS where it turns out that it's
> like Solaris, defaulting to one, but having a way to enable the second.
>
> But we *don't* find out about this until after 5.16.0 ships.
>
> What do we do? Do we change behaviour in 5.18.0?
> Or do we say that we keep the current behaviour on all platforms.
> Which means that "Solaris is special".
>
>
> Neither makes me happy. It sort of comes down to the entire dichotomy of
> (attempting to) provide POSIX interfaces in the core. Are we shooting for
>
> 1: A cross-platform consistent implementation of the interface, that one
> can rely on
>
> or
>
> 2: A minimal wrapper to the OS implementation, which exposes both the OS
> specific extensions and the OS specific bugs to the end user code.

I think we should take the general design approach that POSIX.{xs,pm}
are simply applying a small Perl<->C wrapping around whatever POSIX
functions libc provides, and that ultimately the behaviour does really
come from your OS, and not Perl. We can document that we just pass
things through to libc and back, and what values it gives are up to it.
This then lets people decide to make Linux- or BSD- or Solaris-specific
code, and that's their decision. Specifically here we can just copy
POSIX's wording on the subject of untouched fields, and observe that
some OSes may leave these fields untouched (and therefore return undef
in the perl wrapping), but this is not guaranteed by POSIX, so users
should check their platform's documentation. I feel this is reasonable
for a "just wrapping your OS" module like POSIX.

((
As a brief aside, I at least am already quitefamilar and happy with the
idea that POSIX.{xs,pm} does this in other areas; functions that may or
may not be implemented, errno values, etc...
))

> Different people have legitimate reasons for wanting either. And the core
> *can't* provide both. Whereas CPAN can.
>
> In this case, Win32 doesn't provide strptime. So the question is not
> hypothetical.

Yes; and in Win32's case CPAN already provides several emulations of
strptime(). It's indeed quite easy to write a pureperl strptime() based
only on an enumeration of day and month names as obtained from
strftime().

Ifwhen we get this strptime() function into core's POSIX, I may go and
chat with the author of the POSIX::strptime module on CPAN, and see if
we can arrange that his module just wrap core's if it can, or provides a
PP emulation on strftime if not. This feels almost analogous to my
Socket::GetAddrInfo::Emul, in fact.

> To be blunt, your tests aren't good enough. If I make this change:
<snip>
> all the tests pass. When clearly something *should* fail.

Ah oops, I thought I had covered that.
Ah OK, would it be best if I included your strptime.t file directly in
the distribution then? It would save some more round-tripping for more
test coverage. :)

> I couldn't write tests for the pos interaction with UTF-8 because I knew
> that pos was bust.
>
> I'd really like it if someone *else*, who is keen on this feature going in
> (I'm at best neutral, so this is not me), voted with their "feet" and
> *also* wrote tests for it, with the intent of torturing it. Right now the
> interest level on this seems to be Paul "LeoNerd" Evans wants it in, and
> everyone else thinks that someone else will do "it". Whatever "it" is.

Me too, to be honest. I'm a little upset there hasn't been more peer
review of this. That said I am most grateful of your interest, and
providing a test script. (In fact I plan to write a blog post about that
very subject :) ).

But yes, if anyone else has anything to say I would love to hear it ...
signature.asc

Paul LeoNerd Evans

unread,
Feb 8, 2012, 11:40:48 AM2/8/12
to Nicholas Clark, Perl 5 Porters, Ricardo Signes
On Mon, Feb 06, 2012 at 04:49:22PM +0000, Nicholas Clark wrote:
> str_offset should not be int. I'm aware that nearly every core use of I32 is
...
> but I don't think that int is the right bugfix. I suspect that it should be
> STRLEN, but I'm not sure of that. On the other hand, if it's I32, at least
> it would stand out as consistent, needs fixing.

Done

https://github.com/leonerd/perl/commit/17928b7f4b4fa0a53d05d9cfc0c2ec0ecb2bc8d2

> I found the use of strref here to be confusing:
...
> I think another local variable would make that a lot clearer.

Logic adjusted and variables renamed, now also with comments:

https://github.com/leonerd/perl/commit/fc8c28b32548a9cab8f8bf13df86ca35b1fc8f04

int returning_pos = 0; /* true if caller wants us to set pos() marker on str */
SV *orig_str = NULL; /* caller's original SV* if we have had to regrade it */
const U8 *orig_bytes; /* SvPV of orig_str */
signature.asc

Ævar Arnfjörð Bjarmason

unread,
Feb 11, 2012, 5:56:16 PM2/11/12
to Nicholas Clark, Paul LeoNerd Evans, Perl 5 Porters, Ricardo Signes
On Mon, Feb 6, 2012 at 17:49, Nicholas Clark <ni...@ccl4.org> wrote:

> I'd really like it if someone *else*, who is keen on this feature going in
> (I'm at best neutral, so this is not me), voted with their "feet" and
> *also* wrote tests for it, with the intent of torturing it. Right now the
> interest level on this seems to be Paul "LeoNerd" Evans wants it in, and
> everyone else thinks that someone else will do "it". Whatever "it" is.

I looked over it, thought it was sane and pushed it to blead.

We can push additional tests for it later or fix up any bugs with it
(or revert the push if needed), but since it's a new feature that'll
only burn people who explicitly try to use it I thought it was fine in
its current state.

Paul LeoNerd Evans

unread,
Feb 11, 2012, 7:06:45 PM2/11/12
to Ævar Arnfjörð, Perl 5 Porters, Nicholas Clark, Ricardo Signes
On Sat, Feb 11, 2012 at 11:56:16PM +0100, Ævar Arnfjörð Bjarmason wrote:
> I looked over it, thought it was sane and pushed it to blead.
>
> We can push additional tests for it later or fix up any bugs with it
> (or revert the push if needed), but since it's a new feature that'll
> only burn people who explicitly try to use it I thought it was fine in
> its current state.

One thing it does still need is a Configure test to detect its absence
on MSWin32. Should be quite simple, I can likely hack something up
tomorrow.
signature.asc

Nicholas Clark

unread,
Feb 12, 2012, 5:08:32 PM2/12/12
to Ævar Arnfjörð, Paul LeoNerd Evans, Perl 5 Porters, Ricardo Signes
On Sat, Feb 11, 2012 at 11:56:16PM +0100, Ęvar Arnfjörš Bjarmason wrote:
> On Mon, Feb 6, 2012 at 17:49, Nicholas Clark <ni...@ccl4.org> wrote:
>
> > I'd really like it if someone *else*, who is keen on this feature going in
> > (I'm at best neutral, so this is not me), voted with their "feet" and
> > *also* wrote tests for it, with the intent of torturing it. Right now the
> > interest level on this seems to be Paul "LeoNerd" Evans wants it in, and
> > everyone else thinks that someone else will do "it". Whatever "it" is.

Right, and in my opinion "one person wants it in" and everyone else is
apathetic is *not* a good enough reason to put something in. Because it
means that no-one else has really given its interface design a torture yet.

The "it" *I* wanted was for someone else to write tests for it.
And the stick was "it shouldn't go in *until* someone else has done this.
And the key part being that if the advocate of a feature can't find anyone
else to write tests for it, then it's looking evident that this isn't a
widely desired feature.

> I looked over it, thought it was sane and pushed it to blead.
>
> We can push additional tests for it later or fix up any bugs with it
> (or revert the push if needed), but since it's a new feature that'll
> only burn people who explicitly try to use it I thought it was fine in
> its current state.

Sure, but then we burn them worse if we decide that the API isn't right,
or we want to do some incompatible change. This isn't a bit of code
that has stewed on CPAN for a while to see what people make of it.

In addition to the above, I feel that this was rather hasty there were
unaddressed questions that I had asked.

For starters, do we keep the special casing for Solaris, or should we stick
to a policy of exactly exposing the default implementation of each platform?

That's potentially in incompatible change for Solaris users.


Finally, I think that the *pumpking* should be the one making the call on
new features. Not any other committer. And this a new feature. And it's
now in 8 days before the "user visible change freeze". Which isn't very
long to make user visible changes to it.

Nicholas Clark

Ævar Arnfjörð Bjarmason

unread,
Feb 13, 2012, 6:51:59 AM2/13/12
to Nicholas Clark, Paul LeoNerd Evans, Perl 5 Porters, Ricardo Signes
I have my own reservations about it, but I thought at this point it
would be much better off in blead since we'll get more testing and
interest in it (e.g. the strptime probe).

There's no reason we can't eject it back out again within 8 days
though, I'll do that unless there's agreement from the pumpking about
keeping it.

Anyway, the issues with it, I think are:

* The general issue of having an OS-dependent date parsing function
in the core. This is an area thoroughly solved by
DateTime::Format::* and just by having it we'll lead a lot of
people be exposed by OS-specific bugs instead of using a stable
CPAN module.

* Those who want it now can just install POSIX::strptime.

* Unlike strptime() itself you can't get a char* to the position in
the string where we stopped being able to parse it, arguably the
interface should be:

my $pos = strptime($str, $fmt, \my @tm); # or \my %tm

This would be really tricky though since $str may be a UTF8 string
and the C library isn't going to handle that, what do we do if it
returns a position in the middle of a UTF8 character?

* That Solaris issue, and in general not being able to rely on it
working in the same manner across OS's, that's inherent in the
entire POSIX module though.

* It could use more tests, although we have more tests for it already
than e.g. glibc has for its strptime function.

More generally though I worry that we're being too conservative about
including something like this.

If it's this hard to contribute a new function to POSIX compared to
the effort it took to get the functions that are already in there into
the core (we've had strftime since 5.0, and probably and probably
didn't review it this much) POSIX.pm is likely to stagnate and we'll
get a bunch of CPAN modules implementing POSIX functions instead.

Maybe that's fine, but I think we should think about what we want to
do with the POSIX module, do we want all of POSIX in there? Is a
function not being there a bug? In that case this patch is arguably a
bugfix.

Do we only want the functions we have already? In that case maybe we
should try to get some CPAN distribution going that has complimentary
functions.

Mark Overmeer

unread,
Feb 13, 2012, 7:24:29 AM2/13/12
to Ævar Arnfjörð, Perl5 Porters
* Ęvar Arnfjörš Bjarmason (ava...@gmail.com) [120213 11:52]:
> More generally though I worry that we're being too conservative about
> including something like this.
>
> If it's this hard to contribute a new function to POSIX compared to
> the effort it took to get the functions that are already in there into
> the core (we've had strftime since 5.0, and probably and probably
> didn't review it this much) POSIX.pm is likely to stagnate and we'll
> get a bunch of CPAN modules implementing POSIX functions instead.

That's the purpose of the new POSIX::1003 module, which offers access
to the same POSIX.xs but reorganized and on CPAN. This will also
provide the added functionality to older versions of Perl.

> Do we only want the functions we have already? In that case maybe we
> should try to get some CPAN distribution going that has complimentary
> functions.

One of the main issues with having many small extensions on CPAN, is
that each of them need a XS component. In many commercial environments
that means that someone first has to produce an rpm for that package on
a system which has a compiler by hand: where most distributions do not
contain the whole of CPAN (yet). Larger modules usually are available.

I would like to grow POSIX::1003. On the moment, smaller functions
got absorbed, like mknod. I offered Paul to absorb his strptime as
well.
--
Regards,

MarkOv

------------------------------------------------------------------------
Mark Overmeer MSc MARKOV Solutions
Ma...@Overmeer.net solu...@overmeer.net
http://Mark.Overmeer.net http://solutions.overmeer.net

Leon Timmermans

unread,
Feb 13, 2012, 8:38:30 AM2/13/12
to Ævar Arnfjörð Bjarmason, Perl 5 Porters, Ricardo Signes
On Mon, Feb 13, 2012 at 12:51 PM, Ævar Arnfjörð Bjarmason
<ava...@gmail.com> wrote:
> More generally though I worry that we're being too conservative about
> including something like this.
>
> If it's this hard to contribute a new function to POSIX compared to
> the effort it took to get the functions that are already in there into
> the core (we've had strftime since 5.0, and probably and probably
> didn't review it this much) POSIX.pm is likely to stagnate and we'll
> get a bunch of CPAN modules implementing POSIX functions instead.
>
> Maybe that's fine, but I think we should think about what we want to
> do with the POSIX module, do we want all of POSIX in there? Is a
> function not being there a bug? In that case this patch is arguably a
> bugfix.
>
> Do we only want the functions we have already? In that case maybe we
> should try to get some CPAN distribution going that has complimentary
> functions.

I think the only useful function (not counting constants and stubs)
added to POSIX since 1996 is lchown; I think calling our policy
conservative would be an understatement. I think that was mostly
because it was always considered a big module, even though most of it
was useless stubs.

While I don't think it's a good idea to add all of POSIX in one
module, but some useful functionality would be nice (my personal pet
pieve would be the lack of an mmap).

Leon

Paul LeoNerd Evans

unread,
Feb 13, 2012, 1:49:47 PM2/13/12
to Ævar Arnfjörð, Perl 5 Porters, Nicholas Clark, Ricardo Signes
On Mon, Feb 13, 2012 at 12:51:59PM +0100, Ævar Arnfjörð Bjarmason wrote:
> * The general issue of having an OS-dependent date parsing function
> in the core. This is an area thoroughly solved by
> DateTime::Format::* and just by having it we'll lead a lot of
> people be exposed by OS-specific bugs instead of using a stable
> CPAN module.

It's no more OS-specific than the rest of the OS-specific stuff in the
obviously OS-specific "POSIX" module. Most notably strftime, that
everyone seems quite happy to remain there.

> * Those who want it now can just install POSIX::strptime.

Whose tests break on non-Linux.

> * Unlike strptime() itself you can't get a char* to the position in
> the string where we stopped being able to parse it, arguably the
> interface should be:
>
> my $pos = strptime($str, $fmt, \my @tm); # or \my %tm
>
> This would be really tricky though since $str may be a UTF8 string
> and the C library isn't going to handle that, what do we do if it
> returns a position in the middle of a UTF8 character?

The perl API for a POSIX::strptime() function was discussed at length on
p5p@ primarily /because/ I wanted to find a way to have it adjust the
position for partial matches, and also to allow pre-filled struct tm
buffer inbound. I did propose a function of exactly that form but it was
decided as not "perly enough"; that it was better to have it take a
default list and return the parsed list.

> * That Solaris issue, and in general not being able to rely on it
> working in the same manner across OS's, that's inherent in the
> entire POSIX module though.
>
> * It could use more tests, although we have more tests for it already
> than e.g. glibc has for its strptime function.

More tests especially of the UTF-8 upgrade handling logic would be most
welcome. I admit I'm simply not inventive enough in the cornercases of
Perl's Unicode handling to be able to think up some torturous examples,
but I'd happily accept additions anyone else could make.
signature.asc

Paul LeoNerd Evans

unread,
Feb 13, 2012, 1:52:05 PM2/13/12
to Ævar Arnfjörð, Perl5 Porters
On Mon, Feb 13, 2012 at 01:24:29PM +0100, Mark Overmeer wrote:
> One of the main issues with having many small extensions on CPAN, is
> that each of them need a XS component. In many commercial environments
> that means that someone first has to produce an rpm for that package on
> a system which has a compiler by hand: where most distributions do not
> contain the whole of CPAN (yet). Larger modules usually are available.

Plus I simply feel a little nervous about mmap()ing lots of tiny .so
files into a single Perl process, as compared pulling in the single
POSIX.so and grabbing functions where required.

> I would like to grow POSIX::1003. On the moment, smaller functions
> got absorbed, like mknod. I offered Paul to absorb his strptime as
> well.

Feel free to steal tests/docs/code/etc... from my core strptime work
then.
signature.asc

Paul LeoNerd Evans

unread,
Feb 13, 2012, 1:56:52 PM2/13/12
to Leon Timmermans, Perl 5 Porters, Ævar Arnfjörð, Ricardo Signes
On Mon, Feb 13, 2012 at 02:38:30PM +0100, Leon Timmermans wrote:
> I think the only useful function (not counting constants and stubs)
> added to POSIX since 1996 is lchown;

Any possibility to get lutimes() in there as well? :)

http://search.cpan.org/~pevans/File-lchown-0.02/lib/File/lchown.pm

Then I can deprecate this CPAN module since core would have it.

((Also I don't /think/ core had POSIX::lchown at the time I wrote that,
I may have to update some docs now...))

> I think calling our policy
> conservative would be an understatement. I think that was mostly
> because it was always considered a big module, even though most of it
> was useless stubs.

Is it policy, or just nobody bothered to step up? E.g. I found little
resistance to, and general acceptance of my adding IPV6-related stuff,
getaddrinfo, etc.. into Socket. Previously, nobody had really said
"don't do this", it's just nobody with the right intersection of domains
of knowledge and opportunity, had before tried to do it.

> While I don't think it's a good idea to add all of POSIX in one
> module, but some useful functionality would be nice (my personal pet
> pieve would be the lack of an mmap).

mmap() sounds a far harder challenge though; as it has real longlasting
effects on the state of the process, as compared to strptime() which is
simply a values in -> values out conversion function of the same order
of difficulty as the already-extant strftime().
signature.asc

Steffen Mueller

unread,
Feb 13, 2012, 4:52:23 PM2/13/12
to Paul LeoNerd Evans, Leon Timmermans, Perl 5 Porters, Ævar Arnfjörð Bjarmason, Ricardo Signes
On 02/13/2012 07:56 PM, Paul LeoNerd Evans wrote:
> Is it policy, or just nobody bothered to step up? E.g. I found little
> resistance to, and general acceptance of my adding IPV6-related stuff,
> getaddrinfo, etc.. into Socket. Previously, nobody had really said
> "don't do this", it's just nobody with the right intersection of domains
> of knowledge and opportunity, had before tried to do it.

I don't care to comment on the rest of this thread, but IPv6 is
*different* from many things. It's the (not so distant?) future and it
may well be a prerequisite to a slim, CPAN-bootstrappable core in so
many years. For that reason, idealistic as it may be, I consider IPv6
essential going forward.

--Steffen

Ævar Arnfjörð Bjarmason

unread,
Feb 14, 2012, 6:27:20 AM2/14/12
to Paul LeoNerd Evans, Perl 5 Porters, Nicholas Clark, Ricardo Signes
On Mon, Feb 13, 2012 at 19:49, Paul LeoNerd Evans
<leo...@leonerd.org.uk> wrote:
> On Mon, Feb 13, 2012 at 12:51:59PM +0100, Ævar Arnfjörð Bjarmason wrote:
>>  * The general issue of having an OS-dependent date parsing function
>>    in the core. This is an area thoroughly solved by
>>    DateTime::Format::* and just by having it we'll lead a lot of
>>    people be exposed by OS-specific bugs instead of using a stable
>>    CPAN module.
>
> It's no more OS-specific than the rest of the OS-specific stuff in the
> obviously OS-specific "POSIX" module. Most notably strftime, that
> everyone seems quite happy to remain there.

I meant if the only date parsing facility we provide in the core is in
POSIX we inevitably set users up to run into OS-specific bugs, since
it's easier to use the core than install DateTime.

>>  * Those who want it now can just install POSIX::strptime.
>
> Whose tests break on non-Linux.

Did you check with its author whether you could contribute to it? That
might be easier at this point given that some people don't like it
being in core.

>>  * Unlike strptime() itself you can't get a char* to the position in
>>    the string where we stopped being able to parse it, arguably the
>>    interface should be:
>>
>>        my $pos = strptime($str, $fmt, \my @tm); # or \my %tm
>>
>>    This would be really tricky though since $str may be a UTF8 string
>>    and the C library isn't going to handle that, what do we do if it
>>    returns a position in the middle of a UTF8 character?
>
> The perl API for a POSIX::strptime() function was discussed at length on
> p5p@ primarily /because/ I wanted to find a way to have it adjust the
> position for partial matches, and also to allow pre-filled struct tm
> buffer inbound. I did propose a function of exactly that form but it was
> decided as not "perly enough"; that it was better to have it take a
> default list and return the parsed list.

k. I missed that.

>>  * That Solaris issue, and in general not being able to rely on it
>>    working in the same manner across OS's, that's inherent in the
>>    entire POSIX module though.
>>
>>  * It could use more tests, although we have more tests for it already
>>    than e.g. glibc has for its strptime function.
>
> More tests especially of the UTF-8 upgrade handling logic would be most
> welcome. I admit I'm simply not inventive enough in the cornercases of
> Perl's Unicode handling to be able to think up some torturous examples,
> but I'd happily accept additions anyone else could make.

Understandable. I think the tests are OK-ish now.

Nicholas Clark

unread,
Feb 15, 2012, 8:33:57 AM2/15/12
to 600 people, any one of whom could write tests, Ævar Arnfjörð, Paul LeoNerd Evans, Ricardo Signes, Leon Timmermans
On Mon, Feb 13, 2012 at 06:49:47PM +0000, Paul LeoNerd Evans wrote:
> On Mon, Feb 13, 2012 at 12:51:59PM +0100, Ævar Arnfjörð Bjarmason wrote:
> > * The general issue of having an OS-dependent date parsing function
> > in the core. This is an area thoroughly solved by
> > DateTime::Format::* and just by having it we'll lead a lot of
> > people be exposed by OS-specific bugs instead of using a stable
> > CPAN module.
>
> It's no more OS-specific than the rest of the OS-specific stuff in the
> obviously OS-specific "POSIX" module. Most notably strftime, that
> everyone seems quite happy to remain there.

It's been there since perl-5.000

I wouldn't say that I'm *happy* that it's there, just that taking it out
or changing it radically seems worse than leaving it as-is, known bugs and
all.

> > * Those who want it now can just install POSIX::strptime.
>
> Whose tests break on non-Linux.

And if anyone thinks that putting something in the core is a magic solution
to this, it's not.

What happens is that the core has different portability goals from CPAN
modules, and *is* tested on a range of platforms that most CPAN authors
don't (or, more legitimately can't) care for.

The work *does* get done. Usually by someone else. The core is also a lot
more conservative on API changes than most CPAN modules. For a variety of
reasons.



Part of what's frustrating me here is that only *after* the (tentative)
merge to blead did the Win32 build get fixed.

Eh?

We at least *have* a machine building Win32 for smoke-me branches. Wasn't
that the *point* of the smoke-me branch?

This isn't exactly Paul's fault. It's a more general issue with our
development. Very few people are actually willing to help with anyone
else's work.

Other things (such as C89 violations) only also got fixed after the
(tentative) merge to blead.

This is annoying. It's not sustainable for blead to be the bust. As blead
is *supposed* to be the basis point for branching to new work, it's not
scalable for it to have broken windows. (tests that are "known to fail").

Proper TDD doesn't work like that. If you accept that tests can fail
(ie not TODO or SKIP) you end up shipping PHP 5.3.7

[which has a CVE, and a test failing because of the bug that caused the CVE]

However, that's a side issue to the unaddressed questions below:

> > * Unlike strptime() itself you can't get a char* to the position in
> > the string where we stopped being able to parse it, arguably the
> > interface should be:
> >
> > my $pos = strptime($str, $fmt, \my @tm); # or \my %tm
> >
> > This would be really tricky though since $str may be a UTF8 string
> > and the C library isn't going to handle that, what do we do if it
> > returns a position in the middle of a UTF8 character?
>
> The perl API for a POSIX::strptime() function was discussed at length on
> p5p@ primarily /because/ I wanted to find a way to have it adjust the
> position for partial matches, and also to allow pre-filled struct tm
> buffer inbound. I did propose a function of exactly that form but it was
> decided as not "perly enough"; that it was better to have it take a
> default list and return the parsed list.
>
> > * That Solaris issue, and in general not being able to rely on it
> > working in the same manner across OS's, that's inherent in the
> > entire POSIX module though.

Sure, but that doesn't mean that we should repeat old mistakes.

Moreover, the Solaris issue was very specific, and related to the API.

We've special cased Solaris. Do we

a: remove the special case?
b: accept that every time we discover a new OS where there is a special case
available, we should add it?
c: be inconsistent.

No-one on this list is prepared to answer any of those.

> > * It could use more tests, although we have more tests for it already
> > than e.g. glibc has for its strptime function.
>
> More tests especially of the UTF-8 upgrade handling logic would be most
> welcome. I admit I'm simply not inventive enough in the cornercases of
> Perl's Unicode handling to be able to think up some torturous examples,
> but I'd happily accept additions anyone else could make.


That's a request for help. If someone else *wants* this feature in,
please help.


On Mon, Feb 13, 2012 at 06:56:52PM +0000, Paul LeoNerd Evans wrote:
> On Mon, Feb 13, 2012 at 02:38:30PM +0100, Leon Timmermans wrote:

> > I think calling our policy
> > conservative would be an understatement. I think that was mostly
> > because it was always considered a big module, even though most of it
> > was useless stubs.
>
> Is it policy, or just nobody bothered to step up? E.g. I found little
> resistance to, and general acceptance of my adding IPV6-related stuff,
> getaddrinfo, etc.. into Socket. Previously, nobody had really said
> "don't do this", it's just nobody with the right intersection of domains
> of knowledge and opportunity, had before tried to do it.

Probably not. It's because adding things, as you have found, turns out
to be a complete rats nest of portability.

There's also this problem that there is only *one* perl core.
Which means that it can ship only one implementation of POSIX::strptime.

Whereas CPAN has less restriction.

Does one want

a: A thin interface to the OS strptime (bugs and all), which simply doesn't
work on platforms that don't provide strptime (eg Win32, a biggie)
*and* can't guarantee the desired API (partial parsing) as POSIX doesn't
b: Expose the OS, but ship a bundled C version to fall back to
i) always
ii) only if probing the OS version discovers some shortcoming
c: A bundled C version. Which does mean "same behaviour everywhere?"
d: A Perl version. No XS
Which means
i) same behaviour everywhere
ii) more maintainable


The POSIX module historically has provided (a). Given that the functionality
here is an OS convenience function, and *quite possible* to implement in
pure Perl, I'm not actually even sure that options (a) or (b) are the right
choice here.

Whereas IPv6 isn't something you can just implement in pure Perl.
And is more cross-platform than just (notionally) POSIX OSes.

On Tue, Feb 14, 2012 at 12:27:20PM +0100, Ævar Arnfjörð Bjarmason wrote:

> Understandable. I think the tests are OK-ish now.

I don't fully agree. In the process of writing tests, I found that the API
wasn't ideal in places. The API was changed, in slightly incompatible ways.

I'd like *someone else* to write tests, as a way of validating the API
design, and do it *now*, so that if necessary the API can change.

Nicholas Clark

Nicholas Clark

unread,
Feb 15, 2012, 8:37:55 AM2/15/12
to Ævar Arnfjörð, Paul LeoNerd Evans, Perl 5 Porters, Ricardo Signes
On Mon, Feb 13, 2012 at 12:51:59PM +0100, Ęvar Arnfjörš Bjarmason wrote:

> I have my own reservations about it, but I thought at this point it
> would be much better off in blead since we'll get more testing and
> interest in it (e.g. the strptime probe).

That's highlight a bug in our process that needs fixing.

It's not sustainable long term to have the communal main branch be in a
mess.


[I'm repeating stuff from the other reply]

For example, only *after* the (tentative) merge to blead did the Win32 build
get fixed.

Eh?

We at least *have* a machine building Win32 for smoke-me branches. Wasn't
that the *point* of the smoke-me branch?

This isn't exactly Paul's fault. It's a more general issue with our
development. Very few people are actually willing to help with anyone
else's work.

Other things (such as C89 violations) only also got fixed after the
(tentative) merge to blead.

This is annoying. It's not sustainable for blead to be the bust. As blead
is *supposed* to be the basis point for branching to new work, it's not
scalable for it to have broken windows. (tests that are "known to fail").

Proper TDD doesn't work like that. If you accept that tests can fail
(ie not TODO or SKIP) you end up shipping PHP 5.3.7

[which has a CVE, and a test failing because of the bug that caused the CVE]

Nicholas Clark

H.Merijn Brand

unread,
Feb 15, 2012, 8:54:23 AM2/15/12
to perl5-...@perl.org
On Wed, 15 Feb 2012 13:37:55 +0000, Nicholas Clark <ni...@ccl4.org>
wrote:

> On Mon, Feb 13, 2012 at 12:51:59PM +0100, Ævar Arnfjörð Bjarmason wrote:
>
> > I have my own reservations about it, but I thought at this point it
> > would be much better off in blead since we'll get more testing and
> > interest in it (e.g. the strptime probe).
>
> That's highlight a bug in our process that needs fixing.
>
> It's not sustainable long term to have the communal main branch be in a
> mess.
>
>
> [I'm repeating stuff from the other reply]
>
> For example, only *after* the (tentative) merge to blead did the Win32 build
> get fixed.
>
> Eh?
>
> We at least *have* a machine building Win32 for smoke-me branches. Wasn't
> that the *point* of the smoke-me branch?

Yes, but I already am short on free CPU cycles to smoke the main branch
and I don't see that change.

Compared to pre-smoke era, things already have improved a lot, and
breakage that happens only on non-linux (non-mac) OS's is now actually
detected and most often acted upon.

Unless the community can get their hands on processing power and
resources on the deviating OS/arch ports, I don't think there is an
easy way out

> This isn't exactly Paul's fault. It's a more general issue with our
> development. Very few people are actually willing to help with anyone
> else's work.
>
> Other things (such as C89 violations) only also got fixed after the
> (tentative) merge to blead.

and probably with a (huge) delay as the box with that restricted
compiler already needs 24 hours to do a MINIMAL smoke run

smoketime 20 hours 27 minutes (average 10 hours 13 minutes)
O O F ? - -
^-------- Killed due to smoke time limit

FWIW I have no objection in extending the max smoke time to have it
actually *finish* a complete smoke

> This is annoying. It's not sustainable for blead to be the bust. As blead
> is *supposed* to be the basis point for branching to new work, it's not
> scalable for it to have broken windows. (tests that are "known to fail").
>
> Proper TDD doesn't work like that. If you accept that tests can fail
> (ie not TODO or SKIP) you end up shipping PHP 5.3.7
>
> [which has a CVE, and a test failing because of the bug that caused the CVE]
>
> Nicholas Clark


--
H.Merijn Brand http://tux.nl Perl Monger http://amsterdam.pm.org/
using perl5.00307 .. 5.14 porting perl5 on HP-UX, AIX, and openSUSE
http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/
http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/

Leon Timmermans

unread,
Feb 15, 2012, 9:07:36 AM2/15/12
to H.Merijn Brand, perl5-...@perl.org
On Wed, Feb 15, 2012 at 2:54 PM, H.Merijn Brand <h.m....@xs4all.nl> wrote:
> and probably with a (huge) delay as the box with that restricted
> compiler already needs 24 hours to do a MINIMAL smoke run

I think that someone smoking with CC='gcc -std=c89 -ansi' or something
similar would already catch most C89 issues. That seems to work for me
at least.

Leon

Ævar Arnfjörð Bjarmason

unread,
Feb 15, 2012, 1:54:06 PM2/15/12
to Nicholas Clark, 600 people, any one of whom could write tests, Paul LeoNerd Evans, Ricardo Signes, Leon Timmermans
I'll eject strptime out of blead. Let's deal with it in the 5.17
cycle.

Paul LeoNerd Evans

unread,
Feb 15, 2012, 3:08:17 PM2/15/12
to Ævar Arnfjörð, Perl 5 Porters, Nicholas Clark, Ricardo Signes, Leon Timmermans
On Wed, Feb 15, 2012 at 07:54:06PM +0100, Ævar Arnfjörð Bjarmason wrote:
> I'll eject strptime out of blead. Let's deal with it in the 5.17
> cycle.

A little disappointing, but I can accept it has to be done at this late
stage, so close to 5.16.

I can only hope for more involvement in the next round for 5.17 then...
signature.asc

(Dagfinn Ilmari Mannsåker)

unread,
Feb 14, 2012, 4:53:56 AM2/14/12
to Steffen Mueller, Paul LeoNerd Evans, Leon Timmermans, Perl 5 Porters, Ævar Arnfjörð, Ricardo Signes
Speaking of which, is there any chance of 5.16 being bootstrappable in
an IPv6-only envionment? The IO::Socket::IP inclusion was reverted, and
HTTP::Tiny would need to be ported to it as well.

--
ilmari
"A disappointingly low fraction of the human race is,
at any given time, on fire." - Stig Sandbeck Mathisen
0 new messages