[perl #41889] [PATCH] hoist cut and paste duplication into a function in src/library.c, also possible win32 bug found ?

0 views
Skip to first unread message

Mike Mattie

unread,
Mar 18, 2007, 10:30:11 AM3/18/07
to bugs-bi...@rt.perl.org
# New Ticket Created by Mike Mattie
# Please include the string: [perl #41889]
# in the subject line of all future correspondence about this issue.
# <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=41889 >


Hello,

While mucking around in src/library.c I noticed some cut & paste duplication. It looked like a fairly simple hoist
so I have attached the changes I made.

I do not have a win32 platform available for testing so I haven't been able to compile test it.

while ( (cnv = strchr(path->strstart, '/')) )
*cnv = '\\';

this looks totally broken. unless path->strstart is a stateful iterator this will only
convert the first instance of a '/' character.

I think it should look like this:

cnv = path->strstart;
while ( (cnv = strchr(cnv, '/')) ) {
*cnv = '\\';
++cnv;
}

I am not sure though because I can't imagine how parrot would pass any part of the win32 test suite with
this kind of a bug.

Note that the version I considered broken is in the patch. If I am wrong about the bug then the
hoisted code doesn't have to be reverted to the old form.

Cheers,
Mike Mattie (coder...@gmail.com)


remove-win32-dupcode.diff
signature.asc

Jonathan Worthington

unread,
Mar 18, 2007, 11:35:14 AM3/18/07
to perl6-i...@perl.org
Mike Mattie (via RT) wrote:
> While mucking around in src/library.c I noticed some cut & paste duplication. It looked like a fairly simple hoist so I have attached the changes I made.
>
> I do not have a win32 platform available for testing so I haven't been able to compile test it.
>
> while ( (cnv = strchr(path->strstart, '/')) )
> *cnv = '\\';
>
> this looks totally broken. unless path->strstart is a stateful iterator this will only convert the first instance of a '/' character.
>
Nope, it works fine. The first time you call strchr it finds the first
/, but by the second time you run it that has been transformed to a \ so
it finds the "second" / (now the first one), and so forth.

I think in the long run we'll want to provide a more generic way to
transform paths to the way the current platform likes them, but in the
meantime I'm very much up for reducing copy and paste code; I'll apply
this tonight if no other Win32ers beat me to it. :-)

Thanks,

Jonathan

Nicholas Clark

unread,
Mar 18, 2007, 3:30:35 PM3/18/07
to Jonathan Worthington, perl6-i...@perl.org
On Sun, Mar 18, 2007 at 03:35:14PM +0000, Jonathan Worthington wrote:
> Mike Mattie (via RT) wrote:
> >While mucking around in src/library.c I noticed some cut & paste
> >duplication. It looked like a fairly simple hoist so I have attached the
> >changes I made.
> >
> >I do not have a win32 platform available for testing so I haven't been
> >able to compile test it.
> >
> > while ( (cnv = strchr(path->strstart, '/')) )
> > *cnv = '\\';
> >
> >this looks totally broken. unless path->strstart is a stateful iterator
> >this will only convert the first instance of a '/' character.
> >
> Nope, it works fine. The first time you call strchr it finds the first
> /, but by the second time you run it that has been transformed to a \ so
> it finds the "second" / (now the first one), and so forth.

But painfully inefficiently. (Probably this doesn't matter here)

I suspect the loop wants to become

cnv = path->strstart;
while (*cnv) {
if (*cnv == '/')
*cnv = '\\';
}
++cnv;
}

which should at least be O(n) rather than O(n²) for a string of length n.

> I think in the long run we'll want to provide a more generic way to
> transform paths to the way the current platform likes them, but in the
> meantime I'm very much up for reducing copy and paste code; I'll apply
> this tonight if no other Win32ers beat me to it. :-)

I don't disagree with that longer term plan.

On that subject, I remember 10 years ago so so, network drives on MS DOS named
ux: and uy:
Are 2 letter drive names still valid? Do they confuse most code?

Nicholas Clark

Mike Mattie

unread,
Mar 18, 2007, 6:59:34 PM3/18/07
to perl6-i...@perl.org
On Sun, 18 Mar 2007 19:30:35 +0000
Nicholas Clark <ni...@ccl4.org> wrote:

> On Sun, Mar 18, 2007 at 03:35:14PM +0000, Jonathan Worthington wrote:
> > Mike Mattie (via RT) wrote:
> > >While mucking around in src/library.c I noticed some cut & paste
> > >duplication. It looked like a fairly simple hoist so I have
> > >attached the changes I made.
> > >

[snip]


> > >
> > > while ( (cnv = strchr(path->strstart, '/')) )
> > > *cnv = '\\';
> > >

[snip]


> But painfully inefficiently. (Probably this doesn't matter here)
>
> I suspect the loop wants to become
>
> cnv = path->strstart;
> while (*cnv) {
> if (*cnv == '/')
> *cnv = '\\';
> }
> ++cnv;
> }
>
> which should at least be O(n) rather than O(n²) for a string of
> length n.

you are correct. I do like the strchr() call over a manual iteration
though. GCC will aggressively optimize some of the standard library
string functions, and I would guess that many other compilers do as well.

It at least leaves the door open to picking up the benefits of
future hardware/compiler changes. It's also more friendly to static-analysis.

> > I think in the long run we'll want to provide a more generic way to
> > transform paths to the way the current platform likes them, but in
> > the meantime I'm very much up for reducing copy and paste code;
> > I'll apply this tonight if no other Win32ers beat me to it. :-)
>
> I don't disagree with that longer term plan.

I like it. typically this kind of stuff ends up in config.h with
auto-tools projects, but the file becomes a big heap. Something like
platform_io.h might be a nice route. gather all of the IO related
platform quirks into one spot, topically organized.

signature.asc

Jonathan Worthington

unread,
Mar 18, 2007, 9:10:34 PM3/18/07
to perl6-i...@perl.org
Mike Mattie (via RT) wrote:
> While mucking around in src/library.c I noticed some cut & paste duplication. It looked like a fairly simple hoist so I have attached the changes I made.
>
Needed a small fix (note: C89, declarations must come first in a block).
Did that, and it's now applied in r17626.

Thanks!

Jonathan

Joshua Juran

unread,
Mar 18, 2007, 11:33:25 PM3/18/07
to Mike Mattie, perl6-i...@perl.org

I can't imagine someone else hasn't already come up with

cnv = path->strstart;
while ( (cnv = strchr( cnv, '/' )) )
{
*cnv = '\\';
}

but I didn't see it posted, so here it is just in case.

Josh


Nicholas Clark

unread,
Mar 19, 2007, 7:17:53 AM3/19/07
to Joshua Juran, Mike Mattie, perl6-i...@perl.org
On Sun, Mar 18, 2007 at 08:33:25PM -0700, Joshua Juran wrote:

> I can't imagine someone else hasn't already come up with
>
> cnv = path->strstart;
> while ( (cnv = strchr( cnv, '/' )) )
> {
> *cnv = '\\';
> }
>
> but I didn't see it posted, so here it is just in case.

Bikesheds, I know, but this will be faster:

cnv = path->strstart;
while ( (cnv = strchr( cnv, '/' )) )
{

*cnv++ = '\\';
}

because there's no way that the character at cnv can match '/' now :-)

Whether its faster than the loop without strchr probably depends on whether
the compiler inlines strchr(), whether the optimised version is efficient
on non-word aligned strings, and whether the optimised version doesn't have
a higher start cost. Either is better* than the current, and I don't have
a Win32 platform to test on, so can't test a change.

Nicholas Clark

* Yes, even in a non-speed critical area, because having poor algorithms around
will tempt people to base other code on them.

Mike Mattie

unread,
Mar 19, 2007, 8:01:29 AM3/19/07
to Nicholas Clark, parrot ML
On Mon, 19 Mar 2007 11:17:53 +0000
Nicholas Clark <ni...@ccl4.org> wrote:

> On Sun, Mar 18, 2007 at 08:33:25PM -0700, Joshua Juran wrote:
>

[snip]


>
> cnv = path->strstart;
> while ( (cnv = strchr( cnv, '/' )) )
> {
> *cnv++ = '\\';
> }
>

[snip]



> Whether its faster than the loop without strchr probably depends on
> whether the compiler inlines strchr(), whether the optimised version
> is efficient on non-word aligned strings, and whether the optimised
> version doesn't have a higher start cost. Either is better* than the
> current, and I don't have a Win32 platform to test on, so can't test
> a change.

all that optimization hair is another reason to punt it to strchr().
I do like your use of postfix increment here, elegant.

> Nicholas Clark
>
> * Yes, even in a non-speed critical area, because having poor
> algorithms around will tempt people to base other code on them.

that basing argument is dead on. when I was bread-crumb dumping the
path searches it was generating ~4-5 paths per hit. With a large
environment variable set, vendor paths, extension guessing etc, this
could go alot higher.

Also one of the largest problems I have had with java is the hideous startup
time. probably why you see alot of heavy apps, and don't see as many quickie
shebangs with java.

It all adds up. Perl scores extremely high on start speed. parrot will need
that crown as well to comfortably replace perl5.

I can think of one other reason in a similar spirit, maybe even the
most important one. This being free-sofware and not some $work$
project, complete with a pointy-hair holding a schedule like a gun, we
have the opportunity to take joy in polishing code until it shines.

as far as strchr() vs. pointer it's peanuts in comparison to the
algorithmic improvement of the function.

my preference is O(n) !

I have this on my TODO list now, but if someone beats me too it I will be
very happy. I am focused on [patch] 5 now, rt # 41908 which will probably
keep me quite busy.

$work is going to dominate my time as well until the end of the month at
least.

Cheers,
Mike Mattie (coder...@gmail.com)

signature.asc

Joshua Juran

unread,
Mar 19, 2007, 1:59:33 PM3/19/07
to Nicholas Clark, Mike Mattie, perl6-i...@perl.org
On Mar 19, 2007, at 4:17 AM, Nicholas Clark wrote:

> On Sun, Mar 18, 2007 at 08:33:25PM -0700, Joshua Juran wrote:
>
>> I can't imagine someone else hasn't already come up with
>>
>> cnv = path->strstart;
>> while ( (cnv = strchr( cnv, '/' )) )
>> {
>> *cnv = '\\';
>> }
>>
>> but I didn't see it posted, so here it is just in case.
>
> Bikesheds, I know, but this will be faster:
>
> cnv = path->strstart;
> while ( (cnv = strchr( cnv, '/' )) )
> {
> *cnv++ = '\\';
> }
>
> because there's no way that the character at cnv can match '/' now :-)

I agree that that condition holds, but I'm not convinced your
optimization yields a speed increase. It saves a byte comparison per
match, (which I missed before) but it doesn't prevent any calls to
strchr(). And it makes the code a tiny bit larger, and a tiny bit
more complex. I'd probably go with yours over mine for
'correctness'[1], though, unless profiling indicated otherwise.

[1] Regardless of speed, it 'looks' wasteful to duplicate that byte
comparison. It's usually important that the reader of the code have
confidence in it.

Anyway, your performance claim is a load of null if the pathname
conversion is followed by a filesystem operation such as stat(). :-)

> Whether its faster than the loop without strchr probably depends on
> whether
> the compiler inlines strchr(), whether the optimised version is
> efficient
> on non-word aligned strings, and whether the optimised version
> doesn't have
> a higher start cost. Either is better* than the current, and I
> don't have
> a Win32 platform to test on, so can't test a change.
>
> Nicholas Clark
>
> * Yes, even in a non-speed critical area, because having poor
> algorithms around
> will tempt people to base other code on them.

Josh


Reply all
Reply to author
Forward
0 new messages