automated sub-string search/replace algorithm...

Chris M. Thomasson

unread,

Feb 28, 2010, 1:08:57 AM2/28/10

to

Here is the code:

http://clc.pastebin.com/RM3hyqVE

I am testing two algorithms here. One created by me, and the other created
by Edward Nilges. Here is the post where I gained the source of Edward's
algorithm:

http://groups.google.com/group/comp.lang.c/browse_frm/thread/ad9fea19f2f7dd61

This is basically the same as my original automated sub-string search:

http://groups.google.com/group/comp.lang.c/browse_frm/thread/38aee559668ee60c

except it generates a random exchange string and build an expected result
string. It then runs the source string through the given replacement
algorithm and checks to see if everything is Kosher.

So far, both Edwards algorithm and mine are passing 10,000,000 iterations.
That's quite a bit of randomly generated data. I cannot seem to find any
bugs in the algorithms as-is.

Anyway, I created this because I got sick and tired of manually generating
test cases!

;^)

Chris M. Thomasson

unread,

Feb 28, 2010, 1:12:21 AM2/28/10

to

"Chris M. Thomasson" <n...@spam.invalid> wrote in message
news:KLnin.23287$Dv7....@newsfe17.iad...

> Here is the code:
>
> http://clc.pastebin.com/RM3hyqVE

I posted the wrong fuc%ing link! Here is correction:

http://clc.pastebin.com/YrDmuSpM

Sorry about that!!!

;^(...

[...]

Chris M. Thomasson

unread,

Feb 28, 2010, 1:39:26 AM2/28/10

to

"Chris M. Thomasson" <n...@spam.invalid> wrote in message
news:KLnin.23287$Dv7....@newsfe17.iad...

> Here is the code:
>
> http://clc.pastebin.com/RM3hyqVE
>
>
> I am testing two algorithms here. One created by me, and the other created
> by Edward Nilges. Here is the post where I gained the source of Edward's
> algorithm:
>
> http://groups.google.com/group/comp.lang.c/browse_frm/thread/ad9fea19f2f7dd61

I feel that I should point out the fact that I very slightly altered
Edward's code. I added `const' qualifiers to some variables, and reformatted
the C++ style comments so I could cleanly compile it. The test compiles
without warning on Comeau.

[...]

Ben Bacarisse

unread,

Feb 28, 2010, 9:57:32 AM2/28/10

to

"Chris M. Thomasson" <n...@spam.invalid> writes:
<snip>

> This is basically the same as my original automated sub-string search:
>
> http://groups.google.com/group/comp.lang.c/browse_frm/thread/38aee559668ee60c
>
>
> except it generates a random exchange string and build an expected
> result string. It then runs the source string through the given
> replacement algorithm and checks to see if everything is Kosher.
>
>
> So far, both Edwards algorithm and mine are passing 10,000,000
> iterations. That's quite a bit of randomly generated data. I cannot
> seem to find any bugs in the algorithms as-is.

This is a useful example: 10 million tests and you have not yet found
the bug! It is in a case that I'd test early rather than later and it
survived from the first to the 12th of the 13 posted versions of the
code. Have you tried your automatic tests on even earlier versions?
It would be interesting see what it can pick up.

> Anyway, I created this because I got sick and tired of manually
> generating test cases!

Hand-picked tests have a lot going for them.

--
Ben.

Chris M. Thomasson

unread,

Feb 28, 2010, 6:41:03 PM2/28/10

to

"Ben Bacarisse" <ben.u...@bsb.me.uk> wrote in message
news:0.234757ddeeb7c0b03ba8.2010...@bsb.me.uk...

> "Chris M. Thomasson" <n...@spam.invalid> writes:
> <snip>
>> This is basically the same as my original automated sub-string search:
>>
>> http://groups.google.com/group/comp.lang.c/browse_frm/thread/38aee559668ee60c
>>
>>
>> except it generates a random exchange string and build an expected
>> result string. It then runs the source string through the given
>> replacement algorithm and checks to see if everything is Kosher.
>>
>>
>> So far, both Edwards algorithm and mine are passing 10,000,000
>> iterations. That's quite a bit of randomly generated data. I cannot
>> seem to find any bugs in the algorithms as-is.
>
> This is a useful example: 10 million tests and you have not yet found
> the bug!

;^)

> It is in a case that I'd test early rather than later and it
> survived from the first to the 12th of the 13 posted versions of the
> code. Have you tried your automatic tests on even earlier versions?

I have only tested my own algorithms, and Edwards. When I get some more
time, I will hunt down and add one of the other algorithms posted here.
Well, I did add your `fast replace' algorithm; here is the test:

http://clc.pastebin.com/TzsdaUNW

Look for the `ben_replace()' function. I had to slightly tweak your code in
order to get it compiling cleanly on a non-c99 compiler. BTW, it passes the
test.

:^)

> It would be interesting see what it can pick up.

I have been trying to "fool it" into giving a false positive by deliberately
introducing slight errors here and there. The test finds these errors every
damn time. I have not been able to fool it yet. I change a 1 to a 0, or
something little, and BAM! The test fails and spits out the offending datum.

>> Anyway, I created this because I got sick and tired of manually
>> generating test cases!
>
> Hand-picked tests have a lot going for them.

They sure do. But, I thought it might be a good idea to throw a shi% load of
random data at it. Who knows, the test might present data that ruins the
algorithm. Data that I did not think about testing. So, perhaps a
combination of hand-picked tests and the automated test is the best option.
Humm...

Chris M. Thomasson

unread,

Mar 1, 2010, 12:22:27 AM3/1/10

to

"Chris M. Thomasson" <n...@spam.invalid> wrote in message

news:2aDin.14590$ND2....@newsfe05.iad...

> "Ben Bacarisse" <ben.u...@bsb.me.uk> wrote in message
> news:0.234757ddeeb7c0b03ba8.2010...@bsb.me.uk...
>> "Chris M. Thomasson" <n...@spam.invalid> writes:
>> <snip>
>>> This is basically the same as my original automated sub-string search:
>>>
>>> http://groups.google.com/group/comp.lang.c/browse_frm/thread/38aee559668ee60c
>>>
>>>
>>> except it generates a random exchange string and build an expected
>>> result string. It then runs the source string through the given
>>> replacement algorithm and checks to see if everything is Kosher.
>>>
>>>
>>> So far, both Edwards algorithm and mine are passing 10,000,000
>>> iterations. That's quite a bit of randomly generated data. I cannot
>>> seem to find any bugs in the algorithms as-is.
>>
>> This is a useful example: 10 million tests and you have not yet found
>> the bug!
>
> ;^)
>
>
>
>
>> It is in a case that I'd test early rather than later and it
>> survived from the first to the 12th of the 13 posted versions of the
>> code. Have you tried your automatic tests on even earlier versions?
>
> I have only tested my own algorithms, and Edwards. When I get some more
> time, I will hunt down and add one of the other algorithms posted here.
> Well, I did add your `fast replace' algorithm; here is the test:
>
>
> http://clc.pastebin.com/TzsdaUNW

FWIW, I should probably point out that the following macro defined on line
272:

#define MATCH_MAX 4

is deliberately set to a low value in order to test certain aspects of my
algorithm. The value represents the total number of matches that can be
acquired before I resort to dynamic memory allocation (e.g., `realloc()').
So, if you want the algorithm to "perform better", well, set this value to a
much higher number, say 4096 or something.

[...]

Nick Keighley

unread,

Mar 1, 2010, 3:16:38 AM3/1/10

to

On 28 Feb, 14:57, Ben Bacarisse <ben.use...@bsb.me.uk> wrote:
> "Chris M. Thomasson" <n...@spam.invalid> writes:

> > This is basically the same as my original automated sub-string search:
>

> >http://groups.google.com/group/comp.lang.c/browse_frm/thread/38aee559...

>
> > except it generates a random exchange string and build an expected
> > result string. It then runs the source string through the given
> > replacement algorithm and checks to see if everything is Kosher.
>
> > So far, both Edwards algorithm and mine are passing 10,000,000
> > iterations. That's quite a bit of randomly generated data. I cannot
> > seem to find any bugs in the algorithms as-is.
>
> This is a useful example: 10 million tests and you have not yet found
> the bug!

is it? I'd like some numbers. How big is the space of all possible
tests? What fraction of that space is 10 million tests? How are the
"random" test distributed? Do we have any reason to suppose this
distribution matches the critical parts of the algoritm? How do you
generate the result strings? Doesn't your test result generator look
very similar to your replace function?

<snip>

> > Anyway, I created this because I got sick and tired of manually
> > generating test cases!
>
> Hand-picked tests have a lot going for them.

oh yes.

[consider source strings of up to 26 letters and a replacement string
of 25 letters. I make it about 1e15 different tests (26^6 * 26^5 =
26^11). Now you can do better than this but you have to be careful not
to add some systematic bias]

--
Testing can show the presence of bugs, but not their absence.
-- Dijkstra

Ben Bacarisse

unread,

Mar 1, 2010, 8:07:07 AM3/1/10

to

Nick Keighley <nick_keigh...@hotmail.com> writes:

> On 28 Feb, 14:57, Ben Bacarisse <ben.use...@bsb.me.uk> wrote:
>> "Chris M. Thomasson" <n...@spam.invalid> writes:
>
>> > This is basically the same as my original automated sub-string search:
>>
>> >http://groups.google.com/group/comp.lang.c/browse_frm/thread/38aee559...
>>
>> > except it generates a random exchange string and build an expected
>> > result string. It then runs the source string through the given
>> > replacement algorithm and checks to see if everything is Kosher.
>>
>> > So far, both Edwards algorithm and mine are passing 10,000,000
>> > iterations. That's quite a bit of randomly generated data. I cannot
>> > seem to find any bugs in the algorithms as-is.
>>
>> This is a useful example: 10 million tests and you have not yet found
>> the bug!
>
> is it? I'd like some numbers. How big is the space of all possible
> tests? What fraction of that space is 10 million tests? How are the
> "random" test distributed? Do we have any reason to suppose this
> distribution matches the critical parts of the algoritm? How do you
> generate the result strings? Doesn't your test result generator look
> very similar to your replace function?

Why are you asking me? I can answer these questions if you like, but
what is the point of them?

It doesn't matter how many tests are done. I suspect, that since the
bug invokes UB, Chris's test program can't find it (at least it can't
be assured of finding it). My point (such as it was) is that the
number of random tests passed bears little relation to the confidence
one should have in a piece of software.

> <snip>
>
>> > Anyway, I created this because I got sick and tired of manually
>> > generating test cases!
>>
>> Hand-picked tests have a lot going for them.
>
> oh yes.
>
> [consider source strings of up to 26 letters and a replacement string
> of 25 letters. I make it about 1e15 different tests (26^6 * 26^5 =
> 26^11). Now you can do better than this but you have to be careful not
> to add some systematic bias]

I am not sure of the point of this counting. I'm not even sure what
you are counting. I think you agree that carefully chosen tests are
the way to go. This choosing might be done by machine (at least the
last time I looked there was some very promising work in that area)
but they usually have to be picked by humans.

--
Ben.

Nick Keighley

unread,

Mar 1, 2010, 8:25:32 AM3/1/10

to

On 1 Mar, 13:07, Ben Bacarisse <ben.use...@bsb.me.uk> wrote:

> Nick Keighley <nick_keighley_nos...@hotmail.com> writes:
> > On 28 Feb, 14:57, Ben Bacarisse <ben.use...@bsb.me.uk> wrote:
> >> "Chris M. Thomasson" <n...@spam.invalid> writes:

> >> > This is basically the same as my original automated sub-string search:
>
> >> >http://groups.google.com/group/comp.lang.c/browse_frm/thread/38aee559...
>
> >> > except it generates a random exchange string and build an expected
> >> > result string. It then runs the source string through the given
> >> > replacement algorithm and checks to see if everything is Kosher.
>
> >> > So far, both Edwards algorithm and mine are passing 10,000,000
> >> > iterations. That's quite a bit of randomly generated data. I cannot
> >> > seem to find any bugs in the algorithms as-is.
>
> >> This is a useful example: 10 million tests and you have not yet found
> >> the bug!

there is a bug? I think I misunderstood you. I thought you were saying
passing 10M tests significantly increased our confidence in the
program. You were actually saying it didn't!

> > is it? I'd like some numbers. How big is the space of all possible
> > tests? What fraction of that space is 10 million tests? How are the
> > "random" test distributed? Do we have any reason to suppose this
> > distribution matches the critical parts of the algoritm? How do you
> > generate the result strings? Doesn't your test result generator look
> > very similar to your replace function?
>
> Why are you asking me? I can answer these questions if you like, but
> what is the point of them?

they are rhetorical. I suspect the tested space is enormously smaller
than the entire test space. Even if we limit our string sizes to very
finite values.

> It doesn't matter how many tests are done.

well some problems might submit brute force search

> I suspect, that since the
> bug invokes UB, Chris's test program can't find it (at least it can't
> be assured of finding it). My point (such as it was) is that the
> number of random tests passed bears little relation to the confidence
> one should have in a piece of software.

I think we agree

> >> > Anyway, I created this because I got sick and tired of manually
> >> > generating test cases!
>
> >> Hand-picked tests have a lot going for them.
>
> > oh yes.

the next bit was pure gibberish I have changed it

replace:-

> > [consider source strings of up to 26 letters and a replacement string

> > [...]

with:-

[consider source strings of up to 6 letters and a replacement string
of 5 letters. I make that about 1e15 different tests (26^6 * 26^5 =

26^11). Now you can do better than this but you have to be careful not
to add some systematic bias]

I think the task of analysing a test program that runs in non-
geological time to show it is a reasonable test is comparable to the
task of proving the correctness of the original program.

> I am not sure of the point of this counting. I'm not even sure what
> you are counting.

I'm trying, incompetantly, to show just how nasty combinatorial
explosion can be.

> I think you agree that carefully chosen tests are
> the way to go. This choosing might be done by machine

interesting

Ben Bacarisse

unread,

Mar 1, 2010, 8:38:12 AM3/1/10

to

Nick Keighley <nick_keigh...@hotmail.com> writes:

> On 1 Mar, 13:07, Ben Bacarisse <ben.use...@bsb.me.uk> wrote:
>> Nick Keighley <nick_keighley_nos...@hotmail.com> writes:
>> > On 28 Feb, 14:57, Ben Bacarisse <ben.use...@bsb.me.uk> wrote:
>> >> "Chris M. Thomasson" <n...@spam.invalid> writes:
>
>> >> > This is basically the same as my original automated sub-string search:
>>
>> >> >http://groups.google.com/group/comp.lang.c/browse_frm/thread/38aee559...
>>
>> >> > except it generates a random exchange string and build an expected
>> >> > result string. It then runs the source string through the given
>> >> > replacement algorithm and checks to see if everything is Kosher.
>>
>> >> > So far, both Edwards algorithm and mine are passing 10,000,000
>> >> > iterations. That's quite a bit of randomly generated data. I cannot
>> >> > seem to find any bugs in the algorithms as-is.
>>
>> >> This is a useful example: 10 million tests and you have not yet found
>> >> the bug!
>
> there is a bug? I think I misunderstood you. I thought you were saying
> passing 10M tests significantly increased our confidence in the
> program. You were actually saying it didn't!

Yup. Chris accidentally took a version of the code that had a known
bug in it so, to me, the test became interesting: Can these random tests
find the bug we know to be there? It seems not (and I am not really
surprised).

>> > is it? I'd like some numbers. How big is the space of all possible
>> > tests? What fraction of that space is 10 million tests? How are the
>> > "random" test distributed? Do we have any reason to suppose this
>> > distribution matches the critical parts of the algoritm? How do you
>> > generate the result strings? Doesn't your test result generator look
>> > very similar to your replace function?
>>
>> Why are you asking me? I can answer these questions if you like, but
>> what is the point of them?
>
> they are rhetorical. I suspect the tested space is enormously smaller
> than the entire test space. Even if we limit our string sizes to very
> finite values.
>
>> It doesn't matter how many tests are done.
>
> well some problems might submit brute force search

Sure. I meant in this case.

>> I suspect, that since the
>> bug invokes UB, Chris's test program can't find it (at least it can't
>> be assured of finding it). My point (such as it was) is that the
>> number of random tests passed bears little relation to the confidence
>> one should have in a piece of software.
>
> I think we agree

I think we do!

<snip>
--
Ben.

Malcolm McLean

unread,

Mar 1, 2010, 8:51:28 AM3/1/10

to

On Mar 1, 3:07 pm, Ben Bacarisse <ben.use...@bsb.me.uk> wrote:
> My point (such as it was) is that the
> number of random tests passed bears little relation to the confidence
> one should have in a piece of software.
>

As an example, comnsider a naive implementation of qsort(). If you
feed it random permutations performace will always be O(NlogN). It's
only when you feed it sorted data that it will overflow the stack.
However sorted data is quite likely to be fed to the function when
using for real.

ImpalerCore

unread,

Mar 1, 2010, 10:33:47 AM3/1/10

to

On Mar 1, 8:51 am, Malcolm McLean <malcolm.mcle...@btinternet.com>
wrote:

I agree in general, that random sets of data do not test for crashes
as well as boundary testing, and passing sorted data is one such
boundary test to a recursive sorting algorithm.

On a side note, if you know that qsort has issues with getting fed
almost sorted data, can't you just do a O(n) random permutation step
on the data to mitigate those problems?

for each i in array
create random index k between 0 and array size
swap array[i] with array[k]

I have a recursive (possibly naive) implementation of quicksort that
I've been using, but I haven't exercised it on data sets large enough
to generate stack overflows (yet). However, I do run into the use
case where quicksort is passed almost sorted data. I'm just wondering
if this is a viable (if hackish) method to encourage better average
performance for almost sorted data sets.

Eric Sosman

unread,

Mar 1, 2010, 10:53:35 AM3/1/10

to

On 3/1/2010 10:33 AM, ImpalerCore wrote:
> [...]

> On a side note, if you know that qsort has issues with getting fed
> almost sorted data, can't you just do a O(n) random permutation step
> on the data to mitigate those problems?

That could reduce the likelihood of pathological behavior
in Quicksort (if qsort() uses Quicksort), but would not eliminate
it. It might, in fact, scramble an "easy" sequence into a "hard"
one. Also, there are cheaper ways to obtain a similar effect
than to generate N random numbers (you can do it with just one
random number per partitioning pass).

> for each i in array
> create random index k between 0 and array size
> swap array[i] with array[k]

Note that this scrambling algorithm is "random," but biased:
Some rearrangements are more likely than others.

> I have a recursive (possibly naive) implementation of quicksort that
> I've been using, but I haven't exercised it on data sets large enough
> to generate stack overflows (yet). However, I do run into the use
> case where quicksort is passed almost sorted data. I'm just wondering
> if this is a viable (if hackish) method to encourage better average
> performance for almost sorted data sets.

Read Bentley and McIlroy's "Engineering a Sort Function."
(Note that their final version of qsort() sometimes fails to
satisfy 7.20.5p2. I don't see the reason for 7.20.5p2, and the
Rationale is unenlightening, but it's there nonetheless.)

--
Eric Sosman
eso...@ieee-dot-org.invalid

Malcolm McLean

unread,

Mar 1, 2010, 11:24:31 AM3/1/10

to

to write shuffle (untested)

void shuffle(void *data, size_t N, size_t width)
{
size_t i;
size_t target;

for(i=0;i<N-1;i++)
{
target = (size_t) ((rand()/ (RAND_MAX + 1.0)) * (N-i) + i);
memswap( (unsigned char *)data + (i*width), (unsigned char *)data
+(target *width), width);
}
}

Note deliberate bug in this code. Anyone spot it?
Another snag is that memswap isn't a standard library function. In
fact it's hard to provide an efficient, portable implementation.

Quicksort needs a pivot. The most convenient pivot is the first item
in the array. The problem is that if you choose it, and the data is
sorted or almost sorted, the tree always has one brach with a single
cell. Thus stack overflow is likely.
The generally accepted way to get round the problem is by choosing
three random candidates for pivot, then taking the middle one.

Chris M. Thomasson

unread,

Mar 1, 2010, 11:30:29 AM3/1/10

to

"Ben Bacarisse" <ben.u...@bsb.me.uk> wrote in message

news:0.cfe2184fc46570e7992a.2010...@bsb.me.uk...

> Nick Keighley <nick_keigh...@hotmail.com> writes:
>
>> On 1 Mar, 13:07, Ben Bacarisse <ben.use...@bsb.me.uk> wrote:
>>> Nick Keighley <nick_keighley_nos...@hotmail.com> writes:
>>> > On 28 Feb, 14:57, Ben Bacarisse <ben.use...@bsb.me.uk> wrote:
>>> >> "Chris M. Thomasson" <n...@spam.invalid> writes:
>>
>>> >> > This is basically the same as my original automated sub-string
>>> >> > search:
>>>
>>> >> >http://groups.google.com/group/comp.lang.c/browse_frm/thread/38aee559...
>>>
>>> >> > except it generates a random exchange string and build an expected
>>> >> > result string. It then runs the source string through the given
>>> >> > replacement algorithm and checks to see if everything is Kosher.
>>>
>>> >> > So far, both Edwards algorithm and mine are passing 10,000,000
>>> >> > iterations. That's quite a bit of randomly generated data. I cannot
>>> >> > seem to find any bugs in the algorithms as-is.
>>>
>>> >> This is a useful example: 10 million tests and you have not yet found
>>> >> the bug!
>>
>> there is a bug? I think I misunderstood you. I thought you were saying
>> passing 10M tests significantly increased our confidence in the
>> program. You were actually saying it didn't!
>
> Yup. Chris accidentally took a version of the code that had a known
> bug in it so, to me, the test became interesting: Can these random tests
> find the bug we know to be there? It seems not (and I am not really
> surprised).

Are you referring to Edward's code? I don't know where the most recent
version is. Could you be so kind as to point me toward the bug please? Humm,
is it this one:

http://groups.google.com/group/comp.lang.c/msg/e3677e5f0cb30c43

Here is the version I am working with:

http://groups.google.com/group/comp.lang.c/msg/2c2eb92cd6dbe071

And if I manually feed it:

replace("a", "x", "b");

I am getting a result string of "a".

FWIW, here is a bugged version of Edward's code:

http://groups.google.com/group/comp.lang.c/msg/3b35d28b8e85b2e2

and the test finds an error by seg-faulting...

[...]

ImpalerCore

unread,

Mar 1, 2010, 11:50:05 AM3/1/10

to

On Mar 1, 10:53 am, Eric Sosman <esos...@ieee-dot-org.invalid> wrote:
> On 3/1/2010 10:33 AM, ImpalerCore wrote:
>
> > [...]
> > On a side note, if you know that qsort has issues with getting fed
> > almost sorted data, can't you just do a O(n) random permutation step
> > on the data to mitigate those problems?
>
> That could reduce the likelihood of pathological behavior
> in Quicksort (if qsort() uses Quicksort), but would not eliminate
> it. It might, in fact, scramble an "easy" sequence into a "hard"
> one. Also, there are cheaper ways to obtain a similar effect
> than to generate N random numbers (you can do it with just one
> random number per partitioning pass).
>
> > for each i in array
> > create random index k between 0 and array size
> > swap array[i] with array[k]
>
> Note that this scrambling algorithm is "random," but biased:
> Some rearrangements are more likely than others.

Can you explain what is biased? The bias from the rng as a quality of
implementation, or the method of iterating over the array?

> > I have a recursive (possibly naive) implementation of quicksort that
> > I've been using, but I haven't exercised it on data sets large enough
> > to generate stack overflows (yet). However, I do run into the use
> > case where quicksort is passed almost sorted data. I'm just wondering
> > if this is a viable (if hackish) method to encourage better average
> > performance for almost sorted data sets.
>
> Read Bentley and McIlroy's "Engineering a Sort Function."
> (Note that their final version of qsort() sometimes fails to
> satisfy 7.20.5p2. I don't see the reason for 7.20.5p2, and the
> Rationale is unenlightening, but it's there nonetheless.)

Thanks for the link. I'm pretty rusty on algorithm stuff now compared
to when I learned it in college.

> --
> Eric Sosman
> esos...@ieee-dot-org.invalid

Chris M. Thomasson

unread,

Mar 1, 2010, 12:08:42 PM3/1/10

to

"Ben Bacarisse" <ben.u...@bsb.me.uk> wrote in message

news:0.c901a16c35a7aa6362ba.2010...@bsb.me.uk...

> Nick Keighley <nick_keigh...@hotmail.com> writes:
>
>> On 28 Feb, 14:57, Ben Bacarisse <ben.use...@bsb.me.uk> wrote:
>>> "Chris M. Thomasson" <n...@spam.invalid> writes:
>>
>>> > This is basically the same as my original automated sub-string search:
>>>
>>> >http://groups.google.com/group/comp.lang.c/browse_frm/thread/38aee559...
>>>
>>> > except it generates a random exchange string and build an expected
>>> > result string. It then runs the source string through the given
>>> > replacement algorithm and checks to see if everything is Kosher.
>>>
>>> > So far, both Edwards algorithm and mine are passing 10,000,000
>>> > iterations. That's quite a bit of randomly generated data. I cannot
>>> > seem to find any bugs in the algorithms as-is.
>>>
>>> This is a useful example: 10 million tests and you have not yet found
>>> the bug!
>>
>> is it? I'd like some numbers. How big is the space of all possible
>> tests? What fraction of that space is 10 million tests? How are the
>> "random" test distributed? Do we have any reason to suppose this
>> distribution matches the critical parts of the algoritm? How do you
>> generate the result strings? Doesn't your test result generator look
>> very similar to your replace function?
>
> Why are you asking me? I can answer these questions if you like, but
> what is the point of them?
>
> It doesn't matter how many tests are done. I suspect, that since the
> bug invokes UB, Chris's test program can't find it (at least it can't
> be assured of finding it).

It will only be able to find the bug if the resulting undefined behavior
ends up spitting out a string that is different than the expected string.
Unfortunately, I find Edward's code a bit cryptic and am having trouble
manually finding the bug which would create a broken result. Can you give me
a hint? lol.

;^)

> My point (such as it was) is that the
> number of random tests passed bears little relation to the confidence
> one should have in a piece of software.

Agreed. However, it did "help me" find a couple of bugs in a sub-string
search algorithm. Apparently, I was not able to manually construct a string
that broke it. Luckily, the random testing did create a couple of them. I
hope I did not imply that passing 10,000,000 tests means that the algorithm
is 100% bug free. If I did, well, I apologize for that total non-sense!

Malcolm McLean

unread,

Mar 1, 2010, 12:23:41 PM3/1/10

to

On Mar 1, 6:50 pm, ImpalerCore <jadil...@gmail.com> wrote:
> On Mar 1, 10:53 am, Eric Sosman <esos...@ieee-dot-org.invalid> wrote:
>
> > On 3/1/2010 10:33 AM, ImpalerCore wrote:
>
> > > for each i in array
> > > create random index k between 0 and array size
> > > swap array[i] with array[k]
>
> > Note that this scrambling algorithm is "random," but biased:
> > Some rearrangements are more likely than others.
>
> Can you explain what is biased? The bias from the rng as a quality of
> implementation, or the method of iterating over the array?
>

The first element should have a 1/N chance of remaining in place.
Initially it does, you swap it with a random element, which may
include itself.
However you then subject it to further potential swaps. However
there's no way of it getting back to the first position if it is
swapped by one of these swaps. (I think, as I work it out in my head,
if there is a way it's vanishingly improbable). So its chance of
remaining in first place is less than 1/N. Quite substantially less,
in fact, the chance the the first element will be chosen in a random
swap is quite high. p = 1 - ((N-1)/N)^N, which works out at about 0.63
if N is 100.

Ben Bacarisse

unread,

Mar 1, 2010, 1:07:10 PM3/1/10

to

No. The bug is the last one reported: that an attempt to match in an
empty string ends up by having the code use an indeterminate pointer.

<snip>
--
Ben.

ImpalerCore

unread,

Mar 1, 2010, 1:11:40 PM3/1/10

to

On Mar 1, 11:24 am, Malcolm McLean <malcolm.mcle...@btinternet.com>

I didn't notice the bug from just observation of the code.

rand() - number between 0 and RAND_MAX
RAND_MAX + 1.0 - convert to floating point
x = rand() / (RAND_MAX + 1.0) - should be floating point range between
0 <= x < 1

if i == 0
target = (x) * N, range 0 <= target < N

if i == N-2
target = (x) * 2 + N-2, range N-2 <= target < N

It seems ok, so I'm obviously missing the error.

However, running some test samples seemed to show that the target for
i = 0 never changed with different seeds. I'm not familiar enough
with rand to know why that is. Or maybe I missed the bug.

> Quicksort needs a pivot. The most convenient pivot is the first item
> in the array. The problem is that if you choose it, and the data is
> sorted or almost sorted, the tree always has one brach with a single
> cell. Thus stack overflow is likely.
> The generally accepted way to get round the problem is by choosing
> three random candidates for pivot, then taking the middle one.

I used the "randomizer" when I needed to merge two sorted arrays of
varying sizes into a single sorted array, and the pivot I chose in my
quicksort is the first element in the partition. I'll look into
modifying the algorithm to use a random pivot.

Thanks,
John D.

Keith Thompson

unread,

Mar 1, 2010, 1:13:01 PM3/1/10

to

Eric Sosman <eso...@ieee-dot-org.invalid> writes:
[...]

> Read Bentley and McIlroy's "Engineering a Sort Function."
> (Note that their final version of qsort() sometimes fails to
> satisfy 7.20.5p2. I don't see the reason for 7.20.5p2, and the
> Rationale is unenlightening, but it's there nonetheless.)

C99 7.20.5p2 says:

The implementation shall ensure that the second argument of
the comparison function (when called from bsearch), or both
arguments (when called from qsort), are pointers to elements
of the array. The first argument when called from bsearch
shall equal key.

That seems like a reasonable and necessary requirement. What am I
missing?

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Eric Sosman

unread,

Mar 1, 2010, 1:15:49 PM3/1/10

to

On 3/1/2010 11:50 AM, ImpalerCore wrote:
> On Mar 1, 10:53 am, Eric Sosman<esos...@ieee-dot-org.invalid> wrote:
>> On 3/1/2010 10:33 AM, ImpalerCore wrote:
>>> [...]

>>> for each i in array
>>> create random index k between 0 and array size
>>> swap array[i] with array[k]
>>
>> Note that this scrambling algorithm is "random," but biased:
>> Some rearrangements are more likely than others.
>
> Can you explain what is biased? The bias from the rng as a quality of
> implementation, or the method of iterating over the array?

(Drifting away from C here; maybe comp.programming would
be a better forum. Follow-ups set.)

The latter. There could be additional problems with the
generator, but that's another matter: The shuffle as given is
biased even with a perfectly uniform generator.

Consider the case of an array holding the three elements
A,B,C. You'll call the generator three times, getting one of
0,1,2 each time. Three outcomes for the first call, three for
the second, three for the third, 3**3 = 27 possible shuffles
in all, and if the generator is good they'll be equiprobable.

Each of the 27 shuffles leads to some permutation of A,B,C,
and there are 3! = 6 such permutations. But 27 is not divisible
by 6, so it is not possible to have every permutation produced
by the same number of shuffles. The best you might hope for would
be three permutations produced by four shuffles each and the other
three by five shuffles, so if the shuffles are equiprobable
three permutations will have probability 4/27 ~= 14.8% and the
others will have probability 5/27 ~= 18.5%.

More generally, the method is biased whenever N**N is not
divisible by N! -- that is, whenever N > 2. Fortunately, a very
simple modification to your algorithm rescues the situation; you
may find it instructive to figure it out for yourself. (Hint:
The only change required is in the second line.)

--
Eric Sosman
eso...@ieee-dot-org.invalid

Eric Sosman

unread,

Mar 1, 2010, 2:25:10 PM3/1/10

to

On 3/1/2010 1:13 PM, Keith Thompson wrote:
> Eric Sosman<eso...@ieee-dot-org.invalid> writes:
> [...]
>> Read Bentley and McIlroy's "Engineering a Sort Function."
>> (Note that their final version of qsort() sometimes fails to
>> satisfy 7.20.5p2. I don't see the reason for 7.20.5p2, and the
>> Rationale is unenlightening, but it's there nonetheless.)
>
> C99 7.20.5p2 says:
>
> The implementation shall ensure that the second argument of
> the comparison function (when called from bsearch), or both
> arguments (when called from qsort), are pointers to elements
> of the array. The first argument when called from bsearch
> shall equal key.
>
> That seems like a reasonable and necessary requirement. What am I
> missing?

For "convenient" element sizes, Bentley & McIlroy will
copy the pivot element to an outside-the-array temporary and
call the comparator with a pointer to the temporary and a
pointer to an in-the-array element. IIRC, this makes some of
their array rearrangements simpler.

As far as I can see, the only thing a comparator can do with
the 7.20.5p2 guarantee that it cannot do with B&Mc is calculate
the array indices of the two arguments, or use <,<=,>=,> on the
pointers, or equivalent. But I can't imagine a comparator that
could do so usefully, given that the array may get rearranged both
before and after each comparator call. So I don't understand why
7.20.5p2 is a requirement. For bsearch(), yes, just possibly, but
for qsort()? I don't get it.

--
Eric Sosman
eso...@ieee-dot-org.invalid

Eric Sosman

unread,

Mar 1, 2010, 2:33:08 PM3/1/10

to

On 3/1/2010 1:11 PM, ImpalerCore wrote:
> On Mar 1, 11:24 am, Malcolm McLean<malcolm.mcle...@btinternet.com>
> wrote:
>> On Mar 1, 5:33 pm, ImpalerCore<jadil...@gmail.com> wrote:
>>
>>> On Mar 1, 8:51 am, Malcolm McLean<malcolm.mcle...@btinternet.com>
>>> wrote:
>>
>>> On a side note, if you know that qsort has issues with getting fed
>>> almost sorted data, can't you just do a O(n) random permutation step
>>> on the data to mitigate those problems?
>>
>>> for each i in array
>>> create random index k between 0 and array size
>>> swap array[i] with array[k]
>>
>>> I have a recursive (possibly naive) implementation of quicksort that
>>> I've been using, but I haven't exercised it on data sets large enough
>>> to generate stack overflows (yet). However, I do run into the use
>>> case where quicksort is passed almost sorted data. I'm just wondering
>>> if this is a viable (if hackish) method to encourage better average
>>> performance for almost sorted data sets.
>>
>> to write shuffle (untested)
>>
>> void shuffle(void *data, size_t N, size_t width)
>> {
>> size_t i;
>> size_t target;
>>
>> for(i=0;i<N-1;i++)

>> [...]

>> Note deliberate bug in this code. Anyone spot it?

>> [...]

> I didn't notice the bug from just observation of the code.

Consider the case N==0. McLean is size_t-phobic, and
never passes up an opportunity to prove how bad size_t is by
finding ways to misuse it.

(There might be other bugs, too, but I stopped reading
after the first blatant gaffe.)

--
Eric Sosman
eso...@ieee-dot-org.invalid

Chris M. Thomasson

unread,

Mar 1, 2010, 3:10:22 PM3/1/10

to

"Ben Bacarisse" <ben.u...@bsb.me.uk> wrote in message

news:0.c51fba21fc3b7f543f6d.2010...@bsb.me.uk...

Ahhh! Well, the piece of shi% test does not create empty source strings, so
no wonder why it failed to catch it. Stupid me!!!!!

DAMN IT!!! GRRR!

Okay... I will fix the random string generation so that it can create zero
length strings and post it here when I get some time.

BTW, thanks a lot Ben! I appreciate all of your input.

Ben Bacarisse

unread,

Mar 1, 2010, 3:10:56 PM3/1/10

to

ImpalerCore <jadi...@gmail.com> writes:

You can't be sure of this. If RAND_MAX is very large the calculation
can overcome the precision of the floating point system and you can
get x == 1.0. All you need is a 64 bit int and a conventional double.

> if i == 0
> target = (x) * N, range 0 <= target < N
>
> if i == N-2
> target = (x) * 2 + N-2, range N-2 <= target < N
>
> It seems ok, so I'm obviously missing the error.

The other two bugs (if they can be called that) are (a) the bias
introduced by the random number calculation (small, but it might
matter) and (b) the fact that you can't shuffle no data (N == 0).

It's possible that none of these is the one planted, in which case
I've missed it too.

<snip>
--
Ben.

Chris M. Thomasson

unread,

Mar 1, 2010, 3:30:41 PM3/1/10

to

"Chris M. Thomasson" <n...@spam.invalid> wrote in message

news:OaVin.2425$NH1...@newsfe14.iad...

> "Ben Bacarisse" <ben.u...@bsb.me.uk> wrote in message
> news:0.c51fba21fc3b7f543f6d.2010...@bsb.me.uk...

[...]

>>> Are you referring to Edward's code? I don't know where the most recent
>>> version is. Could you be so kind as to point me toward the bug please?
>>> Humm, is it this one:
>>>
>>> http://groups.google.com/group/comp.lang.c/msg/e3677e5f0cb30c43
>>
>> No. The bug is the last one reported: that an attempt to match in an
>> empty string ends up by having the code use an indeterminate pointer.
>
> Ahhh! Well, the piece of shi% test does not create empty source strings,
> so no wonder why it failed to catch it. Stupid me!!!!!
>
> DAMN IT!!! GRRR!
>
>
> Okay... I will fix the random string generation so that it can create zero
> length strings and post it here when I get some time.
>
>
> BTW, thanks a lot Ben! I appreciate all of your input.

Here is the updated test:

http://clc.pastebin.com/tqdPpa8w

Now, this makes Edwards code seg-fault. Also, it reports an error in your
algorithm on the following input:

ben_replace("", "", "xxx");

It spits out a zero-length string. This fails because my testing code
expects it to result in a string equal to "xxx". Did you design it that way
on purpose? My reasoning is that if you have an empty source and empty
comparand, then you got a match.

Any thoughts?

Seebs

unread,

Mar 1, 2010, 3:31:28 PM3/1/10

to

On 2010-03-01, Chris M. Thomasson <n...@spam.invalid> wrote:
> Any thoughts?

Mine returns a null pointer for an empty needle, because there's no obvious
reason it should be "xxx" rather than "xxxxxx", because immediately after
the first empty string, there's a second. And come to think of it, a third
right after that. And a fourth...

So IMHO, an empty needle should be rejected as invalid. I don't think it
makes sense to special case an empty source string, because it isn't any
emptier than the space between a and b in "ab".

(And I don't think we want to go to "but the empty string is identified by
the NUL at the end", because that's still a special case -- in other cases,
we don't look for or match the NUL.)

-s
--
Copyright 2010, all wrongs reversed. Peter Seebach / usenet...@seebs.net
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!

Chris M. Thomasson

unread,

Mar 1, 2010, 3:51:23 PM3/1/10

to

"Seebs" <usenet...@seebs.net> wrote in message
news:slrnhoo9ch.k8r...@guild.seebs.net...

> On 2010-03-01, Chris M. Thomasson <n...@spam.invalid> wrote:
>> Any thoughts?
>
> Mine returns a null pointer for an empty needle, because there's no
> obvious
> reason it should be "xxx" rather than "xxxxxx", because immediately after
> the first empty string, there's a second. And come to think of it, a
> third
> right after that. And a fourth...
>
> So IMHO, an empty needle should be rejected as invalid. I don't think it
> makes sense to special case an empty source string, because it isn't any
> emptier than the space between a and b in "ab".

I really need to think about this. One reason why I did it is because in
Microsoft Word, if I do a search for any character (e.g., the "^?" code) and
replace it with "xxx" on a completely empty document, the result is "xxx".
Then if I repeat the action the result is "xxxxxxxxxxxx". The first action
is identical to how my replace works... However, the second one differs
because my replace will treat a non-empty source and a empty comparand as a
non-match. This is inconsistent behavior.

Do you think it would be pointless/stupid to try and emulate the behavior of
MS Word?

Interesting... Humm...

Chris M. Thomasson

unread,

Mar 1, 2010, 3:54:19 PM3/1/10

to

"Chris M. Thomasson" <n...@spam.invalid> wrote in message
news:DNVin.14660$ND2....@newsfe05.iad...

> "Seebs" <usenet...@seebs.net> wrote in message
> news:slrnhoo9ch.k8r...@guild.seebs.net...
>> On 2010-03-01, Chris M. Thomasson <n...@spam.invalid> wrote:
>>> Any thoughts?
>>
>> Mine returns a null pointer for an empty needle, because there's no
>> obvious
>> reason it should be "xxx" rather than "xxxxxx", because immediately after
>> the first empty string, there's a second. And come to think of it, a
>> third
>> right after that. And a fourth...
>>
>> So IMHO, an empty needle should be rejected as invalid. I don't think it
>> makes sense to special case an empty source string, because it isn't any
>> emptier than the space between a and b in "ab".
>
> I really need to think about this. One reason why I did it is because in
> Microsoft Word, if I do a search for any character (e.g., the "^?" code)
> and replace it with "xxx" on a completely empty document, the result is
> "xxx". Then if I repeat the action the result is "xxxxxxxxxxxx". The first
> action is identical to how my replace works... However, the second one
> differs because my replace will treat a non-empty source and a empty
> comparand as a non-match. This is inconsistent behavior.
>
> Do you think it would be pointless/stupid to try and emulate the behavior
> of MS Word?
>
> Interesting... Humm...

Ahh fuc% it! I will not special case it. I will alter my replace algorithm
and my test in order to change it's expectations on empty comparands.

Nick

unread,

Mar 1, 2010, 3:57:12 PM3/1/10

to

"Chris M. Thomasson" <n...@spam.invalid> writes:

My view is that an empty match string should return the target string
unaltered, whatever it is. Or should be an error.

You can treat both empty as a special case I suppose, but my argument is
that replace("abc","","xx") should clearly return "abc" (or be an
error). So by extension, although replace("","","xx") could return
"xx", it ought to be "". Or an error - I'm coming rapidly to that
conclusion.
--
Online waterways route planner | http://canalplan.eu
Plan trips, see photos, check facilities | http://canalplan.org.uk

Keith Thompson

unread,

Mar 1, 2010, 3:59:38 PM3/1/10

to

Seebs <usenet...@seebs.net> writes:
> On 2010-03-01, Chris M. Thomasson <n...@spam.invalid> wrote:
>> Any thoughts?
>
> Mine returns a null pointer for an empty needle, because there's no obvious
> reason it should be "xxx" rather than "xxxxxx", because immediately after
> the first empty string, there's a second. And come to think of it, a third
> right after that. And a fourth...
>
> So IMHO, an empty needle should be rejected as invalid. I don't think it
> makes sense to special case an empty source string, because it isn't any
> emptier than the space between a and b in "ab".
>
> (And I don't think we want to go to "but the empty string is identified by
> the NUL at the end", because that's still a special case -- in other cases,
> we don't look for or match the NUL.)

It might make sense to look at existing practice.

In sed and vim, an empty needle isn't even supported; "s/foo/bar/"
replaces "foo" by "bar", but "s//bar/" replaces the previous search
pattern by "bar". This doesn't apply to the current problem, so
there's no help there.

In Perl, an empty needle is found once at each possible position.
So, for example, this:

$s = "hello";
$s =~ s//FOO/g;

sets $s to "FOOhFOOeFOOlFOOlFOOoFOO", which I suppose makes as much
sense as anything.

Seebs

unread,

Mar 1, 2010, 4:04:52 PM3/1/10

to

On 2010-03-01, Chris M. Thomasson <n...@spam.invalid> wrote:
> Do you think it would be pointless/stupid to try and emulate the behavior of
> MS Word?

So far as I know, yes.

Ben Bacarisse

unread,

Mar 1, 2010, 5:13:24 PM3/1/10

to

Nick <3-no...@temporary-address.org.uk> writes:
<snip>

> My view is that an empty match string should return the target string
> unaltered, whatever it is. Or should be an error.

I don't like the idea of signalling an error when the error is
trivially testable by the caller. In a language with exceptions, I'd
raise one but, in C, there is little to be gained by returning NULL
(say) for anything but a failure that can't be tested for in advance
(such as running out of memory).

The caller can always wrap the function to do what they want in any of
the cases that one might be tempted to signal as errors (NULL pointers
or empty strings).

--
Ben.

Ben Bacarisse

unread,

Mar 1, 2010, 5:27:56 PM3/1/10

to

Keith Thompson <ks...@mib.org> writes:

> Seebs <usenet...@seebs.net> writes:
<snip>

>> So IMHO, an empty needle should be rejected as invalid. I don't think it
>> makes sense to special case an empty source string, because it isn't any
>> emptier than the space between a and b in "ab".
>>
>> (And I don't think we want to go to "but the empty string is identified by
>> the NUL at the end", because that's still a special case -- in other cases,
>> we don't look for or match the NUL.)
>
> It might make sense to look at existing practice.
>
> In sed and vim, an empty needle isn't even supported; "s/foo/bar/"
> replaces "foo" by "bar", but "s//bar/" replaces the previous search
> pattern by "bar". This doesn't apply to the current problem, so
> there's no help there.
>
> In Perl, an empty needle is found once at each possible position.
> So, for example, this:
>
> $s = "hello";
> $s =~ s//FOO/g;
>
> sets $s to "FOOhFOOeFOOlFOOlFOOoFOO", which I suppose makes as much
> sense as anything.

In one way this makes more sense for an regular expression replace
(such a Perl's) than for a string replace like the one under
discussion. The reason is that an RE match is usually defined as
being the "longest" one. It is therefore reasonable that an empty RE
should match all of the infinite number of consecutive match positions
at each character. Conversely, a string replacer should try to
replace them all, one by one.

--
Ben.

Ike Naar

unread,

Mar 1, 2010, 6:02:21 PM3/1/10

to

In article <slrnhoo9ch.k8r...@guild.seebs.net>,

Seebs <usenet...@seebs.net> wrote:
>So IMHO, an empty needle should be rejected as invalid. I don't think it
>makes sense to special case an empty source string, because it isn't any
>emptier than the space between a and b in "ab".

Perhaps an empty needle need not be rejected if the replacement
is also empty; in that case the result string should be the same
as the source string (replacing an arbitrary number of ""-s by
""-s is a no-op).

Seebs

unread,

Mar 1, 2010, 6:29:34 PM3/1/10

to

True, but I think that feels like even more of a special case to me.

I think I'd rather just say "you have to have something in mind to
search for".

bartc

unread,

Mar 1, 2010, 6:42:14 PM3/1/10

to

"Seebs" <usenet...@seebs.net> wrote in message
news:slrnhoo9ch.k8r...@guild.seebs.net...

> On 2010-03-01, Chris M. Thomasson <n...@spam.invalid> wrote:
>> Any thoughts?
>
> Mine returns a null pointer for an empty needle, because there's no
> obvious
> reason it should be "xxx" rather than "xxxxxx", because immediately after
> the first empty string, there's a second. And come to think of it, a
> third
> right after that. And a fourth...

If the replacement algorithm starts with a quick check like this:

'if inputstring == matchstring then return replacementstring',

Then ("abc","abc","xyz) results in "xyz";

("ab","ab","xyz") results in "xyz";

("a","a","xyz") results in "xyz"; and you then expect:

("","","xyz") to also result in "xyz" (in other words, (S,S,U) always gives
U))

> So IMHO, an empty needle should be rejected as invalid. I don't think it
> makes sense to special case an empty source string, because it isn't any
> emptier than the space between a and b in "ab".

Searching for empty strings is meaningless, but comparing them isn't (imo).

--
Bartc

Ben Bacarisse

unread,

Mar 1, 2010, 7:05:18 PM3/1/10

to

i...@localhost.claranet.nl (Ike Naar) writes:

... but maybe it should take an unbounded amount of time to do it :-)

Smileys aside, good point.

--
Ben.

Nick Keighley

unread,

Mar 2, 2010, 3:05:44 AM3/2/10

to

On 1 Mar, 20:10, "Chris M. Thomasson" <n...@spam.invalid> wrote:
> "Ben Bacarisse" <ben.use...@bsb.me.uk> wrote in message

>
> news:0.c51fba21fc3b7f543f6d.2010...@bsb.me.uk...
>
>
>
>
>
> > "Chris M. Thomasson" <n...@spam.invalid> writes:
>

> >> "Ben Bacarisse" <ben.use...@bsb.me.uk> wrote in message
> >>news:0.cfe2184fc46570e7992a.2010...@bsb.me.uk...

> BTW, thanks a lot Ben! I appreciate all of your input.- Hide quoted text -

this rather confirms the misgivings about random testing...

Tim Rentsch

unread,

Mar 26, 2010, 7:34:02 AM3/26/10

to

Eric Sosman <eso...@ieee-dot-org.invalid> writes:

I suspect this provision is included to avoid any difficulties
regarding alignment. If other memory (besides array elements) may be
used, the alignment conditions for that memory have to be specified,
and that isn't so easy to do. Furthermore the implementation may not
have enough information available to align the "other memory"
properly. For example, suppose we have an implementation where the
maximum alignment is 8, but a program, using platform-specific code,
creates an array whose element alignments are 32 (to coincide with
cache lines or device registers or something). There is no way for
qsort to get suitably aligned "other memory", since there isn't any
way of knowing what that alignment is (and may be even more stringent
than the most restrictive alignment that the implementation itself
uses). The easiest and most obvious way to avoid all these problems
is to use just the original array element objects when calling the
user-supplied comparison function.