trim whitespace, bullet proof version

John Kelly

unread,

Aug 21, 2010, 1:38:51 AM8/21/10

to

/*

Define author
John Kelly, August 20, 2010

Define copyright
Copyright John Kelly, 2010. All rights reserved.

Define license
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this work except in compliance with the License.
You may obtain a copy of the License at:
http://www.apache.org/licenses/LICENSE-2.0

Define symbols and (words)
exam ......... temporary char *
hast ......... temporary char * to alpha !isspace
keep ......... temporary char * to omega !isspace
ts ........... temporary string
tz ........... temporary ssize_t

*/

# include <ctype.h>
# include <errno.h>
# include <limits.h>
# include <stdio.h>
# include <string.h>

static int
trim (char **ts)
{
ssize_t tz;
unsigned char *exam;
unsigned char *hast;
unsigned char *keep;

if (!ts || !*ts) {
errno = EINVAL;
return -1;
}
tz = 0;
exam = (unsigned char *) *ts;
while (++tz < SSIZE_MAX && isspace (*exam)) {
++exam;
}
if (tz == SSIZE_MAX) {
errno = EOVERFLOW;
return -1;
}
tz = 0;
hast = keep = exam;
while (++tz < SSIZE_MAX && *exam) {
if (!isspace (*exam)) {
keep = exam;
}
++exam;
}
if (tz == SSIZE_MAX) {
errno = EOVERFLOW;
return -1;
}
if (*keep) {
*++keep = '\0';
}
if (keep - hast < 0) {
errno = EOVERFLOW;
return -1;
}
tz = keep - hast;
if (hast != (unsigned char *) *ts) {
(void) memmove (*ts, hast, tz + 1);
}
return tz;
}

void
testme (char *ts)
{
int tn;
tn = trim (&ts);
printf ("%3d characters in string=[%s]\n", tn, ts);
}

int
main (void)
{
char s1[] = " abc";
char s2[] = "def ";
char s3[] = " ghi ";
char s4[] = " ";
char s5[] = "";
char s6[] = " \n such a \n\t\t\t\t funny thing ";

printf ("trim whitespace\n");

testme (NULL);

testme (&s1[1]);
printf ("string=[%s]\n", s1);

testme (s2);
testme (s3);
testme (s4);
testme (s5);
testme (s6);

testme (s1);
testme (s2);
testme (s3);
testme (s4);
testme (s5);
testme (s6);

return 0;
}
--
Web mail, POP3, and SMTP
http://www.beewyz.com/freeaccounts.php

Shao Miller

unread,

Aug 21, 2010, 1:44:59 AM8/21/10

to

John Kelly wrote:
> trim whitespace, bullet proof version

Please forgive me, but: Where does 'ssize_t' come from?

John Kelly

unread,

Aug 21, 2010, 1:57:21 AM8/21/10

to

On Sat, 21 Aug 2010 01:44:59 -0400, Shao Miller <sha0....@gmail.com>
wrote:

>John Kelly wrote:
>> trim whitespace, bullet proof version
>Please forgive me, but: Where does 'ssize_t' come from?

Which is included by one of my listed headers, at least on my test box.

John Kelly

unread,

Aug 21, 2010, 2:08:52 AM8/21/10

to

On Sat, 21 Aug 2010 05:38:51 +0000, John Kelly <j...@isp2dial.com> wrote:

>trim whitespace, bullet proof version

Get rid of unnecessary test:

--- xc.c Sat Aug 21 07:03:25 2010
+++ xc.c Sat Aug 21 07:06:59 2010
@@ -64,10 +64,6 @@

if (*keep) {
*++keep = '\0';
}

- if (keep - hast < 0 || keep - hast == SSIZE_MAX) {
- errno = EOVERFLOW;
- return -1;
- }

tz = keep - hast;
if (hast != (unsigned char *) *ts) {
(void) memmove (*ts, hast, tz + 1);

Shao Miller

unread,

Aug 21, 2010, 2:11:17 AM8/21/10

to

John Kelly wrote:
> On Sat, 21 Aug 2010 01:44:59 -0400, Shao Miller <sha0....@gmail.com>
> wrote:
>
>> John Kelly wrote:
>>> trim whitespace, bullet proof version
>> Please forgive me, but: Where does 'ssize_t' come from?
>
> <sys/types.h>
>
> Which is included by one of my listed headers, at least on my test box.

Thanks.

By the way, what about looping for multiple 'memmove's, each one with an
upper bound of 'SIZE_MAX'? That is, loop through everything with a
'SIZE_MAX' upper bound, but as many times as needed?

Should 'trim' return a 'ssize_t'?

Are you using 'unsigned char *' for a particular reason?

That's a neat trick for a wrap-around pointer arithmetic check.

John Kelly

unread,

Aug 21, 2010, 2:13:33 AM8/21/10

to

On Sat, 21 Aug 2010 06:08:52 +0000, John Kelly <j...@isp2dial.com> wrote:

>Get rid of unnecessary test:
>
>
>--- xc.c Sat Aug 21 07:03:25 2010
>+++ xc.c Sat Aug 21 07:06:59 2010
>@@ -64,10 +64,6 @@
> if (*keep) {
> *++keep = '\0';
> }
>- if (keep - hast < 0 || keep - hast == SSIZE_MAX) {
>- errno = EOVERFLOW;
>- return -1;
>- }
> tz = keep - hast;
> if (hast != (unsigned char *) *ts) {
> (void) memmove (*ts, hast, tz + 1);

That patch is against the wrong version. It's getting late. I think
this is the right one. Just kill the test of keep - hast < 0. That's
impossible, I see.

--- xc.c Sat Aug 21 07:03:25 2010
+++ xc.c Sat Aug 21 07:06:59 2010
@@ -64,10 +64,6 @@
if (*keep) {
*++keep = '\0';
}

- if (keep - hast < 0) {

John Kelly

unread,

Aug 21, 2010, 2:27:39 AM8/21/10

to

On Sat, 21 Aug 2010 02:11:17 -0400, Shao Miller <sha0....@gmail.com>
wrote:

>By the way, what about looping for multiple 'memmove's, each one with an

>upper bound of 'SIZE_MAX'? That is, loop through everything with a
>'SIZE_MAX' upper bound, but as many times as needed?

Maybe. I just wanted to avoid calling memmove with a negative length,
so I used SSIZE_MAX to limit how far I scan, from hast to keep.

It's open source. If you want to hack on it, go for it. Seems to me
like the brainiacs in this place ought to get organized, and develop a
canonical function library. But they better use my favorite license!

>Should 'trim' return a 'ssize_t'?

ssize_t is typedef to a signed int. You think an explicit cast is
better?

>Are you using 'unsigned char *' for a particular reason?

I wrote that part so long ago, I don't remember why.

>That's a neat trick for a wrap-around pointer arithmetic check.

What is?

Time for sleep ...

Eric Sosman

unread,

Aug 21, 2010, 8:26:23 AM8/21/10

to

On 8/21/2010 1:38 AM, John Kelly wrote:
> [...]

> Define author
> John Kelly, August 20, 2010
>
> Define copyright
> Copyright John Kelly, 2010. All rights reserved.
>
> Define license
> Licensed under the Apache License, Version 2.0 (the "License");
> you may not use this work except in compliance with the License.
> You may obtain a copy of the License at:
> http://www.apache.org/licenses/LICENSE-2.0

> [...]

Are you the same "John Kelly" who contributed

> On piratebay you can find many ebooks to download. More than you can
> shake a stick at.

... to the "Where next?" thread? If so, I spit on your copyright,
on your hoity-toity license, and on your double standard.

--
Eric Sosman
eso...@ieee-dot-org.invalid

Ben Bacarisse

unread,

Aug 21, 2010, 8:41:39 AM8/21/10

to

John Kelly <j...@isp2dial.com> writes:

> trim whitespace, bullet proof version

Since you can't dodge all the bullets, it would be more helpful to what
ones you are proofing against. I.e. what is the contract between the
caller and called program? What are you trying to guard against?

> /*
<snip>

> Define symbols and (words)
> exam ......... temporary char *
> hast ......... temporary char * to alpha !isspace
> keep ......... temporary char * to omega !isspace
> ts ........... temporary string
> tz ........... temporary ssize_t

I.e. rather than describe the internal temporary objects, what do you
assume about the environment and about the pointer that is passed to
generate what guarantees from the function.

> */
>
> # include <ctype.h>
> # include <errno.h>
> # include <limits.h>
> # include <stdio.h>
> # include <string.h>
>
> static int
> trim (char **ts)
> {
> ssize_t tz;

I think ptrdiff_t is a better choice for this. If it is not a big
enough type you are sunk anyway (since keep - hast has this type no
matter what you assign the result to) and if it is big enough you make
the code use only C types.

<snip>
--
Ben.

John Kelly

unread,

Aug 21, 2010, 11:59:43 AM8/21/10

to

On Sat, 21 Aug 2010 13:41:39 +0100, Ben Bacarisse <ben.u...@bsb.me.uk>
wrote:

>Since you can't dodge all the bullets, it would be more helpful to what
>ones you are proofing against. I.e. what is the contract between the
>caller and called program? What are you trying to guard against?

The instances of setting errno and returning -1 are the contract terms.
You could add comments about that, but code and comments always get out
of sync. Documenting the errors would be good, but I think a man page
is a better place to list them. Any volunteers?

>> /*
><snip>
>> Define symbols and (words)
>> exam ......... temporary char *
>> hast ......... temporary char * to alpha !isspace
>> keep ......... temporary char * to omega !isspace
>> ts ........... temporary string
>> tz ........... temporary ssize_t
>
>I.e. rather than describe the internal temporary objects, what do you
>assume about the environment and about the pointer that is passed to
>generate what guarantees from the function.

I prefer few comments. Just enough to hint at what's going on. Code
that's well thought out should have a natural interface needing little
explanation. If there are gotchas in the code or interface, the code
should be refined until they're gone.

The purpose of the function is to trim leading and trailing whitespace
from any string found at the pointer location, whether data or garbage.
That could be spelled out, but the less said, the better.

>> static int
>> trim (char **ts)
>> {
>> ssize_t tz;
>
>I think ptrdiff_t is a better choice for this.

ssize_t tz counts the iterations looking for \0. SSIZE_MAX limits the
iterations looking \0. To be sure of calling memmove with a positive
number, the chosen limit must be such that (keep - hast) will always be
positive.

>If it is not a big
>enough type you are sunk anyway (since keep - hast has this type no
>matter what you assign the result to) and if it is big enough you make
>the code use only C types.

From what I understood of the ptrdiff discussion, ssize_t and SSIZE_MAX
satisfy the requirement for the memmove length to be positive . Or did
I misunderstand?

John Kelly

unread,

Aug 21, 2010, 12:36:35 PM8/21/10

to

On Sat, 21 Aug 2010 08:26:23 -0400, Eric Sosman
<eso...@ieee-dot-org.invalid> wrote:

> Are you the same "John Kelly" who contributed
>
> > On piratebay you can find many ebooks to download. More than you can
> > shake a stick at.
>
>... to the "Where next?" thread? If so, I spit on your copyright,
>on your hoity-toity license, and on your double standard.

Yeah those cargo cult newbies are the devil aren't they.

:-D

James Waldby

unread,

Aug 21, 2010, 2:02:35 PM8/21/10

to

On Sat, 21 Aug 2010 15:59:43 +0000, John Kelly wrote:
> On Sat, 21 Aug 2010 13:41:39 +0100, Ben Bacarisse wrote:
...

>> John Kelly wrote:
>>> Define symbols and (words)
>>> exam ......... temporary char *
>>> hast ......... temporary char * to alpha !isspace

...
>> ... rather than describe the internal temporary objects, [tell] what

>> you assume about the environment and about the pointer that is passed
>> to generate what guarantees from the function.
>
> I prefer few comments. Just enough to hint at what's going on. Code
> that's well thought out should have a natural interface needing little
> explanation. If there are gotchas in the code or interface, the code
> should be refined until they're gone.
>
> The purpose of the function is to trim leading and trailing whitespace
> from any string found at the pointer location, whether data or garbage.
> That could be spelled out, but the less said, the better.

...

One purpose of comments is to let maintainers know what the code is
supposed to be doing. The test cases that you wrote don't invoke
the errors that your code attempts to detect. Your comments ought
perhaps to say what you are trying to detect, in case the code needs
fixing in the future after it reports errors.

It's a good idea to say, at the beginning of a function, what the
intent of that function is. For example, you could add the following
at front:
/* Function static int trim(char **ts) trims leading and trailing
whitespace from garbage or whatever at location *ts. It returns the
trimmed string length if no error was detected, else EOVERFLOW or EINVAL.
*/

Since your new version doesn't modify *ts one could use parameter
list (char *ts) rather than (char **ts) [or could use (char const *ts)
if you add a (char *)ts cast in memmove call].

--
jiw

John Kelly

unread,

Aug 21, 2010, 2:12:22 PM8/21/10

to

On Sat, 21 Aug 2010 15:59:43 +0000, John Kelly <j...@isp2dial.com> wrote:

>From what I understood of the ptrdiff discussion, ssize_t and SSIZE_MAX
>satisfy the requirement for the memmove length to be positive . Or did
>I misunderstand?

Uh-oh. That assumes SSIZE_MAX will always be <= PTRDIFF_MAX / 2.

Normally it will be, but I should check to be sure.

The maximum value (>= 0) that memmove can accept, is what defines the
limit for the number of iterations looking for \0.

One problem is, my test platform (Interix) does not have PTRDIFF_MAX or
SIZE_MAX. Only SSIZE_MAX. One other platform I looked at, linux, also
has SSIZE_MAX, so that's what I went with.

I suppose you could have a macro to test what's available and use the
largest that satisfies <= PTRDIFF_MAX / 2.

Or should I say <= ((PTRDIFF_MAX / 2) - 1)

Or to be really sure how about <= ((PTRDIFF_MAX - 1) / 2)

Right now I can't think hard enough to know which is correct to avoid an
off-by-one error.

John Kelly

unread,

Aug 21, 2010, 2:27:25 PM8/21/10

to

On Sat, 21 Aug 2010 18:02:35 +0000 (UTC), James Waldby <n...@no.no> wrote:

>One purpose of comments is to let maintainers know what the code is
>supposed to be doing. The test cases that you wrote don't invoke
>the errors that your code attempts to detect.

Testing for overflow was too much work. And it will probably crash
Interix. Any volunteers?

>Your comments ought
>perhaps to say what you are trying to detect, in case the code needs
>fixing in the future after it reports errors.

>It's a good idea to say, at the beginning of a function, what the
>intent of that function is. For example, you could add the following
>at front:
>/* Function static int trim(char **ts) trims leading and trailing
> whitespace from garbage or whatever at location *ts. It returns the
> trimmed string length if no error was detected, else EOVERFLOW or EINVAL.
>*/

OK.

>Since your new version doesn't modify *ts one could use parameter
>list (char *ts) rather than (char **ts) [or could use (char const *ts)
>if you add a (char *)ts cast in memmove call].

I didn't think of that. I'll review it again, for improvements. When I
posted trim(), I didn't know how much work I was getting into. :-/

Keith Thompson

unread,

Aug 21, 2010, 2:33:44 PM8/21/10

to

John Kelly <j...@isp2dial.com> writes:
[...]

> Testing for overflow was too much work. And it will probably crash
> Interix. Any volunteers?

[...]

Wasn't testing for overflow the main point? Do you still believe
it's possible?

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Seebs

unread,

Aug 21, 2010, 2:50:02 PM8/21/10

to

On 2010-08-21, Keith Thompson <ks...@mib.org> wrote:
> John Kelly <j...@isp2dial.com> writes:
> [...]
>> Testing for overflow was too much work. And it will probably crash
>> Interix. Any volunteers?
> [...]

> Wasn't testing for overflow the main point? Do you still believe
> it's possible?

Haven't you noticed? He's going to do things that are amazing and
prove the naysayers wrong, which makes him feel important. If he can't
achieve devastating success in a few days, he'll give an excuse for
why it wasn't important, which doesn't make him feel bad because he
explained why it doesn't matter.

This allows him to get the thrill of being a revolutionary without any
actual point at which he has to think about what people say or do any
hard work.

Read up on pathological narcissism. It's not impossible to derive value
from responding to his posts, but you have to do so with a clear
understanding of what he's doing and why or it will be very frustrating.

-s
--
Copyright 2010, all wrongs reversed. Peter Seebach / usenet...@seebs.net
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!

John Kelly

unread,

Aug 21, 2010, 2:51:11 PM8/21/10

to

On Sat, 21 Aug 2010 11:33:44 -0700, Keith Thompson <ks...@mib.org>
wrote:

>John Kelly <j...@isp2dial.com> writes:
>[...]
>> Testing for overflow was too much work. And it will probably crash
>> Interix. Any volunteers?
>[...]
>
>Wasn't testing for overflow the main point?

We have different ideas of what overflow means in this context. Have
you tested the code, or are you having fun pontificating?

>Do you still believe it's possible?

What? Overflow, or testing for it?

Keith Thompson

unread,

Aug 21, 2010, 3:18:07 PM8/21/10

to

John Kelly <j...@isp2dial.com> writes:
> On Sat, 21 Aug 2010 11:33:44 -0700, Keith Thompson <ks...@mib.org>
> wrote:
>>John Kelly <j...@isp2dial.com> writes:
>>[...]
>>> Testing for overflow was too much work. And it will probably crash
>>> Interix. Any volunteers?
>>[...]
>>
>>Wasn't testing for overflow the main point?
>
> We have different ideas of what overflow means in this context.

So what's your idea of what overflow means in this context?

> Have
> you tested the code, or are you having fun pontificating?

I haven't tested the code. I'm not sure exactly what it's supposed to
do. Provide some documentation, and I might try it.

In any case, testing isn't necessarily useful in the presence of
undefined behavior.

>>Do you still believe it's possible?
>
> What? Overflow, or testing for it?

Testing for it.

Your stated goal was to create a "bullet proof" version of your
trim() function. Now you say testing for "overflow" is too much
work.

Just what error conditions do you intend to handle? What error
conditions, if any, can you not handle?

My position is this: It is not possible, given your specification
for trim(), to implement it in a way that avoids undefined behavior
unless you're able to control all possible callers. You can define
the circumstances in which its behavior is undefined, and define
the behavior in cases where it is defined. You cannot entirely
eliminate undefined behavior. Do you believe that you can? If not,
what *exactly* does "bullet proof" mean?

John Kelly

unread,

Aug 21, 2010, 4:01:29 PM8/21/10

to

On Sat, 21 Aug 2010 12:18:07 -0700, Keith Thompson <ks...@mib.org>
wrote:

>>>Do you still believe it's possible?

>>
>> What? Overflow, or testing for it?
>
>Testing for it.
>
>Your stated goal was to create a "bullet proof" version of your
>trim() function. Now you say testing for "overflow" is too much
>work.
>
>Just what error conditions do you intend to handle? What error
>conditions, if any, can you not handle?
>
>My position is this: It is not possible, given your specification
>for trim(), to implement it in a way that avoids undefined behavior
>unless you're able to control all possible callers. You can define
>the circumstances in which its behavior is undefined, and define
>the behavior in cases where it is defined. You cannot entirely
>eliminate undefined behavior. Do you believe that you can? If not,
>what *exactly* does "bullet proof" mean?

I appreciate your concern and welcome what help I can get. Right now
I'm focusing on one specific item.

void *memmove(void *dest, const void *src, size_t n);

It expects size_t for the length. It appears universally true that

typedef unsigned int size_t

Thus to me it makes no sense to call memmove with a negative value. Am
I wrong?

Ian Collins

unread,

Aug 21, 2010, 5:03:27 PM8/21/10

to

On 08/21/10 05:57 PM, John Kelly wrote:
> On Sat, 21 Aug 2010 01:44:59 -0400, Shao Miller<sha0....@gmail.com>
> wrote:
>
>> John Kelly wrote:
>>> trim whitespace, bullet proof version
>> Please forgive me, but: Where does 'ssize_t' come from?
>
> <sys/types.h>
>
> Which is included by one of my listed headers, at least on my test box.

ssize_t and SSIZE_MAX aren't standard C, they are POSIX extensions.

--
Ian Collins

John Kelly

unread,

Aug 21, 2010, 5:12:54 PM8/21/10

to

On Sun, 22 Aug 2010 09:03:27 +1200, Ian Collins <ian-...@hotmail.com>
wrote:

OK.

I guess that means Windows would be a problem. What else?

Ben Bacarisse

unread,

Aug 21, 2010, 5:20:58 PM8/21/10

to

John Kelly <j...@isp2dial.com> writes:

> On Sat, 21 Aug 2010 13:41:39 +0100, Ben Bacarisse <ben.u...@bsb.me.uk>
> wrote:
>
>>Since you can't dodge all the bullets, it would be more helpful to what
>>ones you are proofing against. I.e. what is the contract between the
>>caller and called program? What are you trying to guard against?
>
> The instances of setting errno and returning -1 are the contract
> terms.

Ah, your code is correct by definition! I was asking about what you
thought you were protecting against so I could tell if you had met that
goal.

<snip>

>>> static int
>>> trim (char **ts)
>>> {
>>> ssize_t tz;
>>
>>I think ptrdiff_t is a better choice for this.
>
> ssize_t tz counts the iterations looking for \0. SSIZE_MAX limits the
> iterations looking \0. To be sure of calling memmove with a positive
> number, the chosen limit must be such that (keep - hast) will always be
> positive.

That's besides the point.

>>If it is not a big
>>enough type you are sunk anyway (since keep - hast has this type no
>>matter what you assign the result to) and if it is big enough you make
>>the code use only C types.
>
> From what I understood of the ptrdiff discussion, ssize_t and SSIZE_MAX
> satisfy the requirement for the memmove length to be positive . Or did
> I misunderstand?

It seems so. Your code has:

tx = keep - hast;

Pointer subtraction produces a value of type prtdiff_t. You can't alter
that fact. If ssize_t has a narrower range that ptrdiff_t you have
introduced a case that can fail by using ssize_t. If it has a larger
range you will be able to generate pointers where the subtraction goes
wrong. There is no advantage at all in using ssize_t, and it has the
disadvantage that your code becomes dependent on a type that is not a
standard C type.

--
Ben.

Ian Collins

unread,

Aug 21, 2010, 5:21:58 PM8/21/10

to

On 08/21/10 05:38 PM, John Kelly wrote:

snip

>
> void
> testme (char *ts)
> {
> int tn;
> tn = trim (&ts);
> printf ("%3d characters in string=[%s]\n", tn, ts);
> }
>

>
> testme (NULL);

Will cause a crash in printf.

--
Ian Collins

Geoff

unread,

Aug 21, 2010, 5:52:57 PM8/21/10

to

On Sat, 21 Aug 2010 21:12:54 +0000, John Kelly <j...@isp2dial.com>
wrote:

>On Sun, 22 Aug 2010 09:03:27 +1200, Ian Collins <ian-...@hotmail.com>

>wrote:
>
>>On 08/21/10 05:57 PM, John Kelly wrote:
>>> On Sat, 21 Aug 2010 01:44:59 -0400, Shao Miller<sha0....@gmail.com>
>>> wrote:
>>>
>>>> John Kelly wrote:
>>>>> trim whitespace, bullet proof version
>>>> Please forgive me, but: Where does 'ssize_t' come from?
>>>
>>> <sys/types.h>
>>>
>>> Which is included by one of my listed headers, at least on my test box.
>>
>>ssize_t and SSIZE_MAX aren't standard C, they are POSIX extensions.
>
>OK.
>
>I guess that means Windows would be a problem. What else?

Windows is not a problem, as I pointed out in your "ptrdiff_t
overflow/underflow (was: trim whitespace)" thread that you apparently
missed.

But you are using POSIX constants in what is supposed to be a standard
C function, apparently because your compiler implementation doesn't
support the standard SIZE_MAX and size_t so you used <sys/types.h>
which automatically exposes that you are introducing system
dependencies and you left that header out of your original source
listings but confessed to in your posts.

Using POSIX SSIZE_MAX limits your manageable string length to the
arbitrary limit of 32767 bytes so your function will automatically
fail for strings greater than this length. A problem you don't
document in your source.

You also declare static int trim(char **ts) but return ssize_t tz upon
success which implies an unsigned return type.

A truly robust, portable implementation of trim would be able to
survive usages or attacks like:

Test Kelly's trim function

#include <malloc.h>
#define CHUNK 100000

int main (void)
{
char *cp;
int status;

cp = malloc(CHUNK);
if (cp)
{
memset(cp,'x', CHUNK);
memset(cp+CHUNK-3,' ',1);
memset(cp+CHUNK-2,' ',1);
memset(cp+CHUNK-1,'\0',1);
}
else
{
printf("malloc failed\n");
return (1);
}
printf("%u\n", strlen(cp));
status = trim(&cp);
if (status < 0)
{
puts("Oopsie!");
}
printf("%u\n", strlen(cp));
free(cp);
return 0;
}

James

unread,

Aug 21, 2010, 6:48:46 PM8/21/10

to

"John Kelly" <j...@isp2dial.com> wrote in message
news:66g076hb98043cem9...@4ax.com...

> On Sun, 22 Aug 2010 09:03:27 +1200, Ian Collins <ian-...@hotmail.com>
> wrote:
>
>>On 08/21/10 05:57 PM, John Kelly wrote:
>>> On Sat, 21 Aug 2010 01:44:59 -0400, Shao Miller<sha0....@gmail.com>
>>> wrote:
>>>
>>>> John Kelly wrote:
>>>>> trim whitespace, bullet proof version
>>>> Please forgive me, but: Where does 'ssize_t' come from?
>>>
>>> <sys/types.h>
>>>
>>> Which is included by one of my listed headers, at least on my test box.
>>
>>ssize_t and SSIZE_MAX aren't standard C, they are POSIX extensions.
>
> OK.
>
> I guess that means Windows would be a problem. What else?

Mabey a hack like:

#if ! defined (SIZE_MAX)
# define SIZE_MAX ((size_t)(-1L))
#endif

could work out okay.

John Kelly

unread,

Aug 21, 2010, 7:43:49 PM8/21/10

to

On Sat, 21 Aug 2010 14:52:57 -0700, Geoff <ge...@invalid.invalid> wrote:

>>>ssize_t and SSIZE_MAX aren't standard C, they are POSIX extensions.
>>
>>OK.
>>
>>I guess that means Windows would be a problem. What else?
>
>Windows is not a problem, as I pointed out in your "ptrdiff_t
>overflow/underflow (was: trim whitespace)" thread that you apparently
>missed.

It must be a POSIX extension on Windows. Linux seems to have it without
any special defines.

>But you are using POSIX constants in what is supposed to be a standard
>C function, apparently because your compiler implementation doesn't
>support the standard SIZE_MAX and size_t so you used <sys/types.h>
>which automatically exposes that you are introducing system
>dependencies and you left that header out of your original source
>listings but confessed to in your posts.

I didn't know which header it was in, until the question was asked.

>Using POSIX SSIZE_MAX limits your manageable string length to the
>arbitrary limit of 32767 bytes so your function will automatically
>fail for strings greater than this length. A problem you don't
>document in your source.

On Interix it's defined to INT_MAX which is 2147483647.

>You also declare static int trim(char **ts) but return ssize_t tz upon
>success which implies an unsigned return type.

/usr/include/sys/types.h:94:typedef signed int ssize_t;

ssize_t is signed. Presumably that's why it's called (s)size_t.

>A truly robust, portable implementation of trim would be able to
>survive usages or attacks like:
>
>Test Kelly's trim function
>
>#include <malloc.h>
>#define CHUNK 100000
>
>int main (void)
>{
> char *cp;
> int status;
>
> cp = malloc(CHUNK);
> if (cp)
> {
> memset(cp,'x', CHUNK);
> memset(cp+CHUNK-3,' ',1);
> memset(cp+CHUNK-2,' ',1);
> memset(cp+CHUNK-1,'\0',1);
> }
> else
> {
> printf("malloc failed\n");
> return (1);
> }
> printf("%u\n", strlen(cp));
> status = trim(&cp);
> if (status < 0)
> {
> puts("Oopsie!");
> }
> printf("%u\n", strlen(cp));
> free(cp);
> return 0;
>}

I'll see what I can do. So much feedback, so little time. I appreciate
the help.

John Kelly

unread,

Aug 21, 2010, 7:59:09 PM8/21/10

to

On Sat, 21 Aug 2010 22:20:58 +0100, Ben Bacarisse <ben.u...@bsb.me.uk>
wrote:

>> From what I understood of the ptrdiff discussion, ssize_t and SSIZE_MAX

>> satisfy the requirement for the memmove length to be positive . Or did
>> I misunderstand?
>
>It seems so. Your code has:
>
> tx = keep - hast;
>
>Pointer subtraction produces a value of type prtdiff_t. You can't alter
>that fact.

I don't need to. SSIZE_MAX limits the iterations which increment keep,
so the distance between the two pointers will never overflow the lhs of
the assignment.

>If ssize_t has a narrower range that ptrdiff_t you have
>introduced a case that can fail by using ssize_t.

No, see above.

>If it has a larger range you will be able to generate pointers where
>the subtraction goes wrong.

Good point. I need to use the lesser of the two for my iteration limit.

>There is no advantage at all in using ssize_t, and it has the
>disadvantage that your code becomes dependent on a type that is not a
>standard C type.

That can handled with macros, but I'll leave that for later.

I have some pending fixes and I'll post v3 soon.

Geoff

unread,

Aug 21, 2010, 8:25:26 PM8/21/10

to

On Sat, 21 Aug 2010 23:43:49 +0000, John Kelly <j...@isp2dial.com>
wrote:

>On Sat, 21 Aug 2010 14:52:57 -0700, Geoff <ge...@invalid.invalid> wrote:

>
>>>>ssize_t and SSIZE_MAX aren't standard C, they are POSIX extensions.
>>>
>>>OK.
>>>
>>>I guess that means Windows would be a problem. What else?
>>
>>Windows is not a problem, as I pointed out in your "ptrdiff_t
>>overflow/underflow (was: trim whitespace)" thread that you apparently
>>missed.
>
>It must be a POSIX extension on Windows. Linux seems to have it without
>any special defines.
>
>
>>But you are using POSIX constants in what is supposed to be a standard
>>C function, apparently because your compiler implementation doesn't
>>support the standard SIZE_MAX and size_t so you used <sys/types.h>
>>which automatically exposes that you are introducing system
>>dependencies and you left that header out of your original source
>>listings but confessed to in your posts.
>
>I didn't know which header it was in, until the question was asked.
>
>
>>Using POSIX SSIZE_MAX limits your manageable string length to the
>>arbitrary limit of 32767 bytes so your function will automatically
>>fail for strings greater than this length. A problem you don't
>>document in your source.
>
>On Interix it's defined to INT_MAX which is 2147483647.

On Windows it's _POSIX_SSIZE_MAX which is 32767.

>
>
>>You also declare static int trim(char **ts) but return ssize_t tz upon
>>success which implies an unsigned return type.
>
>/usr/include/sys/types.h:94:typedef signed int ssize_t;
>
>ssize_t is signed. Presumably that's why it's called (s)size_t.

In Windows it's defined in the non-standard <BaseTsd.h> as typedef
LONG_PTR SSIZE_T and LONG_PTR can be 32 or 64 bits depending on the
target.

Since strlen can handle a string 100k bytes long without trouble, as a
user of your library function I'd expect trim to handle whatever
string I threw at it without limitations. In order to get your code to
compile I have to throw this at it:

#define _POSIX_
#include <ctype.h>
#include <errno.h>
#include <limits.h>
#include <stdio.h>
#include <string.h>
#include <BaseTsd.h>

static
int trim (char **ts)
{

SSIZE_T tz;
unsigned char *exam;

...

and it fails to handle an arbitrarily large, properly terminated
string.

I also don't understand the need to report the error in errno, it
seems you are demanding the return value serve a dual purpose of
returning a reserved error value or the new length of the string. If
you are going to go to all the trouble to pass the address of the
string, why not add a second argument to the function to pass the
address of a variable for the new length of the string and return 0
for no error or the error code directly?

int trim(char **ts, size_t *newlength){...}

This way you can still use your errno values and users can write:

if( trim(&string, &trimlength) == EINVAL)
puts("invalid string passed in");
else
if trim(&string, &trimlength) == EOVERFLOW)
puts("string overflowed or not terminated");

Then you can dispense with all this tz = 0; and
errno = EOVERFLOW;
return -1;

nonsense.

James

unread,

Aug 21, 2010, 8:29:58 PM8/21/10

to

"John Kelly" <j...@isp2dial.com> wrote in message

news:fhpu66t99dc0jrhn2...@4ax.com...

>
> trim whitespace, bullet proof version

[...]

Why not use a trim function that is much less invasive? Perhaps something
that returns the offset and length of the trimmed string residing within the
target string. Here is some example code:

http://codepad.org/PnIVzrUB
(this is the code and execution output)

<code>

#include <stddef.h>
#include <assert.h>
#include <limits.h>
#include <ctype.h>

#if ! defined (SIZE_MAX)
# define SIZE_MAX ((size_t)-1)
#endif

#define isspacex(c) isspace((unsigned char)(c))

int trim_ex(char const* buf,
size_t* phead,
size_t* plen)
{
size_t head, tail, tail_tmp;

if (! buf || ! buf[0]) return 0;

for (head = 0; isspacex(buf[head]); ++head)
if (! buf[head] || head == SIZE_MAX - 1UL) return 0;

for (tail = head, tail_tmp = head + 1; buf[tail_tmp]; ++tail_tmp)
{
if (tail_tmp == SIZE_MAX - 1UL) return 0;
if (! isspacex(buf[tail_tmp])) tail = tail_tmp;
}

assert(tail >= head && tail_tmp > head && tail <= tail_tmp);

if (phead) *phead = head;
if (plen) *plen = tail - head + 1;

return 1;
}

#include <stdio.h>

int trim_ex_test(char const* buf)
{
size_t head, len, i;

if (! trim_ex(buf, &head, &len)) return 0;

printf("origin: <%s>\n", buf);
printf("trimmed: <");

for (i = 0; i < len; ++i) putchar(buf[head + i]);

puts(">\n_______________________________________________\n");

return 1;
}

int main(void)
{
trim_ex_test(NULL);
trim_ex_test("");
trim_ex_test("Hello");
trim_ex_test("H");
trim_ex_test("H H H");
trim_ex_test(" H ");
trim_ex_test(" Hello ");
trim_ex_test("Hello ");
trim_ex_test(" Hello");
trim_ex_test(" Hello World 1 2 3 ");
trim_ex_test(" Hello World 1 2 3");
trim_ex_test("Hello World 1 2 3 ");
trim_ex_test("Hello World 1 2 3 4");

return 0;
}

</code>

You should be able to pass this a constant string that is longer than
SIZE_MAX without hitting any undefined behavior; the function just returns
failure. The trim_ex() function will stop scanning if the head or tail_tmp
variables stay within range of SIZE_MAX. What type of input, besides a
non-null terminated string, should break this? I hope there are no damn
bugs/typos in there. It look's alright to me at first glance...

James

unread,

Aug 21, 2010, 8:34:43 PM8/21/10

to

"James" <n...@spam.invalid> wrote in message
news:i4pr1k$k63$1...@speranza.aioe.org...

> You should be able to pass this a constant string that is longer than
> SIZE_MAX without hitting any undefined behavior; the function just returns
> failure. The trim_ex() function will stop scanning if the head or tail_tmp
> variables stay within range of SIZE_MAX.

Ummm, the sentence above should read as:

The trim_ex() function will stop scanning if the head or tail_tmp variables

attempt to violate the range of SIZE_MAX.

John Kelly

unread,

Aug 21, 2010, 8:37:12 PM8/21/10

to

On Sat, 21 Aug 2010 17:25:26 -0700, Geoff <ge...@invalid.invalid> wrote:

>Since strlen can handle a string 100k bytes long without trouble, as a
>user of your library function I'd expect trim to handle whatever
>string I threw at it without limitations. In order to get your code to
>compile I have to throw this at it:
>
>#define _POSIX_
>#include <ctype.h>
>#include <errno.h>
>#include <limits.h>
>#include <stdio.h>
>#include <string.h>
>#include <BaseTsd.h>
>
>static
>int trim (char **ts)
>{
> SSIZE_T tz;
> unsigned char *exam;
>
>...
>
>and it fails to handle an arbitrarily large, properly terminated
>string.

I'll see what I can do.

>I also don't understand the need to report the error in errno, it
>seems you are demanding the return value serve a dual purpose of
>returning a reserved error value or the new length of the string.

That's a linux library and syscall tradition.

>If
>you are going to go to all the trouble to pass the address of the
>string, why not add a second argument to the function to pass the
>address of a variable for the new length of the string and return 0
>for no error or the error code directly?

I don't know. Is that a Windows tradition?

>int trim(char **ts, size_t *newlength){...}
>
>This way you can still use your errno values and users can write:
>
>if( trim(&string, &trimlength) == EINVAL)
> puts("invalid string passed in");
>else
>if trim(&string, &trimlength) == EOVERFLOW)
> puts("string overflowed or not terminated");
>
>Then you can dispense with all this tz = 0; and
> errno = EOVERFLOW;
> return -1;
>
>nonsense.

It makes sense on linux.

John Kelly

unread,

Aug 21, 2010, 8:43:19 PM8/21/10

to

On Sat, 21 Aug 2010 17:29:58 -0700, "James" <n...@spam.invalid> wrote:

>"John Kelly" <j...@isp2dial.com> wrote in message
>news:fhpu66t99dc0jrhn2...@4ax.com...
>>
>> trim whitespace, bullet proof version
>[...]
>
>Why not use a trim function that is much less invasive? Perhaps something
>that returns the offset and length of the trimmed string residing within the
>target string. Here is some example code:

I'll try and take a look at this. Thanks for the feedback. It's hard
to keep up with it all.

James

unread,

Aug 21, 2010, 9:34:34 PM8/21/10

to

"James" <n...@spam.invalid> wrote in message
news:i4pr1k$k63$1...@speranza.aioe.org...

> "John Kelly" <j...@isp2dial.com> wrote in message
> news:fhpu66t99dc0jrhn2...@4ax.com...
>>
>> trim whitespace, bullet proof version
> [...]
>
> Why not use a trim function that is much less invasive? Perhaps something
> that returns the offset and length of the trimmed string residing within
> the target string. Here is some example code:

[...]

ARGHGHG! I found a bug that crops up when you pass trim_ex() a string made
up of all whitespaces (e.g., " "). Here is the fixed version:

http://codepad.org/LjNJLQaw

<code>

#include <stddef.h>
#include <assert.h>
#include <limits.h>
#include <ctype.h>

#if ! defined (SIZE_MAX)
# define SIZE_MAX ((size_t)-1)
#endif

#define isspacex(c) isspace((unsigned char)(c))

int trim_ex(char const* buf,
size_t* phead,
size_t* plen)
{
size_t head, tail, tail_tmp;

if (! buf || ! buf[0]) return 0;

for (head = 0; isspacex(buf[head]); ++head)

if (head == SIZE_MAX - 1UL) return 0;

if (buf[head])
{

for (tail = head, tail_tmp = head + 1; buf[tail_tmp]; ++tail_tmp)
{
if (tail_tmp == SIZE_MAX - 1UL) return 0;
if (! isspacex(buf[tail_tmp])) tail = tail_tmp;
}

assert(tail >= head && tail_tmp > head && tail <= tail_tmp);

if (plen) *plen = tail - head + 1;
}

else if (plen)
{
*plen = 0;
}

if (phead) *phead = head;

return 1;
}

#include <stdio.h>

puts(">\n_______________________________________________\n");

return 1;
}

return 0;
}

</code>

Sorry about that damn non-sense!!!!!

;^(...

BTW, can you find any more bugs?

;^)

pete

unread,

Aug 21, 2010, 9:37:07 PM8/21/10

to

James wrote:
>
> Mabey a hack like:
>
> #if ! defined (SIZE_MAX)
> # define SIZE_MAX ((size_t)(-1L))
> #endif
>
> could work out okay.

Why
((size_t)(-1L))
instead of
((size_t)(-1))
?

What effect do you think that the 'L' has?

--
pete

James

unread,

Aug 21, 2010, 9:40:39 PM8/21/10

to

"pete" <pfi...@mindspring.com> wrote in message
news:4C707F...@mindspring.com...

Umm, now that I think about it, I actually don't know why I put it in there!

;^o

John Kelly

unread,

Aug 21, 2010, 9:47:19 PM8/21/10

to

Isn't the idiom usually

((size_t)-1)

without () around the -1?

Geoff

unread,

Aug 21, 2010, 10:44:39 PM8/21/10

to

On Sun, 22 Aug 2010 00:37:12 +0000, John Kelly <j...@isp2dial.com>
wrote:

>That's a linux library and syscall tradition.
>

Ah, more of that cargo cult stuff from UNIX, then.
Are we writing a Linux library or a C library function?
Is it really bullet proof?

>
>>If
>>you are going to go to all the trouble to pass the address of the
>>string, why not add a second argument to the function to pass the
>>address of a variable for the new length of the string and return 0
>>for no error or the error code directly?
>
>I don't know. Is that a Windows tradition?

Just a better programming practice. Don't mix data with error returns.
Just because something was done in 1970 on a PDP 11, doesn't mean it
should be done in 2010 on every other platform.

Consider getchar(), how many times have novices been bitten by

char c;

c = getchar();
if (c == EOF)
...

Can you spot the error without looking it up? Can you tell me which
platforms it might work on?

errno is not thread safe, what will you do if the user is calling trim
in a multi-threaded scenario?

Geoff

unread,

Aug 21, 2010, 10:52:16 PM8/21/10

to

Submitted for your amusement.

#include <ctype.h>
#include <errno.h>
#include <limits.h>

#include <string.h>

/***********************************************
Trim whitespace from a zero-terminated string.
Inputs:
the address of the string
the address for return of new string length.
Returns:
0 on success
EINVAL or EOVERFLOW on error.
************************************************/

int trim (char **ts, size_t *length)
{
size_t tz;
unsigned char *exam;
unsigned char *hast;
unsigned char *keep;

/* make sure caller is sane */
if (!ts || !*ts || !length) {
return EINVAL;
}
/* first pass, trim front of string */
tz = 0;
exam = (unsigned char *) *ts;
while (++tz < SIZE_MAX && isspace (*exam)) {
++exam;
}
if (tz == SIZE_MAX) {
return EOVERFLOW;
}
/* second pass, trim back of string */
tz = 0;
hast = keep = exam;
while (++tz < SIZE_MAX && *exam) {
if (!isspace (*exam)) {
keep = exam;
}
++exam;
}
if (tz == SIZE_MAX) {
return EOVERFLOW;
}
if (*keep) {
*++keep = '\0';
}
if (keep - hast < 0) {
return EOVERFLOW;
}

/* move the trimmed string to where caller expects it to be */
tz = keep - hast;
if (hast != (unsigned char *) *ts) {
(void) memmove (*ts, hast, tz + 1);
}

*length = tz; /* new length of string */
return 0;
}

/* Test the trim function */

static char s0[] = " I need trimming on both ends. ";
static char s1[] = "I need trimming on far end. ";
static char s2[] = " I need trimming on near end.";
static char s3[] = " I need more trimming on both ends. ";
static char s4[] = "\n\t\rI need special trimming on both ends.\n\t\r
";
static char s5[] = " \n\t\r I need special trimming on near end.";
static char s6[] = "I need special trimming on far end. \n\t\r ";
static char s7[] = "I need no trimming";
static char s8[] = " ";
static char s9[] = "";
static char *strings[12] = {
s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, NULL,
};

#include <stdio.h>
#include <malloc.h>

#define CHUNK 100000

int main (void)
{
int i;
char *cp;
size_t length;

for (i = 0; i < 11; i++)
{
printf ("Original string:[%s]\n", strings[i]);
trim (&strings[i], &length);
printf ("Trimmed string:[%s]\n", strings[i]);
puts ("---------------------------------------");
}

/* insanity tests */
printf("trim returned: %i\n", trim(&cp, NULL));
printf("trim returned: %i\n", trim(NULL, &length));

cp = malloc(CHUNK);
if (cp)
{
memset(cp,'x', CHUNK);
memset(cp + CHUNK-3,' ',1);
memset(cp + CHUNK-2,' ',1);

memset(cp + CHUNK-1,'\0',1);

}
else
{
printf("malloc failed\n");
return (1);
}

printf("Original big string length %u\n", strlen(cp));
printf("trim returned: %i\n", trim(&cp, &length));
printf("New string length %u\n", strlen(cp));
printf("Double check length returned %u\n", length);
free(cp);
printf("trim returned %i on a freed block!\n", trim(&cp,
&length));
printf("The length returned was %d\n", length);
puts("\nSUCCESS testing trim function");
return 0;
}

John Kelly

unread,

Aug 21, 2010, 11:07:55 PM8/21/10

to

On Sat, 21 Aug 2010 19:44:39 -0700, Geoff <ge...@invalid.invalid> wrote:

>>>you are going to go to all the trouble to pass the address of the
>>>string, why not add a second argument to the function to pass the
>>>address of a variable for the new length of the string and return 0
>>>for no error or the error code directly?
>>
>>I don't know. Is that a Windows tradition?
>
>Just a better programming practice. Don't mix data with error returns.

What's "better" depends. I like using this idiom:

if (trim(p) < 0)
printf ("we have a problem\n");

>Just because something was done in 1970 on a PDP 11, doesn't mean it
>should be done in 2010 on every other platform.

Are Windows idioms superior to UNIX idioms?

>Consider getchar(), how many times have novices been bitten by
>
>char c;
>
>c = getchar();
>if (c == EOF)
>...
>
>Can you spot the error without looking it up? Can you tell me which
>platforms it might work on?

No.

>errno is not thread safe, what will you do if the user is calling trim
>in a multi-threaded scenario?

linux fork() is fast enough that many apps don't need threads. Someone
who needs threads can try writing trim_r().

Chris M. Thomasson

unread,

Aug 21, 2010, 11:21:49 PM8/21/10

to

"John Kelly" <j...@isp2dial.com> wrote in message

news:q341765mghpkvdlgs...@4ax.com...

> On Sat, 21 Aug 2010 19:44:39 -0700, Geoff <ge...@invalid.invalid> wrote:

[...]

>>Consider getchar(), how many times have novices been bitten by
>>
>>char c;
>>
>>c = getchar();
>>if (c == EOF)
>>...
>>
>>Can you spot the error without looking it up? Can you tell me which
>>platforms it might work on?
>
> No.

http://www.opengroup.org/onlinepubs/000095399/functions/getchar.html

Can you spot it now?

John Kelly

unread,

Aug 21, 2010, 11:28:10 PM8/21/10

to

On Sat, 21 Aug 2010 19:52:16 -0700, Geoff <ge...@invalid.invalid> wrote:

>Submitted for your amusement.

OK. I see. As for the idiom I mentioned:

>What's "better" depends. I like using this idiom:
>
>if (trim(p) < 0)
> printf ("we have a problem\n");

What I meant to write was:

if ((newsize = trim(p)) < 0)

printf ("we have a problem\n");

where I set the variable and test for errors all in one line of code.

But I can see why you might prefer the second argument approach. I will
consider it. Adapting my idiom usage would be simple enough.

Kenny McCormack

unread,

Aug 21, 2010, 11:28:40 PM8/21/10

to

In article <q341765mghpkvdlgs...@4ax.com>,
John Kelly <j...@isp2dial.com> wrote:
...

>>errno is not thread safe, what will you do if the user is calling trim
>>in a multi-threaded scenario?

The obvious CLC answer to this is: What are threads?

My point, of course, is that once you go off-topic, anything is
possible (in terms of what are acceptable answers).

>linux fork() is fast enough that many apps don't need threads. Someone
>who needs threads can try writing trim_r().

Indeed. The previous poster was playing the usual CLC game - which is
to dream up some obscure scenario where there could, conceivably, be a
problem, then act as if YOU are a fucking idiot for not covering that
case.

--
One of the best lines I've heard lately:

Obama could cure cancer tomorrow, and the Republicans would be
complaining that he had ruined the pharmaceutical business.

(Heard on Stephanie Miller = but the sad thing is that there is an awful lot
of direct truth in it. We've constructed an economy in which eliminating
cancer would be a horrible disaster. There are many other such examples.)

Geoff

unread,

Aug 21, 2010, 11:38:47 PM8/21/10

to

On Sun, 22 Aug 2010 03:07:55 +0000, John Kelly <j...@isp2dial.com>
wrote:

>On Sat, 21 Aug 2010 19:44:39 -0700, Geoff <ge...@invalid.invalid> wrote:

>
>>>>you are going to go to all the trouble to pass the address of the
>>>>string, why not add a second argument to the function to pass the
>>>>address of a variable for the new length of the string and return 0
>>>>for no error or the error code directly?
>>>
>>>I don't know. Is that a Windows tradition?
>>
>>Just a better programming practice. Don't mix data with error returns.
>
>What's "better" depends. I like using this idiom:
>
>if (trim(p) < 0)
> printf ("we have a problem\n");

Using my way you can write

if(!trim(&p, &length)
printf("we have a problem")

>
>
>>Just because something was done in 1970 on a PDP 11, doesn't mean it
>>should be done in 2010 on every other platform.
>
>Are Windows idioms superior to UNIX idioms?

Some are, some aren't. There's always a trade-off.

>
>
>>Consider getchar(), how many times have novices been bitten by
>>
>>char c;
>>
>>c = getchar();
>>if (c == EOF)
>>...
>>
>>Can you spot the error without looking it up? Can you tell me which
>>platforms it might work on?
>
>No.
>

I'm sure someone will spot it soon enough. It's so basic even K&R
talked about making sure to get it right, and it is a major design
flaw of getchar's interface. Of course, these days of Unicode we don't
have to worry about char anymore.

>
>>errno is not thread safe, what will you do if the user is calling trim
>>in a multi-threaded scenario?
>
>linux fork() is fast enough that many apps don't need threads. Someone
>who needs threads can try writing trim_r().

Why pollute the system with thread safe and non-thread safe functions
that do identical things when you can design one thread-safe,
reentrant function for all environments?

Surely, you don't want to make the same mistake Microsoft did in their
libraries and have to build thread safe libs and non-thread safe libs
and then have to document them and tell programmers when to link them?

John Kelly

unread,

Aug 21, 2010, 11:56:40 PM8/21/10

to

On Sun, 22 Aug 2010 03:28:40 +0000 (UTC), gaz...@shell.xmission.com
(Kenny McCormack) wrote:

>In article <q341765mghpkvdlgs...@4ax.com>,
>John Kelly <j...@isp2dial.com> wrote:
>...
>>>errno is not thread safe, what will you do if the user is calling trim
>>>in a multi-threaded scenario?
>
>The obvious CLC answer to this is: What are threads?
>
>My point, of course, is that once you go off-topic, anything is
>possible (in terms of what are acceptable answers).
>
>>linux fork() is fast enough that many apps don't need threads. Someone
>>who needs threads can try writing trim_r().
>
>Indeed. The previous poster was playing the usual CLC game - which is
>to dream up some obscure scenario where there could, conceivably, be a
>problem, then act as if YOU are a fucking idiot for not covering that
>case.

That does happen in c.l.c. But in this case I think Geoff has some good
points. If the function worked threaded or not, that would be ideal.

My goal is to refine the non threaded version, to my satisfaction. If I
can complete that task, then I may consider what's required for adapting
it for threads. But I need to proceed one step at a time.

Keith Thompson

unread,

Aug 22, 2010, 12:38:50 AM8/22/10

to

John Kelly <j...@isp2dial.com> writes:
> On Sat, 21 Aug 2010 12:18:07 -0700, Keith Thompson <ks...@mib.org>
> wrote:
[...]
>>My position is this: It is not possible, given your specification
>>for trim(), to implement it in a way that avoids undefined behavior
>>unless you're able to control all possible callers. You can define
>>the circumstances in which its behavior is undefined, and define
>>the behavior in cases where it is defined. You cannot entirely
>>eliminate undefined behavior. Do you believe that you can? If not,
>>what *exactly* does "bullet proof" mean?
>
> I appreciate your concern and welcome what help I can get. Right now
> I'm focusing on one specific item.

I see that you haven't answered any of my questions.

> void *memmove(void *dest, const void *src, size_t n);
>
> It expects size_t for the length. It appears universally true that
>
> typedef unsigned int size_t

Certainly not. Do you have a copy of the C standard?
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf is a
post-C99 draft available at no charge.

size_t is a typedef for some unsigned integer type, not necessarily
unsigned int.

> Thus to me it makes no sense to call memmove with a negative value. Am
> I wrong?

You *cannot* call memmove with a negative value for its third
parameter. If you call memmove(this, that, -1), the value -1 is
converted from int to size_t, so the value you're actually passing
is SIZE_MAX. (This is assuming the declaration is visible, which
it certainly should be.)

And again, you haven't answered any of my questions, and your
question about memmove() has nothing to do with what I wrote.

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

John Kelly

unread,

Aug 22, 2010, 1:19:02 AM8/22/10

to

On Sat, 21 Aug 2010 21:38:50 -0700, Keith Thompson <ks...@mib.org>
wrote:

>And again, you haven't answered any of my questions,

I started this thread, and it's not about you.

>question about memmove() has nothing to do with what I wrote.

I welcome what help I can get, but so far, you have offered little. If
you really want to help, here is my immediate task:

As Ben pointed out, I need to use the lesser of SIZE_MAX or PTRDIFF_MAX
as my iteration limit.

If SIZE_MAX is not defined, I've learned how to compensate with:

# ifndef SIZE_MAX
# define SIZE_MAX ((size_t)-1)
# endif

However, on my Interix test platform, PTRDIFF_MAX is not defined, and
ptrdiff_t is signed. It's my understanding I can't use the same cast
trick with a signed type.

So what are my options for determining the maximum value representable
in a signed ptrdiff_t variable, when PTRDIFF_MAX is not defined?

Geoff

unread,

Aug 22, 2010, 3:00:59 AM8/22/10

to

On Sun, 22 Aug 2010 05:19:02 +0000, John Kelly <j...@isp2dial.com>
wrote:

>So what are my options for determining the maximum value representable

>in a signed ptrdiff_t variable, when PTRDIFF_MAX is not defined?

It's implementation defined. See n1256 7.18(4) and 7.18.3(2).

In a standards compliant implementation PTRDIFF_MAX and PTRDIFF_MIN
are defined in <stdint.h>.

On a Win7 64 bit system the values are:
SIZE_MAX is ffffffffffffffff
Size of SIZE_MAX: 8
Size of ptrdiff_t: 8
INTMAX_MIN: 8000000000000000
INTMAX_MAX: 7fffffffffffffff
PTRDIFF_MIN: 8000000000000000
PTRDIFF_MAX: 7fffffffffffffff

In a 32 bit target, same source the values are:
SIZE_MAX is ffffffff
Size of SIZE_MAX: 4
Size of ptrdiff_t: 4
INTMAX_MIN: 8000000000000000
INTMAX_MAX: 7fffffffffffffff
PTRDIFF_MIN: 80000000
PTRDIFF_MAX: 7fffffff

Which I find very interesting.

Ike Naar

unread,

Aug 22, 2010, 3:07:58 AM8/22/10

to

In article <fpa076ta0vh2262o3...@4ax.com>,

John Kelly <j...@isp2dial.com> wrote:
>It appears universally true that
>
> typedef unsigned int size_t

For some restricted notion of ``universally''.
On the unexceptional machine I'm using at the moment,
``unsigned int'' is a 32-bit type, and ``size_t'' is
an alias for ``unsigned long'', which is a 64-bit type.

Keith Thompson

unread,

Aug 22, 2010, 3:09:01 AM8/22/10

to

John Kelly <j...@isp2dial.com> writes:
> On Sat, 21 Aug 2010 21:38:50 -0700, Keith Thompson <ks...@mib.org>
> wrote:
>>And again, you haven't answered any of my questions,
>
> I started this thread, and it's not about you.
>
>>question about memmove() has nothing to do with what I wrote.
>
> I welcome what help I can get, but so far, you have offered little. If
> you really want to help, here is my immediate task:

[snip]

Help you do what? I still don't know exactly what your "bullet proof"
version of trim() is supposed to do.

Seebs

unread,

Aug 22, 2010, 6:40:23 AM8/22/10

to

On 2010-08-22, John Kelly <j...@isp2dial.com> wrote:
> On Sat, 21 Aug 2010 21:38:50 -0700, Keith Thompson <ks...@mib.org>
> wrote:
>>And again, you haven't answered any of my questions,

> I started this thread, and it's not about you.

And this, friends, is why you should always understand pathological
narcissism before trying to interact with the narcissist.

It's simply not part of his world that there would be any reason for
him to care what other people think or want, or what they are trying to
do.

> I welcome what help I can get, but so far, you have offered little. If
> you really want to help, here is my immediate task:

... and here he is, ignoring useful advice, because your sole function
is to satisfy his whims.

You're sitting there babbling on about a "bridge" being "out", and he's
demanding that you tell him how to drive faster. This proves that you
are wrong, because he asked how to go faster before you mentioned the
bridge.

-s
--
Copyright 2010, all wrongs reversed. Peter Seebach / usenet...@seebs.net
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!

Moi

unread,

Aug 22, 2010, 6:46:00 AM8/22/10

to

On Sat, 21 Aug 2010 18:50:02 +0000, Seebs wrote:

>
> Read up on pathological narcissism. It's not impossible to derive value
> from responding to his posts, but you have to do so with a clear
> understanding of what he's doing and why or it will be very frustrating.

He is a troll, Seebs. And a very good one.
My guess is it is one of the regulars trolling. (Kaz ?)

AvK

John Kelly

unread,

Aug 22, 2010, 7:24:21 AM8/22/10

to

On Sun, 22 Aug 2010 00:09:01 -0700, Keith Thompson <ks...@mib.org>
wrote:

>John Kelly <j...@isp2dial.com> writes:

>> I welcome what help I can get, but so far, you have offered little. If
>> you really want to help, here is my immediate task:
>[snip]
>
>Help you do what?

What I just asked.

>I still don't know exactly what your "bullet proof" version of trim()
>is supposed to do.

Pompous pontificating posturing.

John Kelly

unread,

Aug 22, 2010, 7:43:58 AM8/22/10

to

On Sun, 22 Aug 2010 12:46:00 +0200, Moi <ro...@invalid.address.org>
wrote:

I see Seebs is up to his usual ad hominem tricks and his fan club is
applauding.

I'll ask again, who can help me with this:

--------

However, on my Interix test platform, PTRDIFF_MAX is not defined, and
ptrdiff_t is signed. It's my understanding I can't use the same cast
trick with a signed type.

So what are my options for determining the maximum value representable

in a signed ptrdiff_t variable, when PTRDIFF_MAX is not defined?

--------

Seebs, though I don't read your posts, if you can devise a solution,
email it to me. My posted email address is real, and until you abuse
it, I will accept your email.

John Kelly

unread,

Aug 22, 2010, 7:49:45 AM8/22/10

to

On Sun, 22 Aug 2010 00:00:59 -0700, Geoff <ge...@invalid.invalid> wrote:

>On Sun, 22 Aug 2010 05:19:02 +0000, John Kelly <j...@isp2dial.com>
>wrote:
>
>>So what are my options for determining the maximum value representable
>>in a signed ptrdiff_t variable, when PTRDIFF_MAX is not defined?
>
>It's implementation defined. See n1256 7.18(4) and 7.18.3(2).
>
>In a standards compliant implementation PTRDIFF_MAX and PTRDIFF_MIN
>are defined in <stdint.h>.

That seems to be C99. Interix seems to be C89.

Kenny McCormack

unread,

Aug 22, 2010, 8:41:04 AM8/22/10

to

In article <m12276h8jq8oh5va7...@4ax.com>,

John Kelly <j...@isp2dial.com> wrote:
>On Sun, 22 Aug 2010 00:09:01 -0700, Keith Thompson <ks...@mib.org>
>wrote:
>
>>John Kelly <j...@isp2dial.com> writes:
>
>>> I welcome what help I can get, but so far, you have offered little. If
>>> you really want to help, here is my immediate task:
>>[snip]
>>
>>Help you do what?
>
>What I just asked.

Boys, boys, boys!

>>I still don't know exactly what your "bullet proof" version of trim()
>>is supposed to do.
>
>Pompous pontificating posturing.

Indeed. Actually, I think Spinny said it best, when he described our
friend Kiki as a "corporate drone".

Also see:

http://redwing.hutman.net/~mreed/warriorshtm/android.htm

--
(This discussion group is about C, ...)

Wrong. It is only OCCASIONALLY a discussion group
about C; mostly, like most "discussion" groups, it is
off-topic Rorsharch [sic] revelations of the childhood
traumas of the participants...

pete

unread,

Aug 22, 2010, 9:04:55 AM8/22/10

to

John Kelly wrote:

> So what are my options for determining the maximum value representable
> in a signed ptrdiff_t variable, when PTRDIFF_MAX is not defined?

At least one of the signed types
(for C89: long, int, short, signed char)
has the same size as ptrdiff_t.

If all of the signed types which have the same size as ptrdiff_t,
also have the same range,
then that is the range of ptrdiff_t.

If they don't, then you're stuck.

--
pete

Seebs

unread,

Aug 22, 2010, 2:21:14 PM8/22/10

to

I don't think so, just because he posts elsewhere and has similar
behaviors. He makes grandiose claims, then declares all the systems
which are incompatible with them defective and uninteresting.

Seebs

unread,

Aug 22, 2010, 2:21:44 PM8/22/10

to

On 2010-08-22, John Kelly <j...@isp2dial.com> wrote:

> On Sun, 22 Aug 2010 00:09:01 -0700, Keith Thompson <ks...@mib.org>
> wrote:
>>I still don't know exactly what your "bullet proof" version of trim()
>>is supposed to do.

> Pompous pontificating posturing.

A surprisingly accurate answer to the question.

Geoff

unread,

Aug 22, 2010, 2:40:50 PM8/22/10

to

On Sun, 22 Aug 2010 11:49:45 +0000, John Kelly <j...@isp2dial.com>
wrote:

>On Sun, 22 Aug 2010 00:00:59 -0700, Geoff <ge...@invalid.invalid> wrote:
>
>>On Sun, 22 Aug 2010 05:19:02 +0000, John Kelly <j...@isp2dial.com>
>>wrote:
>>
>>>So what are my options for determining the maximum value representable
>>>in a signed ptrdiff_t variable, when PTRDIFF_MAX is not defined?
>>
>>It's implementation defined. See n1256 7.18(4) and 7.18.3(2).
>>
>>In a standards compliant implementation PTRDIFF_MAX and PTRDIFF_MIN
>>are defined in <stdint.h>.
>
>That seems to be C99. Interix seems to be C89.

Indeed. But you seem to be mixing POSIX constants with your C89.

In your case I think I would choose to test that ptrdiff_t can
represent the signed difference between any two pointers in an array
of size from 0 to SIZE_MAX for your implementation and if SIZE_MAX is
not available, UINT_MAX.

John Kelly

unread,

Aug 22, 2010, 3:40:01 PM8/22/10

to

On Sun, 22 Aug 2010 11:40:50 -0700, Geoff <ge...@invalid.invalid> wrote:

>>That seems to be C99. Interix seems to be C89.

>Indeed. But you seem to be mixing POSIX constants with your C89.

On linux that's as natural as breathing. But I'll try and clean it up.

>In your case I think I would choose to test that ptrdiff_t can
>represent the signed difference between any two pointers in an array
>of size from 0 to SIZE_MAX for your implementation

That seems to be tricky. None of the gurus touched my question.

> and if SIZE_MAX is not available, UINT_MAX.

I learned how to derive SIZE_MAX with a macro, so that's covered. But
the signed ptrdiff_t is some tricky business.

Ian Collins

unread,

Aug 22, 2010, 3:57:32 PM8/22/10

to

On 08/22/10 11:49 PM, John Kelly wrote:
> On Sun, 22 Aug 2010 00:00:59 -0700, Geoff<ge...@invalid.invalid> wrote:
>>
>> In a standards compliant implementation PTRDIFF_MAX and PTRDIFF_MIN
>> are defined in<stdint.h>.
>
> That seems to be C99. Interix seems to be C89.

It's generally a good idea to start out knowing what language you are
compiling.

--
Ian Collins

John Kelly

unread,

Aug 22, 2010, 4:21:34 PM8/22/10

to

On Mon, 23 Aug 2010 07:57:32 +1200, Ian Collins <ian-...@hotmail.com>
wrote:

With trim() I'm aiming for least common denominator. C89 should be
upward compatible I presume.

John Kelly

unread,

Aug 22, 2010, 6:21:32 PM8/22/10

to

On Sat, 21 Aug 2010 18:02:35 +0000 (UTC), James Waldby <n...@no.no> wrote:

>Since your new version doesn't modify *ts one could use parameter
>list (char *ts) rather than (char **ts) [or could use (char const *ts)
>if you add a (char *)ts cast in memmove call].

Done.

However, the pointer is constant, but the string data is not. So this
is what I used:

trim (char *const ts)

James

unread,

Aug 22, 2010, 6:44:55 PM8/22/10

to

"John Kelly" <j...@isp2dial.com> wrote in message

news:ggb176l3a2lfvnb31...@4ax.com...
[...]

> However, on my Interix test platform, PTRDIFF_MAX is not defined, and
> ptrdiff_t is signed. It's my understanding I can't use the same cast
> trick with a signed type.
>
> So what are my options for determining the maximum value representable
> in a signed ptrdiff_t variable, when PTRDIFF_MAX is not defined?

Well, this crap actually compiles on Comeau:

#if ! defined (SIZE_MAX)
# define SIZE_MAX ((size_t)-1)
#endif

#if ! defined (PTRDIFF_MAX)
# if (SIZE_MAX == ULONG_MAX)
# define PTRDIFF_MAX LONG_MAX
# define PTRDIFF_MIN LONG_MIN
# elif (SIZE_MAX == UINT_MAX)
# define PTRDIFF_MAX INT_MAX
# define PTRDIFF_MIN INT_MIN
# elif (SIZE_MAX == USHRT_MAX)
# define PTRDIFF_MAX SHRT_MAX
# define PTRDIFF_MIN SHRT_MIN
# elif (SIZE_MAX == UCHAR_MAX)
# define PTRDIFF_MAX SCHAR_MAX
# define PTRDIFF_MIN SCHAR_MIN
# else
# error PTRDIFF_MAX cannot be defined!
# endif
#endif

It also compiles on VS 2010 express and EDG C99...

lol.

Peter Nilsson

unread,

Aug 22, 2010, 11:20:50 PM8/22/10

to

John Kelly <j...@isp2dial.com> wrote:
> trim whitespace, bullet proof version
<snip>

char *ltrim(char *s)
{
unsigned char *p = (unsigned char *) s;
unsigned char *q = (unsigned char *) s;
while (isspace(*q)) q++;
if (p != q) while ((*p++ = *q++) != 0) ;
return s;
}

char *rtrim(char *s)
{
unsigned char *p = (unsigned char *) s;
unsigned char *q = (unsigned char *) strchr(s, 0);
while (q != p && isspace(q[-1])) q--;
*q = 0;
return s;
}

char *trim(char *s)
{
return ltrim(rtrim(s));
}

--
Peter

Moonman

unread,

Aug 23, 2010, 8:31:41 AM8/23/10

to

can even be optimised:
one should not apply this to constant or read only strings though :)

char * trim(char *s)
{
unsigned char *p = (unsigned char*) s;

unsigned char *q = (unsigned char*) s;

while(isspace(*q)) q++;
if(p != q)
while((*p++ = *q++) !=0);

q = p-1;
p = s;

while(p != q && isspace(q[-1])) q--;
q[0] = 0;

return s;
}

greetz

pete

unread,

Aug 23, 2010, 11:30:23 PM8/23/10

to

Moonman wrote:

> can even be optimised:
> one should not apply this to constant or read only strings though :)
>
> char * trim(char *s)
> {
> unsigned char *p = (unsigned char*) s;
> unsigned char *q = (unsigned char*) s;
>
> while(isspace(*q)) q++;
> if(p != q)

But if (s) points to the first byte of an object,
and if (p) *does* equal (q) in the above line of code

> while((*p++ = *q++) !=0);
>
> q = p-1;

then (p-1) is undefined in the above line of code.

> p = s;
>
> while(p != q && isspace(q[-1])) q--;
> q[0] = 0;
>
> return s;
> }

--
pete

pete

unread,

Aug 24, 2010, 12:06:25 AM8/24/10

to

pete wrote:
>
> Moonman wrote:
>
> > can even be optimised:
> > one should not apply this to constant or read only strings though :)
> >
> > char * trim(char *s)
> > {
> > unsigned char *p = (unsigned char*) s;
> > unsigned char *q = (unsigned char*) s;
> >
> > while(isspace(*q)) q++;
> > if(p != q)
>
> But if (s) points to the first byte of an object,
> and if (p) *does* equal (q) in the above line of code
>
> > while((*p++ = *q++) !=0);
> >
> > q = p-1;
>
> then (p-1) is undefined in the above line of code.
>
> > p = s;

The above line requires a cast.

> > while(p != q && isspace(q[-1])) q--;
> > q[0] = 0;
> >
> > return s;
> > }

And the whole function just doesn't work right
when there's trimming that needs to be done only on the far end:

Original string:[ I need trimming on both ends. ]
Trimmed string:[I need trimming on both ends.]
---------------------------------------
Original string:[I need trimming on far end. ]
Trimmed string:[I need trimming on far end. ]
---------------------------------------
Original string:[ I need trimming on near end.]
Trimmed string:[I need trimming on near end.]
---------------------------------------
Original string:[ I need more trimming on both ends. ]
Trimmed string:[I need more trimming on both ends.]
---------------------------------------
Original string:[
I need special trimming on both ends.
]
Trimmed string:[I need special trimming on both ends.]
---------------------------------------
Original string:[
I need special trimming on near end.]
Trimmed string:[I need special trimming on near end.]
---------------------------------------
Original string:[I need special trimming on far end.
]
Trimmed string:[I need special trimming on far end.
]
---------------------------------------
Original string:[I need no trimming.]
Trimmed string:[I need no trimming.]
---------------------------------------
Original string:[]
Trimmed string:[]
---------------------------------------
Original string:[ ]
Trimmed string:[]
---------------------------------------

--
pete

James

unread,

Aug 27, 2010, 4:49:32 AM8/27/10

to

"John Kelly" <j...@isp2dial.com> wrote in message

news:q341765mghpkvdlgs...@4ax.com...
[...]

>>errno is not thread safe, what will you do if the user is calling trim
>>in a multi-threaded scenario?
>

> linux fork() is fast enough that many apps don't need threads. Someone
> who needs threads can try writing trim_r().

You can do a low-level trim function that is read-only; e.g., returns offset
and length of trimmed string. Why exactly would you need a special variant
trim_r() for a multi-threaded environment?