As I know strtok_r is re-entrant version of strtok.
strtok_r() is called with s1(lets say) as its first parameter.
Remaining tokens from s1 are obtained by calling strtok_r() with a
null pointer for the first parameter.
My confusion is that this behavior is same as strtok. So I assume
strtok_r must also be using any function static variable to keep the
information about s1. If this is the case then how strtok_r is re-
entrant?
Otherwise how it keeps the information about s1?
Regards,
Siddharth
The reentrant version takes one more argument where it stores its progress:
http://www.bullfreeware.com/download/sources/aix43/libgtop-1.0.9.tar.gz/libgtop-1.0.9/support
// Skip GNU copyright
#include <string.h>
/* Parse S into tokens separated by characters in DELIM.
If S is NULL, the saved pointer in SAVE_PTR is used as
the next starting point. For example:
char s[] = "-abc-=-def";
char *sp;
x = strtok_r(s, "-", &sp); // x = "abc", sp = "=-def"
x = strtok_r(NULL, "-=", &sp); // x = "def", sp = NULL
x = strtok_r(NULL, "=", &sp); // x = NULL
// s = "abc\0-def\0"
*/
char *strtok_r (char *s,
const char *delim,
char **save_ptr)
{
char *token;
if (s == NULL)
s = *save_ptr;
/* Scan leading delimiters. */
s += strspn (s, delim);
if (*s == '\0')
return NULL;
/* Find the end of the token. */
token = s;
s = strpbrk (token, delim);
if (s == NULL)
/* This token finishes the string. */
*save_ptr = strchr (token, '\0');
else
{
/* Terminate the token and make *SAVE_PTR point past it. */
*s = '\0';
*save_ptr = s + 1;
}
return token;
}
> As I know strtok_r is re-entrant version of strtok.
This is true on a system compliant with, e.g., POSIX, but it is
not required by C. Followups set.
> [...misunderstanding...]
I think the problem is that you do not realize that strtok_r
takes one more parameter than strtok, and uses that parameter to
save state from one call to the next.
--
char a[]="\n .CJacehknorstu";int putchar(int);int main(void){unsigned long b[]
={0x67dffdff,0x9aa9aa6a,0xa77ffda9,0x7da6aa6a,0xa67f6aaa,0xaa9aa9f6,0x11f6},*p
=b,i=24;for(;p+=!*p;*p/=4)switch(0[p]&3)case 0:{return 0;for(p--;i--;i--)case+
2:{i++;if(i)break;else default:continue;if(0)case 1:putchar(a[i&15]);break;}}}
strtok_r takes an extra parameter, q pointer to a char * where it stores its
current state.
The implementation is quite straightforward:
char *strtok_r(char *str, const char *delim, char **nextp)
{
char *ret;
if (str == NULL)
str = *nextp;
str += strspn(str, delim);
if (*str == '\0')
return NULL;
ret = str;
str += strcspn(str, delim);
if (*str)
*str++ = '\0';
*nextp = str;
return ret;
}
--
Chqrlie.
There is no such standard C function as strtok_r(). To discuss
such a function you have to give its source, in standard C.
However, I just happen to have a suitable replacement function
lying about, whose source follows:
/* ------- file tknsplit.c ----------*/
#include "tknsplit.h"
/* copy over the next tkn from an input string, after
skipping leading blanks (or other whitespace?). The
tkn is terminated by the first appearance of tknchar,
or by the end of the source string.
The caller must supply sufficient space in tkn to
receive any tkn, Otherwise tkns will be truncated.
Returns: a pointer past the terminating tknchar.
This will happily return an infinity of empty tkns if
called with src pointing to the end of a string. Tokens
will never include a copy of tknchar.
A better name would be "strtkn", except that is reserved
for the system namespace. Change to that at your risk.
released to Public Domain, by C.B. Falconer.
Published 2006-02-20. Attribution appreciated.
Revised 2006-06-13 2007-05-26 (name)
*/
const char *tknsplit(const char *src, /* Source of tkns */
char tknchar, /* tkn delimiting char */
char *tkn, /* receiver of parsed tkn */
size_t lgh) /* length tkn can receive */
/* not including final '\0' */
{
if (src) {
while (' ' == *src) src++;
while (*src && (tknchar != *src)) {
if (lgh) {
*tkn++ = *src;
--lgh;
}
src++;
}
if (*src && (tknchar == *src)) src++;
}
*tkn = '\0';
return src;
} /* tknsplit */
#ifdef TESTING
#include <stdio.h>
#define ABRsize 6 /* length of acceptable tkn abbreviations */
/* ---------------- */
static void showtkn(int i, char *tok)
{
putchar(i + '1'); putchar(':');
puts(tok);
} /* showtkn */
/* ---------------- */
int main(void)
{
char teststring[] = "This is a test, ,, abbrev, more";
const char *t, *s = teststring;
int i;
char tkn[ABRsize + 1];
puts(teststring);
t = s;
for (i = 0; i < 4; i++) {
t = tknsplit(t, ',', tkn, ABRsize);
showtkn(i, tkn);
}
puts("\nHow to detect 'no more tkns' while truncating");
t = s; i = 0;
while (*t) {
t = tknsplit(t, ',', tkn, 3);
showtkn(i, tkn);
i++;
}
puts("\nUsing blanks as tkn delimiters");
t = s; i = 0;
while (*t) {
t = tknsplit(t, ' ', tkn, ABRsize);
showtkn(i, tkn);
i++;
}
return 0;
} /* main */
#endif
/* ------- end file tknsplit.c ----------*/
/* ------- file tknsplit.h ----------*/
#ifndef H_tknsplit_h
# define H_tknsplit_h
# ifdef __cplusplus
extern "C" {
# endif
#include <stddef.h>
/* copy over the next tkn from an input string, after
skipping leading blanks (or other whitespace?). The
tkn is terminated by the first appearance of tknchar,
or by the end of the source string.
The caller must supply sufficient space in tkn to
receive any tkn, Otherwise tkns will be truncated.
Returns: a pointer past the terminating tknchar.
This will happily return an infinity of empty tkns if
called with src pointing to the end of a string. Tokens
will never include a copy of tknchar.
released to Public Domain, by C.B. Falconer.
Published 2006-02-20. Attribution appreciated.
revised 2007-05-26 (name)
*/
const char *tknsplit(const char *src, /* Source of tkns */
char tknchar, /* tkn delimiting char */
char *tkn, /* receiver of parsed tkn */
size_t lgh); /* length tkn can receive */
/* not including final '\0' */
# ifdef __cplusplus
}
# endif
#endif
/* ------- end file tknsplit.h ----------*/
--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>
--
Posted via a free Usenet account from http://www.teranews.com
Come on, strtok_r is part of POSIX. Do you pretend POSIX is not popular
enough.
Multiple implementations of strtok_r have been posted before your answer.
>
> /* ------- file tknsplit.c ----------*/
> #include "tknsplit.h"
>
> /* copy over the next tkn from an input string, after
> skipping leading blanks (or other whitespace?). The
Why skip blanks ? this is not strtok behaviour.
The code and the comment don't agree on what blanks are: by C99 Standard,
blanks are space and tab.
> tkn is terminated by the first appearance of tknchar,
> or by the end of the source string.
Your function definitely differs a lot from strtok that takes a collection
of delimiters instead of a single char.
> The caller must supply sufficient space in tkn to
> receive any tkn, Otherwise tkns will be truncated.
>
> Returns: a pointer past the terminating tknchar.
>
> This will happily return an infinity of empty tkns if
> called with src pointing to the end of a string. Tokens
> will never include a copy of tknchar.
again, this is not the behaviour of strtok: sequences of separators are
considered one.
> A better name would be "strtkn", except that is reserved
> for the system namespace. Change to that at your risk.
>
> released to Public Domain, by C.B. Falconer.
> Published 2006-02-20. Attribution appreciated.
> Revised 2006-06-13 2007-05-26 (name)
> */
>
> const char *tknsplit(const char *src, /* Source of tkns */
> char tknchar, /* tkn delimiting char */
> char *tkn, /* receiver of parsed tkn */
> size_t lgh) /* length tkn can receive */
> /* not including final '\0' */
I have reservations about your API:
- instead of returning a const char *, you should return the number of chars
skipped.
it would prevent const poisonning when you pass a regular char * but cannot
store the return value into the same variable... It would also allow
trivial testing of end of string.
- the lgh parameter should be the size of the destination array
(sizeof(buf)), out of consistency with other C library functions such as
snprintf, and to avoid off by one errors: if callers pass sizeof(destbuf) -
1, they wouln't invoke UB, whereas they would by passing sizeof(destbuf)
with your current semantics.
Posting the source code to a public version strtok_r would have been more
helpful.
The only advantage your function offers over strtok_r is the fact that it
does not modify the source string.
--
Chqrlie.
[snip]
>> There is no such standard C function as strtok_r(). To discuss
>> such a function you have to give its source, in standard C.
>> However, I just happen to have a suitable replacement function
>> lying about, whose source follows:
>
> Come on, strtok_r is part of POSIX. Do you pretend POSIX is not popular
> enough.
POSIX is very popular. So is cricket. Neither, however is topical here.
If there were no other place where POSIX were already discussed, one
would have been created, given its popularity.
POSIX is discussed on comp.unix.programmer, and the people there are
very knowledgeable about the subject.
Regards,
Martien
--
|
Martien Verbruggen | Failure is not an option. It comes bundled
| with your Microsoft product.
|
POSIX may not be topical here, but mentioning strtok_r as a widely available
_fixed_ version of broken strtok is more helpful to the OP than the useless
display of obtuse chauvinism expressed ad nauseam by some of the group's
regulars.
Why did C99 get published without including the reentrant alternatives to
strtok and similar functions is a mystery. I guess the national bodies were
too busy arguing about iso646.h. Other Posix utility functions are missing
for no reason: strdup for instance. Did the Posix guys patent those or is
WG14 allergic to unix ?
--
Chqrlie.
Popularity doesn't enter into it. Presence in the standard library
does. strtok_r doesn't exist there. That makes it off-topic here
in c.l.c. (barring source).
>>
>> /* ------- file tknsplit.c ----------*/
>> #include "tknsplit.h"
>>
>> /* copy over the next tkn from an input string, after
>> skipping leading blanks (or other whitespace?). The
>
> Why skip blanks ? this is not strtok behaviour. The code and the
> comment don't agree on what blanks are: by C99 Standard, blanks are
> space and tab.
This is not strtok. It is tknsplit. This is behaviour that seems
more useful to me. You don't have to use it, but siddhu may wish
to.
>
... snip ...
>
> Posting the source code to a public version strtok_r would have
> been more helpful. The only advantage your function offers over
> strtok_r is the fact that it does not modify the source string.
Which, IMO, is a major improvement. It also detects missing
tokens. It (once more) is NOT strtok. I have no idea what
strtok_r is, except that it invades user namespace.
You can easily write your own version of strdup in a couple lines. I use
the following:
char *strdup(char *s)
{
char *r=0;
int i=0;
do {
r=(char *) realloc(r,++i * sizeof(char));
} while(r[i-1]=s[i-1]);
return r;
}
I did post source code (my own, put in the public domain)
>>>
>>> /* ------- file tknsplit.c ----------*/
>>> #include "tknsplit.h"
>>>
>>> /* copy over the next tkn from an input string, after
>>> skipping leading blanks (or other whitespace?). The
>>
>> Why skip blanks ? this is not strtok behaviour. The code and the
>> comment don't agree on what blanks are: by C99 Standard, blanks are
>> space and tab.
>
> This is not strtok. It is tknsplit. This is behaviour that seems
> more useful to me. You don't have to use it, but siddhu may wish
> to.
You introduced your function like this: "I just happen to have a suitable
replacement function"
One would expect semantics to be a tad closer.
>>
> ... snip ...
>>
>> Posting the source code to a public version strtok_r would have
>> been more helpful. The only advantage your function offers over
>> strtok_r is the fact that it does not modify the source string.
>
> Which, IMO, is a major improvement. It also detects missing
> tokens. It (once more) is NOT strtok. I have no idea what
> strtok_r is, except that it invades user namespace.
You must be joking Mr Falconer. You probably never heard of Unix, or even
Linux... Or do you live on this remote planet Microsoft has not settled yet
? If you have no idea what strtok_r is, learn something new today:
http://linux.die.net/man/3/strtok_r or if you like Microsoft's version
better (part of the secure string proposal)
http://msdn2.microsoft.com/en-us/library/ftsafwz3(VS.80).aspx
--
Chqrlie
This proves my point.
Adding useful functions like strdup would prevent newbies and jokers from
re-inventing them in the most cumbersome, inefficient, ugly error prone
ways.
Your function should take a const char *.
sizeof(char) is 1 by definition
Why do you cast the result of realloc ?
Your function invokes undefined behaviour when running out of memory, it
should return NULL instead.
--
Chqrlie.
<snip>
> I have no idea what strtok_r is, except that it invades user
> namespace.
No, it doesn't.
--
Richard Heathfield <http://www.cpax.org.uk>
Email: -www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999
> "CBFalconer" <cbfal...@yahoo.com> a écrit dans le message de news:
> 46EB3092...@yahoo.com...
<snip>
>> I have no idea what
>> strtok_r is, except that it invades user namespace.
>
> You must be joking Mr Falconer.
No, he's toeing the group line, such as it is. As far as comp.lang.c is
concerned, there is *no such function* as strtok_r. If this question
were to arise in, say, comp.unix.programmer, Chuck's answer might be
very different.
[...]
> I have no idea what
> strtok_r is, except that it invades user namespace.
If you have no idea what those *_r functions are, it's time for you (as
a Linux user) to read Stevens APUE! :)
str[a-z] is reserved name space, so it isn't part of the user name space.
--
Tor <torust [at] online [dot] no>
I have seen Chuck post a number of times over in Linux forums, so it's
rather surprising if he doesn't know about POSIX.
Methinks he know, but choose here to pretend he doesn't! :-)
EXACTLY!
> Why did C99 get published without including the reentrant alternatives to
> strtok and similar functions is a mystery. I guess the national bodies were
> too busy arguing about iso646.h. Other Posix utility functions are missing
> for no reason: strdup for instance. Did the Posix guys patent those or is
> WG14 allergic to unix ?
>
C99 did not change ANY of the bugs of the standard library
o non reentrant functions like strtok remained and no alternative
was proposed even if POSIX had developed one.
o Buffer overflows were written into the standard itself.
I had a lengthy discussion in comp.std.c about asctime()
and the fixed buffer of 26 position it says it needs. It
suffices to put some wrong values into the input structure
and you have a buffer overflow. But no corrective action
was taken. More, the commitee told the people reporting
the bug that it was OK to have a buffer overflow there.
o gets() was maintained of course. Only after lengthy discussions,
Mr Gwyn felt forced to propose a "fix" that would have fixed the
input buffer size to at least BUFSIZ. The committee apparently
decided that gets() was deprecated, maybe because of the discussion
in comp.std.c, I do not know. In any case it would have been
better to do it when C99 was published.
o Trigraphs were maintained in the standard.
And I could go on with those examples...
Intsead of using realloc in a loop,
I think most programmers would write strdup with
one function call to strlen and one to malloc and one to strcpy.
--
pete
<snip>
> Intsead of using realloc in a loop,
> I think most programmers would write strdup with
> one function call to strlen and one to malloc and one to strcpy.
memcpy, surely? Why measure the string twice?
If he would give a different answer on a different group, one of these
statements would be a lie or a joke.
So he is a fundamentalist, ostracist, extremist...
--
Chqrlie.
...or a way of making a point, a la "Ich bin ein Berliner", with which
John F Kennedy bolstered the morale of West Berlin's citizens in June
1963. It was not "true" in the literal sense, but neither was it a lie
or a joke.
> So he is a fundamentalist, ostracist, extremist...
If you feel forced to resort to personal attacks, I can only assume you
have no logical arguments to put forward.
Personal attacks are allowed only for friends of Heathfield & Co.
Or more efficiently calling memcpy instead of strcpy.
char *strdup(const char *str) {
size_t len;
char *dest = NULL;
if (str) {
len = strlen(str);
dest = malloc(len + 1);
if (dest) {
memcpy(dest, str, len);
dest[len] = '\0';
}
}
return dest;
}
--
Chqrlie.
Even if it doesn't run out of memory, there's no reason to assume
realloc won't return a pointer to a different area of memory each
time: with the code above, this will lead to 1) memory leaks; 2) the
first part of the string is not copied properly.
The name strdup is also reserved for the implementation.
My suggestion would be:
#include <stdlib.h>
#include <string.h>
char *my_strdup(const char *s)
{
size_t len;
char *t;
if(t=malloc(len=strlen(s)+1))
memcpy(t, s, len);
return t;
}
>
> --
> Chqrlie.
Except CBFalconer is no John F. Kennedy ;-)
His blunt rethoric does not bolster any one's morale, sarcasm does no good.
>> So he is a fundamentalist, ostracist, extremist...
>
> If you feel forced to resort to personal attacks, I can only assume you
> have no logical arguments to put forward.
You are right, I should not have attributed to malice that which can be
adequately explained by plain ignorance. But I repeat: to not use strtok
anymore, check for availability of strtok_r or implement it locally from the
public domain source that has been posted above.
--
Chqrlie.
My suggestion would be:
#include <stdlib.h>
#include <string.h>
It s is NULL, this version only returns NULL if the implemenation's
malloc(0) returns NULL too
Bye, Jojo
Only the last sentence was mine...
No assumption is made about the return value of realloc pointing to
the same area. The above code will indeed cause memory leak when
running out of memory, but undefined behaviour will have been invoked
already since NULL is dereferenced then. Apart from that, the string
is copied correctly because realloc does preserve the contents of the
block it reallocates upto the smaller of old and new sizes.
> The name strdup is also reserved for the implementation.
That's one more reason it should have been standardized in C99.
> My suggestion would be:
>
> #include <stdlib.h>
> #include <string.h>
>
> char *my_strdup(const char *s)
> {
> size_t len;
> char *t;
> if(t=malloc(len=strlen(s)+1))
> memcpy(t, s, len);
> return t;
> }
You code performs the task, but I find it misleading to call len a var
iable that is not the length of the string. I prefer to use size for
this purpose.
Furthermore, this code would not pass my default warning settings.
Assignment as an test expression is considered sloppy and error prone.
--
Chqrlie.
Entirely consistent with the standard library string functions - if
you pass them a char * that doesn't point to a string, the behavior is
undefined.
<OT>
And this is one case where "the thing you hope will happen" probably
doesn't - e.g. trying to compute strlen(NULL) on a GNU system produces
a seg fault).
</OT>
>
> Bye, Jojo
Tell that to Kernighan and Ritchie. :)
>
> --
> Chqrlie.
Is there a reason for the typo in your signature?
And it does not make much sense ;-)
If s in NULL, strlen(s) invokes undefined behaviour.
otherwise, len is always> 0, and the code does not depend on the behaviour
of malloc(0)
--
Chqrlie.
Oh well...
Bye, Jojo
> <OT>
> And this is one case where "the thing you hope will happen" probably
> doesn't - e.g. trying to compute strlen(NULL) on a GNU system produces
> a seg fault).
> </OT>
Damn, here too...
anwyway: see above
Bye, Jojo
They might read this thread, I am sure they would care to comment.
Coding conventions is a very effective tool to catch bugs at an early
stage in development. Using all the help the compiler and other
automated tools can give at tracking potential errors disguised as
suspicious use of certain operators enhances productivity.
There is no gain at writing
size_t len;
char *t;
if(t=malloc(len=strlen(s)+1)) ...
instead of
size_t size = strlen(s) + 1;
char *t = malloc(size);
if (t) ...
The latter is much more readable and less error prone.
Your version did improve on mine by using memcpy to copy the '\0'
instead of writing separate code for that.
--
Chqrlie.
> Is there a reason for the typo in your signature?
chqrlie is my handle, is there a reason you don't sign your messages ?
True but quite easy to fix:
char *my_strdup(const char *str) {
char *dest = NULL;
if (str) {
size_t size = strlen(str) + 1;
dest = malloc(size);
if (dest)
memcpy(dest, str, size);
}
return dest;
}
This newsgroup is comp.lang.c. C is defined by the various C
standards, present or past, and includes K&R for times previous to
1989. None of these define, or even mention, strtok_r. Thus,
without standard C code, published in the same message, discussion
of it is off-topic here. The name is still reserved for the
implementor. As I said, it doesn't exist. Unix, Linux, Microsoft
have no influence whatsoever.
--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>
--
Posted via a free Usenet account from http://www.teranews.com
Well, it's not exactly a typo, but ....
I challenge the 'more efficient'. It will be highly dependent on
the compiler, but at the simplest you would be trading the effort
of an extra procedure call against the possible efficiency
improvement. Since most strings are short (in my case, probably
under 10 or 20 chars) this 'improvement' is a chimera. Also
bearing in mind that strdup is a system reserved name, my version
(with a #include <stdlib.h>) is:
char *dupstr(const char *str) {
char *dest, *temp;
if (dest = malloc(1 + strlen(str))) {
temp = dest;
while (*temp++ = *str++) continue;
}
return dest;
}
and I am willing to let it go boom when str is NULL, for early
warning etc. of problems.
That is not a C system. strlen returns a size_t, which is
unsigned, and thus can never return -1.
Bye, Jojo
Gee, I'm curious what implementation documents this behaviour.
--
Chqrlie.
Bye, Jojo
I agree with you, I am just used to memcpy, strlen and strcpy all expanding
inline.
But performance is better measured with a profiler than estimated by mental
projection.
> Also bearing in mind that strdup is a system reserved name,
I'm aware of that, most systems I use provide it.
> my version
> (with a #include <stdlib.h>) is:
>
> char *dupstr(const char *str) {
> char *dest, *temp;
>
> if (dest = malloc(1 + strlen(str))) {
> temp = dest;
> while (*temp++ = *str++) continue;
> }
> return dest;
> }
>
> and I am willing to let it go boom when str is NULL, for early
> warning etc. of problems.
That's a valid choice. Specifying htat strdup(NULL) -> NULL would make
sense too.
I think it should be documented either way.
--
Chqrlie.
I prefer
char *strdup(const char *str)
{
char *dest = NULL;
if (str) {
size_t size = strlen(str)+1;
dest = malloc(size);
if (dest)
memcpy(dest, str, size);
}
return dest;
}
Or
char *strdup(const char *str)
{
if (!str)
return NULL;
else {
size_t size = strlen(str)+1;
char *dest = malloc(size);
if (dest)
memcpy(dest, str, size);
return dest;
}
}
--
Flash Gordon
--
Joe Wright
"Everything should be made as simple as possible, but not simpler."
--- Albert Einstein ---
> Richard Heathfield wrote:
>> pete said:
>>
>> <snip>
>>
>>> Intsead of using realloc in a loop,
>>> I think most programmers would write strdup with
>>> one function call to strlen and one to malloc and one to strcpy.
>>
>> memcpy, surely? Why measure the string twice?
>>
> Huh? strcpy doesn't measure anything.
Sorry, Joe - I was guilty of truncated exegesis. What I meant was this:
that strcpy must keep going until it hits a null terminator, and it
doesn't know in advance where that null terminator will be found, so it
must test every character. So, although it isn't measuring the string
as such, that's only because it doesn't bother to write down how long
the string is. It's still ploughing through the string, character by
character. But we've already done that with our strlen call. By using
memcpy, we can take advantage of the fact that the string has already
been measured - memcpy can use any number of platform-specific tricks
for copying multiple bytes at a time. Therefore, if the length of the
string to be copied is known in advance, it is (likely to be) more
efficient to use memcpy than strcpy.
Well, maybe.
The standard function strtok() is non-reentrant *and* it has some
other -- well, not bugs necessarily, but quirks. For example, the
fact that it merges multiple adjacent delimiters can be inconvenient,
though it might be just what you need. (In practice, you usually want
this behavior if the delimiter is whitespace, but not if it's
something else.)
If you're using strtok() and it already does exactly what you want
except for the lack of reentrancy, then strtok_r() (if it's available
on your system -- and if not, you can compile it yourself) is just the
thing. If your requirements are less specific, then tknsplit() might
turn out to be perfect for you -- or some other non-standard function
might be better.
--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Sure it can:
size_t strlen(const char *s)
{
if (s == NULL) return -1;
/* ... */
}
(Of course the value -1 will be converted to size_t.)
--
keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
How is returning (size_t)-1 better than a seg fault?
If I pass NULL to strlen(), there's a bug in my program. I'd like to
find out about it as early as possible. If strlen() quietly returns
-1, I might not detect the error until much later.
--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Ask the people who were on the committees. :-) (Seriously, I do not
know why the non-reentrant versions were retained without at least
some sort of cleanup.)
In article <46eb974e$0$5080$ba4a...@news.orange.fr>
jacob navia <ja...@jacob.remcomp.fr> wrote:
>C99 did not change ANY of the bugs of the standard library
I am not sure I would call all of these "bugs". ("Misfeatures",
perhaps, especially trigraphs :-) . More seriously, just two
points here:)
>o non reentrant functions like strtok remained and no alternative
> was proposed even if POSIX had developed one.
While strtok_r() is an improvement on strtok(), it leaves one of
strtok()'s fundamental flaws in place. If one is going to "improve"
strtok(), one should at least look at the BSD strsep().
Still, importing the whole set of POSIX "_r" functions would, I
think, have been better than doing nothing.
>o Buffer overflows were written into the standard itself.
This is, at best, an overstatement.
> I had a lengthy discussion in comp.std.c about asctime()
> and the fixed buffer of 26 position it says it needs. It
> suffices to put some wrong values into the input structure
> and you have a buffer overflow.
If you "put some wrong values" in, you have little hope of expecting
*anything* -- what happens in lcc-win32, for instance, if I write:
struct big { int a[1000]; };
struct big main(double oops) {
short x = strlen((char *)0x98766542);
... /* more "wrong values" as inputs as needed */
return *(struct big *)42;
}
? If you want to protect against bad inputs, you need to think
hard about which kinds of "bad inputs" to guard against, and do
some serious cost/benefit analysis.
Moreover, if your objection is that values of .tm_year greater
than 8100 (or less than or equal to some negative number) cause
problems, you can always test for that in your own implementation:
__internal_return_type __internal_worker_function_for_times(...) {
...
if (OUT_OF_RANGE(tm->tm_year)) ... signal error ...
...
}
which might be used as, e.g.:
char *asctime(const struct tm *tm) {
...
if (__internal_worker_function_for_times(...) == ERROR)
__runtime_error_trap_report("invalid parameter to asctime()");
...
}
and thus demonstrate the superior Quality of Implementation of
lcc-win32, with regard to this particular possibility. (Presumably
__runtime_error_trap_report saves the state of the program for use
in the debugger, prints a stack trace, and/or does whatever else
is good for fixing program bugs.)
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.
By returning -1 strlen is not being so quiet. If you check function's return
value then you can catch the error at least as good as if it'd just segfault.
It could not segfault for some reason, but you'd always be able to check the
return value and tell yourself there's something wrong.
I suppose I was unduly influenced by the strdup example in K&R2.
--
pete
>> o Buffer overflows were written into the standard itself.
>
> This is, at best, an overstatement.
>
A buffer overflow happens when a fixed size memory area is defined
but a program writes PAST the fixed size buffer. This is a buffer
overflow.
Now, the standard specifies a buffer length of 26 for the buffer of
asctime.
In the official C standard of 1999 we find the specifications of the
“asctime” function, page 341:
char *asctime(const struct tm *timeptr)
{
static const char wday_name[7][3] = {
"Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat"
};
static const char mon_name[12][3] = {
"Jan", "Feb", "Mar", "Apr", "May", "Jun",
"Jul", "Aug", "Sep", "Oct", "Nov", "Dec"
};
static char result[26]; // <<<<<<<------------------------!!
sprintf(result, "%.3s %.3s%3d %.2d:%.2d:%.2d %d\n",
wday_name[timeptr->tm_wday],
mon_name[timeptr->tm_mon],
timeptr->tm_mday, timeptr->tm_hour,
timeptr->tm_min, timeptr->tm_sec,
1900 + timeptr->tm_year);
return result;
}
Nowhere is specified that the year value should be less than 8900.
> If you "put some wrong values" in, you have little hope of expecting
> *anything*
Of course. This is exactly the kind of sloppy specifications
attitude where anything goes, and no error analysis is ever done!
Shouldn't a seriously designed function have some way of
indicating an error when some of its inputs are wrong instead of
just making a buffer overflow?
Shouldn't a standard specify either:
o a bigger buffer to accommodate ANY year up to INT_MAX?
o a maximum year where the standard says (at least) that years must be
smaller than 8100 and sets upper and lower bounds for the input
data???
THAT would be a correctly specified function. UB would be clearly
signaled. In the text as it stands in the standard there is NO MENTION
of any limit!!!
I am not the first one to discover this.
Mr Clive Feather submitted a defect report saying in substance the
same thing as I said. The committee answer was:
<quote>
Thus, asctime() may exhibit undefined behavior if any of the members of
timeptr produce undefined behavior in the sample algorithm (for example,
if the timeptr->tm_wday is outside the range 0 to 6 the function may
index beyond the end of an array).
As always, the range of undefined behavior permitted includes:
Corrupting memory
Aborting the program
Range checking the argument and returning a failure indicator (e.g., a
null pointer)
Returning truncated results within the traditional 26 byte buffer.
There is no consensus to make the suggested change or any change along
this line.
<end quote>
You read correctly. Corrupting memory (i.e. a buffer overflow) is
within the range of undefined behavior acceptable!!!!
I have the right then, to name a buffer overflow for what it is, a
buffer overflow in the C standard with all the committee behind it.
-- what happens in lcc-win32, for instance, if I write:
>
> struct big { int a[1000]; };
> struct big main(double oops) {
> short x = strlen((char *)0x98766542);
> ... /* more "wrong values" as inputs as needed */
> return *(struct big *)42;
> }
>
> ? If you want to protect against bad inputs, you need to think
> hard about which kinds of "bad inputs" to guard against, and do
> some serious cost/benefit analysis.
Yes. Let's do this ok?
The number of bytes needed is very easy to calculate. I explained
how to do this in my tutorial about the C language page 122:
<quote>
1.26.1.1 Getting rid of buffer overflows
How much buffer space we would need to protect asctime from buffer
overflows in the worst case?
This is very easy to calculate. We know that in all cases, %d can't
output more characters than the maximum numbers of characters an integer
can hold. This is INT_MAX, and taking into account the possible negative
sign we know that:
Number of digits N = 1 + ceil(log10((double)INT_MAX));
For a 32 bit system this is 11, for a 64 bit system this is 21.
In the asctime specification there are 5 %d format specifications,
meaning that we have as the size for the buffer the expression:
26+5*N bytes
In a 32 bit system this is 26+55=81.
This is a worst case oversized buffer, since we have already counted
some of those digits in the original calculation, where we have allowed
for 3+2+2+2+4 = 13 characters for the digits. A tighter calculation can
be done like this:
Number of characters besides specifications (%d or %s) in the string: 6.
Number of %d specs 5
Total = 6+5*11 = 61 + terminating zero 62.
The correct buffer size for a 32 bit system is 62.
<end quote>
COST and BENEFIT ANALYSIS:
--------------------------
The difference between 62 and 26 is 36. For sparing 36 bytes we have a
buffer overflow. Now do your cost/benefit analysis. Since I paid
120 euros for 2GB (2*1024*1024*1024) bytes, each byte costs
0.0000000004656612873077392578125 euros. Times 36 gives:
0.00000001676380634307861328125 euros.
Is this too expensive for you?
>
> Moreover, if your objection is that values of .tm_year greater
> than 8100 (or less than or equal to some negative number) cause
> problems, you can always test for that in your own implementation:
A buffer of 62 bytes will handle ANY POSSIBLE INPUT in a 32 bit
implementation, there isn't even any need for testing!!!
Are we developing software in 2007?
Or we are still living in the PDP-11?
Why this myopic attitude towards error analysis, that has led to
people leaving C as a reasonable language forever?
C == buffer overflow...
Many people thing like this already. Do we need to furnish a proof with
a buffer overflow in the text of the C standard?
Yours truly.
jacob
Since returning (size_t)-1 is non-standard behavior (though it's
allowed), I'm not likely to check for it.
Yeah, whatever. I'm a coder at a Fortune 500 company, I think I can just
about write a strdup function that works more than adequately on any
machine I'd ever want to run it on.
Claiming that you work for a Fortune 500 company might impress your
aunt, but the fact remains that your implementation of a string
duplication function left a lot to be desired. You would do well to
learn from your mistakes, rather than try to justify them.
>
>If he would give a different answer on a different group, one of these
>statements would be a lie or a joke.
If my teenage son asks me how they work out the price of credit
default swaps, I give one answer.
If junior quant analyst in a bank asks me the same question, I give an
entirely different answer.
I assure you, neither answer is a lie or a joke.
>So he is a fundamentalist, ostracist, extremist...
Or he's tailoring his answer to the forum of the question.
--
Mark McIntyre
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan
> Mr Clive Feather submitted a defect report saying in substance the
> same thing as I said. The committee answer was:
>
> <quote>
> Thus, asctime() may exhibit undefined behavior if any of the members of
> timeptr produce undefined behavior in the sample algorithm (for example,
> if the timeptr->tm_wday is outside the range 0 to 6 the function may
> index beyond the end of an array).
>
> As always, the range of undefined behavior permitted includes:
> Corrupting memory
> Aborting the program
> Range checking the argument and returning a failure indicator (e.g., a
> null pointer)
> Returning truncated results within the traditional 26 byte buffer.
> There is no consensus to make the suggested change or any change along
> this line.
> <end quote>
>
> You read correctly. Corrupting memory (i.e. a buffer overflow) is
> within the range of undefined behavior acceptable!!!!
>
> I have the right then, to name a buffer overflow for what it is, a
> buffer overflow in the C standard with all the committee behind it.
You have the right to misread anything. You have a responsibility, as
a self-proclaimed expert. to think with something other than your
gonads. The response you quoted clearly encompasses a variety
of nicer behaviors than buffer overflow, but you neglect to take
them in.
The committee *accepts* that buffer overflow can occur in a
conforming implementation. The same is true of:
int a[10];
a[300] = 4;
And the committee is "behind" an implementation that overwrites
storage when this ill-formed program executes. (The committee
is also "behind" an implementation that aborts with a diagnostic
message.)
Get over it.
P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com
>On 15 Sep 2007 at 2:31, Charlie Gordon wrote:
>> Your function should take a const char *.
>> sizeof(char) is 1 by definition
>> Why do you cast the result of realloc ?
>> Your function invokes undefined behaviour when running out of memory, it
>> should return NULL instead.
>
>Yeah, whatever. I'm a coder at a Fortune 500 company,
Whoopy doo. *Anyone* will tell you that this is absolutely no
guarantee whatsoever of coding quality or ability.
> I think I can just
>about write a strdup function that works more than adequately on any
>machine I'd ever want to run it on.
Pardon me, but you had a few actual errors pointed out, perhaps you
should consider being less arrogant?
You posted an extraordinarily crappy code example.
--
pete
> You have the right to misread anything.
I did not misread, I quoted exactly what the committee answered.
> You have a responsibility, as
> a self-proclaimed expert. to think with something other than your
> gonads.
You address none of the technical points I raised. You do not explain
why the committee specifies an obviously too small buffer and fails
to provide for upper limits. But you throw "thinking with your
gonads" into the discussion to provoke an emotional atmosphere and
take people away from the technical discussion... since you have
absolutely NO technical arguments to propose.
> The response you quoted clearly encompasses a variety
> of nicer behaviors than buffer overflow, but you neglect to take
> them in.
>
Absolutely not since I cited them. What I do not accept is that
the commitee explicitely says that they would rather have an
IMPLICIT UB instead of adjusting the size of the buffer or
providing an upper limit explicitely.
> The committee *accepts* that buffer overflow can occur in a
> conforming implementation. The same is true of:
>
> int a[10];
> a[300] = 4;
>
Great! Since in C anything goes (see above) let's make things
worse. Let's put that in the standard then!
The committee provides code for very few functions. For unknown
reasons then, they decided to put code into the standard text
that contains a clear buffer overflow problem.
And they persist into their error. Changing that 26 to a number
based on sizeof(INT_MAX) is beyond them, even if there is a clear
proof of how the calculation is done.
Why?
I do not know. In general, Mr Plauger is somebody that
jas written software of good quality and his book about the
C library has been a good inspiration for me. For this reasons
his attitude now is even more incomprehensible.
jacob
That is NOT -1. The cast of -1 to a size_t is exactly size_t_MAX.
Well, some 5 years ago, I made a similar comment on your code Richard,
which was using strcpy() at the time. We had a rather "long" argument
about it, and in the end, I tried to make my point by measuring memcpy()
vs strcpy() performance.
IIRC, the result of those tests, was rather humiliating for me, as your
strcpy() performed excellent! :-)
Is there a reason to beleave, that the strcpy() has become more CPU
bound in recent years? If not, I don't think you will have much success
in measuring an improvement by using memcpy().
Making good measurements on this, is a challenge. We don't want to
measure L1 cache performance only.
--
Tor <torust [at] online [dot] no>
> Richard Heathfield wrote:
<snip>
>> Therefore, if the length of the string to be copied is known in
>> advance, it is (likely to be) more efficient to use memcpy than
>> strcpy.
>
> Well, some 5 years ago, I made a similar comment on your code Richard,
> which was using strcpy() at the time. We had a rather "long" argument
> about it, and in the end, I tried to make my point by measuring
> memcpy() vs strcpy() performance.
>
> IIRC, the result of those tests, was rather humiliating for me, as
> your strcpy() performed excellent! :-)
Whoops! :-) But really, I don't remember that at all. Sorry. I do,
however, recall that I used to use strcpy in those circumstances, and
now I use memcpy. Have I measured the difference? No, not really. I
care about performance enough not to want to throw it away willy-nilly,
but other than that I'm not really fussed. I try to focus more on
readability, correctness, and makessenseness. I guess the memcpy
argument just made sense to me (eventually!).
<snip>
> Making good measurements on this, is a challenge. We don't want to
> measure L1 cache performance only.
I don't think it's that hard, actually. Write a program that can either
do a hundred million memcpying dupstrs or a hundred million strcpying
dupstrs, the choice being easily selectable by the user, and copies
data built from a predetermined PRNG (a hundred million strings of
varying lengths and contents), and records the results. Reboot machine.
Run program with Option A. Reboot machine. Run program with Option B.
Compare results.
Caches become irrelevant under these circumstances, I think, since any
cache benefit that one option gets will be cancelled by the fact that
the other option gets it too.
You haven't demonstrated it so far. Frankly, when I read your
implementation upthread I assumed it was a joke. You call realloc()
once for each character; why not compute the length and call malloc()
just once?
Yes, calling asctime() with certain arguments can result in a buffer
overflow.
Calling strcpy() with certain arguments can result in a buffer
overflow. Likewise for sprintf(), sscanf(), memcpy(), memmove(),
strcat(), etc. In all these cases, the arguments passed are under the
program's control; the problem can reliably be avoided by checking the
arguments before invoking the function.
I happen to agree that asctime() should be defined to use a larger
buffer, one big enough so that the buffer won't overflow for any
possible arguments. But the problem is so easy to avoid that it's
hardly a fatal flaw in the language -- and it can't overflow if you
give it an argument corresponding to the current time (at least not
for the next 8000 years or so). It's certainly not nearly as
dangerous as gets().
I generally wouldn't use asctime() anyway. The format it uses isn't
my favorite (I prefer YYYY-MM-DD for dates), and the trailing '\n' is
more trouble than it's worth. In real code, I'd use strftime()
instead, which is more flexible and doesn't have asctime()'s problems.
Frankly, I too thought the repeated calls to realloc was some sort of joke
from a forum regular trying to come up with the most inefficient yet correct
implementation and was surprised to find the small klotzy details I pointed
out.
If you are actually proud of the code you posted, and consider that a good
example of what you are paid for by a large corporation, shame on you ! You
have some serious progress to make to reach 'decent' status. So far you
qualify for 'best of the worst'. I guess being the best is what prompts
your arrogance, but rest assured everyone here can come up with an even
worse proposal, one you would not even understand.
No matter how efficient and powerful the hardware guys make their products,
there will be software bums to destroy these gains, and managers to come up
with lame excuses and marketers to ship lousy crap. Sturgeon was so right!
--
Chqrlie.
The reason people become coders at Fortune 500 companies is not because
they are any good at coding, but it's because they are good at quickly
churning out lots of code that passes acceptance tests. The more you
care about actual code quality, the less quantity you can churn out,
and the less productive you seem to managers. Managers, of course,
are the kind of person who gets other people to do their work for
them, so they wouldn't know code quality if it bit them in their
faces. So that nicely sums up how much weight your remark has.
Of course, you might be joking (put more precisely: Trolling).
SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
> Sam wrote:
> ) Yeah, whatever. I'm a coder at a Fortune 500 company, I think I can
> just ) about write a strdup function that works more than adequately on
> any ) machine I'd ever want to run it on.
>
> The reason people become coders at Fortune 500 companies is not because
> they are any good at coding, but it's because they are good at quickly
> churning out lots of code that passes acceptance tests. The more you
> care about actual code quality, the less quantity you can churn out,
> and the less productive you seem to managers. Managers, of course,
> are the kind of person who gets other people to do their work for
> them, so they wouldn't know code quality if it bit them in their
> faces. So that nicely sums up how much weight your remark has.
> Of course, you might be joking (put more precisely: Trolling).
It is a great shame that computer programming has become such a
commoditised task, with individual excellence being suppressed
by artificial deadlines and ludicrous budgets, so that nobody
who actually cares about the quality of the source code they
produce is able to spend the necessary time on it to get it
right, if they wish to compete in the market-place against
those who are perfectly content to churn out any old junk
as long as it can pass a badly-designed UAT. Our society
gets the bugs it deserves, by failing to insist on only
the highest quality code. This may explain the current
trend towards bozo-friendly languages that eschew all
pretence of high performance in favour of protecting
the programmer against his own silly mistakes. This
is why we are saddled with Gates's Law ("the speed
of software halves every eighteen months"). If we
insisted on programmers knowing their subject as
we insist on brain surgeons knowing theirs, the
whole of our society would have better, faster
software; software on which it could rely. To
buck the market, though, is becoming far too
expensive, and so it is unlikely that we'll
ever get high quality software, unless the
market manages to find a way to encourage
quality across the industry, rather than
punishing those companies that have the
courage and integrity to turn out high
quality programs, albeit at a greater
initial cost and therefore at prices
that seem unattractive to the naive
software purchaser. If we are able
to discover such a way, the whole
of society will be better off as
a result. It will not be simple
to accomplish such a change in
the current market-place, but
if we do not do so, then the
software we continue to use
day by day will remain, as
now, broken by mis-design
and an embarrassing wart
on an advanced society.
Point of order: That isn't just a preference, it is adhering to
ISO date format specification.
size_t strlen(const char *s) {
size_t r = 0;
if (s) while (*s++) ++r;
return r;
}
--
Joe Wright
"Everything should be made as simple as possible, but not simpler."
--- Albert Einstein ---
And neither is a NOP if being passed a NULL...
Bye, Jojo
>I don't want to check strlen for error. SIZE_MAX may well be valid.
>Passing in a NULL should be a NOP in my view.
>size_t strlen(const char *s) {
> size_t r = 0;
> if (s) while (*s++) ++r;
> return r;
>}
Then how will you distinguish between the string containing just
the terminating nul, and the null pointer?? strlen() is often
used to determine array indices; you don't want to be indexing
the NULL pointer (for one thing, the result of the indexing
might get you to a readable or writable memory location -- and yes,
there are real systems on which virtual addresses near 0 are
accessible.)
--
Okay, buzzwords only. Two syllables, tops. -- Laurie Anderson
My point is that, exactly. The *specification* is flawed in the sense
that it doesn't specify a maximum range of the input, but prescribes a
maximum length for the buffer in the example code given as illustration!
No error returns are ever specified. The proposed correction by Mr
Cleaver said to fill the overflowing fields with the character '*'...
Not even that was allowed.
jacob
No worries, this might even have been 6-7 years ago. :-)
> I do,
> however, recall that I used to use strcpy in those circumstances, and
> now I use memcpy. Have I measured the difference? No, not really. I
> care about performance enough not to want to throw it away willy-nilly,
> but other than that I'm not really fussed.
My argument back then, was similar to yours now. However, to my big
surprise, we had to put this into the micro-optimalization category,
after measurements on quite a number of different compilers and platforms.
> I try to focus more on readability, correctness, and makessenseness.
Me too.
> I guess the memcpy argument just made sense to me (eventually!).
:-)
>> Making good measurements on this, is a challenge. We don't want to
>> measure L1 cache performance only.
>
> I don't think it's that hard, actually. Write a program that can either
> do a hundred million memcpying dupstrs or a hundred million strcpying
> dupstrs, the choice being easily selectable by the user, and copies
> data built from a predetermined PRNG (a hundred million strings of
> varying lengths and contents), and records the results. Reboot machine.
> Run program with Option A. Reboot machine. Run program with Option B.
> Compare results.
>
> Caches become irrelevant under these circumstances, I think, since any
> cache benefit that one option gets will be cancelled by the fact that
> the other option gets it too.
Making good measurements are usually a challenge, I have rarely seen
measurement code without some serious flaws or defects. I can't remember
the quality of the benchmark we used years ago, but I would expect it to
be a better starting point, than writing a new one from scratch.
I don't think it should be. If strlen(s) was SIZE_MAX, then the total
size of s (including the terminating NUL) would be SIZE_MAX+1, which
isn't representable in a size_t. So that should not be possible.
> Passing in a NULL should be a NOP in my view.
I think it's a bug which should result in a segfault.
hp
--
_ | Peter J. Holzer | I know I'd be respectful of a pirate
|_|_) | Sysadmin WSR | with an emu on his shoulder.
| | | h...@hjp.at |
__/ | http://www.hjp.at/ | -- Sam in "Freefall"
>In general, Mr Plauger is somebody that
>jas written software of good quality and his book about the
>C library has been a good inspiration for me. For this reasons
>his attitude now is even more incomprehensible.
Perhaps it has something to do with your attitude of extreme pomposity
and your continual rudeness to various posters here which brings out
the worst in others?
>CBFalconer wrote:
>> Point of order: That isn't just a preference, it is adhering to
>> ISO date format specification.
>
>My point is that, exactly.
Er, no - the point you go on to make below has *nothing* to do with
CBF's posting.
(of some proposed solution to asctime overflowing)
>Not even that was allowed.
From which you apparently draw the conclusion that the ISO committee
were a shower of jobsworths, idiots and fools. Does that really seem
likely to you?
Of course it's allowed. Anything is allowed for undefined behavior.
How can it be a NOP? strlen() has to return some value.
I the example you've shown, strlen(NULL) isn't a NOP; it lies to you
and pretends that you passed it a valid empty string.
I don't think I'll bother doing it, but I would make something that
reads a file into a linked list, and then duplicates that linked
list. Time tha duplication, and keep the result. Change
something, such as the dupstr routine, or the malloc package, etc.
and repeat. Ensure that the malloced memory is always in a proper
condition for the timed run. The idea is to eliminate the disk
i/o, since the data is in memory in the first place. Each read
sequence should have similar, if not identical, timing penalties.
By changing the input file you can change the data manipulated, and
detect data sensitivity of the algorithms.
[...]
I don't bother diving into this either, perhaps compiler X has come up
with a great memcpy() optimization trick since last time, perhaps not.
First thing to do, would be to check the asm generated and read the CPU
optimization guides, without understanding whats's going on, a benchmark
would be rather meaningless to me. For example, if pulling in some
floating-point ops, would make the x86 optimization trick via MMX
registers, a disaster.
Pssst... one of the oldest tricks in the book to crush the Java/C++
folks, is to avoid malloc'ed memory. So, strdup() heavy code, should
rather be optimized by minimizing the usage of strdup(). :)
Bye, Jojo
Are you sure you weren't arguing about:
strcpy(dst, src);
vs.
memcpy(dst, src, strlen(src) + 1);
...as I find it hard to believe that Richard was arguing that strcpy
would be faster (possibly on certain platforms it's very close, but
not faster).
--
James Antill -- ja...@and.org
C String APIs use too much memory? ustr: length, ref count, size and
read-only/fixed. Ave. 44% overhead over strdup(), for 0-20B strings
http://www.and.org/ustr/
> On Sun, 16 Sep 2007 04:02:23 +0200, Tor Rustad wrote:
>
>> Richard Heathfield wrote:
<snip>
>>> Therefore, if the length of the
>>> string to be copied is known in advance, it is (likely to be) more
>>> efficient to use memcpy than strcpy.
>>
>> Well, some 5 years ago, I made a similar comment on your code Richard,
>> which was using strcpy() at the time. We had a rather "long" argument
>> about it, and in the end, I tried to make my point by measuring memcpy()
>> vs strcpy() performance.
>>
>> IIRC, the result of those tests, was rather humiliating for me, as your
>> strcpy() performed excellent! :-)
>
> Are you sure you weren't arguing about:
>
> strcpy(dst, src);
> vs.
> memcpy(dst, src, strlen(src) + 1);
>
> ...as I find it hard to believe that Richard was arguing that strcpy
> would be faster (possibly on certain platforms it's very close, but
> not faster).
Kind of you, James, and I must admit I find it hard to imagine arguing that
way as well. Unfortunately, I remember that I had some pretty strange
misconceptions about C a decade or so ago, so it's not utterly impossible.
But no, I don't remember this particular debate. Your mod seems plausible,
however. (I can't think of any reason why I'd want to call memcpy in that
way, however.)
[...]
>> Well, some 5 years ago, I made a similar comment on your code Richard,
>> which was using strcpy() at the time. We had a rather "long" argument
>> about it, and in the end, I tried to make my point by measuring memcpy()
>> vs strcpy() performance.
>>
>> IIRC, the result of those tests, was rather humiliating for me, as your
>> strcpy() performed excellent! :-)
>
> Are you sure you weren't arguing about:
>
> strcpy(dst, src);
> vs.
> memcpy(dst, src, strlen(src) + 1);
Yes, 100% sure.
> ...as I find it hard to believe that Richard was arguing that strcpy
> would be faster (possibly on certain platforms it's very close, but
> not faster).
No no, Richard didn't say that strcpy was faster. IIRC, he just said
something along the lines, that my claims would be compiler- and system
dependent. *grr*
The burden of proof, was very much on my side, to show that memcpy
performed better. *duh*
For short strings, it isn't that strange if strcpy outperform memcpy,
since an optimized memcpy implementation, typically will have some
overhead (e.g. alignment code).
you could not elaborate
further on the subject
at hand. Continuation
of these well-formed
thoughts on current
trends in software
development would
have brought the
discussion to a
more effective
conclusion. I
am therefore
adding some
lines such
as these.
Yet more
words I
append
so it
will
end
in
a
.
--
Tim Hagan
Not if it is true in one forum and false in another.
It's trivial to write one:
#include <string.h>
#include <limits.h>
#include <stdlib.h>
#include <ctype.h>
/* The default delimiters are chosen as some ordinary white space
characters: */
static const char default_delimiters[] =
{' ', '\n', '\t', '\r', '\f', 0};
/*
* The tokenize() function is similar to a reentrant version of
strtok().
* It parses tokens from 's', where tokens are substrings separated
by
* characters from 'delimiter_list'.
* To get the first token from 's', tokenize() is called with 's' as
its first
* parameter.
* Remaining tokens from 's' are obtained by calling tokenize() with
NULL for
* the first parameter.
* The s of delimiters, identified by 'delimiter_list', can change
from call
* to call.
* If the list of delimiters is NULL, then the standard list
'default_delimiters'
* (see above) is used.
* tokenize() modifies the memory pointed to by 's', because it writes
null
* characters into the buffer.
*/
char *tokenize(char *s, const char *delimiter_list, char
**placeholder)
{
if (delimiter_list == NULL)
delimiter_list = default_delimiters;
if (delimiter_list[0] == 0)
delimiter_list = default_delimiters;
if (s == NULL)
s = *placeholder;
if (s == NULL)
return NULL;
/*
* The strspn() function computes the length of the initial segment of
the first
* string that consists entirely of characters contained in the second
string.
*/
s += strspn(s, delimiter_list);
if (!s[0]) {
*placeholder = s;
return NULL;
} else {
char *token;
token = s;
/*
* The strpbrk() function finds the first occurrence of any character
contained in
* the second string found in the first string.
*/
s = strpbrk(token, delimiter_list);
if (s == NULL)
*placeholder = token + strlen(token);
else {
*s++ = 0;
*placeholder = s;
}
return token;
}
}
#ifdef UNIT_TEST
char ts0[] =
"This is a test. This is only a test. If it were an actual emergency,
you would"
" be dead.";
char ts1[] =
"This is a also a test. This is only a test. If it were an actual
emergency, you"
" would be dead. 12345";
char ts2[] =
"The quick brown fox jumped over the lazy dog's back 1234567890
times.";
char ts3[] =
" \t\r\n\fThe quick brown fox jumped over the lazy dog's back
1234567890 times.";
char ts4[] =
"This is a test. This is only a test. If it were an actual emergency,
you would"
" be dead.";
char ts5[] =
"This is a also a test. This is only a test. If it were an actual
emergency, you"
" would be dead. 12345";
char ts6[] =
"The quick brown fox jumped over the lazy dog's back 1234567890
times.";
char ts7[] =
" \t\r\n\fThe quick brown fox jumped over the lazy dog's back
1234567890 times.";
#include <stdio.h>
char whitespace[UCHAR_MAX + 1];
/*
This test will create token separators as any whitespace or any
punctuation
marks:
*/
void init_whitespace()
{
int i;
int index = 0;
for (i = 0; i < UCHAR_MAX; i++) {
if (isspace(i)) {
whitespace[index++] = (char) i;
}
if (ispunct(i)) {
whitespace[index++] = (char) i;
}
}
}
/*
TNX Gerd.
*/
void spin_test(char *ts, char *white)
{
char *p = NULL;
char *token;
token = tokenize(ts, white, &p);
while (token) {
puts(token);
token = tokenize(NULL, white, &p);
}
}
int main(void)
{
init_whitespace();
puts("Whitespace is whitespace+punctuation");
spin_test(ts0, whitespace);
spin_test(ts1, whitespace);
spin_test(ts2, whitespace);
spin_test(ts3, whitespace);
puts("Whitespace is simple whitespace");
spin_test(ts4, NULL);
spin_test(ts5, NULL);
spin_test(ts6, NULL);
spin_test(ts7, NULL);
return 0;
}
#endif
True, but his rendition of "Somewhere Over The Rainbow" was
tremendous.
If you bother to stop and think about it, I think you will see that
all four routines do the same thing. The only difference is the
amount of code executed, and there the original is the best. It
may not be the most obvious. Think about the value of ++r when the
while loop is never executed.
Which is all I was claiming in my reply two days ago, except I
included the cost of actually doing the memcpy call.
... snip code ...
Why such a monster? I posted tknsplit (not a strtok clone) in 21
lines last Friday in this thread, and the remainder of the post was
testing code. tknsplit won't overwrite anything, will detect
omitted params, and won't modify the parameter line.
Mr Gordon seems to be unaware of the fact that this newsgroup deals
_strictly_ with standard C, as defined in the various ISO C
standards, and K&R for pre-standardization versions. Material not
included in those standards is off-topic, unless full std C code to
implement them is included. For system dependent material, go to a
newsgroup that deals with that system.
I was surprised to find it was. I decided to write a small test, and
for short strings, strdup written with strcpy is faster than strdup
written with memcpy (they both use strlen first to find out how much
to copy). The cut-off point on my laptop is at 164 bytes without any
optimisation. With -O2 the cutoff goes down to 16 bytes, but for
small strings, strcpy seems faster. The traditional YMMV seems to
fall short of the mark here -- it almost certainly will.
Yucky code available, if anyone else wants to try without writing
their own.
--
Ben.
Nope. The third and fourth correctly implement the specification
given for the second (which simply re-implements the behaviour of
the first).
--
Ben.
Mine is about 21 lines also. The rest of the code is a unit test
driver.
It overwrites the original, but that avoids memory allocation.
Well, I was only considering the standard strlen specification, and
returning zero for a NULL input (i.e. treating NULL as a zero
length string). I think. I totally missed the urge to return an
error indication for NULL input (which I don't consider aa good
idea, since using it requires non-std accepting code).