Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

problem with strtok()

2 views
Skip to first unread message

Michael

unread,
Aug 11, 2006, 11:48:53 PM8/11/06
to
Hi,

I have a proble I don't understand when using strtok(). It seems that if I
make a call to strtok(), then make a call to another function that also
makes use of strtok(), the original call is somehow confused or upset.

I have the following code, which I am using to tokenise some input which is
in th form x:y:1.2:

int tokenize_input(Sale *sale, char *string){

char *temp;
int temp_int;
int result = TRUE;

if((temp = strtok(string, ":")) == NULL){
result = FALSE;
} else {
sale -> sale_id = atoi(temp);
}

if((temp = strtok('\0',":")) == NULL){
result = FALSE;
} else {
if(get_date(temp)
> -1){ /* when I added this
line, my problem started*/
strncpy(sale -> date, temp, DATE_LENGTH);
} else
{
/*These were added at the same time*/
result = FALSE;
/**/
}
/**/
}

if((temp = strtok('\0',".")) ==
NULL){ /*this now returns NULL*/
result = FALSE;
} else {
temp_int = atoi(temp)*100;
}

if((temp = strtok('\0',":")) == NULL){
result = FALSE;
} else {
temp_int = temp_int + atoi(temp);
sale -> price = temp_int;
}

return result;
}

get_date() is also using strtok(). It all worked fine until I added the
marked lines in order to do some validation of input, at which point the
later strtok() began returning NULL.

Can anyone explain why this would occur and how can get around it?

Thanks for your help

Michael


Keith Thompson

unread,
Aug 12, 2006, 12:07:47 AM8/12/06
to
"Michael" <micha...@yahoo.com> writes:
> I have a proble I don't understand when using strtok(). It seems that if I
> make a call to strtok(), then make a call to another function that also
> makes use of strtok(), the original call is somehow confused or upset.

Yup. strtok() is not reentrant. It uses internal static data that
makes it impossible to use more than once concurrently.

[...]

> Can anyone explain why this would occur and how can get around it?

Either serialize your calls to strtok(), so each use finishes before
you start another one, or use something other than strtok().

Some systems provide a strtok_r() function. This is non-standard, and
any code that uses it will be portable only to systems that provide
it, but it might suit your purposes anyway. (strtok_r() is likely to
be present on any non-ancient Unix-like system.)

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

pete

unread,
Aug 12, 2006, 12:51:55 AM8/12/06
to
Keith Thompson wrote:
>
> "Michael" <micha...@yahoo.com> writes:
> > I have a proble I don't understand when using strtok().
> > It seems that if I
> > make a call to strtok(),
> > then make a call to another function that also
> > makes use of strtok(),
> > the original call is somehow confused or upset.
>
> Yup. strtok() is not reentrant. It uses internal static data that
> makes it impossible to use more than once concurrently.
>
> [...]
>
> > Can anyone explain why this would occur and how can get around it?
>
> Either serialize your calls to strtok(), so each use finishes before
> you start another one, or use something other than strtok().
>
> Some systems provide a strtok_r() function. This is non-standard, and
> any code that uses it will be portable only to systems that provide
> it, but it might suit your purposes anyway. (strtok_r() is likely to
> be present on any non-ancient Unix-like system.)

/* BEGIN new.c */

#include <stdio.h>
#include <string.h>

#define STRING "\n\n\n\tThere's\n a\r beat in \r\tmy head.\n\n\n"
#define WHITE "\n\r\t"

char *str_tok_r(char *s1, const char *s2, char **s3);
char *str_sep(char **s1, const char *s2);
/*
** K&R2 Exercise 2-4
** alternate squeeze functions
*/
char *str_squeeze(char *s1, const char *s2);
char *str_squeeze_r(char *s1, const char *s2);
char *str_squeeze_s(char *s1, const char *s2);

int main(void)
{
char s1[sizeof STRING];

puts(strcpy(s1, STRING));
puts(str_squeeze(s1, WHITE));

puts(strcpy(s1, STRING));
puts(str_squeeze_r(s1, WHITE));

puts(strcpy(s1, STRING));
puts(str_squeeze_s(s1, WHITE));

return 0;
}

char *str_tok_r(char *s1, const char *s2, char **s3)
{
if (s1 != NULL) {
*s3 = s1;
}
s1 = *s3 + strspn(*s3, s2);
if (*s1 == '\0') {
return NULL;
}
*s3 = s1 + strcspn(s1, s2);
if (**s3 != '\0') {
*(*s3)++ = '\0';
}
return s1;
}

char *str_sep(char **s1, const char *s2)
{
char *const p1 = *s1;

if (p1 != NULL) {
*s1 = strpbrk(p1, s2);
if (*s1 != NULL) {
*(*s1)++ = '\0';
}
}
return p1;
}

char *str_squeeze(char *s1, const char *s2)
{
char *const p1 = s1;
const char *const p2 = s2;

s2 = strtok(p1, p2);
while (s2 != NULL) {
do {
*s1++ = *s2++;
} while (*s2 != '\0');
s2 = strtok(NULL, p2);
}
*s1 = '\0';
return p1;
}

char *str_squeeze_r(char *s1, const char *s2)
{
char *const p1 = s1;
const char *const p2 = s2;
char *p3;

s2 = str_tok_r(p1, p2, &p3);
while (s2 != NULL) {
do {
*s1++ = *s2++;
} while (*s2 != '\0');
s2 = str_tok_r(NULL, p2, &p3);
}
*s1 = '\0';
return p1;
}

char *str_squeeze_s(char *s1, const char *s2)
{
char *const p1 = s1;
const char *const p2 = s2;
char *p3 = s1;

do {
s2 = str_sep(&p3, p2);
while (*s2 != '\0') {
*s1++ = *s2++;
}
} while (p3 != NULL);
*s1 = '\0';
return p1;
}

/* END new.c */

--
pete

Stan Milam

unread,
Aug 13, 2006, 1:28:27 PM8/13/06
to

The strtok() function uses a static char * to maintain the address of
the string it is parsing. If a new initializing call to strtok() is
made you will lose the address of the first string. Over the years I've
written several replacement functions for strtok() (which I believe
should be deprecated). My favorite is something I wrote a few years ago
in another language and ported recently to C. Here it is, so enjoy.

/**********************************************************************/
/* File Name: gettoken.c. */
/* Author: Stan Milam. */
/* Date Written: 15-Jan-2000. */
/* Description: */
/* Extract and remove a token from a string. Handles empty */
/* tokens. */
/* (c) Copyright 2006 by Stan Milam. */
/* All rights reserved. */
/* */
/**********************************************************************/

#include <errno.h>
#include <string.h>

#define strzcpy(d,s,l) (strncpy((d), (s), (l))[(l)] = '\0', (d))

/**********************************************************************/
/* Name: */
/* gettoken(). */
/* */
/* Synopsis: */
/* #include "strtools.h" */
/* char *gettoken( char *dest, char *source, char *delimters ); */
/* */
/* Description: */
/* The gettoken() function will extract tokens seperated by a */
/* specified set of delimiters from a string and store the token */
/* value in the dest argument. Furthermore, the token is removed */
/* from the source string along with the delimiter. Empty token */
/* fields cause the destination vaue to be an empty string. */
/* */
/* Arguments: */
/* char *dest - Address of a buffer where the token will be */
/* stored. */
/* char *source - The address of the string containing one or */
/* more tokens. */
/* char *delimiters - The address of a string of characters used */
/* as token delimiters. */
/* */
/* Return Value: */
/* The gettoken() function will return the address of the */
/* destination argument upon successful completion, and will */
/* return NULL when there no tokens left to extract or any one of */
/* the arguments are a NULL value. Should one of the arguments */
/* be a NULL pointer the global errno variable will be set to */
/* EINVAL. */
/* */
/**********************************************************************/

char *
gettoken( char *dest, char *source, const char *delimiters )
{
char *rv = NULL;

if ( dest == NULL || source == NULL || delimiters == NULL )
errno = EINVAL;
else {
*dest = '\0';
if ( *source ) {
char *ptr = strpbrk( source, delimiters );

/**********************************************************/
/* At this point we know we have something, perhaps an */
/* empty token. Default the return value to the */
/* destination address. If the result of strpbrk() is not */
/* NULL and not the same as the source, copy the token */
/* into the destination string. */
/**********************************************************/

rv = dest;
if ( ptr != NULL ) {
char *tmp = ptr++;
if ( source != tmp )
rv = strzcpy( dest, source, (size_t)(tmp-source) );
}


/**************************************************************/
/* If there are no delimters the source is the token. */
/**************************************************************/

else {
rv = strcpy( dest, source );
ptr = (char *) source + strlen( source );
}

/**********************************************************/
/* Copy the source string down past the token we just */
/* found. */
/**********************************************************/

memmove( (char *)source, ptr, strlen( ptr ) + 1 );
}
}
return rv;
}


#ifdef TEST

#include <stdio.h>
#include <assert.h>

int
main( void )
{
char dest[100];
char delim[]="|;!";
char a[] = "|B.B. Shagnasty|!Shagnasty, William B.|Billy Bob
Shagnasty|;!";

errno = 0;
assert( gettoken( NULL, a, delim ) == NULL);
assert( errno == EINVAL ); errno = 0;

assert( gettoken( dest, NULL, delim ) == NULL);
assert( errno == EINVAL ); errno = 0;

assert( gettoken( dest, a, NULL ) == NULL );
assert( errno == EINVAL ); errno = 0;

while( gettoken( dest, a, delim ) )
puts( dest );

return 0;
}
#endif


--
Regards,
Stan Milam
=============================================================
Charter Member of The Society for Mediocre Guitar Playing on
Expensive Instruments, Ltd.
=============================================================

Ben Pfaff

unread,
Aug 13, 2006, 1:43:46 PM8/13/06
to
"Michael" <micha...@yahoo.com> writes:

> I have a proble I don't understand when using strtok(). It seems that if I
> make a call to strtok(), then make a call to another function that also
> makes use of strtok(), the original call is somehow confused or upset.

strtok() has at least these problems:

* It merges adjacent delimiters. If you use a comma as your
delimiter, then "a,,b,c" will be divided into three tokens,
not four. This is often the wrong thing to do. In fact, it
is only the right thing to do, in my experience, when the
delimiter set contains white space (for dividing a string
into "words") or it is known in advance that there will be
no adjacent delimiters.

* The identity of the delimiter is lost, because it is
changed to a null terminator.

* It modifies the string that it tokenizes. This is bad
because it forces you to make a copy of the string if
you want to use it later. It also means that you can't
tokenize a string literal with it; this is not
necessarily something you'd want to do all the time but
it is surprising.

* It can only be used once at a time. If a sequence of
strtok() calls is ongoing and another one is started,
the state of the first one is lost. This isn't a
problem for small programs but it is easy to lose track
of such things in hierarchies of nested functions in
large programs. In other words, strtok() breaks
encapsulation.

--
"I'm not here to convince idiots not to be stupid.
They won't listen anyway."
--Dann Corbit

0 new messages