tokenize a string

Kelvin@!!!

unread,

Feb 24, 2005, 1:24:31 AM2/24/05

to

hi:
in C, we can use strtok() to tokenize a char*
but i can't find any similar member function of string that can tokenize a
string
so how so i tokenize a string in C++?
do it the C way?

thanks
--
{ Kelvin@!!! }
remove the last .hk to reply
thanks

red floyd

unread,

Feb 24, 2005, 2:34:52 AM2/24/05

to

Kelvin@!!! wrote:
> hi:
> in C, we can use strtok() to tokenize a char*
> but i can't find any similar member function of string that can tokenize a
> string
> so how so i tokenize a string in C++?
> do it the C way?
>
> thanks

Look up std::istringstream in your favorite reference book.

ulrich

unread,

Feb 24, 2005, 2:44:40 AM2/24/05

to

On Thu, 24 Feb 2005 06:24:31 GMT, Kelvin@!!!
<chickenwin...@yahoo.com.hk.hk> wrote:

> hi:
> in C, we can use strtok() to tokenize a char*
> but i can't find any similar member function of string that can tokenize
> a
> string
> so how so i tokenize a string in C++?

you may want to try boost::tokenizer an relatives.

http://www.boost.org/libs/tokenizer/index.html

rossum

unread,

Feb 24, 2005, 6:14:57 PM2/24/05

to

On Thu, 24 Feb 2005 06:24:31 GMT, "Kelvin@!!!"
<chickenwin...@yahoo.com.hk.hk> wrote:

>hi:
>in C, we can use strtok() to tokenize a char*
>but i can't find any similar member function of string that can tokenize a
>string
>so how so i tokenize a string in C++?
>do it the C way?
>
>thanks

There is a sample chapter from Accelerated C++ on the web at
http://www.awprofessional.com/articles/article.asp?p=25333

The chapter has a function called split() which does what you seem to
want, it takes a string and returns a vector of all the individual
words:

// true if the argument is whitespace, false otherwise
bool space(char c) { return isspace(c); }

// false if the argument is whitespace, true otherwise
bool not_space(char c) { return !isspace(c); }

vector<string> split(const string& str) {
typedef string::const_iterator iter;
vector<string> ret;
iter i = str.begin();
while (i != str.end()) {
// ignore leading blanks
i = find_if(i, str.end(), not_space);
// find end of next word
iter j = find_if(i, str.end(), space);
// copy the characters in [i, j)
if (i != str.end()) ret.push_back(string(i, j));
i = j;
}
return ret;
}

There is a detailed explanation of the functino in the text.

rossum

--

The ultimate truth is that there is no Ultimate Truth

david...@warpmail.net

unread,

Feb 24, 2005, 9:59:55 PM2/24/05

to

rossum wrote:

> // true if the argument is whitespace, false otherwise
> bool space(char c) { return isspace(c); }
>
> // false if the argument is whitespace, true otherwise
> bool not_space(char c) { return !isspace(c); }
>
> vector<string> split(const string& str) {
> typedef string::const_iterator iter;
> vector<string> ret;
> iter i = str.begin();
> while (i != str.end()) {
> // ignore leading blanks
> i = find_if(i, str.end(), not_space);
> // find end of next word
> iter j = find_if(i, str.end(), space);
> // copy the characters in [i, j)
> if (i != str.end()) ret.push_back(string(i, j));
> i = j;
> }
> return ret;
> }

This would be better if it was templatized by an insertion iterator
rather than returning a vector by value. Something along the lines of
(untested)

template <typename InsertIter>
int
tokenize(const std::string& buf,
const std::string& delims,
InsertIter it)
{
std::string::size_type sp; // start position
std::string::size_type ep; // end position
int numTokens = 0;

do {
sp = buf.find_first_not_of(delims, sp);
ep = buf.find_first_of(delims, sp);
if (sp != ep) {
if (ep == buf.npos) {
ep = buf.length();
}
*it++ = buf.substr(sp, ep - sp);
++numTokens;
sp = buf.find_first_not_of(delims, ep + 1);
}
} while (sp != buf.npos);

if (sp != buf.npos) {
*it++ = buf.substr(sp, buf.length() - sp);
++numTokens;
}

return numTokens;
}

called as

std::deque<std::string> tokens;
int numTokens = tokenize(buf, delims, std::back_inserter(tokens));

/david