Any way to take a word as input from stdin ?

arnuld

unread,

Sep 10, 2008, 6:08:40 AM9/10/08

to

I searched the c.l.c archives provided by Google as Google Groups with
"word input" as the key words and did not come up with anything good.

C++ has std::string for taking a word as input from stdin. C takes input
in 2 ways:

1) as a character, etchar()
2) as a whole line, fgets()

as C programmer, are we supposed to create a get_word function everytime
when we need a words as input from stdin ( e.g. terminal)

--
www.lispmachine.wordpress.com
my email is @ the above blog.
Google Groups is Blocked. Reason: Excessive Spamming

vipp...@gmail.com

unread,

Sep 10, 2008, 6:24:46 AM9/10/08

to

On Sep 10, 1:08 pm, arnuld <sunr...@invalid.address> wrote:
> I searched the c.l.c archives provided by Google as Google Groups with
> "word input" as the key words and did not come up with anything good.
>
> C++ has std::string for taking a word as input from stdin. C takes input
> in 2 ways:
>
> 1) as a character, etchar()
> 2) as a whole line, fgets()
>
> as C programmer, are we supposed to create a get_word function everytime
> when we need a words as input from stdin ( e.g. terminal)

char word[64];
scanf("%63s", word);

Alternatively, write a get_line function (or use one written by pete,
richard heathfield, eric sossman, cbfalconer et cetera) and then split
that into words.

Richard Bos

unread,

Sep 10, 2008, 11:14:40 AM9/10/08

to

arnuld <sun...@invalid.address> wrote:

> C++ has std::string for taking a word as input from stdin. C takes input
> in 2 ways:
>
> 1) as a character, etchar()
> 2) as a whole line, fgets()
>
> as C programmer, are we supposed to create a get_word function everytime
> when we need a words as input from stdin ( e.g. terminal)

There is no generic solution (mainly because there is no consensus on
what a "word" is), so yes.

Richard

Malcolm McLean

unread,

Sep 10, 2008, 4:34:26 PM9/10/08

to

"arnuld" <sun...@invalid.address> wrote in message

> as C programmer, are we supposed to create a get_word function everytime
> when we need a words as input from stdin ( e.g. terminal)
>

Generally there will be a regular expression parser available. It's not part
of the standard library, unfortunately, so the details may vary.
You can specify exactly what you mean by a 'word', and extract with that.

--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

Pilcrow

unread,

Sep 10, 2008, 5:50:32 PM9/10/08

to

On Wed, 10 Sep 2008 15:08:40 +0500, arnuld <sun...@invalid.address>
wrote:

>I searched the c.l.c archives provided by Google as Google Groups with
>"word input" as the key words and did not come up with anything good.
>
>
>C++ has std::string for taking a word as input from stdin. C takes input
>in 2 ways:
>
> 1) as a character, etchar()
> 2) as a whole line, fgets()
>
>
>as C programmer, are we supposed to create a get_word function everytime
>when we need a words as input from stdin ( e.g. terminal)

Try using fgets(), and strtok(). strtok() will allow you to define word
separators to your taste.

Here is sample code:

------------------------------------------------------------------------
#include <stdio.h>
#include <string.h>
#define MAXLINE 500

char *tok;
char line[MAXLINE];

int main(void)
{
while(fgets(line, MAXLINE, stdin) != NULL) {
if((tok = strtok(line," \n")) != NULL) puts(tok); /* first
token on each line */
while((tok = strtok(NULL," \n")) != NULL) puts(tok); /*
subsequent tokens */
}
return 0;
}

Richard Heathfield

unread,

Sep 10, 2008, 6:05:46 PM9/10/08

to

Pilcrow said:

> On Wed, 10 Sep 2008 15:08:40 +0500, arnuld <sun...@invalid.address>
> wrote:

<snip>

>>as C programmer, are we supposed to create a get_word function everytime
>>when we need a words as input from stdin ( e.g. terminal)
>
> Try using fgets(), and strtok(). strtok() will allow you to define word
> separators to your taste.

This is poor advice for a beginner. Whilst strtok does have its uses, it
also has issues - traps for the unwary programmer. These derive from its
maintenance of significant state between calls, which makes it unsuitable
for use in library functions.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

Gordon Burditt

unread,

Sep 10, 2008, 6:01:53 PM9/10/08

to

>I searched the c.l.c archives provided by Google as Google Groups with
>"word input" as the key words and did not come up with anything good.
>
>
>C++ has std::string for taking a word as input from stdin. C takes input
>in 2 ways:
>
> 1) as a character, etchar()
> 2) as a whole line, fgets()
>
>
>as C programmer, are we supposed to create a get_word function everytime
>when we need a words as input from stdin ( e.g. terminal)

The first step is to define what a "word" is.

How many words are these:

1. don't
2. antidisestablish-
mentarianism
3. Joe,Bob,Sally, and Henry.
4. Joe, Bob, Sally, and Henry.
5. $1,416,383,583.20
6. ()@#$#^&*#^%%#^@*^$&*$
7. George W. Bush
8. slam-dunk
9. 15th-century vase
10. M.O.N.S.T.E.R., the computer chess-playing machine
11. lord.high.master.of....@yahoo.com

Justify your answers.

Pilcrow

unread,

Sep 10, 2008, 7:06:43 PM9/10/08

to

On Wed, 10 Sep 2008 22:05:46 +0000, Richard Heathfield
<r...@see.sig.invalid> wrote:

>Pilcrow said:
>
>> On Wed, 10 Sep 2008 15:08:40 +0500, arnuld <sun...@invalid.address>
>> wrote:
>
><snip>
>
>>>as C programmer, are we supposed to create a get_word function everytime
>>>when we need a words as input from stdin ( e.g. terminal)
>>
>> Try using fgets(), and strtok(). strtok() will allow you to define word
>> separators to your taste.
>
>This is poor advice for a beginner. Whilst strtok does have its uses, it
>also has issues - traps for the unwary programmer. These derive from its
>maintenance of significant state between calls, which makes it unsuitable

I understood that, and I am a 'beginner'. It is very adequately covered
in textbooks (see 'C in a Nutshell', ISBN 0-596-00697-7, page 440),
somewhat less so in K&R2. And I gave the questioner an example to help
him. My dissatisfaction with strtok() is that repeated separation
characters are treated as one, making it difficult to present the user
with an intuitively understandable interface. It is not usually a good
idea to equate ignorance and stupidity.

CBFalconer

unread,

Sep 10, 2008, 8:01:33 PM9/10/08

to

arnuld wrote:
>
> I searched the c.l.c archives provided by Google as Google Groups
> with "word input" as the key words and did not come up with
> anything good.
>
> C++ has std::string for taking a word as input from stdin. C takes
> input in 2 ways:
>
> 1) as a character, etchar()
> 2) as a whole line, fgets()
>
> as C programmer, are we supposed to create a get_word function
> everytime when we need a words as input from stdin ( e.g. terminal)

Well, first you have to define a word. Does it terminate on
blanks, on blanks and non-print chars, on blanks and tabs, etc. I
think you will find that the C++ mechanism terminates on blanks and
'\n' (but I could well be wrong). Having defined it, you just
write the code to extract such a beast from a stream (or from a
string). At that point both you and your code reader know exactly
what the function extracts.

Don't forget to preserve the exit char. Something else may need
it.

Note that, having written the function, you are allowed to keep its
source (and its object code) and reuse it as often as you wish,
with minimum effort. If you have taken the elementary precaution
of writing it in standard C, you can use it anywhere.

--
[mail]: Chuck F (cbfalconer at maineline dot net)
[page]: <http://cbfalconer.home.att.net>
Try the download section.

William Pursell

unread,

Sep 11, 2008, 12:43:55 AM9/11/08

to

On 10 Sep, 11:08, arnuld <sunr...@invalid.address> wrote:

>
> as C programmer, are we supposed to create a get_word function everytime
> when we need a words as input from stdin ( e.g. terminal)

No. You should either find a function that does what you want
or write it yourself, and once you have done that...don't
ever do it again. Put it in a library and use it.

If you are re-writing the same function repeatedly, then
you aren't a C-programmer. You aren't any kind of programmer.
Re-writing the same functionality can be a useful
exercise for the novice, but it is a silly waste of
time otherwise.

arnuld

unread,

Sep 11, 2008, 2:24:00 AM9/11/08

to

> On Wed, 10 Sep 2008 17:01:53 -0500, Gordon Burditt wrote:

> The first step is to define what a "word" is.

Fore *my* program, a word is a collection of letters, numbers or anything
separated by space, tab or newline.

> How many words are these:
>
> 1. don't

1 word

> 2. antidisestablish-mentarianism

1 word

> 3. Joe,Bob,Sally, and Henry.

3 words. Joe,Bob,Sally, makes one word, and makes second, Henry. makes
3rd ( notice that full stop with Henry.)

> 4. Joe, Bob, Sally, and Henry.

5 words

> 5. $1,416,383,583.20

all 1 word. There is no space in between them.

> 6. ()@#$#^&*#^%%#^@*^$&*$

1 word

> 7. George W. Bush

3 words

> 8. slam-dunk

1 word

> 9. 15th-century vase

2 words

> 10. M.O.N.S.T.E.R., the computer chess-playing machine

5 words

> 11. lord.high.master.of....@yahoo.com

1 word, of course

> Justify your answers.

Any collection of letters,symbols or numbers separated by single or
multiple spaces or tab or newline. Therefore

comp.lang.c++ --> 1 word
Std. Lib --> 2 words
Lov@389&om --> 1 word

I think it is pretty much clear now what a word is.

Bartc

unread,

Sep 11, 2008, 5:36:49 AM9/11/08

to

"arnuld" <sun...@invalid.address> wrote in message

news:pan.2008.09.11....@invalid.address...

>> On Wed, 10 Sep 2008 17:01:53 -0500, Gordon Burditt wrote:
>
>> The first step is to define what a "word" is.
>
> Fore *my* program, a word is a collection of letters, numbers or anything
> separated by space, tab or newline.
>
>
>> How many words are these:
>>
>> 1. don't
>
> 1 word
>
>
>> 2. antidisestablish-mentarianism
>
> 1 word
>
>
>> 3. Joe,Bob,Sally, and Henry.
>
> 3 words. Joe,Bob,Sally, makes one word, and makes second, Henry. makes
> 3rd ( notice that full stop with Henry.)

You have commas in the middle of words?

Ever heard of comma-delimited files? Comma is way up there with space and
tab.

--
Bartc

arnuld

unread,

Sep 11, 2008, 5:49:54 AM9/11/08

to

> On Thu, 11 Sep 2008 09:36:49 +0000, Bartc wrote:

> You have commas in the middle of words?
>
> Ever heard of comma-delimited files? Comma is way up there with space and
> tab.

yes, I know and @%$@programmimnng34 is not a word either. If I start to
differentiate these things then it will become very complex to define what
a word is and there could be lots of controversy over what should be (or
could be ?) a word. So I take a simple approach, the white space
(whether a newline or a tab or a single space) separates the words. simple ...

arnuld

unread,

Sep 11, 2008, 7:24:48 AM9/11/08

to

> On Wed, 10 Sep 2008 20:01:33 -0400, CBFalconer wrote:

> Well, first you have to define a word. Does it terminate on
> blanks, on blanks and non-print chars, on blanks and tabs, etc. I
> think you will find that the C++ mechanism terminates on blanks and
> '\n' (but I could well be wrong).

I have told this already in my last reply ( to BartC )

> Having defined it, you just
> write the code to extract such a beast from a stream (or from a
> string). At that point both you and your code reader know exactly
> what the function extracts.

Now there is a big problem in this. In C++ i don't have to care whether
users enter one word or 100s. Memory was being managed by std. lib.
vector. Now here I am thinking of using fgets() to store the input,
which has 2 problems:

1) extract words from each line.
2) fgets() uses array top store data and I don't know how large is
the input, so I can't decide on the size of the array.

> Don't forget to preserve the exit char.
> Something else may need it.

you mean null character ?

> Note that, having written the function, you are allowed to keep its
> source (and its object code) and reuse it as often as you wish, with
> minimum effort. If you have taken the elementary precaution of writing
> it in standard C, you can use it anywhere.

Thats what I want to do, write in ANSI C :)

Richard Heathfield

unread,

Sep 11, 2008, 8:00:01 AM9/11/08

to

arnuld said:

<snip>

> Now here I am thinking of using fgets() to store the input,
> which has 2 problems:
>
> 1) extract words from each line.
> 2) fgets() uses array top store data and I don't know how large is
> the input, so I can't decide on the size of the array.

This is a common problem - so common, in fact, that I wrote it up on the
Web. Take a look at http://www.cpax.org.uk/prg/writings/fgetdata.php which
looks at scanf, gets, and fgets, points out the difficulties with each,
and then discusses a possible solution to the problem of arbitrarily long
lines.

On that page, I present code for reading a word at a time, and for reading
a line at a time. In fact, since you supply your own delimiters, reading a
line is really just a special case of reading a word!

I do not pretend that my code is perfect. For example, the return values
could have been better chosen (I must fix that one day).

It is not intended to be a plug-in solution to the problem (although some
people do actually use it that way and, as far as I'm aware, no harm has
come to them as a result). Rather, it is intended to demonstrate one
possible approach to the problem, in the hope that the reader will have an
"aha!" moment and perhaps come up with a solution that fits his own needs
much better than a generic solution is likely to be able to do.

Several other approaches apart from the one I chose to demonstrate are also
discussed (but not demonstrated), the intent being to give a wider view of
various ways to tackle this problem, depending on your priorities.

Finally, the page provides links to a few other people's demonstrations of
how to solve this problem, again with the intent of providing a wider
perspective on different approaches.

pete

unread,

Sep 11, 2008, 8:03:45 AM9/11/08

to

arnuld wrote:
>> On Thu, 11 Sep 2008 09:36:49 +0000, Bartc wrote:
>
>> You have commas in the middle of words?
>>
>> Ever heard of comma-delimited files? Comma is way up there with space and
>> tab.
>
>
> yes, I know and @%$@programmimnng34 is not a word either. If I start to
> differentiate these things then it will become very complex to define what
> a word is and there could be lots of controversy over what should be (or
> could be ?) a word. So I take a simple approach, the white space
> (whether a newline or a tab or a single space) separates the words. simple ...

Your original question was:

"as C programmer,
are we supposed to create a get_word function everytime
when we need a words as input from stdin"

The answer is
"Yes; every time that you define what you want 'word' to mean."

--
pete

arnuld

unread,

Sep 11, 2008, 8:07:24 AM9/11/08

to

> On Thu, 11 Sep 2008 12:00:01 +0000, Richard Heathfield wrote:

> This is a common problem - so common, in fact, that I wrote it up on the
> Web. Take a look at http://www.cpax.org.uk/prg/writings/fgetdata.php which
> looks at scanf, gets, and fgets, points out the difficulties with each,
> and then discusses a possible solution to the problem of arbitrarily long
> lines.
>

> ...SNIP....

I have not checked it but will be doing it later. The only one question
that keeps on popping up into my mind is "Why C was not designed to have
this feature ? ". That reminds of an article "Back to Basics" by Joel
Spolsky where he said that we have null terminated strings in C whihc
are much slower than PASCAL strings not by choice but by force, as C was
developed on PDP-7, which had ASCIZ table, which required strings to be Z
terminated ( Z means ZERO). Do we have same kid of thing here in my
problem ?

I am just curious and feel a little strange on having this "word problem"
in C.

arnuld

unread,

Sep 11, 2008, 8:08:22 AM9/11/08

to

> On Thu, 11 Sep 2008 08:03:45 -0400, pete wrote:

> Your original question was:
> "as C programmer,
> are we supposed to create a get_word function everytime
> when we need a words as input from stdin"

> The answer is
> "Yes; every time that you define what you want 'word' to mean."

yes, I think CBFalconer also answered that and now things are getting much
more fundamental as I am starting to writing code

Richard Heathfield

unread,

Sep 11, 2008, 8:42:13 AM9/11/08

to

arnuld said:

>> On Thu, 11 Sep 2008 12:00:01 +0000, Richard Heathfield wrote:
>
>> This is a common problem - so common, in fact, that I wrote it up on the
>> Web. Take a look at http://www.cpax.org.uk/prg/writings/fgetdata.php
>> which looks at scanf, gets, and fgets, points out the difficulties with
>> each, and then discusses a possible solution to the problem of
>> arbitrarily long lines.
>>
>> ...SNIP....
>
>
> I have not checked it but will be doing it later. The only one question
> that keeps on popping up into my mind is "Why C was not designed to have
> this feature ? ".

I answered that question already (see the above link).

CBFalconer

unread,

Sep 11, 2008, 10:51:11 AM9/11/08

to

arnuld wrote:
>> Bartc wrote:
>
>> You have commas in the middle of words? Ever heard of
>> comma-delimited files? Comma is way up there with space and tab.
>
> yes, I know and @%$@programmimnng34 is not a word either. If I
> start to differentiate these things then it will become very
> complex to define what a word is and there could be lots of
> controversy over what should be (or could be ?) a word. So I
> take a simple approach, the white space (whether a newline or a
> tab or a single space) separates the words. simple ...

But that is the point. Chars and lines are easily defined. Words
depend on the usage to be applied. Therefore the code to separate
words depends on the usage. You have to write the parsing code to
suit the job. It just isn't black and white.

CBFalconer

unread,

Sep 11, 2008, 10:59:08 AM9/11/08

to

arnuld wrote:
>> CBFalconer wrote:
>
... snip ...

>
> Now there is a big problem in this. In C++ i don't have to care
> whether users enter one word or 100s. Memory was being managed by
> std. lib. vector. Now here I am thinking of using fgets() to
> store the input, which has 2 problems:
>
> 1) extract words from each line.
> 2) fgets() uses array top store data and I don't know how large
> is the input, so I can't decide on the size of the array.

My suggestion is to use ggets, available in std. C source code at:

<http://cbfalconer.home.att.net/download/ggets.zip>

>
>> Don't forget to preserve the exit char.
>> Something else may need it.
>
> you mean null character ?

No. I mean the char that doesn't belong to the word and signifies
the completion. It you are getting the word from a string put the
char back by backing up the pointer (or index). If coming from a
stream you have ungetc available.

CBFalconer

unread,

Sep 11, 2008, 11:06:11 AM9/11/08

to

arnuld wrote:
>
... snip ...

>
> I have not checked it but will be doing it later. The only one
> question that keeps on popping up into my mind is "Why C was not
> designed to have this feature ? ". That reminds of an article
> "Back to Basics" by Joel Spolsky where he said that we have null
> terminated strings in C whihc are much slower than PASCAL
> strings not by choice but by force, as C was developed on PDP-7,
> which had ASCIZ table, which required strings to be Z terminated
> ( Z means ZERO). Do we have same kid of thing here in my problem?

Speed depends on use. Most string processing just processes until
you hit the end of the string, and there is then no slowdown from
nul termination. In addition most strings are short, and again
there is little effort in finding length. With a little care you
can often avoid finding string lengths in advance.

Chris Dollin

unread,

Sep 11, 2008, 11:23:03 AM9/11/08

to

arnuld wrote:

> I have not checked it but will be doing it later. The only one question
> that keeps on popping up into my mind is "Why C was not designed to have
> this feature ? ".

Because C was designed for /implementing/ this feature; as a bare-bones
systems programming language.

> That reminds of an article "Back to Basics" by Joel
> Spolsky where he said that we have null terminated strings in C whihc
> are much slower than PASCAL strings

I'd be interested in real evidence for this claim. Real, as in, it
happened in these programs and couldn't be eliminated by straightforward
fixes, rather than contrived examples or beginners gotchas.

> not by choice but by force, as C was
> developed on PDP-7, which had ASCIZ table, which required strings to be Z
> terminated ( Z means ZERO).

That seems ... unlikely ... to me. Just because one's assembler has
an ASCIZ directive doesn't mean one has to use it; even if one does,
one can perfectly well also associate a length with a string as well
as a null terminator.

> Do we have same kid of thing here in my
> problem ?
>
> I am just curious and feel a little strange on having this "word problem"
> in C.

You've picked a language deliberately sparse in built-in features;
don't be surprised if it doesn't have many.

--
'It changed the future .. and it changed us.' /Babylon 5/

Hewlett-Packard Limited Cain Road, Bracknell, registered no:
registered office: Berks RG12 1HN 690597 England

Pilcrow

unread,

Sep 11, 2008, 12:26:33 PM9/11/08

to

On Thu, 11 Sep 2008 12:00:01 +0000, Richard Heathfield

<r...@see.sig.invalid> wrote:

Thank you so much! This is much more the sort of thing I was hoping to
find when I started reading this group.

I much appreciate the excellent documentation in the function itself.

Is there at least an index to other similar solutions to general
problems? In comp.lang.perl.misc one often sees people scolded for not
using tested, robust solutions, rather than reinventing the wheel. CPAN
largely fills most peoples' needs. At the risk of making myself a
complete bore, I ask again: why doesn't the C community follow this
example?

Now, if you just followed the same indenting and bracketting style that
is used in K&R2, I would be *totally* happy. I have a lot of trouble
reading yours. Nevermind, I'll just have to write a perl script to
convert from your style to theirs. Shouldn't be too hard.

Thank you again!!

Richard Heathfield

unread,

Sep 11, 2008, 12:42:51 PM9/11/08

to

Pilcrow said:

<snip>

> Is there at least an index to other similar solutions to general
> problems?

http://www.google.com :-)

<snip>

> Now, if you just followed the same indenting and bracketting style that
> is used in K&R2, I would be *totally* happy.

Yes, but I wouldn't.

> I have a lot of trouble reading yours.

You may well be the first person ever to say that. People have made all
kinds of complaints about my code, but readability is not usually high on
the hit-list.

> Nevermind, I'll just have to write a perl script to
> convert from your style to theirs. Shouldn't be too hard.

man indent

Keith Thompson

unread,

Sep 11, 2008, 4:24:52 PM9/11/08

to

Yes, it certianly is. Did someone do that?

>>for use in library functions.

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Keith Thompson

unread,

Sep 11, 2008, 4:36:30 PM9/11/08

to

arnuld <sun...@invalid.address> writes:
>> On Wed, 10 Sep 2008 17:01:53 -0500, Gordon Burditt wrote:
>
>> The first step is to define what a "word" is.
>
> Fore *my* program, a word is a collection of letters, numbers or anything
> separated by space, tab or newline.

As you know, that definition is fine for your program; others might
have different requirements.

Incidentally, the phrase "letters, numbers, or anything" seems
redundant. I think that a more precise rendering of what you meant
would be:

A "word" is a non-empty contiguous sequence of characters other
than space, tab, or newline, preceded or followed either by a
space, tab, or newline or by the start or end of the input.

It would also be good to specify whether the input is a string, a line
of text, or an entire text file.

If I take your definition literally, then in the following
"word"
the word "word" is not a word, because it's not separated by space,
tab, or newline.

It might be more convenient to treat anything for which isspace()
returns true (or for which isspace() returns true in the "C" locale)
as a separator; that includes several whitespace characters that you
didn't mention. But of course if your requirements call for only
space, tab, and newline to be treated as separators, then that trumps
convenience.

>> How many words are these:
>>
>> 1. don't
>
> 1 word
>
>
>> 2. antidisestablish-mentarianism
>
> 1 word

In the previous article, "antidisestablish-" and "mentarianism" were
on two lines, so they'd be two words by your definition. (Gordon's
point was that it's reasonable to treat them as a single word, since
that's what the hyphen means in English text, but if they're two words
by your definition then they're two words by your definition.)

[snip]

> Any collection of letters,symbols or numbers separated by single or
> multiple spaces or tab or newline. Therefore
>
> comp.lang.c++ --> 1 word
> Std. Lib --> 2 words
> Lov@389&om --> 1 word
>
>
> I think it is pretty much clear now what a word is.

It's pretty much clear what your definition of a word is. It's still
not at all clear what a word is in general (and it can't be, since the
term is used inconsistently).

Pilcrow

unread,

Sep 11, 2008, 10:53:27 PM9/11/08

to

On Thu, 11 Sep 2008 13:24:52 -0700, Keith Thompson <ks...@mib.org>
wrote:

>Pilcrow <Pilc...@gmail.com> writes:
>> On Wed, 10 Sep 2008 22:05:46 +0000, Richard Heathfield
>> <r...@see.sig.invalid> wrote:
>>>Pilcrow said:
>>>> On Wed, 10 Sep 2008 15:08:40 +0500, arnuld <sun...@invalid.address>
>>>> wrote:
>>><snip>
>>>
>>>>>as C programmer, are we supposed to create a get_word function everytime
>>>>>when we need a words as input from stdin ( e.g. terminal)
>>>>
>>>> Try using fgets(), and strtok(). strtok() will allow you to define word
>>>> separators to your taste.
>>>
>>>This is poor advice for a beginner. Whilst strtok does have its uses, it
>>>also has issues - traps for the unwary programmer. These derive from its
>>>maintenance of significant state between calls, which makes it unsuitable
>>
>> I understood that, and I am a 'beginner'. It is very adequately covered
>> in textbooks (see 'C in a Nutshell', ISBN 0-596-00697-7, page 440),
>> somewhat less so in K&R2. And I gave the questioner an example to help
>> him. My dissatisfaction with strtok() is that repeated separation
>> characters are treated as one, making it difficult to present the user
>> with an intuitively understandable interface. It is not usually a good
>> idea to equate ignorance and stupidity.
>
>Yes, it certianly is. Did someone do that?
>

How many times does someone here say, in effect, "this is too deep for a
beginner"?

Pilcrow

unread,

Sep 11, 2008, 11:50:50 PM9/11/08

to

On Thu, 11 Sep 2008 16:42:51 +0000, Richard Heathfield
<r...@see.sig.invalid> wrote:

>Pilcrow said:
>
><snip>
>
>> Is there at least an index to other similar solutions to general
>> problems?
>
>http://www.google.com :-)
>
><snip>
>
>> Now, if you just followed the same indenting and bracketting style that
>> is used in K&R2, I would be *totally* happy.
>
>Yes, but I wouldn't.
>
>> I have a lot of trouble reading yours.
>
>You may well be the first person ever to say that. People have made all
>kinds of complaints about my code, but readability is not usually high on
>the hit-list.
>

I apologize. It was not really a complaint, more an expression of my
frustration.

I am still digesting that code. I was especially taken with the memory
management. It should be provided for all the other situations where
one sees the a caution that one should make sure that there is adequate
room for the result. After I have gotten more experience with C, I
think I'll try my hand at it.

Keith Thompson

unread,

Sep 12, 2008, 1:35:19 AM9/12/08

to

Pilcrow <Pilc...@gmail.com> writes:
> On Thu, 11 Sep 2008 13:24:52 -0700, Keith Thompson <ks...@mib.org>
> wrote:
>>Pilcrow <Pilc...@gmail.com> writes:

[...]

>>> It is not usually a good idea to equate ignorance and stupidity.
>>
>>Yes, it certianly is. Did someone do that?

s/certianly/certainly/

> How many times does someone here say, in effect, "this is too deep for a
> beginner"?

That's not equating ignorance and stupidity; it's equating ignorance
and ignorance. And ignorance isn't necessarily an insult; it's
usually curable, after all.

Sorry, but some things really are too deep for a beginner.

Richard Heathfield

unread,

Sep 12, 2008, 2:17:25 AM9/12/08

to

Pilcrow said:

> On Thu, 11 Sep 2008 16:42:51 +0000, Richard Heathfield
> <r...@see.sig.invalid> wrote:
>
>>Pilcrow said:
>>
<snip>

>>> I have a lot of trouble reading yours.

>>
>>You may well be the first person ever to say that. People have made all
>>kinds of complaints about my code, but readability is not usually high on
>>the hit-list.
>>
>
> I apologize.

I wish you wouldn't. You have every right to say what you said. I wasn't
being "precious" about it - merely surprised! In fact, I'd be quite
curious to know more about *why* you have a lot of trouble reading my
code. Maybe there's something I can change to make it easier for you to
read without making it more difficult for myself and others.

Richard Bos

unread,

Sep 12, 2008, 2:49:03 AM9/12/08

to

Pilcrow <Pilc...@gmail.com> wrote:

> Is there at least an index to other similar solutions to general
> problems? In comp.lang.perl.misc one often sees people scolded for not
> using tested, robust solutions, rather than reinventing the wheel. CPAN
> largely fills most peoples' needs. At the risk of making myself a
> complete bore, I ask again: why doesn't the C community follow this
> example?

Mainly because for many things C is used for, someone else's "almost
good enough" solution is _not_ good enough. In PERL, well... you're
having to deal with PERL already. That your string library (or rather,
someone else's string library) is slightly tentacular doesn't matter
much when you're already up to your knees in Cthulhuspawn.

Richard

arnuld

unread,

Sep 12, 2008, 3:12:45 AM9/12/08

to

> On Thu, 11 Sep 2008 12:42:13 +0000, Richard Heathfield wrote:

> I answered that question already (see the above link).

If thats C's way of doing things. I have to admit, it is very messy :( . I
really can't find why it is better than:

std::vector<std::string> svec;

vipp...@gmail.com

unread,

Sep 12, 2008, 4:34:21 AM9/12/08

to

On Sep 12, 10:12 am, arnuld <sunr...@invalid.address> wrote:
> > On Thu, 11 Sep 2008 12:42:13 +0000, Richard Heathfield wrote:
> > I answered that question already (see the above link).
>
> If thats C's way of doing things. I have to admit, it is very messy :( . I
> really can't find why it is better than:
>
> std::vector<std::string> svec;

Well, for starters, because it does compile.

Richard Heathfield

unread,

Sep 12, 2008, 4:52:03 AM9/12/08

to

arnuld said:

>> On Thu, 11 Sep 2008 12:42:13 +0000, Richard Heathfield wrote:
>
>> I answered that question already (see the above link).
>
>
> If thats C's way of doing things. I have to admit, it is very messy :( .

It seems you have misunderstood.

The question was: why doesn't C have this feature (the ability to read
arbitrarily long lines) already? My answer to that perfectly reasonable
question is quite simply that there are many ways to do this, and no one
of them stands out as being the universally "right" decision.

> I really can't find why it is better than:
>
> std::vector<std::string> svec;

Here's one obvious problem with that: it won't compile. Here's another:
assuming it did, it doesn't appear to be a function, so it's hard to see
how it could read anything at all.

Richard

unread,

Sep 12, 2008, 6:15:03 AM9/12/08

to

Richard Heathfield <r...@see.sig.invalid> writes:

> Pilcrow said:
>
>> On Thu, 11 Sep 2008 16:42:51 +0000, Richard Heathfield
>> <r...@see.sig.invalid> wrote:
>>
>>>Pilcrow said:
>>>
> <snip>
>
>>>> I have a lot of trouble reading yours.
>>>
>>>You may well be the first person ever to say that. People have made all
>>>kinds of complaints about my code, but readability is not usually high on
>>>the hit-list.
>>>
>>
>> I apologize.
>
> I wish you wouldn't. You have every right to say what you said. I wasn't
> being "precious" about it - merely surprised! In fact, I'd be quite
> curious to know more about *why* you have a lot of trouble reading my
> code. Maybe there's something I can change to make it easier for you to
> read without making it more difficult for myself and others.

Speaking for myself and knowingly only picking on style things:

I hate this move to putting braces on their own lines. Its a horrible
waste of vertical space and "3 levels" for one unit is not natural.

e.g

while(t--){
doSomething(t);
doSomething2(t);
}

is much preferable to

while(t--)
{
doSomething(t);
doSomething2(t);
}

The closing brace matches to the opening "while". Clean. Economical.

Also you adopt the non "standard" option of putting your values to
compare against on the left. While "clever" it does read as traditional
"English" and is not adopted widely elsewhere.

e.g

while( 0 == getValue(t))
doSomething(t);

is horrible. 0 is not the thing was are interested in manipulating or
looking at. the return from getValue(t) is. It is that we compare
against a benchmark figure - therefore

while(getValue(t)==0)
doSomething(t);

reads much better and is more traditional.

Yes, we could argue until the cows come home about it and it is purely
style. But I have tried to justify my traditional K&R preferences. At
least you do not seem to have adopted Falconer's horrific habit of
having conditional targets on the same line e.g

if(error(r))printf("I'm a pedantic nutter");

Richard

unread,

Sep 12, 2008, 6:18:27 AM9/12/08

to

vipp...@gmail.com writes:

Minus 3 for being too late on your attempt to get promoted into the
c.l.c "reg" upper echelon. But that atttempt combined with your
"indeeds", your "Mr heathfields" and various nauseating attempts at
belittling nOObs should ensure at least a cushion at RHs feet in the
near future.

Hint : it was perfectly clear what arnuld meant. Pretending other
languages do not exist (especially one as rooted in C history as C++) in
this NG is simply pathetic.

arnuld

unread,

Sep 12, 2008, 6:28:48 AM9/12/08

to

> On Fri, 12 Sep 2008 08:52:03 +0000, Richard Heathfield wrote:

> It seems you have misunderstood.
>
> The question was: why doesn't C have this feature (the ability to read
> arbitrarily long lines) already? My answer to that perfectly reasonable
> question is quite simply that there are many ways to do this, and no one
> of them stands out as being the universally "right" decision.

NO, I simply got it. You have discussed several ways of accomplishing task
but none of them fits properly, you have shows the pros and cons of each
very *clearly* and hence then on that you prove reasonably why we don't
have such function in Std. Lib. I think you are pretty much technical,
unbiased and right about it.

> Here's one obvious problem with that: it won't compile. Here's another:
> assuming it did, it doesn't appear to be a function, so it's hard to see
> how it could read anything at all.

I know you are playing here ;), It won't compile because its from C++.

<OT>
If i have to use C then I have to use one of the options you have suggested
or do it in other language, but thats personal. I have posted the code on
comap.lang.c++ with title "sorting the input":

http://groups.google.com/group/comp.lang.c++/browse_thread/thread/0488e58b666d0eb1#

The only problem i was having with C is, my mind was drifted away very
badly from the *thinking-in-problem* to *thinking-about-language-issues*
and hence my focus was lost. C++ saved that focus and IMVVHO, may be I
am not sure, C++ version will run as fast as C version. But I don't think
C++ is better than C, because there are cases where C++ will not fit,
like resource and memory constraint systems, where there is no library
available, then even when if you use C++ compiler, you will have to
learn and use the *C Way*. You can't do anything else.

But since thats personal, Why will I even use C++ at all. I will prefer
Common Lisp on my side and will avoid working on resource and memory
constraint systems. I think they disrupt and kill my ability to
thinking-in-problem. </OT>

vipp...@gmail.com

unread,

Sep 12, 2008, 6:30:18 AM9/12/08

to

On Sep 12, 1:18 pm, Richard<rgr...@gmail.com> wrote:

[replying to me]

> Minus 3 for being too late on your attempt to get promoted into the
> c.l.c "reg" upper echelon. But that atttempt combined with your
> "indeeds", your "Mr heathfields" and various nauseating attempts at
> belittling nOObs should ensure at least a cushion at RHs feet in the
> near future.

I'm tired of this. I don't give a crap what Heathfield thinks of me.
I'm here to learn C and help others do the same, not to socialize.

arnuld

unread,

Sep 12, 2008, 6:32:38 AM9/12/08

to

> On Fri, 12 Sep 2008 12:18:27 +0200, Richard wrote:

>> vipp...@gmail.com writes:

>> On Sep 12, 10:12 am, arnuld <sunr...@invalid.address> wrote:
>>> > On Thu, 11 Sep 2008 12:42:13 +0000, Richard Heathfield wrote:
>>> > I answered that question already (see the above link).
>>>
>>> If thats C's way of doing things. I have to admit, it is very messy :( . I
>>> really can't find why it is better than:
>>>
>>> std::vector<std::string> svec;
>>
>> Well, for starters, because it does compile.

you must be using Google Groups, thats why I don't see your post. Anyway,
Richard is right. It won't compile (this is comp.lang.c )

vipp...@gmail.com

unread,

Sep 12, 2008, 6:36:29 AM9/12/08

to

On Sep 12, 1:32 pm, arnuld <sunr...@invalid.address> wrote:
> > On Fri, 12 Sep 2008 12:18:27 +0200, Richard wrote:
> >> vipps...@gmail.com writes:
> >> On Sep 12, 10:12 am, arnuld <sunr...@invalid.address> wrote:
> >>> > On Thu, 11 Sep 2008 12:42:13 +0000, Richard Heathfield wrote:
> >>> > I answered that question already (see the above link).
>
> >>> If thats C's way of doing things. I have to admit, it is very messy :( . I
> >>> really can't find why it is better than:
>
> >>> std::vector<std::string> svec;
>
> >> Well, for starters, because it does compile.
>
> you must be using Google Groups, thats why I don't see your post. Anyway,
> Richard is right. It won't compile (this is comp.lang.c )

Yes I do. There's better spam filters than just blocking a service...
I'm also right; I meant that the other way is better than the latter
because the former *does* compile. (unlike the latter that does not)

Richard

unread,

Sep 12, 2008, 6:49:28 AM9/12/08

to

vipp...@gmail.com writes:

Hmmmm. Indeed. It's "Mr Heathfield" to you.

Richard Bos

unread,

Sep 12, 2008, 6:57:20 AM9/12/08

to

Pilcrow <Pilc...@gmail.com> wrote:

> On Wed, 10 Sep 2008 22:05:46 +0000, Richard Heathfield
> >Pilcrow said:
> >
> >> Try using fgets(), and strtok(). strtok() will allow you to define word
> >> separators to your taste.
> >
> >This is poor advice for a beginner. Whilst strtok does have its uses, it
> >also has issues - traps for the unwary programmer. These derive from its
> >maintenance of significant state between calls, which makes it unsuitable
>
> I understood that, and I am a 'beginner'. It is very adequately covered
> in textbooks (see 'C in a Nutshell', ISBN 0-596-00697-7, page 440),
> somewhat less so in K&R2. And I gave the questioner an example to help
> him. My dissatisfaction with strtok() is that repeated separation
> characters are treated as one, making it difficult to present the user

> with an intuitively understandable interface. It is not usually a good

> idea to equate ignorance and stupidity.

There is also the catch that strtok() scribbles over its parameter,
meaning that you cannot use it to tokenise either a string literal, or
data you want to keep. This is something that catches out a lot of less
well-informed newbies.

Richard

Richard Bos

unread,

Sep 12, 2008, 6:58:41 AM9/12/08

to

arnuld <sun...@invalid.address> wrote:

> If thats C's way of doing things. I have to admit, it is very messy :( . I
> really can't find why it is better than:
>
> std::vector<std::string> svec;

Confucius, he says: "if you want C++, you know where to find it".

Richard

unread,

Sep 12, 2008, 7:02:30 AM9/12/08

to

r...@hoekstra-uitgeverij.nl (Richard Bos) writes:

If I were looking at designing solutions in C for such things then it
would be remiss of me NOT to look to see how C++ has done it in the
meantime. It could lead to a lot of time saving. Sure you can not use
the C++ syntax but things are never there for "no reason". And in that
context mentioning the way C++ does it here is clearly topical and
possibly useful to C library designers.

James Kuyper

unread,

Sep 12, 2008, 7:05:07 AM9/12/08

to

arnuld wrote:
> I searched the c.l.c archives provided by Google as Google Groups with
> "word input" as the key words and did not come up with anything good.
>
>
> C++ has std::string for taking a word as input from stdin.

Could you identify the std::string feature that implements this? I
couldn't find any use of the word "word" anywhere in section 21 of the
C++ standard, which describes std::string.

James Kuyper

unread,

Sep 12, 2008, 7:30:35 AM9/12/08

to

Richard wrote:
> Richard Heathfield <r...@see.sig.invalid> writes:
...

>> I wish you wouldn't. You have every right to say what you said. I wasn't
>> being "precious" about it - merely surprised! In fact, I'd be quite
>> curious to know more about *why* you have a lot of trouble reading my
>> code. Maybe there's something I can change to make it easier for you to
>> read without making it more difficult for myself and others.
>
> Speaking for myself and knowingly only picking on style things:
>
> I hate this move to putting braces on their own lines. Its a horrible
> waste of vertical space and "3 levels" for one unit is not natural.

Vertical space is not in short supply. Personally, I handle that issue
the same way Richard Heathfield does. My reason is that it makes it
easier to identify and move block statements around when there's a set
of lines which is used for the block, and only that block, including the
delimiting curly brackets.

> Also you adopt the non "standard" option of putting your values to
> compare against on the left. While "clever" it does read as traditional
> "English" and is not adopted widely elsewhere.
>
> e.g
>
> while( 0 == getValue(t))
> doSomething(t);

I don't personally use this style, for reasons similar to yours.
However, are you aware of the reason why some people do this? When a
literal is the left operand of a comparison, rather than the right,
there is no danger of your code being silently compiled if you
accidentally type "=" instead of "==". I'll grant you that this doesn't
make any difference when the right operand is also something which could
not be the left operand of an assignment, such as a function call.
However, this kind of rule is much more effective when used
consistently, rather than always asking yourself "is it needed here?". I
tried this style, but found it very hard to break old habits; but I
would not criticize people for adopting it.

Richard

unread,

Sep 12, 2008, 7:38:58 AM9/12/08

to

James Kuyper <james...@verizon.net> writes:

> Richard wrote:
>> Richard Heathfield <r...@see.sig.invalid> writes:
> ...
>>> I wish you wouldn't. You have every right to say what you said. I
>>> wasn't being "precious" about it - merely surprised! In fact, I'd
>>> be quite curious to know more about *why* you have a lot of trouble
>>> reading my code. Maybe there's something I can change to make it
>>> easier for you to read without making it more difficult for myself
>>> and others.
>>
>> Speaking for myself and knowingly only picking on style things:
>>
>> I hate this move to putting braces on their own lines. Its a horrible
>> waste of vertical space and "3 levels" for one unit is not natural.
>
> Vertical space is not in short supply. Personally, I handle that issue
> the same way Richard Heathfield does. My reason is that it makes it
> easier to identify and move block statements around when there's a set
> of lines which is used for the block, and only that block, including
> the delimiting curly brackets.

Sounds very rare to me. This moving blocks around. And even so its one
key stroke away to realign etc. hardly worth adopting an entire new
layout to support. I have emacs set up that a single sequence collects
the entire scope into the clipboard anyway.

>
>> Also you adopt the non "standard" option of putting your values to
>> compare against on the left. While "clever" it does read as traditional
>> "English" and is not adopted widely elsewhere.
>>
>> e.g
>>
>> while( 0 == getValue(t))
>> doSomething(t);
>
> I don't personally use this style, for reasons similar to
> yours. However, are you aware of the reason why some people do this?

Yes. Well, one reason. And its not the same as the other one (which I
knew too :-;).

> When a literal is the left operand of a comparison, rather than the
> right, there is no danger of your code being silently compiled if you
> accidentally type "=" instead of "==". I'll grant you that this

Fabricated and blown out of all proportion IMO. What about protecting
against someone typing "a==0" instead of "a=0". or one of a million
other errors. Its a coding error. And 2 seconds testing or debugging
puts that right.

> doesn't make any difference when the right operand is also something
> which could not be the left operand of an assignment, such as a
> function call. However, this kind of rule is much more effective when
> used consistently, rather than always asking yourself "is it needed
> here?". I tried this style, but found it very hard to break old
> habits; but I would not criticize people for adopting it.

I would. The perceived benefits are more than offset by the non standard
"reading" of the code. In my opinion of course. I now expect the usual
sycophants to appear telling us how their productivity increased 30000%
when they adopted this notation ....

It's amusing that the people I know you use this "back to front" trend
are also some of the worst "team players" I have ever encountered and
tend to be jobs worth language lawyers than good, practical programmers
interested in contributing to a consistent and maintainable code base.

Richard Heathfield

unread,

Sep 12, 2008, 10:52:59 AM9/12/08

to

vipp...@gmail.com said:

> On Sep 12, 1:18 pm, Richard<rgr...@gmail.com> wrote:
>
> [replying to me]
>
>> Minus 3 for being too late on your attempt to get promoted into the
>> c.l.c "reg" upper echelon. But that atttempt combined with your
>> "indeeds", your "Mr heathfields" and various nauseating attempts at
>> belittling nOObs should ensure at least a cushion at RHs feet in the
>> near future.
>
> I'm tired of this.

Yes, it's tiresome. But he's an attention-seeker, like all trolls. If you
never, ever reply to him, maybe - just *may*be - he'll get bored and drift
away, and the average C understanding of the group will increase by a
small but significant amount.

> I don't give a crap what Heathfield thinks of me.

Neither should you. The best strategy is to give people every reason to
think highly of you whilst, at the same time, not worrying whether or not
they do.

> I'm here to learn C and help others do the same, not to socialize.

Right.

Richard

unread,

Sep 12, 2008, 10:51:57 AM9/12/08

to

Richard Heathfield <r...@see.sig.invalid> writes:

> vipp...@gmail.com said:
>
>> On Sep 12, 1:18 pm, Richard<rgr...@gmail.com> wrote:
>>
>> [replying to me]
>>
>>> Minus 3 for being too late on your attempt to get promoted into the
>>> c.l.c "reg" upper echelon. But that atttempt combined with your
>>> "indeeds", your "Mr heathfields" and various nauseating attempts at
>>> belittling nOObs should ensure at least a cushion at RHs feet in the
>>> near future.
>>
>> I'm tired of this.
>
> Yes, it's tiresome. But he's an attention-seeker, like all trolls. If you
> never, ever reply to him, maybe - just *may*be - he'll get bored and drift
> away, and the average C understanding of the group will increase by a
> small but significant amount.

Not if there is no one to challenge you and your humongous ego and
inflated head.

>
>> I don't give a crap what Heathfield thinks of me.
>
> Neither should you. The best strategy is to give people every reason to
> think highly of you whilst, at the same time, not worrying whether or not
> they do.

Vippstar's contributions to nOObs are invariably poorly worded or wrong
and then followed up with a "yes, I made a mistake there I meant to say
.....".

Only a complete fool would not have noticed his attempts to ingratiate
himself with you and your clique by parroting the "Mr heathfield" form
of address and the cringingly horrible use of "Indeed" to promote an
image of knowledge, seniority and peer acceptance.

>
>> I'm here to learn C and help others do the same, not to socialize.
>
> Right.

It's a shame that decency and politeness can not be mixed with "learning
and helping" in too many cases here. To be social costs nothing.

So basically, stick it where the sun does not shine you obnoxious
arse

Pilcrow

unread,

Sep 12, 2008, 11:00:13 AM9/12/08

to

On Fri, 12 Sep 2008 06:17:25 +0000, Richard Heathfield
<r...@see.sig.invalid> wrote:

>Pilcrow said:
>
>> On Thu, 11 Sep 2008 16:42:51 +0000, Richard Heathfield
>> <r...@see.sig.invalid> wrote:
>>
>>>Pilcrow said:
>>>
><snip>
>
>>>> I have a lot of trouble reading yours.
>>>
>>>You may well be the first person ever to say that. People have made all
>>>kinds of complaints about my code, but readability is not usually high on
>>>the hit-list.
>>>
>>
>> I apologize.
>
>I wish you wouldn't. You have every right to say what you said. I wasn't
>being "precious" about it - merely surprised! In fact, I'd be quite
>curious to know more about *why* you have a lot of trouble reading my
>code. Maybe there's something I can change to make it easier for you to
>read without making it more difficult for myself and others.

It's just a matter of 'accent'. Just as it's easiest for me to
understand someone who speaks with a regional accent similar to mine
(I'm a native of Brooklyn), I understand easiest a coding style similar
to mine. I really shouldn't have brought it up.

CBFalconer

unread,

Sep 12, 2008, 11:10:42 AM9/12/08

to

Richard Bos wrote:
> Pilcrow <Pilc...@gmail.com> wrote:

Try this:

/* ------- file tknsplit.c ----------*/
#include "tknsplit.h"

/* copy over the next tkn from an input string, after
skipping leading blanks (or other whitespace?). The
tkn is terminated by the first appearance of tknchar,
or by the end of the source string.

The caller must supply sufficient space in tkn to
receive any tkn, Otherwise tkns will be truncated.

Returns: a pointer past the terminating tknchar.

This will happily return an infinity of empty tkns if
called with src pointing to the end of a string. Tokens
will never include a copy of tknchar.

A better name would be "strtkn", except that is reserved
for the system namespace. Change to that at your risk.

released to Public Domain, by C.B. Falconer.
Published 2006-02-20. Attribution appreciated.
Revised 2006-06-13 2007-05-26 (name)
*/

const char *tknsplit(const char *src, /* Source of tkns */
char tknchar, /* tkn delimiting char */
char *tkn, /* receiver of parsed tkn */
size_t lgh) /* length tkn can receive */
/* not including final '\0' */
{
if (src) {
while (' ' == *src) src++;

while (*src && (tknchar != *src)) {
if (lgh) {
*tkn++ = *src;
--lgh;
}
src++;
}
if (*src && (tknchar == *src)) src++;
}
*tkn = '\0';
return src;
} /* tknsplit */

#ifdef TESTING
#include <stdio.h>

#define ABRsize 6 /* length of acceptable tkn abbreviations */

/* ---------------- */

static void showtkn(int i, char *tok)
{
putchar(i + '1'); putchar(':');
puts(tok);
} /* showtkn */

/* ---------------- */

int main(void)
{
char teststring[] = "This is a test, ,, abbrev, more";

const char *t, *s = teststring;
int i;
char tkn[ABRsize + 1];

puts(teststring);
t = s;
for (i = 0; i < 4; i++) {
t = tknsplit(t, ',', tkn, ABRsize);
showtkn(i, tkn);
}

puts("\nHow to detect 'no more tkns' while truncating");
t = s; i = 0;
while (*t) {
t = tknsplit(t, ',', tkn, 3);
showtkn(i, tkn);
i++;
}

puts("\nUsing blanks as tkn delimiters");
t = s; i = 0;
while (*t) {
t = tknsplit(t, ' ', tkn, ABRsize);
showtkn(i, tkn);
i++;
}
return 0;
} /* main */

#endif
/* ------- end file tknsplit.c ----------*/

/* ------- file tknsplit.h ----------*/
#ifndef H_tknsplit_h
# define H_tknsplit_h

# ifdef __cplusplus
extern "C" {
# endif

#include <stddef.h>

/* copy over the next tkn from an input string, after
skipping leading blanks (or other whitespace?). The
tkn is terminated by the first appearance of tknchar,
or by the end of the source string.

The caller must supply sufficient space in tkn to
receive any tkn, Otherwise tkns will be truncated.

Returns: a pointer past the terminating tknchar.

This will happily return an infinity of empty tkns if
called with src pointing to the end of a string. Tokens
will never include a copy of tknchar.

released to Public Domain, by C.B. Falconer.
Published 2006-02-20. Attribution appreciated.
revised 2007-05-26 (name)
*/

const char *tknsplit(const char *src, /* Source of tkns */
char tknchar, /* tkn delimiting char */
char *tkn, /* receiver of parsed tkn */
size_t lgh); /* length tkn can receive */
/* not including final '\0' */

# ifdef __cplusplus
}
# endif
#endif
/* ------- end file tknsplit.h ----------*/

--
[mail]: Chuck F (cbfalconer at maineline dot net)
[page]: <http://cbfalconer.home.att.net>
Try the download section.

Anand Hariharan

unread,

Sep 13, 2008, 12:54:55 PM9/13/08

to

On Fri, 12 Sep 2008 11:30:35 +0000, James Kuyper <james...@verizon.net>
wrote:

> Richard wrote:
>> Richard Heathfield <r...@see.sig.invalid> writes:
> ...
>>> I wish you wouldn't. You have every right to say what you said. I
>>> wasn't being "precious" about it - merely surprised! In fact, I'd be
>>> quite curious to know more about *why* you have a lot of trouble
>>> reading my code. Maybe there's something I can change to make it
>>> easier for you to read without making it more difficult for myself and
>>> others.
>>
>> Speaking for myself and knowingly only picking on style things:
>>
>> I hate this move to putting braces on their own lines. Its a horrible
>> waste of vertical space and "3 levels" for one unit is not natural.
>
> Vertical space is not in short supply. Personally, I handle that issue
> the same way Richard Heathfield does. My reason is that it makes it
> easier to identify and move block statements around when there's a set
> of lines which is used for the block, and only that block, including the
> delimiting curly brackets.
>

I also put in the curly braces in the same column, just below the
starting column of the line prior to the opening brace. And I have held
this to be a style issue.

While I am not going to change my style anytime soon, a ex-coworker
pointed out one benefit to putting the opening brace at the end of the
line that starts the block:

Most editors have a "match-parens" feature. Should you be at the closing
curly brace, and use this feature to jump to the opening brace, most
editor usually scroll vertically upwards just enough to show the opening
brace. One usually has to scroll upwards a line or two more in order to
see what initiated this block. I admit until I was pointed this out, I
had not figured it was a big deal, but it has started to annoy me after I
realised this.

I am not sure if this 'tool issue' makes this any less of a style issue.

- Anand

arnuld

unread,

Sep 15, 2008, 12:21:15 AM9/15/08

to

> On Thu, 11 Sep 2008 13:36:30 -0700, Keith Thompson wrote:

> As you know, that definition is fine for your program; others might
> have different requirements.
>
> Incidentally, the phrase "letters, numbers, or anything" seems
> redundant. I think that a more precise rendering of what you meant
> would be:
>
> A "word" is a non-empty contiguous sequence of characters other
> than space, tab, or newline, preceded or followed either by a
> space, tab, or newline or by the start or end of the input.

> It would also be good to specify whether the input is a string, a line
> of text, or an entire text file.

> .....SNIP....

> It's pretty much clear what your definition of a word is. It's still
> not at all clear what a word is in general (and it can't be, since the
> term is used inconsistently).

I think std::string in C++ defines what exactly *definition* of word is.
Look at my code and see how std::string works and perhaps we can settle on
some common and standard meaning word. I don't like to put C++ code in a C
group and I think I don't have any choice to define what a word is:

/* A program that will ask user for input and then will print them
* in an alphabetical order
*
* VERSION 1.1
*
*/

#include <iostream>
#include <string>
#include <vector>
#include <algorithm>

void ask_input( std::vector<std::string>& );
void print_vector( const std::vector<std::string>& );

int main()
{

std::vector<std::string> svec;

ask_input( svec );
std::sort( svec.begin(), svec.end() );
std::cout << "--------------------------------"
<< std::endl;
print_vector( svec );

return 0;
}

void ask_input( std::vector<std::string>& strvec )
{
std::string str;

while( std::cin >> str )
{
strvec.push_back( str );
}
}

void print_vector( const std::vector<std::string>& strvec )
{
std::vector<std::string>::const_iterator iter = strvec.begin();

for( ; iter != strvec.end(); ++iter )
{
std::cout << *iter << std::endl;
}
}

=================== OUTPUT =========================
[arnuld@dune ztest]$ g++ -ansi -pedantic -Wall -Wextra sort-input.cpp
[arnuld@dune ztest]$ ./a.out
and saurabh sumi smit sumit and arnuld
--------------------------------
and
and
arnuld
saurabh
smit
sumi
sumit
[arnuld@dune ztest]$

you see even if you put a line as input, std::string will automatically
dissect it into separate words.

arnuld

unread,

Sep 15, 2008, 12:23:06 AM9/15/08

to

> On Fri, 12 Sep 2008 11:05:07 +0000, James Kuyper wrote:

> Could you identify the std::string feature that implements this? I
> couldn't find any use of the word "word" anywhere in section 21 of the
> C++ standard, which describes std::string.

I think we have to look at the source code of std::string library and see
how it is implemented. I am sure it is done using C way, arrays and
pointers ;)

Ian Collins

unread,

Sep 15, 2008, 12:42:47 AM9/15/08

to

arnuld wrote:
>
> I think std::string in C++ defines what exactly *definition* of word is.
> Look at my code and see how std::string works and perhaps we can settle on
> some common and standard meaning word.

I "token" what you are looking for?

>
> you see even if you put a line as input, std::string will automatically
> dissect it into separate words.
>

No, it will not.

<OT> The input stream tokenises the input. The C++ standard defines how
formatted input is tokenised. </OT>

--
Ian Collins.

Richard Heathfield

unread,

Sep 15, 2008, 12:50:29 AM9/15/08

to

arnuld said:

>> On Thu, 11 Sep 2008 13:36:30 -0700, Keith Thompson wrote:
>

<snip>

>
>> It's pretty much clear what your definition of a word is. It's still
>> not at all clear what a word is in general (and it can't be, since the
>> term is used inconsistently).
>
>
> I think std::string in C++ defines what exactly *definition* of word is.

It defines what *its* definition is.

> Look at my code and see how std::string works and perhaps we can settle
> on some common and standard meaning word.

Even if we could, it would only be *our* definition, not a universal
definition.

Let me give you an example from ordinary English, where whitespace
delimiters are not sufficient:

"What did he say?", said Albert.
"He just said, 'I'll be there', I think", replied the captain.

Now, consider the whitespace-separated tokens:

"What
did
he
say?",
said
Albert.
"He
just
said,
'I'll
be
there',
I
think",
replied
the
captain.

Seventeen tokens there, but fewer than half of them are English words. The
rest are encumbered with some kind of punctuation. But do we really wish
to treat "said" and "said," differently? No, of course not. They are the
same word. So we need to strip punctuation, right?

Problem: design an algorithm for removing punctuation from arbitrary
English sentences, *without* removing punctuation that actually belongs to
the word (example: "will-o'-the-wisp" must retain its three hyphens and
its apostrophe).

As Knuth would say: [50]

arnuld

unread,

Sep 15, 2008, 1:15:50 AM9/15/08

to

> On Mon, 15 Sep 2008 04:50:29 +0000, Richard Heathfield wrote:

> Even if we could, it would only be *our* definition, not a universal
> definition.
>
> Let me give you an example from ordinary English, where whitespace
> delimiters are not sufficient:

> ....SNIP....

> Problem: design an algorithm for removing punctuation from arbitrary
> English sentences, *without* removing punctuation that actually belongs to
> the word (example: "will-o'-the-wisp" must retain its three hyphens and
> its apostrophe).

That means, we will also have a function containing all of the words with
intended hyphens and apostrophes to which we will compare the input words.
Hence that function will be used at run time and will have millions of
words, hence will be very expansive to run. If the user wants to enter
comp.lang.3c as words then its his choice or stupidity.Let hi do this way,
why we need to think about it.

> As Knuth would say: [50]

I don't know what that means .

Richard Heathfield

unread,

Sep 15, 2008, 1:39:12 AM9/15/08

to

arnuld said:

>> On Mon, 15 Sep 2008 04:50:29 +0000, Richard Heathfield wrote:
>
>> Even if we could, it would only be *our* definition, not a universal
>> definition.
>>
>> Let me give you an example from ordinary English, where whitespace
>> delimiters are not sufficient:
>
>> ....SNIP....
>
>> Problem: design an algorithm for removing punctuation from arbitrary
>> English sentences, *without* removing punctuation that actually belongs
>> to the word (example: "will-o'-the-wisp" must retain its three hyphens
>> and its apostrophe).
>
> That means, we will also have a function containing all of the words with
> intended hyphens and apostrophes to which we will compare the input
> words.

Not necessarily a function, but yes, we would need some kind of dictionary
- and even then, we wouldn't be done, because some French or German or
Spanish or Czech or Polish or Slovakian or Turkish geezer would come along
and say "you call those words? Those aren't words - THESE are words...",
and give you a whole new set of problems.

The lesson here is that there is no single answer that will satisfy
everyone.

>> As Knuth would say: [50]
>
> I don't know what that means .

<sigh> I know.

Keith Thompson

unread,

Sep 15, 2008, 1:48:39 AM9/15/08

to

arnuld <sun...@invalid.address> writes:
>> On Fri, 12 Sep 2008 11:05:07 +0000, James Kuyper wrote:
>> Could you identify the std::string feature that implements this? I
>> couldn't find any use of the word "word" anywhere in section 21 of the
>> C++ standard, which describes std::string.
>
>
> I think we have to look at the source code of std::string library and see
> how it is implemented. I am sure it is done using C way, arrays and
> pointers ;)

<OT>
No, std::string doesn't define what a "word" is. The overloaded "<<"
operator, declared in the <iostream> header, does that.
</OT>

But, to be blunt, so what? That's one possible definition of a
"word". If it works for your purposes, that's great. But there is
nothing unique or definitive about the way this particular C++
operator does things; there are many other possible definitions, most
of them equally valid.

If you want to discuss how to read a "word" as input from stdin, given
a particular definition of "word", that's fine. For any definition
you can state with sufficient clarity, it's possible to implement it
in C. (I expect RH to offer a counterexample shortly.) If you want
to discuss which definition of "word" (or "token", or whatever) is
correct, that's not a C question, nor is it really an answerable
question.

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Richard Heathfield

unread,

Sep 15, 2008, 2:43:37 AM9/15/08

to

Keith Thompson said:

<snip>

> If you want to discuss how to read a "word" as input from stdin, given
> a particular definition of "word", that's fine. For any definition
> you can state with sufficient clarity, it's possible to implement it
> in C. (I expect RH to offer a counterexample shortly.)

I don't know what kind of counterexample you're expecting. I would guess,
however, that any sufficiently clear explanation would probably be
insufficiently accurate for universal or even general use.

> If you want
> to discuss which definition of "word" (or "token", or whatever) is
> correct, that's not a C question, nor is it really an answerable
> question.

Isaac Asimov was once in a Q&A session and said something like: "I can
answer any question you ask me, provided you are prepared accept 'I don't
know' as an answer."

Richard

unread,

Sep 15, 2008, 2:57:19 AM9/15/08

to

Richard Heathfield <r...@see.sig.invalid> writes:

I wonder how many people do?

But then 90% of SW Engineers never read Knuth or possibly they tried it
and found it impenetrable. Only in c.l.c is it recommended as a "great
way to learn programming". I still smile when I remember that thread.

So, basically, I don't know what that means either.

CBFalconer

unread,

Sep 15, 2008, 3:16:46 AM9/15/08

to

arnuld wrote:
>
... snip ...

>
> I think std::string in C++ defines what exactly *definition* of
> word is. Look at my code and see how std::string works and perhaps
> we can settle on some common and standard meaning word. I don't
> like to put C++ code in a C group and I think I don't have any
> choice to define what a word is:
>
> /* A program that will ask user for input and then will print

> * them in an alphabetical order
> *
> * VERSION 1.1

> */
>
> #include <iostream>
> #include <string>
> #include <vector>
> #include <algorithm>

Please restrain your C++ writings to comp.land.c++. This is c.l.c
and they are off-topic here. C++ is a different language.

arnuld

unread,

Sep 15, 2008, 3:17:38 AM9/15/08

to

> On Sun, 14 Sep 2008 22:48:39 -0700, Keith Thompson wrote:

> ..SNIP...

> If you want
> to discuss which definition of "word" (or "token", or whatever) is
> correct, that's not a C question, nor is it really an answerable
> question.

okay, that seems a good reply. I mean, we make it topical to C again as I
lost in the confusion a little. so *my* definition of word will be the
same one yo told earlier:

A "word" is a non-empty contiguous sequence of characters other
than space, tab, or newline, preceded or followed either by a
space, tab, or newline or by the start or end of the input.

Now you earlier questions down here:

> It would also be good to specify whether the input is a string, a line
> of text, or an entire text file.

in the current case, it is a "word" from terminal, the word we just
defined.

so lets code it :)

Richard

unread,

Sep 15, 2008, 3:25:22 AM9/15/08

to

CBFalconer <cbfal...@yahoo.com> writes:

> arnuld wrote:
>>
> ... snip ...
>>
>> I think std::string in C++ defines what exactly *definition* of
>> word is. Look at my code and see how std::string works and perhaps
>> we can settle on some common and standard meaning word. I don't
>> like to put C++ code in a C group and I think I don't have any
>> choice to define what a word is:
>>
>> /* A program that will ask user for input and then will print
>> * them in an alphabetical order
>> *
>> * VERSION 1.1
>> */
>>
>> #include <iostream>
>> #include <string>
>> #include <vector>
>> #include <algorithm>
>
> Please restrain your C++ writings to comp.land.c++. This is c.l.c
> and they are off-topic here. C++ is a different language.

Please get lost. This was on topic and in a discussion with how best to
approach certain issues. Only a complete idiot like you would try to do
that without considering already researched (partial) solutions.

arnuld

unread,

Sep 15, 2008, 4:12:34 AM9/15/08

to

Now since *my* definition of word is done. Here is the outline of the
program:

/* It will ask the user for input and will print the input
* in alphabetcial order when user will hit EOF (Ctrl-D on Linux)
*
* VERSION 1.0
*
*/

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

void get_words( char** );
void sort_words( char** );
void printf_words( char** );

int main( int argc, char* argv[] )
{
char** pda;

get_words( pda );

sort_words( pda );

print_words( pda );

return EXIT_SUCCESS;
}

SOLUTION: pda is an array of pointers, where pointers are pointing to
different words input by the user ( which are in fact arrays of characters
terminated by null, which means they are string literals of C, which means
it is still inherently confusing to me )

So we have an array of arrays. when we want to sort the input, we will
just sort the pointers pointers to words, rather than sorting the arrays
themselves. That will be much more efficient and is an idea i learned from
K&R2 :) . We don't sort the string literals, we will sort the pointers
pointing to them.

2nd we don't have any idea on how many words a user will enter, so we will
use dynamic memory allocation , which I am going to learn for first time,
so please run me in the wrong way ;) . 3rd, we do have an idea on the
maximum length of the word. Wikipedia says, the longest English word is
189819 characters long, a chimcal name for some sort of protein:

http://en.wikipedia.org/wiki/Longest_word_in_English

which I *assume* user is not going to enter. I will limit the longets
words to what we call "Longest word in Shakespeare's work" whihc is 27
characters long, hence limiting the array size to be used to store the
words to 28.

good idea ?

arnuld

unread,

Sep 15, 2008, 4:18:42 AM9/15/08

to

> On Mon, 15 Sep 2008 09:25:22 +0200, Richard wrote:

> Please get lost. This was on topic and in a discussion with how best to
> approach certain issues. Only a complete idiot like you would try to do
> that without considering already researched (partial) solutions.

Though I really appreciate that you supported me and I think you are
disrespecting him by calling him an idiot. Its not what I think of him
when he replied. Chuck said so because he respects clc like all of us.
Though he could have added something like "it is ok for this time but no
C++ next time. you know better", to his reply. If he did not add that it
does not mean anyone should disrespect him. He is looking for the
well-being of clc , like me and I understood his reasoning.

Only trolls should be disrespected by not replying to their posts ;)

Richard Heathfield

unread,

Sep 15, 2008, 5:28:36 AM9/15/08

to

arnuld said:

>
> Now since *my* definition of word is done.

Yes. Whitespace-delimited.

> Here is the outline of the program:

[...]

> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
>
> void get_words( char** );
> void sort_words( char** );
> void printf_words( char** );

I don't think this is going to cut it.

You want to sort many words (i.e. more than one), so you will need to store
them, assuming you want an in-memory sort. The natural way to store them
(or at least, the natural way for me to store them) is by allocating a
number of pointers to char (which can be reallocated if it proves to be
insufficiently long), each of which points to a word. The pointers to char
will be stored in a dynamic array, the base element of which will be
pointed to by a char **. For get_words to be able to do the allocation and
any necessary reallocations, you must be able to modify this char **
within the get_words function. For this change to "stick" in the caller,
you will need either to pass a pointer to the char **, or return the char
** from the function. Thus, you will need, at a minimum (at this point),
either this:

char **get_words(void);

or this:

void get_words(char ***);

But you need to know how many words you've captured! So you must either
return the count or pass a pointer to an integer object in which to store
the count. So you'll need either this:

char **get_words(size_t *);

or this:

size_t get_words(char ***);

But what if something goes wrong? You'll need to be able to report an
error. The natural way to do this is via a return value, which means we
can't use that value for either the list or the count, and that leads us
to:

int get_words(char ***, size_t *);

Since they don't need to modify the caller's status, sort_words and
print_words can be of type int(char **, size_t).

<snip>

> SOLUTION: pda is an array of pointers, where pointers are pointing to
> different words input by the user ( which are in fact arrays of
> characters terminated by null, which means they are string literals of C,
> which means it is still inherently confusing to me )

Not string literals - just strings.

<snip>

> 3rd, we do have an idea on
> the maximum length of the word.

The longest one you find. That's why you use dynamic allocation - it means
you can fit the longest word you find without having to worry about
wasting space catering for longer words.

<snip>

> I will limit the longets
> words to what we call "Longest word in Shakespeare's work" whihc is 27
> characters long, hence limiting the array size to be used to store the
> words to 28.
>
> good idea ?

Up to you, but I wouldn't bother setting a limit (or, if I did, I'd set it
at a million or so, and treat any string longer than that as a reportable
error). With dynamic allocation, you don't /need/ to set a limit; you
simply allocate as you go, and reallocate if necessary.

arnuld

unread,

Sep 15, 2008, 5:43:54 AM9/15/08

to

> On Mon, 15 Sep 2008 09:28:36 +0000, Richard Heathfield wrote:

> I don't think this is going to cut it.

> ..SNIP...

> But what if something goes wrong? You'll need to be able to report an
> error. The natural way to do this is via a return value, which means we
> can't use that value for either the list or the count, and that leads us
> to:
>
> int get_words(char ***, size_t *);

why *** , 3 levels of indirection ? when we pass an array of characters
as an argument to a function, it becomes a pointer, single * . Hence when
we will pass an array of pointers, it will become **.

> Not string literals - just strings.

string literal, string and string constant aren't 3 names for a single
thing ?

> Up to you, but I wouldn't bother setting a limit (or, if I did, I'd set
> it at a million or so, and treat any string longer than that as a
> reportable error). With dynamic allocation, you don't /need/ to set a
> limit; you simply allocate as you go, and reallocate if necessary.

so you want to dynamically allocate both the single word and the array of
words.

Richard Heathfield

unread,

Sep 15, 2008, 5:58:29 AM9/15/08

to

arnuld said:

>> On Mon, 15 Sep 2008 09:28:36 +0000, Richard Heathfield wrote:
>
>> I don't think this is going to cut it.
>
>> ..SNIP...
>
>> But what if something goes wrong? You'll need to be able to report an
>> error. The natural way to do this is via a return value, which means we
>> can't use that value for either the list or the count, and that leads us
>> to:
>>
>> int get_words(char ***, size_t *);
>
> why *** , 3 levels of indirection ? when we pass an array of characters
> as an argument to a function, it becomes a pointer, single * . Hence
> when we will pass an array of pointers, it will become **.

Yes, but you're not passing an array of pointers. You're trying to pass a
pointer to a pointer to char - which is fine, but it means that any
changes made to the pointer value within the function (and there *will* be
changes) will be local to that function. That isn't what you want.

>> Not string literals - just strings.
>
> string literal, string and string constant aren't 3 names for a single
> thing ?

Two of them are two names for a single thing. Although "string literal" is
the formal term for a string literal, people will know what you mean if
you say "string constant". But consider this:

char foo[3];
foo[0] = 'H';
foo[1] = 'i';
foo[2] = '\0';

foo now contains a string, but no string literals are involved.

>> Up to you, but I wouldn't bother setting a limit (or, if I did, I'd set
>> it at a million or so, and treat any string longer than that as a
>> reportable error). With dynamic allocation, you don't /need/ to set a
>> limit; you simply allocate as you go, and reallocate if necessary.
>
>
> so you want to dynamically allocate both the single word and the array of
> words.

Yes. I think that's the best approach.

arnuld

unread,

Sep 15, 2008, 7:23:40 AM9/15/08

to

> On Mon, 15 Sep 2008 09:58:29 +0000, Richard Heathfield wrote:

>> arnuld said:
>> why *** , 3 levels of indirection ? when we pass an array of characters
>> as an argument to a function, it becomes a pointer, single * . Hence
>> when we will pass an array of pointers, it will become **.

> Yes, but you're not passing an array of pointers. You're trying to pass a
> pointer to a pointer to char - which is fine, but it means that any
> changes made to the pointer value within the function (and there *will* be
> changes) will be local to that function. That isn't what you want.

I don't get it to be true. You can never pass an array as value,
arrays are *always* passed by reference. It means when I
pass the name of an array of characters to a function as an argument, then
any changes made to the array will be made to the original array because
when you pass an array to a function as an argument, it will be changed to
a pointer to its first element:

char arrc[3] = { 'a', 'z', '\0'};
char* pc;

pc = arrc;

some_function( arrc );
some_function( pc );

both calls are same, right ?

Now when we will pass an array of pointers to some function, then it will
be converted as pointer to its first element ( which in fact is already a
pointer) hence it will be passed as pointer to pointer to char and with
that we can modify the original elements:

#include <stdio.h>
#include <ctype.h>

enum { ARRSIZE = 2 };

void edit_first_element_arrp( char** ppc );

int main( void )
{
char* p1;
char* p2;
char** p_arrp;

char* arrp[ARRSIZE] = { 0 };

p1 = p2 = NULL;

arrp[0] = p1;
arrp[1] = p2;

p_arrp = arrp;

edit_first_element_arrp( p_arrp );

/* pointer has moved, so take it to the original position */
p_arrp = arrp;

printf("arrp[0] = %c\n", **p_arrp++);
printf("arrp[1] = %c\n", **p_arrp);

return 0;
}

void edit_first_element_arrp( char** ppc )
{
int idx;

for( idx = 0; idx != ARRSIZE; ++idx )
{
if( ! (idx) )
{
**ppc++ = 'Z';
}
}
}

Hence we can change the values of p1 and p2 pointing to. but this function
Segfaults :(

> Two of them are two names for a single thing. Although "string literal"
> is the formal term for a string literal, people will know what you mean
> if you say "string constant". But consider this:
>
> char foo[3];
> foo[0] = 'H';
> foo[1] = 'i';
> foo[2] = '\0';
>
> foo now contains a string, but no string literals are involved.

So what is a string literal ?

> Yes. I think that's the best approach.

okay, first I will try to test the dynamic version of get_single_word
function. which will just make a single word out of some input characters.

James Kuyper

unread,

Sep 15, 2008, 7:23:08 AM9/15/08

to

arnuld wrote:
...

> I think std::string in C++ defines what exactly *definition* of word is.

Where?

> Look at my code and see how std::string works and perhaps we can settle on
> some common and standard meaning word.

The C++ standard does not provide a name to describe what it is that
operator>> extracts into a string; but the most generally used term for
that kind of thing is "token", not "word".

The std::string operator>> overload reads in delimited tokens. By
default, the set of delimiters is the set of characters that are
considered to be spacing characters under the currently imbued locale.
This default can be overridden.

James Kuyper

unread,

Sep 15, 2008, 7:36:37 AM9/15/08

to

CBFalconer wrote:
> arnuld wrote:
> ... snip ...
>> I think std::string in C++ defines what exactly *definition* of
>> word is. Look at my code and see how std::string works and perhaps
>> we can settle on some common and standard meaning word. I don't
>> like to put C++ code in a C group and I think I don't have any
>> choice to define what a word is:
>>
>> /* A program that will ask user for input and then will print
>> * them in an alphabetical order
>> *
>> * VERSION 1.1
>> */
>>
>> #include <iostream>
>> #include <string>
>> #include <vector>
>> #include <algorithm>
>
> Please restrain your C++ writings to comp.land.c++. This is c.l.c
> and they are off-topic here. C++ is a different language.

The only way he knows how to clearly describe what he wants his code to
do is by providing a C++ example; this has been made abundantly clear by
his failed attempts to clearly describe it in English. However, the code
he wants to write should be in C.

If he were to post this same question to comp.lang.c++, and there were a
C++BFalconer on comp.lang.c++, C++BFalconer would certainly respond by
saying that this C question is off-topic in comp.lang.c++. Should arnuld
then simply remain silent about his question?

James Kuyper

unread,

Sep 15, 2008, 7:39:21 AM9/15/08

to

arnuld wrote:
>> On Fri, 12 Sep 2008 11:05:07 +0000, James Kuyper wrote:
>
>> Could you identify the std::string feature that implements this? I
>> couldn't find any use of the word "word" anywhere in section 21 of the
>> C++ standard, which describes std::string.
>
>
> I think we have to look at the source code of std::string library and see
> how it is implemented. I am sure it is done using C way, arrays and
> pointers ;)

No, that will only tell you what std::string actually does. It will not
tell you what the meaning of the word "word" is. For that, you have to
search the relevant documentation, the C++ standard - and that
documentation never uses the word "word" to describe what std::string does.

Ben Bacarisse

unread,

Sep 15, 2008, 8:15:23 AM9/15/08

to

arnuld <sun...@invalid.address> writes:

>> On Mon, 15 Sep 2008 09:58:29 +0000, Richard Heathfield wrote:
>
>>> arnuld said:
>>> why *** , 3 levels of indirection ? when we pass an array of characters
>>> as an argument to a function, it becomes a pointer, single * . Hence
>>> when we will pass an array of pointers, it will become **.
>
>> Yes, but you're not passing an array of pointers. You're trying to pass a
>> pointer to a pointer to char - which is fine, but it means that any
>> changes made to the pointer value within the function (and there *will* be
>> changes) will be local to that function. That isn't what you want.
>
> I don't get it to be true. You can never pass an array as value,
> arrays are *always* passed by reference. It means when I
> pass the name of an array of characters to a function as an argument, then
> any changes made to the array will be made to the original array because
> when you pass an array to a function as an argument, it will be changed to
> a pointer to its first element:

Richard is taking about the pointer to the whole array. A function
that takes: void get_word(char **words); can change words[32] to point
to some new string just found. It can change words[32][0] to be 'x',
but it can't change words itself. Well, it can, but the effect will
be lost when the function returns.

The most important change you need to make is that you will have to
realloc the space for the char * array. This is, of course, a char
**, but if the function is to change a char ** this is outside and
passed in, that parameter must be a char ***.

> char arrc[3] = { 'a', 'z', '\0'};
> char* pc;
>
> pc = arrc;
>
> some_function( arrc );
> some_function( pc );
>
> both calls are same, right ?

Yes, but some_function can't make pc point to a bigger array if
needed. pc will point to the same place after the call.

> Now when we will pass an array of pointers to some function, then it will
> be converted as pointer to its first element ( which in fact is already a
> pointer) hence it will be passed as pointer to pointer to char and with
> that we can modify the original elements:
>
>
> #include <stdio.h>
> #include <ctype.h>
>
> enum { ARRSIZE = 2 };
>
> void edit_first_element_arrp( char** ppc );
>
>
> int main( void )
> {
> char* p1;
> char* p2;
> char** p_arrp;
>
>
> char* arrp[ARRSIZE] = { 0 };
>
> p1 = p2 = NULL;
>
> arrp[0] = p1;
> arrp[1] = p2;

All these last three lines make no changes. Both elements of arrp are
already NULL.

> p_arrp = arrp;
>
> edit_first_element_arrp( p_arrp );
>
> /* pointer has moved, so take it to the original position */
> p_arrp = arrp;

No. p_arrp can't be change by the call. This is a key thin about C
and applied to all types:

void f(int x);
...
int x = 42;
f(x);

x is guaranteed to be unchanged here. The same applies it x is a
pointer or a pointer to a pointer or a pointer to a pointer to a
pointer or...

> printf("arrp[0] = %c\n", **p_arrp++);
> printf("arrp[1] = %c\n", **p_arrp);
>
> return 0;
> }
>
>
>
> void edit_first_element_arrp( char** ppc )
> {
> int idx;
>
> for( idx = 0; idx != ARRSIZE; ++idx )
> {
> if( ! (idx) )
> {
> **ppc++ = 'Z';

*ppc is NULL -- you it to be NULL before the call. You can write any
value into **ppc.

> }
> }
> }
>
>
> Hence we can change the values of p1 and p2 pointing to. but this function
> Segfaults :(

See above.

>> Two of them are two names for a single thing. Although "string literal"
>> is the formal term for a string literal, people will know what you mean
>> if you say "string constant". But consider this:
>>
>> char foo[3];
>> foo[0] = 'H';
>> foo[1] = 'i';
>> foo[2] = '\0';
>>
>> foo now contains a string, but no string literals are involved.
>
> So what is a string literal ?

It is a sequence of characters (and escaped chracters) between ""s.
I.e. it is there, literally, in your program's text.

--
Ben.

arnuld

unread,

Sep 15, 2008, 8:36:51 AM9/15/08

to

> On Mon, 15 Sep 2008 13:15:23 +0100, Ben Bacarisse wrote:

>> arnuld <sun...@invalid.address> writes:

> Richard is taking about the pointer to the whole array.

pointer to the whole array ? char* is a pointer to char, int** is a
pointer to pointer to int. How you get pointer to array, I mean what type
it is?

> A function
> that takes: void get_word(char **words); can change words[32] to point
> to some new string just found.

Right. And words++ will take us to the 2nd element of the array.

> It can change words[32][0] to be 'x',
> but it can't change words itself. Well, it can, but the effect will
> be lost when the function returns.

Now here is the problem where my understanding about pointers and arrays
blows away:

get_word( char* words[3] )

so we can change where words[0], [1] and [2] point because array will be
converted to pointer to first element and pointer *always* changes the
original element.

>> both calls are same, right ?

> Yes, but some_function can't make pc point to a bigger array if needed.
> pc will point to the same place after the call.

yes, it means I can understand arrays and pointers :)

>> arrp[0] = p1;
>> arrp[1] = p2;

> All these last three lines make no changes. Both elements of arrp are
> already NULL.

There is difference. First array had NULL elements. Now arrays has
pointers which point to NULL. There is a difference.

> No. p_arrp can't be change by the call. This is a key thin about C and
> applied to all types:
>
> void f(int x);
> ...
> int x = 42;
> f(x);
>
> x is guaranteed to be unchanged here. The same applies it x is a
> pointer or a pointer to a pointer or a pointer to a pointer to a pointer
> or...

That I know, x is a variable in the example and variables are passed as
value. Pointers and arrays are passed as references, hence we can change
the original elements.

>> if( ! (idx) )
>> {
>> **ppc++ = 'Z';

> *ppc is NULL -- you it to be NULL before the call. You can write any
> value into **ppc.

then why that values does not appear ?

> It is a sequence of characters (and escaped chracters) between ""s. I.e.
> it is there, literally, in your program's text.

I got it. What we pass to printf() is a string literal.

Andrew Poelstra

unread,

Sep 15, 2008, 9:11:58 AM9/15/08

to

On 2008-09-15, arnuld <sun...@invalid.address> wrote:
>> On Mon, 15 Sep 2008 04:50:29 +0000, Richard Heathfield wrote:
>
>> As Knuth would say: [50]
>
> I don't know what that means .
>

In the series /The Art of Computer Programming/ by Donald
Knuth, which is probably the greatest book on mathematical
computing ever written, problems are given at the end of
each section with a numerical code indicating their difficulty.

A code of [01], for example, you should be able to answer
in your head without pausing. A code of [50] means that,
if you solve the problem, you will have been the first
in the history of mathematics to do so.

The point is that I highly recommend you pick up a copy of
at least the first three volumes of this work, and when
you are able, read though them all.

--
Andrew Poelstra apoe...@wpsoftware.com
To email me, use the above email addresss with .com set to .net

Ben Bacarisse

unread,

Sep 15, 2008, 9:43:36 AM9/15/08

to

arnuld <sun...@invalid.address> writes:

>> On Mon, 15 Sep 2008 13:15:23 +0100, Ben Bacarisse wrote:
>
>>> arnuld <sun...@invalid.address> writes:
>
>> Richard is taking about the pointer to the whole array.
>
> pointer to the whole array ? char* is a pointer to char, int** is a
> pointer to pointer to int. How you get pointer to array, I mean what type
> it is?

I was being a bit vague. Lets leave actual array pointers out of
this. I mean that Richard was talking about changing the char ** as
seen from the calling function. The thing you are intending to pass,
a char **, is in some sense a pointer to the whole array: from it all
of the array's data is accessible. The trouble is you can can't
change this char ** inside the function -- not in a way that has any
effect outside. All you can do is change the various things it points
to.

If a function needs to change an int, you pass an int *. If it needs
to change int *, you pass an int **. If it needs to change and int **
you must pass an int ***.

>> A function
>> that takes: void get_word(char **words); can change words[32] to point
>> to some new string just found.
>
> Right. And words++ will take us to the 2nd element of the array.

Right. With no visible effect outside. Just as:

void f(int x)
{
x++; /* changes x but has no effect on anything passed */
}

>> It can change words[32][0] to be 'x',
>> but it can't change words itself. Well, it can, but the effect will
>> be lost when the function returns.
>
> Now here is the problem where my understanding about pointers and arrays
> blows away:
>
> get_word( char* words[3] )

(First, the declaration is confusing because the 3 has no effect.
Pretend you wrote get_word(char **words);).

> so we can change where words[0], [1] and [2] point because array will be
> converted to pointer to first element and pointer *always* changes the
> original element.

Absolutely. Now, having set words[0], words[1] and words[2] what
happens when you need to set sets words[3]. You can't. You need to
realloc some more space (always assuming that this is how the function
is supposed to work). That means changing words:

char **new_space = realloc(words, new_size * sizeof *new_space);
if (new_space) {
/* set up new space with all the right pointer in it... */
words = new_space;
}

Now what? words has more space and you can set words[3], but the
calling function will never see it. The calling function will still
have the old vale of that is passed (we can't even say what it is
called since it is just a pointer value) and, worse, that pointer now
points to storage invalidated by the realloc call.

>>> both calls are same, right ?
>
>> Yes, but some_function can't make pc point to a bigger array if needed.
>> pc will point to the same place after the call.
>
> yes, it means I can understand arrays and pointers :)
>
>
>
>
>>> arrp[0] = p1;
>>> arrp[1] = p2;
>
>> All these last three lines make no changes. Both elements of arrp are
>> already NULL.
>
> There is difference. First array had NULL elements. Now arrays has
> pointers which point to NULL. There is a difference.

No. I don't know how to explain this because I can't see the source
of your confusion. Writing:

char* arrp[ARRSIZE] = { 0 };

p1 = p2 = NULL;

arrp[0] = p1;
arrp[1] = p2;

as you did, is just like writing:

int arr[ARRSIZE] = { 42, 42 };

i1 = i2 = 42;
arr[0] = i1;
arr[1] = i2;

All the elements were 42 to start with and the are 42 after the
assignments. All I did was change the type. Everything is an int
rather than a char *.

>> No. p_arrp can't be change by the call. This is a key thin about C and
>> applied to all types:
>>
>> void f(int x);
>> ...
>> int x = 42;
>> f(x);
>>
>> x is guaranteed to be unchanged here. The same applies it x is a
>> pointer or a pointer to a pointer or a pointer to a pointer to a pointer
>> or...
>
>
> That I know, x is a variable in the example and variables are passed as
> value. Pointers and arrays are passed as references, hence we can change
> the original elements.

Excellent! It words the same with a pointer -- provided you think
about the value of the pointer itself

>>> if( ! (idx) )
>>> {
>>> **ppc++ = 'Z';
>
>> *ppc is NULL -- you it to be NULL before the call. You can write any
>> value into **ppc.
>
>
> then why that values does not appear ?

Typo! I meant you *can't* write any value into **ppc! Sorry. There
are two typos, I now see. It should have read: "*ppc is NULL -- you
set it to be NULL before the call. You can't write any value into
**ppc."

--
Ben.

James Kuyper

unread,

Sep 15, 2008, 10:29:11 AM9/15/08

to

arnuld wrote:
...

I won't address most of your questions, because I'm short of time and
the answers are complicated; I'll let Richard or Ben take care of that.
I'll just address one thing where the answer is simple:

>> It is a sequence of characters (and escaped chracters) between ""s. I.e.
>> it is there, literally, in your program's text.
>
> I got it. What we pass to printf() is a string literal.

The format string passed to printf is often a string literal; the other
arguments can be string literals, but often aren't. However, it's quite
feasible to call printf() without using any string literals.

The following code is simplified for purpose of exposition by failing to
checking for the validity, or even the presence, of command line
arguments in any way. This is NOT recommended.

#include <stdio.h>
int main(int argc, char *argv[])
{
printf(argv[1], argv[2]);
return 0;
}

What is passed to printf in that case is two pointers to char. No string
literals are involved in any way.

Keith Thompson

unread,

Sep 15, 2008, 11:57:10 AM9/15/08

to

arnuld <sun...@invalid.address> writes:
>> On Mon, 15 Sep 2008 09:25:22 +0200, Richard wrote:
>> Please get lost. This was on topic and in a discussion with how best to
>> approach certain issues. Only a complete idiot like you would try to do
>> that without considering already researched (partial) solutions.
>
> Though I really appreciate that you supported me and I think you are
> disrespecting him by calling him an idiot. Its not what I think of him
> when he replied. Chuck said so because he respects clc like all of us.
> Though he could have added something like "it is ok for this time but no
> C++ next time. you know better", to his reply. If he did not add that it
> does not mean anyone should disrespect him. He is looking for the
> well-being of clc , like me and I understood his reasoning.
>
> Only trolls should be disrespected by not replying to their posts ;)

Richard no-last-name has made a hobby out of insulting Chuck Falconer
at every opportunity, even dragging his name into discussions in which
Chuck has not participated.

CBFalconer

unread,

Sep 15, 2008, 3:30:58 PM9/15/08

to

Keith Thompson wrote:
> arnuld <sun...@invalid.address> writes:
>
... snip ...

>
>> Only trolls should be disrespected by not replying to their posts ;)
>
> Richard no-last-name has made a hobby out of insulting Chuck Falconer
> at every opportunity, even dragging his name into discussions in which
> Chuck has not participated.

It is totally pointless, since I have Richard the un-named PLONKed,
and I never see his silly diatribes.

CBFalconer

unread,

Sep 15, 2008, 3:35:44 PM9/15/08

to

James Kuyper wrote:
>
... snip ...

>
> The only way he knows how to clearly describe what he wants his
> code to do is by providing a C++ example; this has been made
> abundantly clear by his failed attempts to clearly describe it in
> English. However, the code he wants to write should be in C.
>
> If he were to post this same question to comp.lang.c++, and there
> were a C++BFalconer on comp.lang.c++, C++BFalconer would certainly
> respond by saying that this C question is off-topic in
> comp.lang.c++. Should arnuld then simply remain silent about his
> question?

I disagree. If he wants to use a C++ algorithm as illustration he
should translate that algorithm to C. In fact, a good example
would be a lexer for a C compiler.

CBFalconer

unread,

Sep 15, 2008, 3:46:22 PM9/15/08

to

arnuld wrote:
>
... snip ...
>

> okay, that seems a good reply. I mean, we make it topical to C
> again as I lost in the confusion a little. so *my* definition of
> word will be the same one yo told earlier:
>
> A "word" is a non-empty contiguous sequence of characters other
> than space, tab, or newline, preceded or followed either by a
> space, tab, or newline or by the start or end of the input.

So any sequence of control chars, such as '\16', '\17' can go in a
word? Just illustrating the difficulties. For examples
identifiers in C start with 'a'..'z', 'A'..'Z', '_', and continue
with the same plus '0'..'9'. This assumes (not valid for C) that
'a'..'z' are contiguous, as are 'A'..'Z'. When the word has been
parsed it has to be checked against a (limited) list of reserved
words.

james...@verizon.net

unread,

Sep 15, 2008, 3:48:30 PM9/15/08

to

CBFalconer wrote:
> James Kuyper wrote:
> >
> ... snip ...
> >
> > The only way he knows how to clearly describe what he wants his
> > code to do is by providing a C++ example; this has been made
> > abundantly clear by his failed attempts to clearly describe it in
> > English. However, the code he wants to write should be in C.
> >
> > If he were to post this same question to comp.lang.c++, and there
> > were a C++BFalconer on comp.lang.c++, C++BFalconer would certainly
> > respond by saying that this C question is off-topic in
> > comp.lang.c++. Should arnuld then simply remain silent about his
> > question?
>
> I disagree. If he wants to use a C++ algorithm as illustration he
> should translate that algorithm to C. In fact, a good example
> would be a lexer for a C compiler.

His question was basically about how to translate the C++ algorithm to
C. So what you're saying is that he must answer his own question
before he can ask it here? I'm curious, where do you think he should
go to get help with the translation, since you've ruled out coming
here for help with it; and C++BFalconer would presumably rule out
going to clc++ for such a question? And when he finally does ask it,
according to you, his question is required to take the form "How do I
translate this algorithm {algorithm already translated into C}, into
C?". That's patently ridiculous.

Keith Thompson

unread,

Sep 15, 2008, 3:54:42 PM9/15/08

to

CBFalconer <cbfal...@yahoo.com> writes:
> arnuld wrote:
> ... snip ...
>>
>> okay, that seems a good reply. I mean, we make it topical to C
>> again as I lost in the confusion a little. so *my* definition of
>> word will be the same one yo told earlier:
>>
>> A "word" is a non-empty contiguous sequence of characters other
>> than space, tab, or newline, preceded or followed either by a
>> space, tab, or newline or by the start or end of the input.
>
> So any sequence of control chars, such as '\16', '\17' can go in a
> word? Just illustrating the difficulties. For examples
> identifiers in C start with 'a'..'z', 'A'..'Z', '_', and continue
> with the same plus '0'..'9'. This assumes (not valid for C) that
> 'a'..'z' are contiguous, as are 'A'..'Z'. When the word has been
> parsed it has to be checked against a (limited) list of reserved
> words.

Yes, given the definition above, this string:

"\16 \17"

contains two "words". Are you suggesting that that's a problem?

Obviously a program that's intended to recognize C identifiers would
have to use a different rule. But the OP didn't say anything about C
identifiers, so I'm not sure why you're bringing them up.

Incidentally, on my initial reading of your followup, I thought your
use of the word "contiguous" was meant to be related to the use in
arnuld's definition of "word" (the one I had suggested earlier). In
fact, they're quite different; in the definition of "word" it refers
to the characters being adjacent in the input, not to their numeric
representations. A more careful reading of what you wrote indicates
that you just meant that the notation 'a'..'z' doesn't make sense
unless the representations of those characters are numerically
contiguous. I thought I should point this out in case anyone else is
confused.

james...@verizon.net

unread,

Sep 15, 2008, 4:49:01 PM9/15/08

to

CBFalconer wrote:
> arnuld wrote:
> >
> ... snip ...
> >
> > okay, that seems a good reply. I mean, we make it topical to C
> > again as I lost in the confusion a little. so *my* definition of
> > word will be the same one yo told earlier:
> >
> > A "word" is a non-empty contiguous sequence of characters other
> > than space, tab, or newline, preceded or followed either by a
> > space, tab, or newline or by the start or end of the input.
>
> So any sequence of control chars, such as '\16', '\17' can go in a
> word? Just illustrating the difficulties. For examples
> identifiers in C start with 'a'..'z', 'A'..'Z', '_', and continue

> with the same plus '0'..'9'. ...

If you're going to bother pointing out that

> ... This assumes (not valid for C) that

> 'a'..'z' are contiguous, as are 'A'..'Z'.

then you shouldn't be assuming that '\16' and '\17' are control
characters; if we're not assuming ASCII, then they just might
represent ' ' and 'a', respectively.

Identifying what arnuld calls "words" is much simpler than identifying
C identifiers; in fact, I can't quite figure out why you bothered
bringing up the definition of C identifiers. All that arnuld's code
needs to do is check for the delimiting " \t\n" characters. In fact,
since he has said that he wants to mimic the behavior of the C++ code
which he provided as an example, he probably left out out the form-
feed, carriage return, and vertical tab characters only by accident.
If he adds "\f\r\v" to the delimiter list, then the simplest way to
handle the delimiter check is to just call isspace().

Richard Heathfield

unread,

Sep 15, 2008, 5:28:49 PM9/15/08

to

CBFalconer said:

> James Kuyper wrote:
>>
> ... snip ...
>>
>> The only way he knows how to clearly describe what he wants his
>> code to do is by providing a C++ example; this has been made
>> abundantly clear by his failed attempts to clearly describe it in
>> English. However, the code he wants to write should be in C.
>>
>> If he were to post this same question to comp.lang.c++, and there
>> were a C++BFalconer on comp.lang.c++, C++BFalconer would certainly
>> respond by saying that this C question is off-topic in
>> comp.lang.c++. Should arnuld then simply remain silent about his
>> question?
>
> I disagree. If he wants to use a C++ algorithm as illustration he
> should translate that algorithm to C.

He agrees. How about helping him do it, by answering his C question?

Antoninus Twink

unread,

Sep 15, 2008, 6:12:55 PM9/15/08

to

On 15 Sep 2008 at 19:30, CBFalconer wrote:

> Keith Thompson wrote:
>> Richard no-last-name has made a hobby out of insulting Chuck Falconer
>> at every opportunity, even dragging his name into discussions in
>> which Chuck has not participated.
>
> It is totally pointless, since I have Richard the un-named PLONKed,
> and I never see his silly diatribes.

Fortunate, then, that your posts have become so embarrassingly
error-ridden in recent months that another Richard, with a surname we
all know only too well, has started posting similar diatribes that you
surely *do* read.

Kenny McCormack

unread,

Sep 15, 2008, 7:40:12 PM9/15/08

to

In article <slrngctnf7...@nospam.invalid>,

Of course, now, KT himself has gotten on the bashing CBF bandwagon.

Good on him!

Old Wolf

unread,

Sep 15, 2008, 10:37:28 PM9/15/08

to

On Sep 15, 4:50 pm, Richard Heathfield <r...@see.sig.invalid> wrote:

> arnuld said:
> > Look at my code and see how std::string works and perhaps we can settle
> > on some common and standard meaning word.
>

> Let me give you an example from ordinary English, where whitespace
> delimiters are not sufficient:
>
> "What did he say?", said Albert.
> "He just said, 'I'll be there', I think", replied the captain.
>
> Now, consider the whitespace-separated tokens:

A bit sidetracked from the original thread, but is
there actually any problem here besides identifying
whether a ' symbol is a quote mark or an apostrophe?

CBFalconer

unread,

Sep 15, 2008, 11:43:32 PM9/15/08

to

Old Wolf wrote:
> Richard Heathfield <r...@see.sig.invalid> wrote:
>
... snip ...

>
>> Let me give you an example from ordinary English, where
>> whitespace delimiters are not sufficient:
>>
>> "What did he say?", said Albert.
>> "He just said, 'I'll be there', I think", replied the captain.
>>
> > Now, consider the whitespace-separated tokens:
>
> A bit sidetracked from the original thread, but is
> there actually any problem here besides identifying
> whether a ' symbol is a quote mark or an apostrophe?

And I gather you consider that a trivial problem? Please describe
your algorithm.

CBFalconer

unread,

Sep 15, 2008, 11:46:56 PM9/15/08

to

You certainly make a good point.

CBFalconer

unread,

Sep 16, 2008, 12:01:01 AM9/16/08

to

Keith Thompson wrote:
> CBFalconer <cbfal...@yahoo.com> writes:
>> arnuld wrote:
>>
>> ... snip ...
>>
>>> okay, that seems a good reply. I mean, we make it topical to C
>>> again as I lost in the confusion a little. so *my* definition of
>>> word will be the same one yo told earlier:
>>>
>>> A "word" is a non-empty contiguous sequence of characters other
>>> than space, tab, or newline, preceded or followed either by a
>>> space, tab, or newline or by the start or end of the input.
>>
>> So any sequence of control chars, such as '\16', '\17' can go in a
>> word? Just illustrating the difficulties. For examples
>> identifiers in C start with 'a'..'z', 'A'..'Z', '_', and continue
>> with the same plus '0'..'9'. This assumes (not valid for C) that
>> 'a'..'z' are contiguous, as are 'A'..'Z'. When the word has been
>> parsed it has to be checked against a (limited) list of reserved
>> words.
>
> Yes, given the definition above, this string:
>
> "\16 \17"
>
> contains two "words". Are you suggesting that that's a problem?

I didn't specify a string. I meant those characters contiguous
(i.e. one strictly following the other) in the input stream. The
detection I specified above can be done with one char look ahead.
The presence (and necessity) of such a look ahead scheme may not be
obvious to the casual reader. In C it revolves around the ungetc()
function.

>
> Obviously a program that's intended to recognize C identifiers would
> have to use a different rule. But the OP didn't say anything about C
> identifiers, so I'm not sure why you're bringing them up.
>
> Incidentally, on my initial reading of your followup, I thought your
> use of the word "contiguous" was meant to be related to the use in
> arnuld's definition of "word" (the one I had suggested earlier). In
> fact, they're quite different; in the definition of "word" it refers
> to the characters being adjacent in the input, not to their numeric
> representations. A more careful reading of what you wrote indicates
> that you just meant that the notation 'a'..'z' doesn't make sense
> unless the representations of those characters are numerically
> contiguous. I thought I should point this out in case anyone else is
> confused.

Right. I should have specified 'the values of the chars are
contiguous'. The point being that ASCII works fine, but EBCDIC
doesn't. The C lexer will be a good example, because what it has
to detect is well defined.

arnuld

unread,

Sep 16, 2008, 12:37:24 AM9/16/08

to

> On Mon, 15 Sep 2008 14:43:36 +0100, Ben Bacarisse wrote:

> I was being a bit vague. Lets leave actual array pointers out of
> this. I mean that Richard was talking about changing the char ** as
> seen from the calling function. The thing you are intending to pass,
> a char **, is in some sense a pointer to the whole array: from it all
> of the array's data is accessible. The trouble is you can can't
> change this char ** inside the function -- not in a way that has any
> effect outside. All you can do is change the various things it points
> to.

> ...SNIP....

> If a function needs to change an int, you pass an int *. If it needs
> to change int *, you pass an int **. If it needs to change and int **
> you must pass an int ***.

> ... SNIP....

> Typo! I meant you *can't* write any value into **ppc! Sorry. There
> are two typos, I now see. It should have read: "*ppc is NULL -- you
> set it to be NULL before the call. You can't write any value into
> **ppc."

see my new post titled "pointers passed by copying ?"

Richard Heathfield

unread,

Sep 16, 2008, 2:28:30 AM9/16/08

to

Old Wolf said:

I think it's about here that I like to pretend I'm from Missouri.

Show me.

Old Wolf

unread,

Sep 16, 2008, 5:24:05 PM9/16/08

to

On Sep 16, 3:43 pm, CBFalconer <cbfalco...@yahoo.com> wrote:
> Old Wolf wrote:
> > Richard Heathfield <r...@see.sig.invalid> wrote:
>
> >> "He just said, 'I'll be there', I think", replied the captain.
>

> > A bit sidetracked from the original thread, but is
> > there actually any problem here besides identifying
> > whether a ' symbol is a quote mark or an apostrophe?
>
> And I gather you consider that a trivial problem? Please describe
> your algorithm.

Not at all, I was just checking that there
wasn't some other problem besides this one,
that I hadn't seen.

Richard Heathfield

unread,

Sep 16, 2008, 6:02:47 PM9/16/08

to

Old Wolf said:

Hyphens are another issue: "will-o'-the-wisp" illustrates where both the
hyphen and the apostrophe are part of the word, but there are situ-
ations where the hyphen (and newline) are not part of the word, just as
there are situations where 'apostrophes' are not part of the word.

Then there's the whole issue of "what is an alphabetic character"? If we
simply say A-Za-z, we exclude a vast range of words from languages such as
French, German, Spanish, Polish, and Russian. I'm not saying we shouldn't
do that, but we should be aware that the decision is costly in terms of
internationalisation.

Is 'C++' a word? How about 'G#m'? You might or might not consider that to
be a word, but a musician might. And yet they may have a very different
opinion about 'H#m'.

What about numbers? Is 42 a word? How about 3Com?

Is the copyright symbol a word? What about the trademark and registered
trademark symbols? Can they be part of a word? Consider, for example,
Microsoft<sup>(R)</sup>.

How about full stops (or 'periods' as some people call them)? Consider:
"U.S.A.", "B.B.C.", "etc.", etc.

What about &? Is that a word?

To any one of these questions, you may say, "yes, that's allowable as part
of a word", or you may say, "no, it's not allowable". But your decision
may well differ from someone else's decision.

And having decided, how do you design your algorithm so that it accepts
"fo'c'sle" as one word rather than three? A dictionary? If you're going to
do /that/, the algorithm is indeed trivial (modulo bugs):

1. start with s = "" and an empty word list
2. c = getch
3. if EOF continue from 8.
4. s += c
5. if s in dictionary
continue from 2.
6. else
s -= c.
if s != ""
add s to word list
s = c
7. continue from 2.
8. if s != ""
add s to word list
9. stop

but now you have to list in your dictionary every single character
combination that you consider to be a word. Big dictionary. (For a start,
every word will need at least three entries: "word", "Word", "WORD".)

The dictionary approach is clumsy in the extreme, and the algorithmic
approach gets more and more difficult as you get pickier and pickier about
what does and what does not constitute a word.

Old Wolf

unread,

Sep 17, 2008, 12:04:21 AM9/17/08

to

On Sep 17, 10:02 am, Richard Heathfield <r...@see.sig.invalid> wrote:
> but now you have to list in your dictionary every single character
> combination that you consider to be a word. Big dictionary. (For a start,
> every word will need at least three entries: "word", "Word", "WORD".)
>
> The dictionary approach is clumsy in the extreme, and the algorithmic
> approach gets more and more difficult as you get pickier and pickier about
> what does and what does not constitute a word.

Surely there is no approach other than using
a sophisticated dictionary. For example:

'Tis the season to be playin'

there is no rule to deduce whether we have
quote marks or apostrophes, besides knowing
that 'Tis is a word.

The dictionary can includes rules such as
the fact that if "abcd" is a word, then
so is "Abcd"; it can know that acronyms
can be written with periods, and so on.

Now where it gets harder is if you have to
accept text from people who make spelling
mistakes and typoes :)

Richard Heathfield

unread,

Sep 17, 2008, 12:37:21 AM9/17/08

to

Old Wolf said:

> On Sep 17, 10:02 am, Richard Heathfield <r...@see.sig.invalid> wrote:
>> but now you have to list in your dictionary every single character
>> combination that you consider to be a word. Big dictionary. (For a
>> start, every word will need at least three entries: "word", "Word",
>> "WORD".)
>>
>> The dictionary approach is clumsy in the extreme, and the algorithmic
>> approach gets more and more difficult as you get pickier and pickier
>> about what does and what does not constitute a word.
>
> Surely there is no approach other than using
> a sophisticated dictionary.

Yes, there is. There is the "good enough for Professor Jenkins[1]"
approach, in which we define "word" as non-empty contiguous sequence of
non-whitespace characters delimited on the left by SOF or whitespace and
on the right by EOF or whitespace.

This is not only good enough for Professor Jenkins[1] but frequently good
enough in the Real World, too.

Not that the Real World has any bearing, but I just thought I'd mention it.

[1] cf Gary Larson (the one with the duck)

arnuld

unread,

Sep 17, 2008, 1:07:59 AM9/17/08

to

> On Mon, 15 Sep 2008 09:28:36 +0000, Richard Heathfield wrote:

> ...SNIP...

> But what if something goes wrong? You'll need to be able to report an
> error. The natural way to do this is via a return value, which means we
> can't use that value for either the list or the count, and that leads us
> to:

what we will do with that return value ? If something wrong occurs I can
simply exit the program telling the user that he did some thing stupid and
he is responsible for that.

> int get_words(char ***, size_t *);
>
> Since they don't need to modify the caller's status, sort_words and
> print_words can be of type int(char **, size_t).

I think there is qsort in std. lib. , hence we can use that but I don't
know whether it modifies the original array or not.

> Up to you, but I wouldn't bother setting a limit (or, if I did, I'd set it
> at a million or so, and treat any string longer than that as a reportable
> error). With dynamic allocation, you don't /need/ to set a limit; you
> simply allocate as you go, and reallocate if necessary.

okay, I will write the program in parts. First we will write a simple
program that will ask the user to input and we will store that word
dynamically using calloc in some array. It will be called get_single_word
and it will form the basis of get_words function which will store all
words in an array. get_single_word returns an int because I want to use
get_single_word in get_words like this:

while( get_single_word )
{
/* code for get_words */
}

Here is my code for get_single_word. PROBLEM: it does not print anything
I entered:

/* a program to get a single word from stdin */

#include <stdio.h>
#include <stdlib.h>

enum { AVERAGE_SIZE = 28 };

int get_single_word( char* );

int main( void )
{
char* pw; /* pw means pointer to word */

get_single_word( pw );

printf("word you entered is: %s\n", pw);

return 0;
}

int get_single_word( char* pc )
{
int idx;
int ch;
char *pc_begin;

pc = calloc(AVERAGE_SIZE-1, sizeof(char));
pc_begin = pc;

if( (! pc) )
{
perror("can not allocate memory, sorry babe!");
return 1;
}

for( idx = 0; ( (ch = getchar()) != EOF ); ++idx, ++pc )
{
if( AVERAGE_SIZE == idx )
{
/* use realloc here which I have no idea how to write */
}

*pc = ch;
}

*++pc = '\0';
free(pc_begin);

return 0;
}

=================== OUTPUT ==================
[arnuld@dune ztest]$ gcc -ansi -pedantic -Wall -Wextra test.c
[arnuld@dune ztest]$ ./a.out
like
word you entered is:
[arnuld@dune ztest]$