Non-constant constant strings

Rick C. Hodgin

unread,

Jan 19, 2014, 6:12:49 PM1/19/14

to

I have a need for something like this, except that I need to edit list[N]'s data, as in memcpy(list[0], "eno", 3):
char* list[] = { "one", "two", "three", "four" };

I have a work-around like this:
char one[] = "one";
char two[] = "two";
char three[] = "three";
char four[] = "four";
char* list[] = { one, two, three, four };

However, this is clunky because I want to be able to change the items because in the actual application it is source code that I'm coding within the compiler for an automatic processor. For example:

char* readonlySourceCode[] =
{
"if (something[9999])\r\n",
"{\r\n",
" // Do something\r\n",
"} else {\r\n",
" // Do something else\r\n",
"}"
};

My algorithm iterates through the list and looks for "[" with 4 characters between, and then a closing "]" ... when found, it injects at runtime the current reference which begins at 1 and increments up to the maximum value which, at present, is 812. It might change over time.

I want to use the list this way because I will alter the source code from time to time.

Obviously, this is creating a series of constant strings which are setup in read-only memory. My runtime solutions are two-fold: First, I can replace the pointers with a copy of each one malloc()'d and then memcpy()'d, which is my current solution. Or, I can do this:

char line1[] = "if (something[9999])\r\n";
char line2[] = "{\r\n";
char line3[] = " // Do something\r\n";
char line4[] = "} else {\r\n";
char line5[] = " // Do something else\r\n";
char line6[] = "}";

char* editableSourceCode[] = { line1, line2, line3, line4, line5, line6 };

My issue is that the source changes periodically, and the actual use case is about 100 lines, which also increases from time to time, and causes the numbering system in source code to be off.

Is there a way to create the lines in the readonlySourceCode definition so it's not read-only. I'm using Visual C++, and am looking for something like this:

#pragma data_seg(push, ".data", all)

Or this:
char* readwriteSourceCode[] =
{
_rw("if (something[9999])\r\n"),
_rw("{\r\n"),
_rw(" // Do something\r\n"),
_rw("} else {\r\n"),
_rw(" // Do something else\r\n"),
_rw("}")
};

Thank you in advance.

Best regards,
Rick C. Hodgin

Eric Sosman

unread,

Jan 19, 2014, 6:30:17 PM1/19/14

to

On 1/19/2014 6:12 PM, Rick C. Hodgin wrote:
> I have a need for something like this, except that I need to edit list[N]'s data, as in memcpy(list[0], "eno", 3):
> char* list[] = { "one", "two", "three", "four" };
>
> I have a work-around like this:
> char one[] = "one";
> char two[] = "two";
> char three[] = "three";
> char four[] = "four";
> char* list[] = { one, two, three, four };

That works -- but not if you try strcpy(one, "five"). That is,
you can only replace these strings with replacements that are no
longer than the originals.

> However, this is clunky because I want to be able to change the items because in the actual application it is source code that I'm coding within the compiler for an automatic processor. For example:
>
> char* readonlySourceCode[] =
> {
> "if (something[9999])\r\n",
> "{\r\n",
> " // Do something\r\n",
> "} else {\r\n",
> " // Do something else\r\n",
> "}"
> };
>
> My algorithm iterates through the list and looks for "[" with 4 characters between, and then a closing "]" ... when found, it injects at runtime the current reference which begins at 1 and increments up to the maximum value which, at present, is 812. It might change over time.

I'm not sure what you mean by "inject." You write as if you
mean to plop something else down in place of the 9999, but I don't
see what use that would be: The altered strings would still be inside
your program, not in a place where a compiler could get at them.

If you intend to write the altered strings to a file and then
compile the file, consider how printf() does things: It has no need
to replace the "%d" in a format string with the "1234" that goes to
the output; rather, it leaves the "%d" untouched and sends the "1234"
to the output.

... but, as I say, I'm not clear on what "inject" means here.

> [...]

> Is there a way to create the lines in the readonlySourceCode definition so it's not read-only.

Not in C, although particular compilers may allow it as an
extension. Among other things, a C compiler is permitted to merge
common suffixes, so (for example) the string literals "ant" and
"cant" and "secant" might occupy a total of seven bytes.

I'm using Visual C++, [...]

If you're writing C++, try comp.lang.c++ instead of here.

--
Eric Sosman
eso...@comcast-dot-net.invalid

Kaz Kylheku

unread,

Jan 19, 2014, 8:52:29 PM1/19/14

to

On 2014-01-19, Rick C. Hodgin <rick.c...@gmail.com> wrote:
> [ ... ]

This is not how you write compilers or compiler-like transliterators in 2014.

You do it in a high level language where you don't have to mess around with low
level string manipulation and memory management (or avoidance thereof).

You will be done in way less time, and with fewer bugs.

Possibly, the performance of the thing will be better (if it even matters).

Asaf Las

unread,

Jan 19, 2014, 10:54:17 PM1/19/14

to

On Monday, January 20, 2014 1:12:49 AM UTC+2, Rick C. Hodgin wrote:
> However, this is clunky because I want to be able to change
> the items because in the actual application it is source code
> that I'm coding within the compiler for an automatic processor.

> Rick C. Hodgin

you can write your own byte code machine and allocate statically big
enough array for opcodes to be loaded from text files or anything else

if speed is not issue www.swig.org or similar can glue your program with interpreter languages

or define generic api and write your logic in c files so your application
will invoke c compiler to create dynamically loaded lib and load them on fly

Rick C. Hodgin

unread,

Jan 20, 2014, 6:06:25 AM1/20/14

to

There are lots of solutions and workarounds. I'm looking for a compiler directive that will override the default behavior of allocating constant strings to read-only memory, and instead allocate them to read-write memory.

char foo[] = "Rick"; // Goes to read-write memory
char* list[] = { "Rick" } // Goes to read-only memory

I want a way for list[0] to go to the same place as foo. I am using Visual C++ compiler, but I am writing in C. I use the C++ compiler because it has some relaxed syntax constraints.

glen herrmannsfeldt

unread,

Jan 20, 2014, 7:37:53 AM1/20/14

to

Rick C. Hodgin <rick.c...@gmail.com> wrote:

(snip)

> char foo[] = "Rick"; // Goes to read-write memory
> char* list[] = { "Rick" } // Goes to read-only memory

> I want a way for list[0] to go to the same place as foo.
> I am using Visual C++ compiler, but I am writing in C.
> I use the C++ compiler because it has some relaxed syntax
> constraints.

As I remember it, not having looked recently, the pre-ANSI (K&R)
compilers allowed writable strings. While not the best practice,
it was an allowed and sometimes useful technique.

Some compilers have an option still to do that. Note that this
option also has to be sure to separately allocate strings that are
otherwise equal.

For K&R it would be:

static char *list[]={"Rick", "Rick", "Rick"};

(K&R didn't allow initializing auto arrays.)

Note that in this example all three are the same and, if read only,
there is no need to store separate copies.

Is it really so bad to initialize separate variables, and then
initialize an array with those pointer values?

-- glen

Rick C. Hodgin

unread,

Jan 20, 2014, 7:53:33 AM1/20/14

to

On Monday, January 20, 2014 7:37:53 AM UTC-5, glen herrmannsfeldt wrote:

> Rick C. Hodgin <rick...n@gmail.com> wrote:
> Is it really so bad to initialize separate variables, and then
> initialize an array with those pointer values?

As I change source code from time to time I would like to be able to edit the strings defined within this block, as it's closer to a real source code layout with minimal overhead to maintain:

char* sourceCode[] =
{
"if (foo[9999]) {\r\n",

" // Do something\r\n",
"} else {\r\n",
" // Do something else\r\n",

"}\r\n"
};

Changes to this:
char* sourceCode[] =
{
"if (foo[9999] == 0) {\r\n",

" // Do something\r\n",

"} else if (foo[9999] == 1) {\r\n",

" // Do something else\r\n",
"} else {\r\n",

" // Do some other things\r\n",
"}"
};

Rather than this:
char line10[] = "if (foo[9999]) {\r\n";
char line20[] = " // Do something\r\n";
char line30[] = "} else {\r\n";
char line40[] = " // Do something else\r\n";
char line50[] = "}\r\n";
char* sourceCode[] = { line10, line20, line30, line40, line50 };

Changed to this:
char line10[] = "if (foo[9999] == 0) {\r\n";
char line20[] = " // Do something\r\n";
char line30[] = "} else if (foo[9999] == 1) {\r\n";
char line40[] = " // Do something else\r\n";
char line43[] = "} else }\r\n";
char line47[] = " // Do some other things\r\n";
char line50[] = "}\r\n";
char* sourceCode[] = { line10, line20, line30, line40, line43, line47, line50 };

Because now I'm back in the days of BASICA and needing the RENUM 100,10 ability to redistribute as my initially defined numbering system gets bigger. Or I begin having unusual naming conventions with extra parts tagged on (line41a or line41_1, and so on) because (1) I must manually give everything names so they are explicitly referenced thereafter, and because (2) the code assigned to each item will change from time to time it (3) introduces the possibility of additional errors due to the mechanics of setting everything up (something the compiler should handle).

My current solution to do something along these lines during initialization:
char* sourceCode[] =
{
"if (foo[9999] == 0) {\r\n",

" // Do something\r\n",

"} else if (foo[9999] == 1) {\r\n",

" // Do something else\r\n",
"} else {\r\n",

" // Do some other things\r\n",
"}",
null
};

int i, len;
char* ptr;
for (i = 0; list[i] != null; i++)
{
len = strlen(list[i]) + 1;
ptr = (char*)malloc(len);
memcpy(ptr, list[0], len);
list[0] = ptr;
}

This works ... but with the compiler switch it wouldn't even be necessary. The compiler would remove the possibility of these errors.

Richard Damon

unread,

Jan 20, 2014, 8:52:44 AM1/20/14

to

one option might be to change your
char* sourceCode[] = {
...
};

to

char sourceCode[][MAXLEN] = {
...
};

This will "waste" some memory, as all the lines will take the space of a
"full" line, and runs the danger of losing the terminating null on a
line that just exactly overruns the length, but does give you the easy
to edit format.

Rick C. Hodgin

unread,

Jan 20, 2014, 9:03:36 AM1/20/14

to

That would fix it. I appreciate the suggestion. I'm still holding out hope for a #pragma directive, or constant string wrapper macro, that does what I'm after. :-)

Aleksandar Kuktin

unread,

Jan 20, 2014, 9:10:24 AM1/20/14

to

On Mon, 20 Jan 2014 04:53:33 -0800, Rick C. Hodgin wrote:

> My current solution to do something along these lines during
> initialization:
> char* sourceCode[] =
> {
> "if (foo[9999] == 0) {\r\n",
> " // Do something\r\n",
> "} else if (foo[9999] == 1) {\r\n",
> " // Do something else\r\n",
> "} else {\r\n",
> " // Do some other things\r\n", "}",
> null
> };
>
> int i, len;
> char* ptr;
> for (i = 0; list[i] != null; i++)
> {
> len = strlen(list[i]) + 1;
> ptr = (char*)malloc(len); memcpy(ptr, list[0], len);
> list[0] = ptr;
> }

And what, exactly, is wrong with the basic principle of this approach?

I personally would have done something like this:
char *read_only[] = { "Rick", "Jane", "Marc", 0 };
char **read_write;

char **init_readwrite(char **readonly) {
unsigned int i, count;
char **readwrite;

for (count=0, i=0; readonly[i]; i++) {
count++;
}
readwrite = malloc(count * sizeof(*readwrite));
/* no check */
return memcpy(read_write, read_only, count * sizeof(*readwrite));
}

read_write = init_readwrite(read_only);

...And then you operate on read_write and ignore read_only.

Eric Sosman

unread,

Jan 20, 2014, 9:17:32 AM1/20/14

to

... and you've been told (by more than one respondent) where
to find such a thing, if it exists: In the documentation of the
compiler you happen to be using, somewhere in the "Extensions to
C" or "Beyond C" or "Things That Aren't Quite C" section.

The C language does not offer what you ask for.

--
Eric Sosman
eso...@comcast-dot-net.invalid

Rick C. Hodgin

unread,

Jan 20, 2014, 10:11:31 AM1/20/14

to

On Monday, January 20, 2014 9:10:24 AM UTC-5, Aleksandar Kuktin wrote:
> On Mon, 20 Jan 2014 04:53:33 -0800, Rick C. Hodgin wrote:
> > My current solution to do something along these lines during
> > initialization:
> > char* sourceCode[] =
> > {
> > "if (foo[9999] == 0) {\r\n",
> > " // Do something\r\n",
> > "} else if (foo[9999] == 1) {\r\n",
> > " // Do something else\r\n",
> > "} else {\r\n",
> > " // Do some other things\r\n", "}",
> > null
> > };
> >
> > int i, len;
> > char* ptr;
> > for (i = 0; list[i] != null; i++)
> > {
> > len = strlen(list[i]) + 1;
> > ptr = (char*)malloc(len); memcpy(ptr, list[0], len);
> > list[0] = ptr;
> > }
>
> And what, exactly, is wrong with the basic principle of this approach?

What "exactly" is wrong with this approach is that I must do something manually in code, something that is (a) unnecessary, (b) rather cumbersome mechanically, and (c) something the compiler would be capable of doing for me were it not for design protocol limitations being artificially imposed upon an otherwise valid data request for a block of read-write memory.

> I personally would have done something like this:
> char *read_only[] = { "Rick", "Jane", "Marc", 0 };
> char **read_write;
>
> char **init_readwrite(char **readonly) {
> unsigned int i, count;
> char **readwrite;
>
> for (count=0, i=0; readonly[i]; i++) {
> count++;
> }
> readwrite = malloc(count * sizeof(*readwrite));
> /* no check */
> return memcpy(read_write, read_only, count * sizeof(*readwrite));
> }
>
> read_write = init_readwrite(read_only);
> ...And then you operate on read_write and ignore read_only.

Now you're dealing with a copy that must be free()'d each time after use. In my example I'm replacing the "[9999]" portion with something akin to printf("[%04u]", my_int_iterator_value++).

I don't have need of making copies of my data. It introduces unnecessary code, complexity, opportunity for errors. What I do have need of is accessing the data I've encoded, as it's encoded at comple-time, to be altered at run-time.

Rick C. Hodgin

unread,

Jan 20, 2014, 10:20:20 AM1/20/14

to

On Monday, January 20, 2014 9:17:32 AM UTC-5, Eric Sosman wrote:
> The C language does not offer what you ask for.

Yup.

Rick C. Hodgin

unread,

Jan 20, 2014, 10:40:17 AM1/20/14

to

On Monday, January 20, 2014 9:10:24 AM UTC-5, Aleksandar Kuktin wrote:

> I personally would have done something like this:
> char *read_only[] = { "Rick", "Jane", "Marc", 0 };
> char **read_write;
>
> char **init_readwrite(char **readonly) {
> unsigned int i, count;
> char **readwrite;
>
> for (count=0, i=0; readonly[i]; i++) {
> count++;
> }
> readwrite = malloc(count * sizeof(*readwrite));
> /* no check */
> return memcpy(read_write, read_only, count * sizeof(*readwrite));
> }
> read_write = init_readwrite(read_only);
>
> ...And then you operate on read_write and ignore read_only.

I have not gone through this deeply or tried it in code, but I'm thinking the theory of this solution would not work in all cases (and that this particular implementation also will not work).

Since each read_only[] pointer is to a constant string, and the compiler creates the entry in read-only memory, it could optimize and allow lines like "red" and "fred" to be mapped to the same four byte area, one pointing to "f" and one pointing to "r" after "f". So making a bulk copy would not copy all things properly in all cases.

I believe to be sure, you must copy each pointer out one-by-one to verify you'll always get an appropriate copy into read-write memory.

Keith Thompson

unread,

Jan 20, 2014, 11:14:45 AM1/20/14

to

glen herrmannsfeldt <g...@ugcs.caltech.edu> writes:
> Rick C. Hodgin <rick.c...@gmail.com> wrote:
>
> (snip)
>
>> char foo[] = "Rick"; // Goes to read-write memory
>> char* list[] = { "Rick" } // Goes to read-only memory
>
>> I want a way for list[0] to go to the same place as foo.
>> I am using Visual C++ compiler, but I am writing in C.
>> I use the C++ compiler because it has some relaxed syntax
>> constraints.
>
> As I remember it, not having looked recently, the pre-ANSI (K&R)
> compilers allowed writable strings. While not the best practice,
> it was an allowed and sometimes useful technique.

All versions of the C language, from K&R to ISO C11, have permitted
compilers to make string literals writable. What's changed over
time is that most compilers don't take advantage of that permission.

> Some compilers have an option still to do that. Note that this
> option also has to be sure to separately allocate strings that are
> otherwise equal.

They don't *have* to do that unless they make additional guarantees
beyond what the language specifies.

If I write:

char *a = "foo";
char *b = "foo";
a[0] = 'F';
puts(b);

and the puts call is actually executed, the language permits it to
print either "foo", or "Foo"(or "fnord", or a suffusion of yellow).
The behavior of the assignment to a[0] is undefined, and once you
do that all bets are off.

But if a compiler were to guarantee, as a language extension,
that string literals are meaningfully modifiable, then it would
probably have to guarantee that the strings pointed to by a and
b must be distinct (unless the compiler can prove that they're
never modified). The compiler's documentation would have to spell
out just what additional guarantees it offers. (Such an extension
would not make the compiler non-conforming, since any code that
takes advantage of it have undefined behavior.)

[...]

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Keith Thompson

unread,

Jan 20, 2014, 11:32:10 AM1/20/14

to

"Rick C. Hodgin" <rick.c...@gmail.com> writes:
[...]

> As I change source code from time to time I would like to be able to
> edit the strings defined within this block, as it's closer to a real
> source code layout with minimal overhead to maintain:
>
> char* sourceCode[] =
> {
> "if (foo[9999]) {\r\n",
> " // Do something\r\n",
> "} else {\r\n",
> " // Do something else\r\n",
> "}\r\n"
> };

[...]

This isn't relevant to your question, but why do you have explicit
"\r\n" line endings? If your program reads and/or writes the
source code in text mode (or if you're on a UNIX-like system),
line endings are marked by a single '\n' character, regardless of
the format used by the operating system.

Keith Thompson

unread,

Jan 20, 2014, 11:34:52 AM1/20/14

to

"Rick C. Hodgin" <rick.c...@gmail.com> writes:

> I have a need for something like this, except that I need to edit list[N]'s data, as in memcpy(list[0], "eno", 3):
> char* list[] = { "one", "two", "three", "four" };

[...]

It would be helpful if you'd format your articles to have lines
no longer than about 72 columns. Usenet is not the web, and
newsreaders don't necessarily deal with with arbitrary long lines.
(My newsreader does split long lines, but not at word boundaries.)

Rick C. Hodgin

unread,

Jan 20, 2014, 12:37:07 PM1/20/14

to

On Monday, January 20, 2014 11:14:45 AM UTC-5, Keith Thompson wrote:
> glen herrmannsfeldt <g...@ugcs.caltech.edu> writes:

> > Rick C. Hodgin <rick...n@gmail.com> wrote:
> > (snip)
> >> char foo[] = "Rick"; // Goes to read-write memory
> >> char* list[] = { "Rick" } // Goes to read-only memory
> >> I want a way for list[0] to go to the same place as foo.
> >> I am using Visual C++ compiler, but I am writing in C.
> >> I use the C++ compiler because it has some relaxed syntax
> >> constraints.
> > As I remember it, not having looked recently, the pre-ANSI (K&R)
> > compilers allowed writable strings. While not the best practice,
> > it was an allowed and sometimes useful technique.
>
> All versions of the C language, from K&R to ISO C11, have permitted
> compilers to make string literals writable. What's changed over
> time is that most compilers don't take advantage of that permission.

Ah! That's a shame. :-)

> > Some compilers have an option still to do that. Note that this
> > option also has to be sure to separately allocate strings that are
> > otherwise equal.
> They don't *have* to do that unless they make additional guarantees
> beyond what the language specifies.
>
> If I write:
> char *a = "foo";
> char *b = "foo";
> a[0] = 'F';
> puts(b);
> and the puts call is actually executed, the language permits it to
> print either "foo", or "Foo"(or "fnord", or a suffusion of yellow).
> The behavior of the assignment to a[0] is undefined, and once you
> do that all bets are off.

I believe the language should operate such that as I've defined a to point to "foo", and b to point to "foo", and these are separate strings, then they should be separate strings in memory, the same as if I'd said char* a="123"; char* b="456".

> But if a compiler were to guarantee, as a language extension,
> that string literals are meaningfully modifiable, then it would
> probably have to guarantee that the strings pointed to by a and
> b must be distinct (unless the compiler can prove that they're
> never modified). The compiler's documentation would have to spell
> out just what additional guarantees it offers. (Such an extension
> would not make the compiler non-conforming, since any code that
> takes advantage of it have undefined behavior.)

I personally believe it's a silly requirement to do such a comparison to save a few bytes of space by default. I'd rather have it always duplicated and then allow the developer to provide a manually inserted command line switch which specifically turns on that kind of checking, and that kind of substituting.

Rick C. Hodgin

unread,

Jan 20, 2014, 12:39:19 PM1/20/14

to

On Monday, January 20, 2014 11:32:10 AM UTC-5, Keith Thompson wrote:

> "Rick C. Hodgin" <rick...n@gmail.com> writes:
> [...]
> > As I change source code from time to time I would like to be able to
> > edit the strings defined within this block, as it's closer to a real
> > source code layout with minimal overhead to maintain:
> >
> > char* sourceCode[] =
> > {
> > "if (foo[9999]) {\r\n",
> > " // Do something\r\n",
> > "} else {\r\n",
> > " // Do something else\r\n",
> > "}\r\n"
> > };
>

> This isn't relevant to your question, but why do you have explicit
> "\r\n" line endings? If your program reads and/or writes the
> source code in text mode (or if you're on a UNIX-like system),
> line endings are marked by a single '\n' character, regardless of
> the format used by the operating system.

To be consistent with the source file input I'm processing. Without using both \r and \n it gives warnings when opening the files that the line endings are not consistent.

This program I'm writing this for is an augment of code generated by another tool. The other tool generates for a version 1.0 of the tool, and my tool modifies it for a version 2.0 of the tool.

Rick C. Hodgin

unread,

Jan 20, 2014, 12:40:48 PM1/20/14

to

On Monday, January 20, 2014 11:34:52 AM UTC-5, Keith Thompson wrote:

> "Rick C. Hodgin" <rick...n@gmail.com> writes:
> > I have a need for something like this, except that I need to edit
> > list[N]'s data, as in memcpy(list[0], "eno", 3):
> > char* list[] = { "one", "two", "three", "four" };
> [...]
>
> It would be helpful if you'd format your articles to have lines
> no longer than about 72 columns. Usenet is not the web, and
> newsreaders don't necessarily deal with with arbitrary long lines.
> (My newsreader does split long lines, but not at word boundaries.)

Will do. I'm using Google Groups and it handles wrapping. I never knew
it was an issue for anyone.

glen herrmannsfeldt

unread,

Jan 20, 2014, 2:10:40 PM1/20/14

to

Rick C. Hodgin <rick.c...@gmail.com> wrote:

(snip, I wrote)

>> Rick C. Hodgin <rick...n@gmail.com> wrote:
>> Is it really so bad to initialize separate variables, and then
>> initialize an array with those pointer values?

> As I change source code from time to time I would like to be able
> to edit the strings defined within this block, as it's closer to
> a real source code layout with minimal overhead to maintain:

> char* sourceCode[] =
> {
> "if (foo[9999]) {\r\n",
> " // Do something\r\n",
> "} else {\r\n",
> " // Do something else\r\n",
> "}\r\n"
> };

(snip)

> Because now I'm back in the days of BASICA and needing the
> RENUM 100,10 ability to redistribute as my initially defined
> numbering system gets bigger. Or I begin having unusual naming
> conventions with extra parts tagged on (line41a or line41_1,
> and so on) because (1) I must manually give everything names
> so they are explicitly referenced thereafter, and because (2)
> the code assigned to each item will change from time to time it
> (3) introduces the possibility of additional errors due to the
> mechanics of setting everything up (something the compiler
> should handle).

My usual solution in that case is to put all the data into a file
of some kind, than, as part of the build process, usually with make,
convert that file into appropriate C, just before compiling it.

> My current solution to do something along these lines
> during initialization:

(snip)

> int i, len;
> char* ptr;
> for (i = 0; list[i] != null; i++)
> {
> len = strlen(list[i]) + 1;
> ptr = (char*)malloc(len);
> memcpy(ptr, list[0], len);
> list[0] = ptr;
> }

> This works ... but with the compiler switch it wouldn't even
> be necessary. The compiler would remove the possibility of
> these errors.

In many cases when one wants initialized char data to modify,
the new length can be different. In that case, this solution doesn't
work, which reminded me of one case that does:

char sourcecode[][80]={

"if (foo[9999]) {\r\n",
" // Do something\r\n",
"} else {\r\n",
" // Do something else\r\n",

"}\r\n",
}

In this case, you get the appropriate number of 80 element
char arrays, initialized to the given values.

Oh, also, one of my favorite C features (Java also has), you
can have the extra comma on the last line. Convenient for program
generated text, though most likely in the standard as it allows
for easy preprocessor conditionals.

-- glen

BartC

unread,

Jan 20, 2014, 2:21:14 PM1/20/14

to

"Rick C. Hodgin" <rick.c...@gmail.com> wrote in message
news:bb2e2189-ec64-43d2...@googlegroups.com...

> char* readonlySourceCode[] =
> {
> "if (something[9999])\r\n",
> "{\r\n",
> " // Do something\r\n",
> "} else {\r\n",
> " // Do something else\r\n",
> "}"
> };

> Is there a way to create the lines in the readonlySourceCode definition so
> it's not read-only. I'm using Visual C++, and am looking for something
> like this:

What's wrong with keeping the lines in a text file? Then you can read them
into an allocated string which will necessarily be writeable. Or you can
just edit the file (or run a script on it to make the changes needed).

--
Bartc

glen herrmannsfeldt

unread,

Jan 20, 2014, 2:27:50 PM1/20/14

to

Rick C. Hodgin <rick.c...@gmail.com> wrote:

> On Monday, January 20, 2014 9:10:24 AM UTC-5, Aleksandar Kuktin wrote:

(snip)

>> > int i, len;
>> > char* ptr;
>> > for (i = 0; list[i] != null; i++)
>> > {
>> > len = strlen(list[i]) + 1;
>> > ptr = (char*)malloc(len); memcpy(ptr, list[0], len);
>> > list[0] = ptr;
>> > }

>> And what, exactly, is wrong with the basic principle of this approach?

> What "exactly" is wrong with this approach is that I must do
> something manually in code, something that is (a) unnecessary,
> (b) rather cumbersome mechanically, and (c) something the compiler
> would be capable of doing for me were it not for design protocol
> limitations being artificially imposed upon an otherwise valid
> data request for a block of read-write memory.

There are an infinite number of features that could be added
to languages and/or compilers, and that could make it easier
for the needs of some people. The commonly needed ones get in,
the rare ones don't.

K&R C had this feature, but not intialized auto arrays.
With the ability to initialize arrays with string constants,
the need to modify string constants was reduced.

(snip)

> I don't have need of making copies of my data.
> It introduces unnecessary code, complexity, opportunity for
> errors. What I do have need of is accessing the data I've
> encoded, as it's encoded at comple-time, to be altered at run-time.

-- glen

Keith Thompson

unread,

Jan 20, 2014, 2:37:06 PM1/20/14

to

"Rick C. Hodgin" <rick.c...@gmail.com> writes:

> On Monday, January 20, 2014 11:14:45 AM UTC-5, Keith Thompson wrote:
>> glen herrmannsfeldt <g...@ugcs.caltech.edu> writes:
>> > Rick C. Hodgin <rick...n@gmail.com> wrote:
>> > (snip)
>> >> char foo[] = "Rick"; // Goes to read-write memory
>> >> char* list[] = { "Rick" } // Goes to read-only memory
>> >> I want a way for list[0] to go to the same place as foo.
>> >> I am using Visual C++ compiler, but I am writing in C.
>> >> I use the C++ compiler because it has some relaxed syntax
>> >> constraints.
>> > As I remember it, not having looked recently, the pre-ANSI (K&R)
>> > compilers allowed writable strings. While not the best practice,
>> > it was an allowed and sometimes useful technique.
>>
>> All versions of the C language, from K&R to ISO C11, have permitted
>> compilers to make string literals writable. What's changed over
>> time is that most compilers don't take advantage of that permission.
>
> Ah! That's a shame. :-)

Not really, at least not for the vast majority of C programmers.

C string literals are intended to be *constant*. The fact that
compilers are permitted to generate code that crashes on an attempt to
modify the array specified by a string literal makes for better error
checking. The fact that such checks are not required is for backwards
compability for code written before the "const" keyword was added to the
language; even if not for that, C tends to make such things undefined
behavior rather than requiring run-time diagnostics.

[...]

> I believe the language should operate such that as I've defined a to
> point to "foo", and b to point to "foo", and these are separate
> strings, then they should be separate strings in memory, the same as
> if I'd said char* a="123"; char* b="456".

If *I* write

char *a = "foo";
char *b = "foo";

all I care about is that both a and b point to strings containing
the characters 'f', 'o', and 'o', in that order. (It also means
that I've forgotten the "const" keyword for some reason.) And if
I later write:

printf("%s\n", a);

the compiler is free to generate code that does the equivalent of

puts("foo");

Forbidding the two occurrences of "foo" to occupy the same memory
location would matter only if (a) you want to be able to modify the
contents of the array (which C doesn't permit you to rely on), or
(b) if you care about the result of (a == b).

If you want writable strings, you can get them:

char a_array[] = "foo";
char *a = a_array; /* or &a_array[0] */

It's slightly less convenient for what you're trying to do, but I don't
think that's a common enough case to justify changing the language as
you suggest.

A compatible language change (or compiler-specific extension) that
wouldn't break existing code might be a new kind of string literal, with
a prefix indicating that the array is writable and may not be shared
with other string literals with the same value. Perhaps something like:

char *a = W"foo";
char *b = W"foo";
a[0] = 'F';
printf("%s%s\n", a, b); /* will print "Foofoo" */

If you wanted to take that approach, you options would be:

1. Modify some open-source compiler to implement it as a language
extension (lots of work);
2. Persuade the maintainers of some compiler to provide it (less work
for you, but likely to fail); or
3. Persuade the ISO C committee to add such a feature to the next C
standard (even more likely to fail, and requries waiting at least a
decade before you can use it).

Barring that, you can either use the existing features of the language,
or implement a preprocessing step that translates code using something
like this feature into standard C.

BTW, you might find that compound literals (added to the language by the
1999 standard) are helpful:

This:

#include <stdio.h>

int main(void) {
char *s = (char[]){"hello"};
s[0] = 'H';
puts(s);
}

prints "Hello". But the array whose first element s points to is still
just 6 characters long, and unlike string literals, an object created by
a compound has automatic storage duration (it ceases to exist when you
leave the enclosing block).

glen herrmannsfeldt

unread,

Jan 20, 2014, 2:38:36 PM1/20/14

to

Keith Thompson <ks...@mib.org> wrote:

(snip, I wrote)

>> As I remember it, not having looked recently, the pre-ANSI (K&R)
>> compilers allowed writable strings. While not the best practice,
>> it was an allowed and sometimes useful technique.

> All versions of the C language, from K&R to ISO C11, have permitted
> compilers to make string literals writable. What's changed over
> time is that most compilers don't take advantage of that permission.

Well, also the need to do it has been reduced. For one, initialized
auto arrays in ANSI C helped. But an initialized auto array takes
up twice as much memory, one for the initialization value and
another when it is allocated. For most systems, initialized static
arrays only allocate the one copy and initialize it at program
fetch.

There were lots of tricks used in the small memory days that
went away as memory prices decreased and systems got larger.

A program might have some string data that it needs to print out
once, and never again. It could reuse that memory for something
else later.

>> Some compilers have an option still to do that. Note that this
>> option also has to be sure to separately allocate strings that are
>> otherwise equal.

> They don't *have* to do that unless they make additional guarantees
> beyond what the language specifies.

> If I write:

> char *a = "foo";
> char *b = "foo";
> a[0] = 'F';
> puts(b);

> and the puts call is actually executed, the language permits it to
> print either "foo", or "Foo"(or "fnord", or a suffusion of yellow).
> The behavior of the assignment to a[0] is undefined, and once you
> do that all bets are off.

> But if a compiler were to guarantee, as a language extension,
> that string literals are meaningfully modifiable, then it would
> probably have to guarantee that the strings pointed to by a and
> b must be distinct (unless the compiler can prove that they're
> never modified). The compiler's documentation would have to spell
> out just what additional guarantees it offers. (Such an extension
> would not make the compiler non-conforming, since any code that
> takes advantage of it have undefined behavior.)

Yes. If the extension was called "writable-strings", one would
hope that it supplied separate copies.

-- glen

glen herrmannsfeldt

unread,

Jan 20, 2014, 2:43:40 PM1/20/14

to

Rick C. Hodgin <rick.c...@gmail.com> wrote:

> On Monday, January 20, 2014 11:14:45 AM UTC-5, Keith Thompson wrote:

(snip)

>> All versions of the C language, from K&R to ISO C11, have permitted
>> compilers to make string literals writable. What's changed over
>> time is that most compilers don't take advantage of that permission.

> Ah! That's a shame. :-)

>> > Some compilers have an option still to do that. Note that this
>> > option also has to be sure to separately allocate strings that are
>> > otherwise equal.

(snip)

> I believe the language should operate such that as I've defined
> a to point to "foo", and b to point to "foo", and these are
> separate strings, then they should be separate strings in memory,
> the same as if I'd said char* a="123"; char* b="456".

(snip)

> I personally believe it's a silly requirement to do such a
> comparison to save a few bytes of space by default. I'd
> rather have it always duplicated and then allow the developer
> to provide a manually inserted command line switch which
> specifically turns on that kind of checking, and that kind
> of substituting.

I believe Java requires String constants in the same class
(maybe only method) to have the same reference value. That is, in

if("string"=="string") ...

the if condition will be true. As far as I know, C doesn't
require that, but allows for it.

-- glen

glen herrmannsfeldt

unread,

Jan 20, 2014, 2:44:34 PM1/20/14

to

Keith Thompson <ks...@mib.org> wrote:
> "Rick C. Hodgin" <rick.c...@gmail.com> writes:
>> I have a need for something like this, except that I need to
>> edit list[N]'s data, as in memcpy(list[0], "eno", 3):
>> char* list[] = { "one", "two", "three", "four" };
> [...]

> It would be helpful if you'd format your articles to have lines
> no longer than about 72 columns. Usenet is not the web, and
> newsreaders don't necessarily deal with with arbitrary long lines.
> (My newsreader does split long lines, but not at word boundaries.)

And some news hosts enforce this.

-- glen

Keith Thompson

unread,

Jan 20, 2014, 3:06:11 PM1/20/14

to

glen herrmannsfeldt <g...@ugcs.caltech.edu> writes:
> Rick C. Hodgin <rick.c...@gmail.com> wrote:
>> On Monday, January 20, 2014 9:10:24 AM UTC-5, Aleksandar Kuktin wrote:
>
> (snip)
>
>>> > int i, len;
>>> > char* ptr;
>>> > for (i = 0; list[i] != null; i++)
>>> > {
>>> > len = strlen(list[i]) + 1;
>>> > ptr = (char*)malloc(len); memcpy(ptr, list[0], len);
>>> > list[0] = ptr;
>>> > }
>
>>> And what, exactly, is wrong with the basic principle of this approach?
>
>> What "exactly" is wrong with this approach is that I must do
>> something manually in code, something that is (a) unnecessary,
>> (b) rather cumbersome mechanically, and (c) something the compiler
>> would be capable of doing for me were it not for design protocol
>> limitations being artificially imposed upon an otherwise valid
>> data request for a block of read-write memory.
>
> There are an infinite number of features that could be added
> to languages and/or compilers, and that could make it easier
> for the needs of some people. The commonly needed ones get in,
> the rare ones don't.
>
> K&R C had this feature, but not intialized auto arrays.
> With the ability to initialize arrays with string constants,
> the need to modify string constants was reduced.

Did K&R1 guarantee that string literals are writable and unique?
(My copy is at home; I'll try to check later.)

Rick C. Hodgin

unread,

Jan 20, 2014, 3:08:41 PM1/20/14

to

On Monday, January 20, 2014 2:10:40 PM UTC-5, glen herrmannsfeldt wrote:
> > Because now I'm back in the days of BASICA and needing the
> > RENUM 100,10 ability to redistribute as my initially defined
> > numbering system gets bigger. Or I begin having unusual naming
> > conventions with extra parts tagged on (line41a or line41_1,
> > and so on) because (1) I must manually give everything names
> > so they are explicitly referenced thereafter, and because (2)
> > the code assigned to each item will change from time to time it
> > (3) introduces the possibility of additional errors due to the
> > mechanics of setting everything up (something the compiler
> > should handle).
>
> My usual solution in that case is to put all the data into a file
> of some kind, than, as part of the build process, usually with make,
> convert that file into appropriate C, just before compiling it.

Yes. There are many ways to do it. I considered that option as well. It's just a lot of work for something the compiler should be able to do.

> Oh, also, one of my favorite C features (Java also has), you
> can have the extra comma on the last line. Convenient for program
> generated text, though most likely in the standard as it allows
> for easy preprocessor conditionals.
> -- glen

See, and I think that such an "allowance" is patently absurd and should not be a part of any language. :-)

Rick C. Hodgin

unread,

Jan 20, 2014, 3:10:20 PM1/20/14

to

On Monday, January 20, 2014 2:21:14 PM UTC-5, Bart wrote:
> > Is there a way to create the lines in the readonlySourceCode definition so
> > it's not read-only. I'm using Visual C++, and am looking for something
> > like this:
>
> What's wrong with keeping the lines in a text file? Then you can read them
> into an allocated string which will necessarily be writeable. Or you can
> just edit the file (or run a script on it to make the changes needed).

Nothing logically. It's just that if I use the external file, now I'm maintaining an extra file, I had to write the extra code which reads it in, and that all presents many more opportunities for errors at runtime.

Eric Sosman

unread,

Jan 20, 2014, 3:31:34 PM1/20/14

to

On 1/20/2014 3:06 PM, Keith Thompson wrote:
>[...]

> Did K&R1 guarantee that string literals are writable and unique?
> (My copy is at home; I'll try to check later.)

K&R guarantees uniqueness ("all strings, even when written
identically, are distinct" -- pg. 181), but I can't find any
promise of modifiability. One might guess that the strings
were intended to be modifiable -- why guarantee uniqueness if
not? -- but if there's any explicit language to that effect I've
overlooked it.

But that was long ago: The uniqueness guarantee (and any
accompanying mutability) was rescinded by the original ANSI C
Standard way back in 1989. Quoth the Rationale:

"String literals are not required to be modifiable. This
specification allows implementations to share copies of
strings with identical text, to place string literals in
read-only memory, and to perform certain optimizations.
[...] Those members of the C89 Committee who insisted that
string literals should be modifiable were content to have
this practice designated a common extension [...]"

--
Eric Sosman
eso...@comcast-dot-net.invalid

Rick C. Hodgin

unread,

Jan 20, 2014, 3:36:14 PM1/20/14

to

On Monday, January 20, 2014 2:37:06 PM UTC-5, Keith Thompson wrote:
> >> All versions of the C language, from K&R to ISO C11, have permitted
> >> compilers to make string literals writable. What's changed over
> >> time is that most compilers don't take advantage of that permission.
> > Ah! That's a shame. :-)
> Not really, at least not for the vast majority of C programmers.

How did I guess you were going to say that? :-)

> C string literals are intended to be *constant*.

They are most often intended to be constant, but not always. There are many
cases where developers allocate something with an initial value, but then
alter it at runtime.

char defaultOption = "4";

In this case, the default option is 4 until the user changes it. It's a constant bit of text, but is not constant. :-)

> The fact that
> compilers are permitted to generate code that crashes on an attempt to
> modify the array specified by a string literal makes for better error
> checking.

I would like to be able to specify that with a const prefix, as in this
type of syntax:

char* list[] =
{
"foo1",
const "foo2",
"foo3"
}

In this case, I do not want the second element to be changed, but the
first and third... they can change.

> The fact that such checks are not required is for backwards
> compability for code written before the "const" keyword was added to the
> language; even if not for that, C tends to make such things undefined
> behavior rather than requiring run-time diagnostics.

Well ... there's logic there. It makes sense. I think it's time for a
switchover though. We're getting into multi-processor programming, multiple
threads. Where we are in 2010s and later is not where we were in 1980s.

I realize C operates this way and it's fine. I think the future standard
should be that everything is in read-write memory except those things
explicitly prefixed with const, or a new _c("text") macro which identifies
that data explicitly as a constant.

> A compatible language change (or compiler-specific extension) that
> wouldn't break existing code might be a new kind of string literal, with
> a prefix indicating that the array is writable and may not be shared
> with other string literals with the same value. Perhaps something like:
> char *a = W"foo";
> char *b = W"foo";
> a[0] = 'F';
> printf("%s%s\n", a, b); /* will print "Foofoo" */
> If you wanted to take that approach, you options would be:
> 1. Modify some open-source compiler to implement it as a language
> extension (lots of work);
> 2. Persuade the maintainers of some compiler to provide it (less work
> for you, but likely to fail); or
> 3. Persuade the ISO C committee to add such a feature to the next C
> standard (even more likely to fail, and requries waiting at least a
> decade before you can use it).

Yup.

> Barring that, you can either use the existing features of the language,
> or implement a preprocessing step that translates code using something
> like this feature into standard C.
>
> BTW, you might find that compound literals (added to the language by the
> 1999 standard) are helpful:
> This:
> #include <stdio.h>
> int main(void) {
> char *s = (char[]){"hello"};
> s[0] = 'H';
> puts(s);
> }
> prints "Hello". But the array whose first element s points to is still
> just 6 characters long, and unlike string literals, an object created by
> a compound has automatic storage duration (it ceases to exist when you
> leave the enclosing block).

Ian Collins

unread,

Jan 20, 2014, 3:40:28 PM1/20/14

to

Rick C. Hodgin wrote:
> There are lots of solutions and workarounds. I'm looking for a
> compiler directive that will override the default behavior of
> allocating constant strings to read-only memory, and instead allocate
> them to read-write memory.

>
> char foo[] = "Rick"; // Goes to read-write memory char* list[] = {
> "Rick" } // Goes to read-only memory
>
> I want a way for list[0] to go to the same place as foo. I am using
> Visual C++ compiler, but I am writing in C. I use the C++ compiler
> because it has some relaxed syntax constraints.

For what you are doing, you would be better off using C++ strings.
Given you have to tool to hand, you may as well use it.

--
Ian Collins

Rick C. Hodgin

unread,

Jan 20, 2014, 3:42:33 PM1/20/14

to

On Monday, January 20, 2014 3:31:34 PM UTC-5, Eric Sosman wrote:
> K&R guarantees uniqueness ("all strings, even when written
> identically, are distinct" -- pg. 181)

To me, this is the only way that makes sense. If I want to use the same string I can reference it. Or, I could introduce a compiler switch which introduces an option to combine similar strings marked const.

> But that was long ago: The uniqueness guarantee (and any
> accompanying mutability) was rescinded by the original ANSI C
> Standard way back in 1989.

And I bet you could hear the thuds on the floor as many developers screamed "WHAT!" and then passed out.

> Quoth the Rationale:
> "String literals are not required to be modifiable. This
> specification allows implementations to share copies of
> strings with identical text, to place string literals in
> read-only memory, and to perform certain optimizations.

Insanity I say! :-)

> [...] Those members of the C89 Committee who insisted that
> string literals should be modifiable were content to have
> this practice designated a common extension [...]"

The word "common" being used very loosely there. LOL! :-)

Eric Sosman

unread,

Jan 20, 2014, 4:29:28 PM1/20/14

to

On 1/20/2014 3:36 PM, Rick C. Hodgin wrote:
> On Monday, January 20, 2014 2:37:06 PM UTC-5, Keith Thompson wrote:
>>>> All versions of the C language, from K&R to ISO C11, have permitted
>>>> compilers to make string literals writable. What's changed over
>>>> time is that most compilers don't take advantage of that permission.
>>> Ah! That's a shame. :-)
>> Not really, at least not for the vast majority of C programmers.
>
> How did I guess you were going to say that? :-)
>
>> C string literals are intended to be *constant*.
>
> They are most often intended to be constant, but not always. There are many
> cases where developers allocate something with an initial value, but then
> alter it at runtime.
>
> char defaultOption = "4";

Constraint violation, requiring a diagnostic. Presumably
you meant one of

char defaultOption = '4';
or
char *defaultOption = "4";

> In this case, the default option is 4 until the user changes it. It's a constant bit of text, but is not constant. :-)

No, not at all. `defaultOption' (either version) is not a
constant, but a variable. It has an initial value, that's all.
You can change the variable's value with one of

if (do_z)
defaultOption = 'Z';
or
if (pile_it_deeply)
defaultOption = "Gomer Pyle";

There's really no difference between any of these and

int defaultOption = 42;
...
if (behave_differently)
defaultOption = -17;

In none of these cases is there any need to change the value
of a constant, nor any reason to want to do so.

--
Eric Sosman
eso...@comcast-dot-net.invalid

Eric Sosman

unread,

Jan 20, 2014, 4:32:04 PM1/20/14

to

On 1/20/2014 3:42 PM, Rick C. Hodgin wrote:
> On Monday, January 20, 2014 3:31:34 PM UTC-5, Eric Sosman wrote:
>> K&R guarantees uniqueness ("all strings, even when written
>> identically, are distinct" -- pg. 181)
>
> To me, this is the only way that makes sense. If I want to use the same string I can reference it. Or, I could introduce a compiler switch which introduces an option to combine similar strings marked const.
>
>> But that was long ago: The uniqueness guarantee (and any
>> accompanying mutability) was rescinded by the original ANSI C
>> Standard way back in 1989.
>
> And I bet you could hear the thuds on the floor as many developers screamed "WHAT!" and then passed out.

Did you miss the part about "Those members ... were content?"

At that time I was a developer with not quite twenty years'
worth of C experience, and I neither screamed nor thudded. YMMV.

>> Quoth the Rationale:
>> "String literals are not required to be modifiable. This
>> specification allows implementations to share copies of
>> strings with identical text, to place string literals in
>> read-only memory, and to perform certain optimizations.
>
> Insanity I say! :-)
>
>> [...] Those members of the C89 Committee who insisted that
>> string literals should be modifiable were content to have
>> this practice designated a common extension [...]"
>
> The word "common" being used very loosely there. LOL! :-)

See Appendix J.

--
Eric Sosman
eso...@comcast-dot-net.invalid

Keith Thompson

unread,

Jan 20, 2014, 4:54:50 PM1/20/14

to

"Rick C. Hodgin" <rick.c...@gmail.com> writes:

> On Monday, January 20, 2014 2:37:06 PM UTC-5, Keith Thompson wrote:
>> >> All versions of the C language, from K&R to ISO C11, have permitted
>> >> compilers to make string literals writable. What's changed over
>> >> time is that most compilers don't take advantage of that permission.
>> > Ah! That's a shame. :-)
>> Not really, at least not for the vast majority of C programmers.
>
> How did I guess you were going to say that? :-)
>
>> C string literals are intended to be *constant*.
>
> They are most often intended to be constant, but not always. There are many
> cases where developers allocate something with an initial value, but then
> alter it at runtime.
>
> char defaultOption = "4";
>
> In this case, the default option is 4 until the user changes it. It's
> a constant bit of text, but is not constant. :-)

Did you mean "char *defaultOption" or "char defaultOption[]" rather than
"char defaultOptions", or did you mean '4' rather than "4"?

>> The fact that
>> compilers are permitted to generate code that crashes on an attempt to
>> modify the array specified by a string literal makes for better error
>> checking.
>
> I would like to be able to specify that with a const prefix, as in this
> type of syntax:
>
> char* list[] =
> {
> "foo1",
> const "foo2",
> "foo3"
> }
>
> In this case, I do not want the second element to be changed, but the
> first and third... they can change.

If I were to suggest a new language feature to support that, I'd want an
explicit marker for a string that I *do* want to be able to change.

In your proposed C-like language, what would this snippet print?

for (int i = 0; i < 2; i ++) {
char *s = "hello";
if (i == 0) {

s[0] = 'H';
}
puts(s);
}

In C as it's currently defined, the string literal "hello" corresponds
to an anonymous array object with static storage duration; attempting to
modify it has undefined behavior. As I understand it, you want to
remove the second part of that. The above code has one occurrence of a
string literal, but it's being used in the initializer for two distinct
objects. On the second iteration, does s point to a string with
contents "hello" or "Hello"?

Either interpretation is problematic.

>> The fact that such checks are not required is for backwards
>> compability for code written before the "const" keyword was added to the
>> language; even if not for that, C tends to make such things undefined
>> behavior rather than requiring run-time diagnostics.
>
> Well ... there's logic there. It makes sense. I think it's time for a
> switchover though. We're getting into multi-processor programming, multiple
> threads. Where we are in 2010s and later is not where we were in 1980s.

I fail to see how this argues for modifiable string literals.

As a language design issue, I *strongly* disagree with this.
Personally, I like the idea of making everything read-only unless you
explicitly say you want to be able to modify it. (Obviously C isn't
defined this way; equally obviously, this is merely my own opinion.)

BTW, your _c("text") macro would still have to be defined somehow;
a new kind of string literal would probably make more sense.

The bottom line is that standard C cannot, and IMHO should not, cater to
every obscure coding practice. A language can have:
1. mutable string literals;
2. immutable string literals; or
3. both, with distinct syntax.

C has chosen option 2, and it has served us well. I would not strongly
object to option 3, but I'm not convinced that it would be worth the
extra complexity. You're welcome to push for option 1, but don't expect
to succeed.

BartC

unread,

Jan 20, 2014, 5:25:30 PM1/20/14

to

"glen herrmannsfeldt" <g...@ugcs.caltech.edu> wrote in message
news:lbjsbg$r19$1...@speranza.aioe.org...

> Oh, also, one of my favorite C features (Java also has), you
> can have the extra comma on the last line. Convenient for program
> generated text, though most likely in the standard as it allows
> for easy preprocessor conditionals.

A nice feature I came across (I think from the poster known as 'BGB') was a
form of include where the text in the included file was inserted as a string
constant. So if the text in the file was:

one
two
three

then including that file would be equivalent to:

"one\ntwo\nthree\n"

in the source code. (Obviously in C it would need to be allowed inside an
expression.)

--
Bartc

glen herrmannsfeldt

unread,

Jan 20, 2014, 5:29:21 PM1/20/14

to

Keith Thompson <ks...@mib.org> wrote:

(snip, I wrote)

>> There are an infinite number of features that could be added
>> to languages and/or compilers, and that could make it easier
>> for the needs of some people. The commonly needed ones get in,
>> the rare ones don't.

>> K&R C had this feature, but not intialized auto arrays.
>> With the ability to initialize arrays with string constants,
>> the need to modify string constants was reduced.

> Did K&R1 guarantee that string literals are writable and unique?
> (My copy is at home; I'll try to check later.)

I have one somewhere, but I found K&R2, appendix C, summary
of changes: (from K&R1)

"Strings are no longer modifyable, and may be placed in
read-only memory."

Doesn't say about unique, might need an actual K&R1.

Reminds me of a rarely used feature in Fortran, though maybe
gone by now. You can read over H format descriptors.

READ(5,1)
1 FORMAT(20HSOMETHING GOES HERE.)
WRITE(6,1)

The first READ writes over the contents of the H descriptor,
the WRITE then writes the new value.

Seems like a similar reuse of otherwise not needed memory, but
pretty strange now.

Most often not so useful, as there is no way to get carriage
control in place.

-- glen

glen herrmannsfeldt

unread,

Jan 20, 2014, 5:53:19 PM1/20/14

to

Rick C. Hodgin <rick.c...@gmail.com> wrote:

> I have a need for something like this, except that I need to
> edit list[N]'s data, as in memcpy(list[0], "eno", 3):

> char* list[] = { "one", "two", "three", "four" };

> I have a work-around like this:
> char one[] = "one";
> char two[] = "two";
> char three[] = "three";
> char four[] = "four";
> char* list[] = { one, two, three, four };

> However, this is clunky because I want to be able to change the
> items because in the actual application it is source code that
> I'm coding within the compiler for an automatic processor.

I thought I should go back to the beginning of this discussion,
to see what you were actually doing.

Seems to me that when most people do this, or at least when I do,
I substitute the copy just before writing it out. You need a loop
that goes through and writes out the lines, inside that loop copy
each line to a line buffer, modify as appropriate, then write
it out.

That also allows for variable length substitution, though sometimes
constant length is better.

It is common for error messages where the appropriate context,
such as line number, is substituted, or an error code.

It is common for macro processors, such as Mortran 2, or TeX,
where the macro arguments are substituted in the expansion.
Both Mortran and TeX use # to indicate a substitution, such
as #1 for the first argument, #2 for the second.

One that I have done takes advantage of an interesting C feature.
Put %d where you want the number to go, and printf it:

for(i=0;i<sizeof(x)/sizeof(*x);i++) {
fprintf(outfile,x[i], n, n, n, n, n, n, n, n, n, n);
}

Now up to 10 %d's in the line will be replaced by the value of n.
(I like to put in extra to avoid the problem of not having enough.)

Very little work to write.

-- glen

Keith Thompson

unread,

Jan 20, 2014, 6:00:34 PM1/20/14

to

#include directives are already allowed inside expressions, though they
still don't have a way to convert the input file content to a string
literal.

For example:

$ cat hello.inc
"Hello, world"
$ cat hello.c
#include <stdio.h>
int main(void) {
puts(
#include "hello.inc"
);
}
$ gcc hello.c -o hello && ./hello
Hello, world
$

(Note that I've named the file with a ".inc" rather than ".h" suffix to
make it clear that it's not an ordinary header file.)

Seebs

unread,

Jan 20, 2014, 5:37:41 PM1/20/14

to

On 2014-01-20, Rick C. Hodgin <rick.c...@gmail.com> wrote:
> On Monday, January 20, 2014 3:31:34 PM UTC-5, Eric Sosman wrote:
>> But that was long ago: The uniqueness guarantee (and any
>> accompanying mutability) was rescinded by the original ANSI C
>> Standard way back in 1989.

> And I bet you could hear the thuds on the floor as many developers
> screamed "WHAT!" and then passed out.

I don't think so. Back in the late 80s, when I was just starting to
learn C, I was aware that if you had two string literals, and one was
the same characters as the tail end of the other, the compiler might
use the same storage for both. It's really easy to obtain a modifiable
string if I want one, so I don't expect literals to be modifiable, or
indeed, even to occur in code or storage anywhere if they don't really
have to.

-s
--
Copyright 2013, all wrongs reversed. Peter Seebach / usenet...@seebs.net
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
Autism Speaks does not speak for me. http://autisticadvocacy.org/
I am not speaking for my employer, although they do rent some of my opinions.

Seebs

unread,

Jan 20, 2014, 6:14:48 PM1/20/14

to

On 2014-01-20, Rick C. Hodgin <rick.c...@gmail.com> wrote:
> On Monday, January 20, 2014 2:10:40 PM UTC-5, glen herrmannsfeldt wrote:
>> Oh, also, one of my favorite C features (Java also has), you
>> can have the extra comma on the last line. Convenient for program
>> generated text, though most likely in the standard as it allows
>> for easy preprocessor conditionals.

> See, and I think that such an "allowance" is patently absurd and should not be a part of any language. :-)

Why? It's really convenient as a way to eliminate a gratuitous special
case.

BartC

unread,

Jan 21, 2014, 4:39:05 AM1/21/14

to

"Keith Thompson" <ks...@mib.org> wrote in message
news:lneh42z...@nuthaus.mib.org...

> "BartC" <b...@freeuk.com> writes:

>> So if the text in the file was:
>>
>> one
>> two
>> three
>>
>> then including that file would be equivalent to:
>>
>> "one\ntwo\nthree\n"
>>
>> in the source code. (Obviously in C it would need to be allowed inside an
>> expression.)
>
> #include directives are already allowed inside expressions, though they
> still don't have a way to convert the input file content to a string
> literal.

Sure, I forgot you can have a term of an expression on its own line and so
make it possible to insert a # directive.

> For example:
>
> $ cat hello.inc
> "Hello, world"
> $ cat hello.c
> #include <stdio.h>
> int main(void) {
> puts(
> #include "hello.inc"
> );
> }
> $ gcc hello.c -o hello && ./hello
> Hello, world
> $

It seems a fine distinction, but not requiring the text in the included file
to be quoted (ie. written out as one or a series of valid C string constants
complete with escape codes) means text from any source can be used, which
can be created/edited by anyone with any text editor. That was the advantage
that struck me.

Although such an include wouldn't work recursively. This means a C source
file can include a string constant containing the entire C source file
itself!

(I've just spent ten minutes implementing such a feature, to show it's
workable. However it possible, even with C, to read a file into a string
using a single function call (in C, you have write the function first). So
it's mainly useful where large, statically allocated string constants are
needed. As in the OP's requirement perhaps.)

--
Bartc

Rick C. Hodgin

unread,

Jan 21, 2014, 6:51:41 AM1/21/14

to

On Monday, January 20, 2014 4:29:28 PM UTC-5, Eric Sosman wrote:
> >> C string literals are intended to be *constant*.
> > They are most often intended to be constant, but not always. There are
> > many cases where developers allocate something with an initial value, but
> > then alter it at runtime.
> > char defaultOption[] = "4";

> > In this case, the default option is 4 until the user changes it. It's a
> > constant bit of text, but is not constant. :-)
>
> No, not at all. `defaultOption' (either version) is not a
> constant, but a variable. It has an initial value, that's all.
> You can change the variable's value with one of

That was my point. In my char* list[] = { ... } case, I want initial values, but not constants. I type them in the same way, but they have different meanings.

> In none of these cases is there any need to change the value
> of a constant, nor any reason to want to do so.

I don't want to change the value of a constant. By definition one should not be able to do that. :-)

What I want is a way to change the value of my variable, the one defined and stored as a literal string in source code, one that I desire to be a variable, yet the one being interpreted by the compiler as a constant instead of a variable with no apparent override available to direct it where I would like it to go even though those members in attendance were content to allow such an ability through common extensions (nearly all of which were later dropped it seems).

Rick C. Hodgin

unread,

Jan 21, 2014, 6:57:13 AM1/21/14

to

On Monday, January 20, 2014 4:32:04 PM UTC-5, Eric Sosman wrote:
> On 1/20/2014 3:42 PM, Rick C. Hodgin wrote:
> >> But that was long ago: The uniqueness guarantee (and any
> >> accompanying mutability) was rescinded by the original ANSI C
> >> Standard way back in 1989.
>
> > And I bet you could hear the thuds on the floor as many developers
> > screamed "WHAT!" and then passed out.
> Did you miss the part about "Those members ... were content?"

Not at all. I presumed "those members" involved in the ANSI authorizing were a small number, power-seeking representatives of the entire C language developer base of us "little people," while the much larger group (developers who were coding in C in general) contained many developers, and it was many of them who screamed "WHAT!" and then passed out.

> At that time I was a developer with not quite twenty years'
> worth of C experience, and I neither screamed nor thudded. YMMV.

MMDV.

> >> Quoth the Rationale:
> >> "String literals are not required to be modifiable. This
> >> specification allows implementations to share copies of
> >> strings with identical text, to place string literals in

> >> [...] Those members of the C89 Committee who insisted that
> >> string literals should be modifiable were content to have
> >> this practice designated a common extension [...]"
> > The word "common" being used very loosely there. LOL! :-)
> See Appendix J.

Appendix J of what?

Rick C. Hodgin

unread,

Jan 21, 2014, 7:47:24 AM1/21/14

to

> > char defaultOption = "4";

> Did you mean "char *defaultOption" or "char defaultOption[]" rather than
> "char defaultOptions", or did you mean '4' rather than "4"?

I meant char defaultOption[] = "4";

> > I would like to be able to specify that with a const prefix, as in this
> > type of syntax:
> >
> > char* list[] =
> > {
> > "foo1",
> > const "foo2",
> > "foo3"
> > }
> >
> > In this case, I do not want the second element to be changed, but the
> > first and third... they can change.
>
> If I were to suggest a new language feature to support that, I'd want an
> explicit marker for a string that I *do* want to be able to change.

I realize that. My view, of course, differs. :-)

> In your proposed C-like language, what would this snippet print?
> for (int i = 0; i < 2; i ++) {
> char *s = "hello";
> if (i == 0) {
> s[0] = 'H';
> }
> puts(s);
> }

Interesting.

FWIW, I don't believe in defining variables in this way in C. I believe an
initialization block should exist so it is being done explicitly, both for
documentation purposes, and clarity in reading the source code (it's very
easy to miss a nested declaration when a group of variables is created of
a similar type.

In my proposed language, it would print "Hello" both times because the
char* s definition would've been pulled out of the loop and defined as a
function variable. If the user wanted independent copies each iteration
he would have to create them and manage them himself, which I think
documents the condition of the code more explicitly, making it far
easier for someone to see what's going on rather than inferring from
language peculiarities which may not be consistent across compiler
versions (they probably are in this case, but I've seen other similar
conditions which vary between Visual C++ and GCC).

I would rewrite your function to look like this (because I believe it
should behave this way):

void foo(void)
{
int i;
char* s; // Note I use the D language syntax of keeping the pointer
// symbol near the type, rather than the variable.

// Initialization
s = "hello";

// Code
for (i = 0; i < 2; i ++)
{
// I introduce predicates, which logically operate more like this:

if (i == 0) s[0] = 'H';

// Displays "Hello" both times
puts(s);
}
}

To mimic your functionality, I would code this way:

const char gcHello[] = "hello";

void foo(void)
{
int i;
char s[7];

// Initialization (a compiler switch would make this automatic)
memset(s, 0, sizeof(s)); // Here it's done manually

// Code
for (i = 0; i < 2; i ++)
{
// Iterative re/initialization
memcpy(s, gcHello, -2); // My memcpy would support a p3 of -1 to
// automatically perform strlen() on p3,
// and -2 would be strlen()+1.

// First pass conversion only

if (i == 0) s[0] = 'H';

// Displays "Hello" then "hello"

puts(s);
}
}

> In C as it's currently defined, the string literal "hello" corresponds
> to an anonymous array object with static storage duration; attempting to
> modify it has undefined behavior. As I understand it, you want to
> remove the second part of that. The above code has one occurrence of a
> string literal, but it's being used in the initializer for two distinct
> objects. On the second iteration, does s point to a string with
> contents "hello" or "Hello"?
>
> Either interpretation is problematic.

Exactly. So, you don't code that way. :-)

You make everything a function-level variable and it's done. You make
all code items read/write unless they are explicitly prefixed with a
const or have some macro wrapper like _rw("foo") or _fo("foo") to
explicitly name them.

> >> The fact that such checks are not required is for backwards
> >> compability for code written before the "const" keyword was added to the
> >> language; even if not for that, C tends to make such things undefined
> >> behavior rather than requiring run-time diagnostics.
>
> > Well ... there's logic there. It makes sense. I think it's time for a
> > switchover though. We're getting into multi-processor programming, multiple
> > threads. Where we are in 2010s and later is not where we were in 1980s.
>
> I fail to see how this argues for modifiable string literals.

Not just string literals, but a separation of the "before" and the "after."
Programming today is, by default, targeted at multiple CPUs. There are
functions which run top-down, but on the whole we are creating
multi-threaded congruent code execution engines running on commensurate
hardware. The time for a new language syntax is at hand.

I propose new extensions to C in general:

in (thread_name) {
// Do some code in this thread
} and in (other_thread_name) {
// Do some code in this thread
}

And a new tjoin keyword to join threads before continuing:
tjoin this, thread_name, other_thread_name

flow name {
flowto name;

always before {
}

always after {
}

function name {
}

subflow name {
}

error name {
}

} // end flow

And I propose the new concept of a cask, a "pill" that is inserted anywhere
in code to do whatever I like, along with explicit cask definitions that do
certain things based upon called functions which can convey branch logic
upon returning.

Casks look like this (|sample|) and they would operate like this:

Traditional code:
if (some_test(1, 2, 3)) {
// Do something
}

In this case:
push 3
push 2
push 1
call some_test
compare result to 0
if not equal, enter the block

The ability to insert a cask does not alter program logic:
if ( (|cask1|) some_test( (|cask2|) 1, 2, (|cask3|) 3) {
// Do something
}

In this case:
push 3
call cask3
push 2
push 1
call cask2
call some_test
call cask1
compare result from some_test to 0
if not equal, enter the block

The casks are called functions which can be inserted at any point without
otherwise affecting program logic (hence their new shape). I wrote it with
spaces above to make it more clear, but it could be coded like this:
if ((|cask1|)some_test((|cask2|)1, 2, (|cask3|)3)

And I have other ideas. You can read about them on this page. This page
specifically relates to extensions to Visual FoxPro, but my intention is
my RDC (Rapid Development Compiler) which is C-like, but relaxes a lot of
stringent errors in C reporting them only as warnings, such as pointer-to-
pointer conversions, allowing for them to be perfectly valid, and many
other changes as well.

http://www.visual-freepro.org/wiki/index.php/VXB++

Each of these add-ons should be language-level first class citizens which are known to the language and allow for modern CPU architectures with various new data types and volatile extensions which operate around explicitly semaphored access at the language level, along with certain optimization constraints and allowances (as per the developer's dictates, even of the kind which can override "safety" on variable use -- meaning that a particular case could violate atomicity and the compiler knows it, but the compiler is a tool and should allow what the developer dictates because the developer is a person and has real authority).

My opinion. My goals. :-)

> > I realize C operates this way and it's fine. I think the future standard
> > should be that everything is in read-write memory except those things
> > explicitly prefixed with const, or a new _c("text") macro which identifies
> > that data explicitly as a constant.
>
> As a language design issue, I *strongly* disagree with this.
> Personally, I like the idea of making everything read-only unless you
> explicitly say you want to be able to modify it. (Obviously C isn't
> defined this way; equally obviously, this is merely my own opinion.)

I look at computers as being exactly that: computers. They compute. Their
purpose is to take input, process it, and store output. That storage portion
automatically means write abilities, and the input combination means
read-write as it is more common to operate through a chain of processing
where a recently computed and written value is then quickly used thereafter
as input to a follow-on computation.

Everything should be read-write unless explicitly stated as read-only. My
opinion, and my position in any languages I author. :-)

> BTW, your _c("text") macro would still have to be defined somehow;
> a new kind of string literal would probably make more sense.

I think string literals should all use double-quotes, but it should be a
different double-quote character from ASCII-34, and one for left and one
for right, so they can be nested and mate up as parenthesis. I also
believe variables should be allowed to have spaces, but it should be a
different space character than ASCII-32. In my implementation of RDC,
Visual FreePro, my virtual machine, I will introduce ASCII characters which
explicitly serve these purposes at the developer level, allowing for double-
quoted characters to be used as raw input without first escaping them. I
will also allow other explicit ASCII characters between the new double quote
characters in their raw single-symbol form without escaping.

We're past the days of limited abilities. We have GUIs now that can draw any
icon, any character ... it's time to grow up. :-)

There are a lot of things which were done badly in the early days of
programming. Many rigid constraints which make difficult-to-read-and-understand source code. The concept of a header, for example, is
no longer required when 16GB of memory is $700 or less and will only
get cheaper over time.

It's time to rethink what's been thought, and undo the damage that's been
done, to look to the current and future needs of multi-core, parallel-thread,
object-oriented design, yet with all of it having an explicit base in the raw
compute needs of the machine. C is ideal for that. C++ takes the idea of
object orientation too far.

In C++, foo->function(). The better syntax is foo.function() (for both
pointers to variables, and declared variables), and such is a mild
extension of the existing struct:

struct SFoo
{
void function(void);

int member_variable;
}

SFoo foo1;
SFoo* foo2 = malloc(SFoo); // Compiler is smart enough to know
foo1.function();
foo2.function(); // Same syntax

No default constructor. No default destructor. Each component must be
explicitly coded and executed in code if needed ... thereby documenting it
without obfuscation, and allowing a straight-forward merging of structures
through multiple inheritance. My opinion.

> The bottom line is that standard C cannot, and IMHO should not, cater to
> every obscure coding practice. A language can have:
> 1. mutable string literals;
> 2. immutable string literals; or
> 3. both, with distinct syntax.
>
> C has chosen option 2, and it has served us well. I would not strongly
> object to option 3, but I'm not convinced that it would be worth the
> extra complexity. You're welcome to push for option 1, but don't expect
> to succeed.

I don't expect to get anywhere trying to change anything in C. :-) It's why
I'm moving to my own language. I hit the GCC group a while back asking for
the self() extension, which allows recursion to the current function without
it being explicitly named or even explicitly populated with parameters. They
said it was not a good idea. I asked for something else (can't remember what)
and they said the same. So ... it was fuel for me to get started on my own.

:-)

Rick C. Hodgin

unread,

Jan 21, 2014, 7:50:45 AM1/21/14

to

On Monday, January 20, 2014 5:53:19 PM UTC-5, glen herrmannsfeldt wrote:
> Seems to me that when most people do this, or at least when I do,
> I substitute the copy just before writing it out. You need a loop
> that goes through and writes out the lines, inside that loop copy
> each line to a line buffer, modify as appropriate, then write
> it out.
>

> [snip]

>
> Very little work to write.

Still requires that I write some code, and maintain some code, and have a function that leaves allocated memory blocks hanging around (inviting leaks unless I code to handle them all properly all of the time). It's more than "very little work" to do this when, in the alternative, the very definition of the thing you'd be copying in your example would be utilized in my example.

The compiler option removes all issues.

Rick C. Hodgin

unread,

Jan 21, 2014, 7:54:04 AM1/21/14

to

On Monday, January 20, 2014 5:37:41 PM UTC-5, Seebs wrote:

> On 2014-01-20, Rick C. Hodgin <rick...n@gmail.com> wrote:
> > On Monday, January 20, 2014 3:31:34 PM UTC-5, Eric Sosman wrote:
> >> But that was long ago: The uniqueness guarantee (and any
> >> accompanying mutability) was rescinded by the original ANSI C
> >> Standard way back in 1989.
> > And I bet you could hear the thuds on the floor as many developers
> > screamed "WHAT!" and then passed out.
>
> I don't think so. Back in the late 80s, when I was just starting to
> learn C, I was aware that if you had two string literals, and one was
> the same characters as the tail end of the other, the compiler might
> use the same storage for both. It's really easy to obtain a modifiable
> string if I want one, so I don't expect literals to be modifiable, or
> indeed, even to occur in code or storage anywhere if they don't really
> have to.

When most people encounter something they are in "learning how it works"
mode. They read and study and come to understand the system. However,
some people look at things differently desiring to understand the philosophy
of why something works. This gives them a different perspective than the
user, as they are more akin to a type of author, desiring to peer into the
inner workings and perform mental optimizations upon the design.

In my experience there are about 20% authors and 80% users in the developer
community. That number falls somewhat to 5% to 10% authors and 90% to 95%
users in certain types of developer houses (more long-term maintenance of
established applications, rather than new shops which are writing new apps).

My personal experience. YMMV.

Rick C. Hodgin

unread,

Jan 21, 2014, 7:58:31 AM1/21/14

to

On Monday, January 20, 2014 6:14:48 PM UTC-5, Seebs wrote:

> On 2014-01-20, Rick C. Hodgin <rick...n@gmail.com> wrote:
> > On Monday, January 20, 2014 2:10:40 PM UTC-5, glen herrmannsfeldt wrote:
> >> Oh, also, one of my favorite C features (Java also has), you
> >> can have the extra comma on the last line. Convenient for program
> >> generated text, though most likely in the standard as it allows
> >> for easy preprocessor conditionals.
> > See, and I think that such an "allowance" is patently absurd and should
> > not be a part of any language. :-)
>
> Why? It's really convenient as a way to eliminate a gratuitous special
> case.

Why do we have "void function(void)" when "function()" would work sufficiently at that level in a source file? It's explicitly so we convey that no mistake was given as by accidentally leaving out some parameters. We declare void to indicate "I purposefully intended to leave this empty, there are no return variables, there are no parameters," and so on.

It's the same here. "Oh, another comma ... was the developer finished? Was there supposed to be more? What is missing? What was left out? Please ... I need answers. I'm left hanging in a state of confusion that is disrupting my soul. Whatever do I do?"

Nobody needs that kind of stress in their life. :-)

Eric Sosman

unread,

Jan 21, 2014, 8:58:01 AM1/21/14

to

On 1/21/2014 6:57 AM, Rick C. Hodgin wrote:
> On Monday, January 20, 2014 4:32:04 PM UTC-5, Eric Sosman wrote:

>>[...]

>>>> Quoth the Rationale:
>>>> "String literals are not required to be modifiable. This
>>>> specification allows implementations to share copies of
>>>> strings with identical text, to place string literals in
>>>> [...] Those members of the C89 Committee who insisted that
>>>> string literals should be modifiable were content to have
>>>> this practice designated a common extension [...]"
>>> The word "common" being used very loosely there. LOL! :-)
>> See Appendix J.
>
> Appendix J of what?

Sorry; since you've been suggesting changes to the C language,
I supposed without justification that you were familiar with its
defining document. The appendix mentioned is in "ISO/IEC 9899:2011
Programming Language C," and is entitled "Portability Issues." The
relevant part is "J.5 Common Extensions," and the clause describing
the particular matter that upsets you is "J.5.5 Writeable String
Literals."

--
Eric Sosman
eso...@comcast-dot-net.invalid

James Kuyper

unread,

Jan 21, 2014, 9:48:17 AM1/21/14

to

On 01/21/2014 06:57 AM, Rick C. Hodgin wrote:
> On Monday, January 20, 2014 4:32:04 PM UTC-5, Eric Sosman wrote:
>> On 1/20/2014 3:42 PM, Rick C. Hodgin wrote:
>>>> But that was long ago: The uniqueness guarantee (and any
>>>> accompanying mutability) was rescinded by the original ANSI C
>>>> Standard way back in 1989.
>>
>>> And I bet you could hear the thuds on the floor as many developers
>>> screamed "WHAT!" and then passed out.

I doubt it - having lived through that period, I don't remember anyone
objecting to that decision. The biggest objections I've seen have been
in the other direction: string literals should have the type "const
char[n]", as they do in C++. As a result, they would automatically
convert to "const char*" in most contexts. If that were the case, most
code that accidentally tries to write to them would be a constraint
violation, requiring a diagnostic, which would make it a bit easier to
write correct code.

>> Did you miss the part about "Those members ... were content?"
>
> Not at all. I presumed "those members" involved in the ANSI authorizing were a small number, power-seeking representatives of the entire C language developer base of us "little people," while the much larger group (developers who were coding in C in general) contained many developers, and it was many of them who screamed "WHAT!" and then passed out.

"Those members" refers to the people on the committee "who insisted that
string literals should be modifiable". You paint an amazingly nasty
picture of those who agreed with your point of view on this issue.

The ANSI committee includes both people who implement C and people who
use C. I'm not sure exactly how many of each were present at, but I
believe that the implementors were actually less numerous, they
certainly were not in absolute control of the proceedings.
--
James Kuyper

James Kuyper

unread,

Jan 21, 2014, 10:01:11 AM1/21/14

to

On 01/20/2014 09:10 AM, Aleksandar Kuktin wrote:
> On Mon, 20 Jan 2014 04:53:33 -0800, Rick C. Hodgin wrote:
>
>> My current solution to do something along these lines during
>> initialization:
>> char* sourceCode[] =
>> {
>> "if (foo[9999] == 0) {\r\n",
>> " // Do something\r\n",
>> "} else if (foo[9999] == 1) {\r\n",
>> " // Do something else\r\n",
>> "} else {\r\n",
>> " // Do some other things\r\n", "}",
>> null
>> };

>>
>> int i, len;
>> char* ptr;
>> for (i = 0; list[i] != null; i++)
>> {
>> len = strlen(list[i]) + 1;
>> ptr = (char*)malloc(len); memcpy(ptr, list[0], len);
>> list[0] = ptr;
>> }
>
> And what, exactly, is wrong with the basic principle of this approach?
>

> I personally would have done something like this:
> char *read_only[] = { "Rick", "Jane", "Marc", 0 };
> char **read_write;
>
> char **init_readwrite(char **readonly) {
> unsigned int i, count;
> char **readwrite;
>
> for (count=0, i=0; readonly[i]; i++) {
> count++;
> }
> readwrite = malloc(count * sizeof(*readwrite));
> /* no check */
> return memcpy(read_write, read_only, count * sizeof(*readwrite));
> }
>
> read_write = init_readwrite(read_only);
>
> ...And then you operate on read_write and ignore read_only.

read_only is an array of pointers; the things that the pointers point at
are not modifiable. It should therefore, for safety, have been declared
as "const char*[]".

You allocate enough space to copy over all of the pointers to
read_write. Then you do copy them over. The new pointers in read_write
still point at the same locations as the ones in read_only; those
locations still cannot be safely written to, so nothing has been gained
by the copy. It is therefore incorrectly named.

That's why you need to create a deep copy, as in Rick's code. It copies
the strings themselves to memory that it guaranteed writable.
--
James Kuyper

James Kuyper

unread,

Jan 21, 2014, 11:09:03 AM1/21/14

to

On 01/20/2014 10:40 AM, Rick C. Hodgin wrote:

> On Monday, January 20, 2014 9:10:24 AM UTC-5, Aleksandar Kuktin wrote:
>> I personally would have done something like this:
>> char *read_only[] = { "Rick", "Jane", "Marc", 0 };
>> char **read_write;
>>
>> char **init_readwrite(char **readonly) {
>> unsigned int i, count;
>> char **readwrite;
>>
>> for (count=0, i=0; readonly[i]; i++) {
>> count++;
>> }
>> readwrite = malloc(count * sizeof(*readwrite));
>> /* no check */
>> return memcpy(read_write, read_only, count * sizeof(*readwrite));
>> }
>> read_write = init_readwrite(read_only);
>>
>> ...And then you operate on read_write and ignore read_only.
>

> I have not gone through this deeply or tried it in code, but I'm
thinking the theory of this solution would not work in all cases (and
that this particular implementation also will not work).
>
> Since each read_only[] pointer is to a constant string, and the
compiler creates the entry in read-only memory, it could optimize and
allow lines like "red" and "fred" to be mapped to the same four byte
area, one pointing to "f" and one pointing to "r" after "f". ...

That is quite true, but precisely because it is a statement about
pointers, it's not actually relevant.

> ... So making a
bulk copy would not copy all things properly in all cases.

This code just copies the pointers themselves, not the objects that they
point to, so the fact that the objects could be overlapping is not a
problem. The fact that it only copies the pointers IS a problem.

It would be possible to do this with a bulk copy only if there were some
way to ensure that all of the strings were stored in adjacent pieces of
memory:

char *read_only =
"if (something[9999])\r\n\0"
"{\r\n\0"
" // Do something\r\n\0"
"} else {\r\n\0"
" // Do something else\r\n\0"
"}";

Read that line carefully. Instead of having six separate string
literals, it has only a single string literal, containing six separate
non-overlapping strings, the first five terminated by explicit null
characters. This is done by the "magic" of string literal concatenation:
two consecutive string literals separated only by white space are
automatically merged into a single string.
You could do a bulk copy of those string literals, but the tricky part
would be figuring out how much space is needed. sizeof(read_only) gives
the size of the pointer. strlen(read_only) gives the length of the first
string, so neither is suitable. You'd have to iterate over all six
strings to find their total length, and you'll have to count them
manually, there's no way to use C constructs to determine that length
for you.

Much simpler would be the following:

char read_write[] =
"if (something[9999])\r\n\0"
"{\r\n\0"
" // Do something\r\n\0"
"} else {\r\n\0"
" // Do something else\r\n\0"
"}";

// Count the strings
int strings=0;
for(char *ptr = read_write; ptr < read_write + sizeof read_write;
ptr++)
if(*ptr = '\0')
strings++;

// Set up an array of pointers to the strings.
char **rw_ptrs = malloc(strings * sizeof *rw_ptrs);
if(rw_ptrs)
{
char *ptr = read_write;
for(int str = 0; str < strings; str++)
{
rw_ptrs[str] = ptr;
while(*ptr++); // move past end of string
}
}
--
James Kuyper

James Kuyper

unread,

Jan 21, 2014, 11:17:33 AM1/21/14

to

On 01/20/2014 12:39 PM, Rick C. Hodgin wrote:
> On Monday, January 20, 2014 11:32:10 AM UTC-5, Keith Thompson wrote:
>> "Rick C. Hodgin" <rick...n@gmail.com> writes:
...
>>> char* sourceCode[] =
>>> {
>>> "if (foo[9999]) {\r\n",
>>> " // Do something\r\n",
>>> "} else {\r\n",
>>> " // Do something else\r\n",
>>> "}\r\n"
>>> };
>>
>> This isn't relevant to your question, but why do you have explicit
>> "\r\n" line endings? If your program reads and/or writes the
>> source code in text mode (or if you're on a UNIX-like system),
>> line endings are marked by a single '\n' character, regardless of
>> the format used by the operating system.
>
> To be consistent with the source file input I'm processing. Without using both \r and \n it gives warnings when opening the files that the line endings are not consistent.

Do you open the input file and the output file in binary mode or text
mode? If you opened both in text mode, that shouldn't be happening,
unless you're reading an input file that has line endings that are
inconsistent with the conventions for your platform (for instance,
reading a file following Dos/Windows conventions on a Unix-like system).
--
James Kuyper

Rick C. Hodgin

unread,

Jan 21, 2014, 11:28:03 AM1/21/14

to

Quoted as Mr. Miyagi speaking, "Binary. Always open binary." :-)

It is code for a Windows system and it uses two character line endings.

Rick C. Hodgin

unread,

Jan 21, 2014, 11:30:34 AM1/21/14

to

> >> Did you miss the part about "Those members ... were content?"
> > Not at all. I presumed "those members" involved in the ANSI
> > authorizing were a small number, power-seeking representatives of
> > the entire C language developer base of us "little people," while
> > the much larger group (developers who were coding in C in general)
> > contained many developers, and it was many of them who screamed
> > "WHAT!" and then passed out.
>
> "Those members" refers to the people on the committee "who insisted that
> string literals should be modifiable".

Agreed. They represent a small number of the C developer base, however, who probably later read that it was added only as a "common extension."

> You paint an amazingly nasty picture of those who agreed with your
> point of view on this issue.

Well I apologize. I was only going for humor, not a literal conveyance of
what might have actually happened. :-)

> The ANSI committee includes both people who implement C and people who
> use C. I'm not sure exactly how many of each were present at, but I
> believe that the implementors were actually less numerous, they
> certainly were not in absolute control of the proceedings.

Makes sense.

James Kuyper

unread,

Jan 21, 2014, 11:31:07 AM1/21/14

to

On 01/21/2014 07:58 AM, Rick C. Hodgin wrote:
...

> Why do we have "void function(void)" when "function()" would work
> sufficiently at that level in a source file? It's explicitly so we
> convey that no mistake was given as by accidentally leaving out some
> parameters. We declare void to indicate "I purposefully intended to
> leave this empty, there are no return variables, there are no
> parameters," and so on.

Backwards compatibility. In C, as originally designed, there were no
function prototypes. When the language was first standardized,
prototypes were added. If the committee had followed your suggestion,
virtually all existing code would have had to be rewritten in order to
compile correctly. That was, at the time of standardization, already
billions of lines of code world-wide. It's easy to say "don't worry
about having to re-write all the old code" - unless you're the one who
has to re-write it.

Instead, the committee decided to allow the old syntax to continue
meaning what it used to mean: the function takes an unknown but fixed
number of arguments (as opposed to variadic functions, which take an
unknown and variable number of arguments). As a result, a different
syntax was needed to specify that the function doesn't take any arguments.

You seem to think the committee was dominated by implementors. If that
were true, this decision is hard to explain - it requires all
implementors to handle two different syntaxes for function declarations
- it would have been much simpler to mandate only one. This is a
concession to the needs of those who write programs in C, at the expense
of those who write C implementations.

> It's the same here. "Oh, another comma ... was the developer
> finished? Was there supposed to be more? What is missing? What was
> left out? Please ... I need answers. I'm left hanging in a state of
> confusion that is disrupting my soul. Whatever do I do?"

I don't use that feature, and I don't like it. However, this feature
simplifies the creation of machine-generated C code, and people who
write such generators are apparently sufficiently numerous that the
committee felt a need to accommodate their desires.
--
James Kuyper

Rick C. Hodgin

unread,

Jan 21, 2014, 11:36:43 AM1/21/14

to

On Tuesday, January 21, 2014 11:31:07 AM UTC-5, James Kuyper wrote:
> On 01/21/2014 07:58 AM, Rick C. Hodgin wrote:
> > Why do we have "void function(void)" when "function()" would work
> > sufficiently at that level in a source file? It's explicitly so we
> > convey that no mistake was given as by accidentally leaving out some
> > parameters. We declare void to indicate "I purposefully intended to
> > leave this empty, there are no return variables, there are no
> > parameters," and so on.
>
> Backwards compatibility. In C, as originally designed, there were no
> function prototypes. When the language was first standardized,
> prototypes were added. If the committee had followed your suggestion,
> virtually all existing code would have had to be rewritten in order to
> compile correctly. That was, at the time of standardization, already
> billions of lines of code world-wide. It's easy to say "don't worry
> about having to re-write all the old code" - unless you're the one who
> has to re-write it.
>
> Instead, the committee decided to allow the old syntax to continue
> meaning what it used to mean: the function takes an unknown but fixed
> number of arguments (as opposed to variadic functions, which take an
> unknown and variable number of arguments). As a result, a different
> syntax was needed to specify that the function doesn't take any arguments.

I'm not sure I would've been keen on that idea. I would rather have maintained it as a deprecated functionality that would have been slated to be removed in a few version releases. The old compilers could've generated object code in a particular version of a compiler that could be maintained for backward compatibility without negating the language in moving forward. My opinion. :-)

> You seem to think the committee was dominated by implementors. If that
> were true, this decision is hard to explain - it requires all
> implementors to handle two different syntaxes for function declarations
> - it would have been much simpler to mandate only one. This is a
> concession to the needs of those who write programs in C, at the expense
> of those who write C implementations.

I don't think that. I honestly was going for humor. Nothing more. However, I would argue that the people involved in the decision were the ones who wanted to have their voice heard because they felt a particular way about it. I say that because it's more or less that way in everything.

> > It's the same here. "Oh, another comma ... was the developer
> > finished? Was there supposed to be more? What is missing? What was
> > left out? Please ... I need answers. I'm left hanging in a state of
> > confusion that is disrupting my soul. Whatever do I do?"
>
> I don't use that feature, and I don't like it. However, this feature
> simplifies the creation of machine-generated C code, and people who
> write such generators are apparently sufficiently numerous that the
> committee felt a need to accommodate their desires.

Makes sense. I still would've opted for the deprecated allowance and phased it out over time.

James Kuyper

unread,

Jan 21, 2014, 12:03:10 PM1/21/14

to

On 01/21/2014 07:47 AM, Rick C. Hodgin wrote:
>>> char defaultOption = "4";
>> Did you mean "char *defaultOption" or "char defaultOption[]" rather than
>> "char defaultOptions", or did you mean '4' rather than "4"?
>
> I meant char defaultOption[] = "4";

Then you've already got your wish; defaultOption contains a writable
string. It's perfectly legal to overwrite it with:
strcpy(defaultOption, "5");

If you had chosen
char defaultOption = '4';
then you could have modified it with
defaultOption = '5';

If you had chosen
const char *defaultOption = "4";
then you could modify with with
defaultOption = "5";

...

> FWIW, I don't believe in defining variables in this way in C. I believe an
> initialization block should exist so it is being done explicitly, both for
> documentation purposes, and clarity in reading the source code (it's very
> easy to miss a nested declaration when a group of variables is created of
> a similar type.

I find that both documentation and clarity is best served by defining
each variable with the smallest scope that is consistent with the way it
will be used (except that I will not create a separate compound
statement for the sole purpose of more tightly constraining the scope -
that would require a separate compound statement for each variable; that
way lies madness).
Among other benefits, that approach minimizes the distance I have to
search for the definition of the variable (since such search normally
starts at a point where the variable is being used).

> I propose new extensions to C in general:
>
> in (thread_name) {
> // Do some code in this thread
> } and in (other_thread_name) {
> // Do some code in this thread
> }

I'm curious - are you familiar with the threading support that was added
to C2011? It's not at all similar to your way of handling threads, but
it does have the advantage of being based upon existing common practice.
As a result, it can be implemented as a thin wrapper over many existing
threading systems, such as those provided by POSIX or Windows. It
requires a somewhat thicker wrapper on other, more exotic threading
systems, but it should be widely implementable.

I doubt that your proposal could be modified in a way that's
sufficiently compatible with C2011's threading support to be a fully
conforming extension - but I haven't studied either one sufficiently
throughly to be sure of that.
--
James Kuyper

James Kuyper

unread,

Jan 21, 2014, 12:14:16 PM1/21/14

to

On 01/21/2014 11:28 AM, Rick C. Hodgin wrote:
> On Tuesday, January 21, 2014 11:17:33 AM UTC-5, James Kuyper wrote:

...

>> Do you open the input file and the output file in binary mode or text
>> mode? If you opened both in text mode, that shouldn't be happening,
>> unless you're reading an input file that has line endings that are
>> inconsistent with the conventions for your platform (for instance,
>> reading a file following Dos/Windows conventions on a Unix-like system).
>
> Quoted as Mr. Miyagi speaking, "Binary. Always open binary." :-)

That's a bad idea, if you're reading and writing text files - that's
what text mode is for. It makes your code less portable. Even for code
intended exclusively for a platform that uses a specific method of
handling line endings, if that method is anything other than '\n' (as
is, in fact, the case on your system), it just makes more work for yourself.
--
James Kuyper

Aleksandar Kuktin

unread,

Jan 21, 2014, 12:15:09 PM1/21/14

to

On Tue, 21 Jan 2014 10:01:11 -0500, James Kuyper wrote:

> On 01/20/2014 09:10 AM, Aleksandar Kuktin wrote:
>>

>> [snip]

>>
>> char *read_only[] = { "Rick", "Jane", "Marc", 0 };
>> char **read_write;
>>
>> char **init_readwrite(char **readonly) {
>> unsigned int i, count;
>> char **readwrite;
>>
>> for (count=0, i=0; readonly[i]; i++) {
>> count++;
>> }
>> readwrite = malloc(count * sizeof(*readwrite));
>> /* no check */
>> return memcpy(read_write, read_only, count * sizeof(*readwrite));
>> }
>>
>> read_write = init_readwrite(read_only);
>>
>> ...And then you operate on read_write and ignore read_only.
>
> read_only is an array of pointers; the things that the pointers point at
> are not modifiable. It should therefore, for safety, have been declared
> as "const char*[]".
>
> You allocate enough space to copy over all of the pointers to
> read_write. Then you do copy them over. The new pointers in read_write
> still point at the same locations as the ones in read_only; those
> locations still cannot be safely written to, so nothing has been gained
> by the copy. It is therefore incorrectly named.

Correct. I also made a slight error because I didn't copy the terminating
zero.

My trigger-happines got the better of me again. My understanding was that
the OP had wanted a list of lines and that he wanted to exchange the
lines accoding to some rule.

Only later did I realize he actually wants what amounts to run-time macro
expansion. That, obviously, requires a different approach...

> That's why you need to create a deep copy, as in Rick's code. It copies
> the strings themselves to memory that it guaranteed writable.

...this being one of the better suited ones. Another possibility would be
to malloc() a big flat buffer, copy the lines into it (possibly doing
macro expansion while copying) and manipulate it as if it were a mmap()-
ed file. A change in macro expansion rules can be effected by re-copying-
and-expanding the lines.

Rick C. Hodgin

unread,

Jan 21, 2014, 12:18:36 PM1/21/14

to

On Tuesday, January 21, 2014 12:03:10 PM UTC-5, James Kuyper wrote:
> On 01/21/2014 07:47 AM, Rick C. Hodgin wrote:
> >>> char defaultOption = "4";
> >> Did you mean "char *defaultOption" or "char defaultOption[]" rather than
> >> "char defaultOptions", or did you mean '4' rather than "4"?
> > I meant char defaultOption[] = "4";
>
> Then you've already got your wish; defaultOption contains a writable
> string.

I was using this an example of me creating a writable string. My issue relates to the identical encoding syntax used for the string contents itself, but that it exists in another place in this:

char* list[] = { "one", "two", "three", null };

In the defaultOption I encode a literal that is not a constant. In list I encode three literals that are constants. In both of them I use double-quote, text, double-quote, but the compiler makes the items in the list[] array of pointers all read-only/constant by convention. That is my issue.

My example was only to demonstrate that read-write strings are encoded the same way as constant strings.

> ...
> > FWIW, I don't believe in defining variables in this way in C. I believe an
> > initialization block should exist so it is being done explicitly, both for
> > documentation purposes, and clarity in reading the source code (it's very
> > easy to miss a nested declaration when a group of variables is created of
> > a similar type.
>
> I find that both documentation and clarity is best served by defining
> each variable with the smallest scope that is consistent with the way it
> will be used (except that I will not create a separate compound
> statement for the sole purpose of more tightly constraining the scope -
> that would require a separate compound statement for each variable; that
> way lies madness).

I find that both documentation and clarity is best served by defining

all variables at a common location, and then using a GUI which has smart
windowing or hover abilities to indicate from where it came.

> Among other benefits, that approach minimizes the distance I have to
> search for the definition of the variable (since such search normally
> starts at a point where the variable is being used).

The GUI minimizes the distance involved by creating a constant lookup
window that shows all code definitions when they're needed. In addition,
add-on tools like Whole Tomato's Visual Assist X (for use in Visual
Studio) allows Ctrl+Alt+F to find all references, showing other source
code line uses, etc.

We're beyond the days of text-based editors. :-)

> > I propose new extensions to C in general:
> > in (thread_name) {
> > // Do some code in this thread
> > } and in (other_thread_name) {
> > // Do some code in this thread
> > }
>
> I'm curious - are you familiar with the threading support that was added
> to C2011?

No. I'm not familiar (I'm sure) with the prior standards either. I know how to program in C and I just do so ... yet without knowing standards. :-)

> It's not at all similar to your way of handling threads, but
> it does have the advantage of being based upon existing common practice.
> As a result, it can be implemented as a thin wrapper over many existing
> threading systems, such as those provided by POSIX or Windows. It
> requires a somewhat thicker wrapper on other, more exotic threading
> systems, but it should be widely implementable.

Interesting. My design logic comes from looking ahead. We've moved from
single-core systems to multi-core, and soon we will have many-core. These
will be large core CPUs without much processing power per thread, but the
ability to do a lot of work in parallel. As such, within a single function
there will be the need to do a lot of thread-level parallelism, such as
being able to schedule both branches of an IF ahead of knowing the results,
provided all dependencies are satisfied, so that by the time the results are
known the branch has already been taken. This kind of micro-threading will
be made possible by having many cores (64+) that can work in parallel easily.
The language and OS must work in harmony to provide these facilities so that
they can be spawned, executed, and terminated within a minimal amount of
clock cycles.

I remember in the old days the 8087 FPUs used to monitor all 8086 CPU instructions and ignore them, as the 8086 monitored all 8087 instructions and
ignored them. Perhaps something similar is required, but with peek-ahead
setting such as using a new JMPWAIT instruction which causes a particular
thread to jump ahead to a location and wait for the "controlling thread" to
catch up, and then it kicks off, so the cache is already filled, memory reads
have been made, etc.

In any event, that's the logic behind my threading model.

James Kuyper

unread,

Jan 21, 2014, 12:19:32 PM1/21/14

to

On 01/21/2014 11:36 AM, Rick C. Hodgin wrote:
> On Tuesday, January 21, 2014 11:31:07 AM UTC-5, James Kuyper wrote:
>> On 01/21/2014 07:58 AM, Rick C. Hodgin wrote:

...

>>> It's the same here. "Oh, another comma ... was the developer
>>> finished? Was there supposed to be more? What is missing? What was
>>> left out? Please ... I need answers. I'm left hanging in a state of
>>> confusion that is disrupting my soul. Whatever do I do?"
>>
>> I don't use that feature, and I don't like it. However, this feature
>> simplifies the creation of machine-generated C code, and people who
>> write such generators are apparently sufficiently numerous that the
>> committee felt a need to accommodate their desires.
>
> Makes sense. I still would've opted for the deprecated allowance and phased it out over time.

The combination of those two sentences doesn't work. If it ever made
sense to accommodate their needs, then it still makes sense - machine
generated code is at least as popular as it has ever been, possibly more
so. Deprecating that allowance would only make sense if the allowance
itself doesn't make sense.
--
James Kuyper

Rick C. Hodgin

unread,

Jan 21, 2014, 12:22:12 PM1/21/14

to

> >> Do you open the input file and the output file in binary mode or text
> >> mode?

> > Quoted as Mr. Miyagi speaking, "Binary. Always open binary." :-)
>
> That's a bad idea, if you're reading and writing text files - that's
> what text mode is for.

I don't know how you do it, but most of my text processing is on source code
files. I use the terminating end-of-line characters to break out lines
during the read as I create structures.

Binary. Always binary.

> It makes your code less portable. Even for code intended exclusively for
> a platform that uses a specific method of handling line endings, if that
> method is anything other than '\n' (as is, in fact, the case on your
> system), it just makes more work for yourself.

Windows uses two-character line endings. It's pretty universal.

The work I have in this method is more on token parsing, identifying groups
of related characters, and so on. I process through every file I load byte
by byte anyway ... it's no more work to have a token which identifies line-
ending characters. It's actually just a setting in my token lookup logic,
which is a series of related structures which are parsed by a small engine
which goes through the source file identifying everything it can into known
groups, later parsed out into known tokens, later parsed out into known logic,
whereby errors are reported.

It works quite well. :-) Binary. Always binary. :-)

Rick C. Hodgin

unread,

Jan 21, 2014, 12:27:01 PM1/21/14

to

On Tuesday, January 21, 2014 12:19:32 PM UTC-5, James Kuyper wrote:
> > Makes sense. I still would've opted for the deprecated allowance
> > and phased it out over time.
>
> The combination of those two sentences doesn't work. If it ever made
> sense to accommodate their needs, then it still makes sense

Your explanation as to why it was setup the way it was makes sense. I still
would've opted for it to not continue over time, but to be allowed for some
while, and as I say a prior version of a compiler could be used to parse those
old generated files with a common object format that could be linked together
for the foreseeable future, just by maintaining compatibility with that obj
file format.

> - machine
> generated code is at least as popular as it has ever been, possibly more
> so. Deprecating that allowance would only make sense if the allowance
> itself doesn't make sense.

I doubt people today are using the same generated source code files they were
back then. And I would still argue that it's nothing short of a catering hack
to include the ability to allow buggy code generator logic to pass through.

I'm sorry, but it's absolutely lame. It was a lame decision, in my opinion,
and whereas I probably would've opted to allow it for a time as through a
newly branded "deprecated feature," I would not have allowed it in moving
forward. We are better than that as (1) human beings, (2) developers, and
(3) all of our products should be better than that as well. We don't cater
to bad designs for bad reasons over the long term. If we need a Grandfather
Clause to get us by for a time, that's one thing ... but we don't keep it
going when it was a catering hack from the start.

All of this is my opinion. YMMV.

Keith Thompson

unread,

Jan 21, 2014, 12:41:07 PM1/21/14

to

"Rick C. Hodgin" <rick.c...@gmail.com> writes:
> On Monday, January 20, 2014 4:32:04 PM UTC-5, Eric Sosman wrote:
>> On 1/20/2014 3:42 PM, Rick C. Hodgin wrote:
>> >> But that was long ago: The uniqueness guarantee (and any
>> >> accompanying mutability) was rescinded by the original ANSI C
>> >> Standard way back in 1989.
>>
>> > And I bet you could hear the thuds on the floor as many developers
>> > screamed "WHAT!" and then passed out.
>> Did you miss the part about "Those members ... were content?"
>
> Not at all. I presumed "those members" involved in the ANSI
> authorizing were a small number, power-seeking representatives of the
> entire C language developer base of us "little people," while the much
> larger group (developers who were coding in C in general) contained
> many developers, and it was many of them who screamed "WHAT!" and then
> passed out.

(Extremely long line reformatted. Can you find a way to *consistently*
format your articles with shorter lines?)

I'll assume that

screamed "WHAT!" and then passed out

is not meant to be taken literally. But if a substantial number of C
programmers were greatly upset by the decision to make string literals
effectively read-only, I presume you can provide evidence. Please do
so.

And I believe your characterization of the members of the ANSI C
committee is inaccurate.

[...]

>> See Appendix J.
>
> Appendix J of what?

Of the ISO C standard, a recent draft of which can be downloaded at no
charge from http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

James Kuyper

unread,

Jan 21, 2014, 12:48:01 PM1/21/14

to

On 01/21/2014 12:22 PM, Rick C. Hodgin wrote:
...

> I don't know how you do it, but most of my text processing is on source code
> files. I use the terminating end-of-line characters to break out lines
> during the read as I create structures.

Yes, and that's (very marginally) easier to do if you only have to look
for '\n' rather than '\r\n' - which would be the case if you used text
mode rather that binary mode. Your input files file could still have
'\r\n', and your output files would still have '\r\n': text mode takes
care of those things for you automatically. However, internal to your
program you can match those sequences just by searching for '\n'.

...

> Windows uses two-character line endings. It's pretty universal.

It's not even close to universal - it's specific to DOS/Windows and few
other places.

> The work I have in this method is more on token parsing, identifying groups
> of related characters, and so on. I process through every file I load byte
> by byte anyway ... it's no more work to have a token which identifies line-

> ending characters. ...

Actually, it is more work than that, as this thread has already shown.
As a side effect of that decision, you've had to type extra '\r'
characters in your string literals. As far as I can see, the sole affect
of your decision is that you have to occasionally type "\r\n" where you
otherwise would have been able to type "\n". You haven't identified a
single compensating advantage, not even one tiny enough to make up for
that admittedly very minor disadvantage.

> ... It's actually just a setting in my token lookup logic,

> which is a series of related structures which are parsed by a small engine
> which goes through the source file identifying everything it can into known
> groups, later parsed out into known tokens, later parsed out into known logic,
> whereby errors are reported.

And it wouldn't make your program any more complicated (slightly less
so, in fact) to re-write it to work with text mode. If you did so, it
would continue to work, without modification to the line ending token,
even if someone someday decides to try porting it to a platform using a
different convention for line endings.

> It works quite well. :-) Binary. Always binary. :-)

That's a recipe for locking your code to a single platform. If it
simplified your code in any way, that might make sense for code that you
are certain will never be ported anywhere else (though such certainty is
often delusional). However, it actually makes your code (very slightly)
more complicated.
--
James Kuyper

Keith Thompson

unread,

Jan 21, 2014, 12:55:45 PM1/21/14

to

"Rick C. Hodgin" <rick.c...@gmail.com> writes:

That's fine; if you don't want to write code like that, you don't have
to. But I didn't ask how you'd re-write it; I asked how *that code*
should behave.

You're proposing (I think) a change to the language. That change would
affect compiler writers as well as developers. A compiler writer needs
to do *something* with the specific code I wrote above.

The language definition can either:

1. Define clearly how the code behaves;

2. State that the behavior is unspecified, undefined, or
implementation-defined); or

3. Introduce a new rule making the above code a constraint violation or
syntax error.

Which of those do you advocate? (Any suggestion that declarations in
nested blocks should be banned is a non-starter; that's been a feature
of C and its predecessors as far back as I can find documentation, and a
lot of existing code depends on it.)

[...]

>> In C as it's currently defined, the string literal "hello" corresponds
>> to an anonymous array object with static storage duration; attempting to
>> modify it has undefined behavior. As I understand it, you want to
>> remove the second part of that. The above code has one occurrence of a
>> string literal, but it's being used in the initializer for two distinct
>> objects. On the second iteration, does s point to a string with
>> contents "hello" or "Hello"?
>>
>> Either interpretation is problematic.
>
> Exactly. So, you don't code that way. :-)

I don't code that way because string literals are read-only.

> You make everything a function-level variable and it's done. You make
> all code items read/write unless they are explicitly prefixed with a
> const or have some macro wrapper like _rw("foo") or _fo("foo") to
> explicitly name them.

Macros do not add functionality. They have to expand to *something*.

>> >> The fact that such checks are not required is for backwards
>> >> compability for code written before the "const" keyword was added to the
>> >> language; even if not for that, C tends to make such things undefined
>> >> behavior rather than requiring run-time diagnostics.
>>
>> > Well ... there's logic there. It makes sense. I think it's time for a
>> > switchover though. We're getting into multi-processor programming, multiple
>> > threads. Where we are in 2010s and later is not where we were in 1980s.
>>
>> I fail to see how this argues for modifiable string literals.
>
> Not just string literals, but a separation of the "before" and the "after."
> Programming today is, by default, targeted at multiple CPUs. There are
> functions which run top-down, but on the whole we are creating
> multi-threaded congruent code execution engines running on commensurate
> hardware. The time for a new language syntax is at hand.
>
> I propose new extensions to C in general:
>
> in (thread_name) {
> // Do some code in this thread
> } and in (other_thread_name) {
> // Do some code in this thread
> }
>
> And a new tjoin keyword to join threads before continuing:
> tjoin this, thread_name, other_thread_name

Are you aware that the 2011 ISO C standard includes a specification of
threading? I suggest you study it before proposing changes.

[SNIP]

> And I have other ideas. You can read about them on this page. This page
> specifically relates to extensions to Visual FoxPro, but my intention is
> my RDC (Rapid Development Compiler) which is C-like, but relaxes a lot of
> stringent errors in C reporting them only as warnings, such as pointer-to-
> pointer conversions, allowing for them to be perfectly valid, and many
> other changes as well.
>
> http://www.visual-freepro.org/wiki/index.php/VXB++

So your problem with C is that it's too stringent. Hmmm.

[...]

> I don't expect to get anywhere trying to change anything in C. :-) It's why
> I'm moving to my own language. I hit the GCC group a while back asking for
> the self() extension, which allows recursion to the current function without
> it being explicitly named or even explicitly populated with parameters. They
> said it was not a good idea. I asked for something else (can't remember what)
> and they said the same. So ... it was fuel for me to get started on my own.

Good luck with that. But if you're giving up on C and inventing your
own new language, comp.lang.c is not the best place to discuss it.

Rick C. Hodgin

unread,

Jan 21, 2014, 12:56:18 PM1/21/14

to

On Tuesday, January 21, 2014 12:48:01 PM UTC-5, James Kuyper wrote:
> On 01/21/2014 12:22 PM, Rick C. Hodgin wrote:
> > I don't know how you do it, but most of my text processing is on source code
> > files. I use the terminating end-of-line characters to break out lines
> > during the read as I create structures.
>
> Yes, and that's (very marginally) easier to do if you only have to look
> for '\n' rather than '\r\n' - which would be the case if you used text
> mode rather that binary mode. Your input files file could still have
> '\r\n', and your output files would still have '\r\n': text mode takes
> care of those things for you automatically. However, internal to your
> program you can match those sequences just by searching for '\n'.

My algorithm actually looks for either ASCII-10 or ASCII-13, in any order, and
then looks at the next character. If it's the corresponding char, as in "\r\n"
or "\n\r" then it accepts that as a line feed. If it doesn't, then it reads it
in as a single-character line ending and continues processing. This allows
combinations like "\r\r" or "\n\n" to be recognized as two lines.

> ...
> > Windows uses two-character line endings. It's pretty universal.
> It's not even close to universal - it's specific to DOS/Windows and few
> other places.

It's pretty universal ... "in Windows."

> > The work I have in this method is more on token parsing, identifying groups
> > of related characters, and so on. I process through every file I load byte
> > by byte anyway ... it's no more work to have a token which identifies line-
> > ending characters. ...
>
> Actually, it is more work than that, as this thread has already shown.
> As a side effect of that decision, you've had to type extra '\r'
> characters in your string literals. As far as I can see, the sole affect
> of your decision is that you have to occasionally type "\r\n" where you
> otherwise would have been able to type "\n". You haven't identified a
> single compensating advantage, not even one tiny enough to make up for
> that admittedly very minor disadvantage.

I don't have to type in the extra "\r", but I choose to do so because when we
do periodically bring up the text files in editors that care about the line
ending combination, it doesn't generate that error. It works perfectly fine
with or without the "\r" ... it's just done as a nicety.

> > ... It's actually just a setting in my token lookup logic,
> > which is a series of related structures which are parsed by a small engine
> > which goes through the source file identifying everything it can into known
> > groups, later parsed out into known tokens, later parsed out into known
> > logic, whereby errors are reported.
>
> And it wouldn't make your program any more complicated (slightly less
> so, in fact) to re-write it to work with text mode. If you did so, it
> would continue to work, without modification to the line ending token,
> even if someone someday decides to try porting it to a platform using a
> different convention for line endings.

Perhaps. But it's not important enough for me for it to be an issue. I've
already coded for all combinations of line endings. That part is coded. At
this point it would be more work for me to retrofit it. :-)

> > It works quite well. :-) Binary. Always binary. :-)
> That's a recipe for locking your code to a single platform. If it
> simplified your code in any way, that might make sense for code that you
> are certain will never be ported anywhere else (though such certainty is
> often delusional). However, it actually makes your code (very slightly)
> more complicated.

I use binary files with this logic on Linux as well. It works the same. My
logic accounts for combinations that I've seen, so any combination of \r or
\n, repeated or not repeated, all parses out properly.

James Kuyper

unread,

Jan 21, 2014, 1:00:34 PM1/21/14

to

On 01/21/2014 12:27 PM, Rick C. Hodgin wrote:
> On Tuesday, January 21, 2014 12:19:32 PM UTC-5, James Kuyper wrote:
>>> Makes sense. I still would've opted for the deprecated allowance
>>> and phased it out over time.
>>
>> The combination of those two sentences doesn't work. If it ever made
>> sense to accommodate their needs, then it still makes sense
>
> Your explanation as to why it was setup the way it was makes sense. I still
> would've opted for it to not continue over time, but to be allowed for some

> while, ...

If you're only going to deprecate it now, and remove it later, why even
create it in the first place? If it makes sense to remove it, it makes
even more sense not to create it. The arguments in favor of the allowing
the extra comma were not time dependent - it wasn't a matter of old
legacy code, but of current desires of those who like to write source
code generators.

> ... and as I say a prior version of a compiler could be used to parse those

> old generated files with a common object format that could be linked together
> for the foreseeable future, just by maintaining compatibility with that obj
> file format.

I has nothing to do with object files; the same object file will be
created with or without the extra comma.

>> - machine
>> generated code is at least as popular as it has ever been, possibly more
>> so. Deprecating that allowance would only make sense if the allowance
>> itself doesn't make sense.
>
> I doubt people today are using the same generated source code files they were
> back then.

I'm sure that some are; code has a long range of lifetimes, and some
code that is still in use has been around a lot longer than you seem to
consider likely. However, more important to the argument I was
describing (NOT endorsing!) is the fact that new source code generators
are being created all the time, and newly generated source code is also
being created. If the desires of the creators of those generators are
not worth catering to now, then it never made sense to cater to those
desires (I might have some sympathy with that conclusion - but you're
the one who said it "makes sense", not me).
--
James Kuyper

Rick C. Hodgin

unread,

Jan 21, 2014, 1:22:35 PM1/21/14

to

On Tuesday, January 21, 2014 12:55:45 PM UTC-5, Keith Thompson wrote:
> >> In your proposed C-like language, what would this snippet print?
> >> for (int i = 0; i < 2; i ++) {
> >> char *s = "hello";
> >> if (i == 0) {
> >> s[0] = 'H';
> >> }
> >> puts(s);
> >> }
> >

> > In my proposed language, it would print "Hello" both times because the
> > char* s definition would've been pulled out of the loop and defined as a
> > function variable.
>
> That's fine; if you don't want to write code like that, you don't have
> to. But I didn't ask how you'd re-write it; I asked how *that code*
> should behave.

I answered you. How should it behave?

In my compiler, I would pull the variable out and make it a function-variable
defined at the top, so it would've been altered the first time through and
both times would print Hello.

> You're proposing (I think) a change to the language. That change would
> affect compiler writers as well as developers. A compiler writer needs
> to do *something* with the specific code I wrote above.

No ... I'm creating my own new language, RDC, which is C-like, but dumps a
lot of what I view as "hideous baggage left over from a bygone era" ... while
also adding a lot of new features I see as looking to the future of multiple
cores, GUI developer environments, touch screens, eventual 3D interfaces, and
more.

> The language definition can either:
> 1. Define clearly how the code behaves;
> 2. State that the behavior is unspecified, undefined, or
> implementation-defined); or
> 3. Introduce a new rule making the above code a constraint violation or
> syntax error.
> Which of those do you advocate? (Any suggestion that declarations in
> nested blocks should be banned is a non-starter; that's been a feature
> of C and its predecessors as far back as I can find documentation, and a
> lot of existing code depends on it.)

I choose 1 in general, with a periodic injection of 2.

> [...]
> >> In C as it's currently defined, the string literal "hello" corresponds
> >> to an anonymous array object with static storage duration; attempting to
> >> modify it has undefined behavior. As I understand it, you want to
> >> remove the second part of that. The above code has one occurrence of a
> >> string literal, but it's being used in the initializer for two distinct
> >> objects. On the second iteration, does s point to a string with
> >> contents "hello" or "Hello"?
>
> >> Either interpretation is problematic.
> > Exactly. So, you don't code that way. :-)
> I don't code that way because string literals are read-only.

Well ... They shouldn't be. :-)

> > You make everything a function-level variable and it's done. You make
> > all code items read/write unless they are explicitly prefixed with a
> > const or have some macro wrapper like _rw("foo") or _fo("foo") to
> > explicitly name them.
> Macros do not add functionality. They have to expand to *something*.

Call it something else then. I would introduce the cask and be done:

char* list[] = { (|rw|)"foo" };

The (|rw|) cask is injected as a single override in the GUI, and indicates
that the following token is to be read/write.

> > The time for a new language syntax is at hand.
> > I propose new extensions to C in general:
> > in (thread_name) {
> > // Do some code in this thread
> > } and in (other_thread_name) {
> > // Do some code in this thread
> > }
> >
> > And a new tjoin keyword to join threads before continuing:
> > tjoin this, thread_name, other_thread_name
>
> Are you aware that the 2011 ISO C standard includes a specification of
> threading? I suggest you study it before proposing changes.

I'm not proposing changes to C. These are my extensions to a C-like language
called RDC (Rapid Development Compiler) which is very C-like, but it is not C.

> > And I have other ideas. You can read about them on this page. This page
> > specifically relates to extensions to Visual FoxPro, but my intention is
> > my RDC (Rapid Development Compiler) which is C-like, but relaxes a lot of
> > stringent errors in C reporting them only as warnings, such as pointer-to-
> > pointer conversions, allowing for them to be perfectly valid, and many
> > other changes as well.
> > http://www.visual-freepro.org/wiki/index.php/VXB++
>
> So your problem with C is that it's too stringent. Hmmm.

In some areas, yes. I also don't believe in going deeper than a pointer to
a pointer. I think if you're coding further out than that you're probably
doing something wrong.

I come from an assembly background, and I desire to give the developers a full
set of tools they can use, leaving them, as competent human beings and skilled
developers, to best make the use of those tools. Compiler warnings will exist
where many errors do today ... but so long as there's logic in what's being
done, as in:

int i;
int* iptr;
char* cptr;

i = 5;
iptr = &i;
cptr = iptr; // No cast, no error, because it's simply pointer to pointer
// and valid, but the compiler would generate a warning.
*cptr = '2';

My compiler won't force a cast. It will generate a warning, but no more than
that. :-) The developer should have all of the tools necessary, and without
clunky syntax hoops to jump through (unions galore, and so on).

Only if something weird happens will they get an error:
*cptr = "Hello there, Billy!";

Error! :-)

> > I don't expect to get anywhere trying to change anything in C. :-) It's why
> > I'm moving to my own language. I hit the GCC group a while back asking for
> > the self() extension, which allows recursion to the current function without
> > it being explicitly named or even explicitly populated with parameters.
> > They
> > said it was not a good idea. I asked for something else (can't remember
> > what)
> > and they said the same. So ... it was fuel for me to get started on my own.
>
> Good luck with that. But if you're giving up on C and inventing your
> own new language, comp.lang.c is not the best place to discuss it.

Agreed. This is all just back story as to why I desire to have read/write
string literals in all cases unless explicitly cast as const.

Rick C. Hodgin

unread,

Jan 21, 2014, 1:27:58 PM1/21/14

to

On Tuesday, January 21, 2014 1:00:34 PM UTC-5, James Kuyper wrote:
> On 01/21/2014 12:27 PM, Rick C. Hodgin wrote:
> > On Tuesday, January 21, 2014 12:19:32 PM UTC-5, James Kuyper wrote:
> >>> Makes sense. I still would've opted for the deprecated allowance
> >>> and phased it out over time.
> >> The combination of those two sentences doesn't work. If it ever made
> >> sense to accommodate their needs, then it still makes sense
> > Your explanation as to why it was setup the way it was makes sense. I
> > still would've opted for it to not continue over time, but to be
> > allowed for some while, ...
> If you're only going to deprecate it now, and remove it later, why even
> create it in the first place? If it makes sense to remove it, it makes
> even more sense not to create it. The arguments in favor of the allowing
> the extra comma were not time dependent - it wasn't a matter of old
> legacy code, but of current desires of those who like to write source
> code generators.

I would've deprecated it back then had I been the decision maker. Today I
will not support it. In the future, if I change my mind on something that I
initially introduce, I will deprecate it and phase it out over time, but I
won't phase out anything unless there's some exceedingly valid reason to do
so, such as we've moved from binary computers to quantum computers, or such.
Would have to be major.

> > ... and as I say a prior version of a compiler could be used to parse
> > those old generated files with a common object format that could be linked
> > together for the foreseeable future, just by maintaining compatibility
> > with that obj file format.
> I has nothing to do with object files; the same object file will be
> created with or without the extra comma.

Yes, but the parsing engine is the compiler, which would've read that syntax
in the beginning, and generated the object file. That old version of the
compiler that supported the extra comma syntax could be used well into the
future as new compilers are written which handle it without the extra comma
allowance. In that way, legacy code that cannot be changed can still be
supported through the object file format of the code generated by the compiler
which supported it.

> >> - machine
> >> generated code is at least as popular as it has ever been, possibly more
> >> so. Deprecating that allowance would only make sense if the allowance
> >> itself doesn't make sense.
> > I doubt people today are using the same generated source code files
> > they were back then.
> I'm sure that some are; code has a long range of lifetimes, and some
> code that is still in use has been around a lot longer than you seem to
> consider likely. However, more important to the argument I was
> describing (NOT endorsing!) is the fact that new source code generators
> are being created all the time, and newly generated source code is also
> being created. If the desires of the creators of those generators are
> not worth catering to now, then it never made sense to cater to those
> desires (I might have some sympathy with that conclusion - but you're
> the one who said it "makes sense", not me).

Yes. I would stand up in front of all of them in a large room and say, "NO!
YOU CANNOT DO THIS ANY LONGER. THERE ARE BETTER WAYS. CLEARER PATHS. YOU
DON'T NEED TO WALLOW IN EXTRA COMMA LAND ANY LONGER. COME OUT AND BE FREE!"

And I think I would get a standing ovation. Perhaps not.

James Kuyper

unread,

Jan 21, 2014, 2:28:38 PM1/21/14

to

On 01/21/2014 01:27 PM, Rick C. Hodgin wrote:
> On Tuesday, January 21, 2014 1:00:34 PM UTC-5, James Kuyper wrote:
...
>> If you're only going to deprecate it now, and remove it later, why even
>> create it in the first place? If it makes sense to remove it, it makes
>> even more sense not to create it. The arguments in favor of the allowing
>> the extra comma were not time dependent - it wasn't a matter of old
>> legacy code, but of current desires of those who like to write source
>> code generators.
>

> I would've deprecated it back then had I been the decision maker. ...

That's what I don't understand - why introduce a new feature as
"deprecated"? Or, perhaps, by "back then", you're referring to some
particular time after that feature was first introduced? If so, what
time was that?

> ... Today I
> will not support it. ...

I don't see the distinction - deprecating a feature is definitely not
supporting it.

...

> allowance. In that way, legacy code that cannot be changed can still be
> supported through the object file format of the code generated by the compiler
> which supported it.

Well, legacy code was never the primary issue. It was people wanting to
generate new code with that feature.

...

>> I'm sure that some are; code has a long range of lifetimes, and some
>> code that is still in use has been around a lot longer than you seem to
>> consider likely. However, more important to the argument I was
>> describing (NOT endorsing!) is the fact that new source code generators
>> are being created all the time, and newly generated source code is also
>> being created. If the desires of the creators of those generators are
>> not worth catering to now, then it never made sense to cater to those
>> desires (I might have some sympathy with that conclusion - but you're
>> the one who said it "makes sense", not me).
>
> Yes. I would stand up in front of all of them in a large room and say, "NO!
> YOU CANNOT DO THIS ANY LONGER. THERE ARE BETTER WAYS. CLEARER PATHS. YOU
> DON'T NEED TO WALLOW IN EXTRA COMMA LAND ANY LONGER. COME OUT AND BE FREE!"
>
> And I think I would get a standing ovation. Perhaps not.

Certainly not from the people who were requesting the feature. Those
words might make them doubt your emotional stability, but they don't say
anything likely to change their minds.
--
James Kuyper

Rick C. Hodgin

unread,

Jan 21, 2014, 2:43:08 PM1/21/14

to

> > allowance. In that way, legacy code that cannot be changed can still be
> > supported through the object file format of the code generated by the
> > compiler which supported it.
>
> Well, legacy code was never the primary issue. It was people wanting to
> generate new code with that feature.

AH! I misunderstood. No. In that case I never would've introduced it.

> > Yes. I would stand up in front of all of them in a large room and say, "NO!
> > YOU CANNOT DO THIS ANY LONGER. THERE ARE BETTER WAYS. CLEARER PATHS. YOU
> > DON'T NEED TO WALLOW IN EXTRA COMMA LAND ANY LONGER. COME OUT AND BE FREE!"
> > And I think I would get a standing ovation. Perhaps not.
> Certainly not from the people who were requesting the feature. Those
> words might make them doubt your emotional stability, but they don't
> say anything likely to change their minds.

They wouldn't be the first to doubt my emotional stability. :-)

Öö Tiib

unread,

Jan 21, 2014, 2:46:10 PM1/21/14

to

Software can use text mode only for files produced by itself for itself
on same platform. That is sort of narrow corner case today.

Life is complicated and world is interconnected and so varying
line endings of different platforms are nuisance that one has to
deal with by supporting them all. If software must work on Mac,
Windows and Linux and should eat text files produced by itself
and other text editors on Mac, Windows and Linux then
"always binary" is good choice.

Kaz Kylheku

unread,

Jan 21, 2014, 3:08:36 PM1/21/14

to

On 2014-01-21, Rick C. Hodgin <rick.c...@gmail.com> wrote:
> On Monday, January 20, 2014 6:14:48 PM UTC-5, Seebs wrote:
>> On 2014-01-20, Rick C. Hodgin <rick...n@gmail.com> wrote:
>> > On Monday, January 20, 2014 2:10:40 PM UTC-5, glen herrmannsfeldt wrote:
>> >> Oh, also, one of my favorite C features (Java also has), you
>> >> can have the extra comma on the last line. Convenient for program
>> >> generated text, though most likely in the standard as it allows
>> >> for easy preprocessor conditionals.
>> > See, and I think that such an "allowance" is patently absurd and should
>> > not be a part of any language. :-)
>>
>> Why? It's really convenient as a way to eliminate a gratuitous special
>> case.

>
>
> Why do we have "void function(void)" when "function()" would work
> sufficiently at that level in a source file?

The void type was introduced by C++, because Stroustrup wanted stronger type
safety. C++ introduced void * pointers, and using void to declare function
returns.

However, in C++, a function with no parameters is just (); it does not
mean "unspecified parameters".

The C people "ported" void into C, and invented the (void) hack to mean "empty
parameter list", so that () could continue to mean "unspecified number of
parameters" for compatibility with the old-style C that is described in
the first edition of the Kernighan and Ritchie text.

Then, for better compatibility with C (something they no longer give
a damn about today), the C++ people back-ported the (void) kludge into C++.

It's allowed even in contexts that could never be C, like:

MyClass::MyClass(void) { ... }

Needless to say, don't do this. Only use (void) in C++ that also compiles as C,
such as declarations in header files that are included in both C++ and C.

> It's the same here. "Oh, another comma ... was the developer finished? Was
> there supposed to be more? What is missing? What was left out? Please ...

You could say the same thing about statement/declaration-terminating
semicolons.

> Nobody needs that kind of stress in their life. :-)

Decently designed languages do not have comma and semicolon diseases
to begin with.

Eric Sosman

unread,

Jan 21, 2014, 3:16:54 PM1/21/14

to

I'm one who would not readily change his mind, because (in
part) as things stand I can write stuff like:

const char *archiveFormats[] = {
#if CPIO_SUPPORTED
"cpio",
#endif
#if TAR_SUPPORTED
"tar",
#endif
#if ZIP_SUPPORTED
"ZIP",
#endif
#if APK_SUPPORTED
"apk",
#endif
};

It's *possible* to manage this sort of thing without introducing
an extra comma, but it's ugly as all-get-out:

const char *archiveFormats[] = {
#if CPIO_SUPPORTED
"cpio"
#if TAR_SUPPORTED | ZIP_SUPPORTED | APK_SUPPORTED
,
#endif
#endif
#if TAR_SUPPORTED
"tar"
#if ZIP_SUPPORTED | APK_SUPPORTED
,
#endif
#endif
#if ZIP_SUPPORTED
"ZIP"
#if APK_SUPPORTED
,
#endif
#endif
#if APK_SUPPORTED
"apk"
#endif
};

--
Eric Sosman
eso...@comcast-dot-net.invalid

Kaz Kylheku

unread,

Jan 21, 2014, 3:19:50 PM1/21/14

to

On 2014-01-21, Rick C. Hodgin <rick.c...@gmail.com> wrote:

> On Tuesday, January 21, 2014 12:19:32 PM UTC-5, James Kuyper wrote:
>> - machine
>> generated code is at least as popular as it has ever been, possibly more
>> so. Deprecating that allowance would only make sense if the allowance
>> itself doesn't make sense.
>
> I doubt people today are using the same generated source code files they were
> back then.

C has built in code-generation: macros.

#define ELEM(A,B,C} { FOO(A, BAR(B), 0, (void *) (C) },

struct foobar array[] {
ELEM(3, 2, 4)
ELEM(1, 2, 3)
};

This re-generates each time you compile it.

I wouldn't write it that way myself; I'd leave the comma out and have:

struct foobar array[] {
ELEM(3, 2, 4),
ELEM(1, 2, 3)
};

yet, there is code like that out there, and probably situations in which
it makes sense to hide the comma in the "macrology".

Conditionally generating a comma in the C macro language is difficult to
impossible, and requiring the macro caller to specify it can sometimes
break the abstraction of the macro.

> And I would still argue that it's nothing short of a catering hack
> to include the ability to allow buggy code generator logic to pass through.

That is pure nonsense. If the code generator developer knows that commas can
be treated as terminating rather than separating puncutation, then it's fine to
design the code generator logic that way.

> I'm sorry, but it's absolutely lame. It was a lame decision, in my opinion,

What's lame is not knowing C, but criticizing it.

Nobody cares what you "would have" done back when these decisions were
made, because back then you had -X years of experience in C.

A good 10-15 years of coding should be required for anyone who is to have
any input on the future direction of a language.

James Kuyper

unread,

Jan 21, 2014, 3:35:17 PM1/21/14

to

On 01/21/2014 02:46 PM, �� Tiib wrote:
> On Tuesday, 21 January 2014 19:48:01 UTC+2, James Kuyper wrote:
>> On 01/21/2014 12:22 PM, Rick C. Hodgin wrote:
>>
>>> It works quite well. :-) Binary. Always binary. :-)
>>
>> That's a recipe for locking your code to a single platform. If it
>> simplified your code in any way, that might make sense for code that you
>> are certain will never be ported anywhere else (though such certainty is
>> often delusional). However, it actually makes your code (very slightly)
>> more complicated.
>
> Software can use text mode only for files produced by itself for itself
> on same platform. That is sort of narrow corner case today.

In my experience, the special features of text mode as compared to
binary mode are conventions associated with operating systems. As such,
files adhering to those conventions can be used to communicate between
any two programs compiled for that operating system, whether or not
they're running on the same platforms or different platforms.
I wouldn't be surprised to learn that there are conventions for the
layout of text files that are associated with things other than
operating systems - but offhand I can't think of any.

> Life is complicated and world is interconnected and so varying
> line endings of different platforms are nuisance that one has to
> deal with by supporting them all. If software must work on Mac,
> Windows and Linux and should eat text files produced by itself
> and other text editors on Mac, Windows and Linux then
> "always binary" is good choice.

I think using specialized routines to translate line endings (such as
dos2unix) are the more reasonable way to go. Otherwise a file editing
program could end up creating a document containing a mixture of Mac,
Windows, and Linux line endings. Then you have to decide how to
interpret the result: how many line endings does "\r\n\r" encode? How
many does "\n\r\n" encode?

How many programs do you know of that can correctly handle some of the
other, more exotic possibilities in use by currently existing systems,
such as lines whose length is not indicated by a special character at
the end of the line, but by a count at the beginning? Or block-oriented
files where all of the characters between the end of a line and the end
of a block are null (or even more confusing, blanks)? The standard
defines processing of text-mode files in ways that are compatible with
all of those possibilities. As a result, my text-oriented C code doesn't
need to know anything about those possibilities, I just let the
<stdio.h> library take care of it. My code will work on such systems
without modification, and without me having to write it in any way that
is different from the way I would write it if I were only targeting the
unix-like systems where it normally runs.
--
James Kuyper

Ian Collins

unread,

Jan 21, 2014, 3:48:27 PM1/21/14

to

James Kuyper wrote:
> On 01/21/2014 02:46 PM, �� Tiib wrote:
>> On Tuesday, 21 January 2014 19:48:01 UTC+2, James Kuyper wrote:
>>> On 01/21/2014 12:22 PM, Rick C. Hodgin wrote:
>>>
>>>> It works quite well. :-) Binary. Always binary. :-)
>>>
>>> That's a recipe for locking your code to a single platform. If it
>>> simplified your code in any way, that might make sense for code that you
>>> are certain will never be ported anywhere else (though such certainty is
>>> often delusional). However, it actually makes your code (very slightly)
>>> more complicated.
>>
>> Software can use text mode only for files produced by itself for itself
>> on same platform. That is sort of narrow corner case today.
>
> In my experience, the special features of text mode as compared to
> binary mode are conventions associated with operating systems. As such,
> files adhering to those conventions can be used to communicate between
> any two programs compiled for that operating system, whether or not
> they're running on the same platforms or different platforms.
> I wouldn't be surprised to learn that there are conventions for the
> layout of text files that are associated with things other than
> operating systems - but offhand I can't think of any.

Most if not all of the programmer's editors I've used on Windows
recognise Unix line endings and gcc on Unix recognises Windows endings.
Text mode is something of a curse!

--
Ian Collins

Joe keane

unread,

Jan 21, 2014, 4:03:47 PM1/21/14

to

In article <ln8uua1...@nuthaus.mib.org>,
Keith Thompson <ks...@mib.org> wrote:
>But the array whose first element s points to is still
>just 6 characters long, and unlike string literals, an object created by
>a compound has automatic storage duration (it ceases to exist when you
>leave the enclosing block).

How about this?

@ cat bar.c
char *bar[2] =
{
"jjj",
(char []) { "kkk" },
};
@ cc -S bar.c
@ cat bar.s
.file "bar.c"
.data
.type __compound_literal.0, @object
.size __compound_literal.0, 4
__compound_literal.0:
.string "kkk"
.globl bar
.section .rodata
.LC0:
.string "jjj"
.data
.align 4
.type bar, @object
.size bar, 8
bar:
.long .LC0
.long __compound_literal.0
.ident "GCC: (NetBSD nb2 20110806) 4.5.3"

Rick C. Hodgin

unread,

Jan 21, 2014, 4:39:55 PM1/21/14

to

On Tuesday, January 21, 2014 3:16:54 PM UTC-5, Eric Sosman wrote:
> I'm one who would not readily change his mind, because (in
> part) as things stand I can write stuff like:
>
> const char *archiveFormats[] = {
> #if CPIO_SUPPORTED
> "cpio",
> #endif
> #if TAR_SUPPORTED
> "tar",
> #endif
> #if ZIP_SUPPORTED
> "ZIP",
> #endif
> #if APK_SUPPORTED
> "apk",
> #endif
> };
>
> It's *possible* to manage this sort of thing without introducing
> an extra comma, but it's ugly as all-get-out:

> [snip]

Try this, and then just always start at 1 instead of 0, and process until
you reach null:

const char *archiveFormats[] = {
null
#if CPIO_SUPPORTED
,"cpio"
#endif
#if TAR_SUPPORTED
,"tar"
#endif
#if ZIP_SUPPORTED
,"ZIP"
#endif
#if APK_SUPPORTED
,"apk"
#endif
,null
};

Keith Thompson

unread,

Jan 21, 2014, 4:58:55 PM1/21/14

to

Ian Collins <ian-...@hotmail.com> writes:
> James Kuyper wrote:

Not all Unix tools tolerate Windows-style line endings. For example,
if you write:

if [ "$x" = 42 ] ; then
echo ok
fi

in a bash script, and the script file uses Windows-style line endings,
bash will complain that "then\r" is an unrecognized token. (Except that
it will print the "\r" literally, causing a very confusing error
message.)

Blindly using "foreign" format text files on any system is not a good
idea.

James Kuyper

unread,

Jan 21, 2014, 5:01:53 PM1/21/14

to

As he said: ugly. The two extra nulls (and "null" needs to be defined)
seem far worse to me than the extra comma - they survive into the object
file, and even into the final executable, taking up extra space. The
extra comma disappears during translation phase 7 and has no impact on
the actual executable.
--
James Kuyper

Keith Thompson

unread,

Jan 21, 2014, 5:03:20 PM1/21/14

to

"Rick C. Hodgin" <rick.c...@gmail.com> writes:

> On Tuesday, January 21, 2014 11:31:07 AM UTC-5, James Kuyper wrote:
>> On 01/21/2014 07:58 AM, Rick C. Hodgin wrote:

>> > Why do we have "void function(void)" when "function()" would work

>> > sufficiently at that level in a source file? It's explicitly so we
>> > convey that no mistake was given as by accidentally leaving out some
>> > parameters. We declare void to indicate "I purposefully intended to
>> > leave this empty, there are no return variables, there are no
>> > parameters," and so on.
>>
>> Backwards compatibility. In C, as originally designed, there were no
>> function prototypes. When the language was first standardized,
>> prototypes were added. If the committee had followed your suggestion,
>> virtually all existing code would have had to be rewritten in order to
>> compile correctly. That was, at the time of standardization, already
>> billions of lines of code world-wide. It's easy to say "don't worry
>> about having to re-write all the old code" - unless you're the one who
>> has to re-write it.
>>
>> Instead, the committee decided to allow the old syntax to continue
>> meaning what it used to mean: the function takes an unknown but fixed
>> number of arguments (as opposed to variadic functions, which take an
>> unknown and variable number of arguments). As a result, a different
>> syntax was needed to specify that the function doesn't take any arguments.
>
> I'm not sure I would've been keen on that idea. I would rather have
> maintained it as a deprecated functionality that would have been
> slated to be removed in a few version releases. The old compilers
> could've generated object code in a particular version of a compiler
> that could be maintained for backward compatibility without negating
> the language in moving forward. My opinion. :-)

(Reformatting your long lines *again*.)

That's exactly what they did. I gave you a link to a recent draft
of the C standard. Take a look at section 6.11.6:

The use of function declarators with empty parentheses (not
prototype-format parameter type declarators) is an obsolescent
feature.

I'm personally not happy with how long it's taken to actually remove the
feature, but it's been officially obsolescent (which means that it may
be considered for withdrawal in future revisions of the standard) since
1989.

[...]

Rick C. Hodgin

unread,

Jan 21, 2014, 5:03:51 PM1/21/14

to

On Tuesday, January 21, 2014 4:58:55 PM UTC-5, Keith Thompson wrote:
> > Most if not all of the programmer's editors I've used on Windows
> > recognise Unix line endings and gcc on Unix recognises Windows endings.
> > Text mode is something of a curse!
>
> Not all Unix tools tolerate Windows-style line endings. For example,
> if you write:
>
> if [ "$x" = 42 ] ; then
> echo ok
> fi
>
> in a bash script, and the script file uses Windows-style line endings,
> bash will complain that "then\r" is an unrecognized token. (Except that
> it will print the "\r" literally, causing a very confusing error
> message.)
>
> Blindly using "foreign" format text files on any system is not a good
> idea.

It's why my algorithm looks for \r or \n in any order, and then checks the
character after for the alternate (\r\n or \n\r combinations). If found,
it considers that grouping to be one newline. If not, it considers the single
character to be one newline. Then it continues parsing.

bash sounds like it needs some post-rebirth rehabilitation. :-)

Rick C. Hodgin

unread,

Jan 21, 2014, 5:15:33 PM1/21/14

to

On Tuesday, January 21, 2014 5:01:53 PM UTC-5, James Kuyper wrote:
> As he said: ugly. The two extra nulls (and "null" needs to be defined)
> seem far worse to me than the extra comma - they survive into the object
> file, and even into the final executable, taking up extra space. The
> extra comma disappears during translation phase 7 and has no impact on
> the actual executable.

There's a part of me that agrees with you. I would go to lengths to avoid
having this kind of issue. Since this is a heavily used feature, I would
probably create some type of generic tool to distribute around which is an
on-the-fly builder capable of preparing lists, and then returning the source
code. And it would know how to handle commas.

Rick C. Hodgin

unread,

Jan 21, 2014, 5:24:33 PM1/21/14

to

> > I'm not sure I would've been keen on that idea. I would rather have
> > maintained it as a deprecated functionality that would have been
> > slated to be removed in a few version releases. The old compilers
> > could've generated object code in a particular version of a compiler
> > that could be maintained for backward compatibility without negating
> > the language in moving forward. My opinion. :-)
>
> (Reformatting your long lines *again*.)

I see the text in a window on Google Groups which is about 72 characters
wide. I have to manually insert carriage returns to break it up. I
sometimes forget. I apologize.

What news reader are you using? Try groups.google.com and subscribe to the
comp.lang.c group.

> That's exactly what they did. I gave you a link to a recent draft
> of the C standard. Take a look at section 6.11.6:
> The use of function declarators with empty parentheses (not
> prototype-format parameter type declarators) is an obsolescent
> feature.

Awesome! :-) To quote the three drones from Voyager, former members of the
tertiary adjunct of unimatrix one that seven was in, one who appears very
much like Admiral Forrest from ST:Enterprise, "we have consensus."

> I'm personally not happy with how long it's taken to actually remove the
> feature, but it's been officially obsolescent (which means that it may
> be considered for withdrawal in future revisions of the standard) since
> 1989.

I hear you. Always that backward compatibility. It's why it's important to
include dates. We have them in our U.S. Constitution even.

From the 21st amendment:
3. The article shall be inoperative unless it shall have been
ratified as an amendment to the Constitution ... within seven
years from the date of the submission hereof to the States
by the Congress.

Seven years is a good time period. It's the biblical period of forgiveness
(Deu 15:1), "At the end of every seven years you must cancel debts." How
the world would be better were that guidance followed.

Keith Thompson

unread,

Jan 21, 2014, 5:39:14 PM1/21/14

to

"Rick C. Hodgin" <rick.c...@gmail.com> writes:

> On Tuesday, January 21, 2014 12:55:45 PM UTC-5, Keith Thompson wrote:
>> >> In your proposed C-like language, what would this snippet print?
>> >> for (int i = 0; i < 2; i ++) {
>> >> char *s = "hello";
>> >> if (i == 0) {
>> >> s[0] = 'H';
>> >> }
>> >> puts(s);
>> >> }
>> >
>> > In my proposed language, it would print "Hello" both times because the
>> > char* s definition would've been pulled out of the loop and defined as a
>> > function variable.
>>
>> That's fine; if you don't want to write code like that, you don't have
>> to. But I didn't ask how you'd re-write it; I asked how *that code*
>> should behave.
>
> I answered you. How should it behave?

I don't believe you did answer me.

> In my compiler, I would pull the variable out and make it a function-variable
> defined at the top, so it would've been altered the first time through and
> both times would print Hello.

Do you mean by that that you would *change the source code I posted* so
that s is declared at a higher level? The result might be a better
program, but it's a different program than the one I posted, so that
doesn't answer my question at all.

Or do you mean that the compiler would implicitly do the equivalent of
moving s to a higher level? If so, it's unclear what that would mean.

How *should* it behave? In standard C, the behavior is undefined,
because it attempts to modify a string literal. I have no interest in
changing that rule (well, I'd prefer string literals to be const, but I
understand why they're not), so I have no further answer. You're the
one proposing changes; I'm asking you for details on how you can make
those changes consistently.

>> You're proposing (I think) a change to the language. That change would
>> affect compiler writers as well as developers. A compiler writer needs
>> to do *something* with the specific code I wrote above.
>
> No ... I'm creating my own new language, RDC, which is C-like, but dumps a
> lot of what I view as "hideous baggage left over from a bygone era" ... while
> also adding a lot of new features I see as looking to the future of multiple
> cores, GUI developer environments, touch screens, eventual 3D interfaces, and
> more.

Ok. Then why are you discussing your non-C language in comp.lang.c?
Perhaps comp.lang.misc would be of interest to you.

[snip]

Keith Thompson

unread,

Jan 21, 2014, 5:41:16 PM1/21/14

to

j...@panix.com (Joe keane) writes:
> In article <ln8uua1...@nuthaus.mib.org>,
> Keith Thompson <ks...@mib.org> wrote:
>>But the array whose first element s points to is still
>>just 6 characters long, and unlike string literals, an object created by
>>a compound has automatic storage duration (it ceases to exist when you
>>leave the enclosing block).
>
> How about this?
>
> @ cat bar.c
> char *bar[2] =
> {
> "jjj",
> (char []) { "kkk" },
> };
> @ cc -S bar.c
> @ cat bar.s

[SNIP]

I don't understand assembly language well enough to figure out what
point you're making.

glen herrmannsfeldt

unread,

Jan 21, 2014, 5:45:56 PM1/21/14

to

James Kuyper <james...@verizon.net> wrote:

(snip)

> In my experience, the special features of text mode as compared to
> binary mode are conventions associated with operating systems. As such,
> files adhering to those conventions can be used to communicate between
> any two programs compiled for that operating system, whether or not
> they're running on the same platforms or different platforms.
> I wouldn't be surprised to learn that there are conventions for the
> layout of text files that are associated with things other than
> operating systems - but offhand I can't think of any.

I believe that HTTP (and so HTML) are OS independent, and,
as well as I know, use the "\r\n" line endings.

-- glen

glen herrmannsfeldt

unread,

Jan 21, 2014, 5:50:29 PM1/21/14

to

James Kuyper <james...@verizon.net> wrote:

(snip)

>> It's the same here. "Oh, another comma ... was the developer
>> finished? Was there supposed to be more? What is missing? What was

>> left out? Please ... I need answers. I'm left hanging in a state of
>> confusion that is disrupting my soul. Whatever do I do?"

> I don't use that feature, and I don't like it. However, this feature
> simplifies the creation of machine-generated C code, and people who
> write such generators are apparently sufficiently numerous that the
> committee felt a need to accommodate their desires.

It does, and I do sometimes generate look-up tables,

But I believe that simplifying the use of the preprocessor is
a more important use. One can #ifdef table entries, without
a special case for the last one. (Since you don't know which one
will be the last.)

The second best choice would be to waste the last entry, with a
null, zero, or some other useless item. Complicates a lot of
other coding, though.

-- glen

Robert Wessel

unread,

Jan 21, 2014, 6:54:28 PM1/21/14

to

On Tue, 21 Jan 2014 14:24:33 -0800 (PST), "Rick C. Hodgin"
<rick.c...@gmail.com> wrote:

>> > I'm not sure I would've been keen on that idea. I would rather have
>> > maintained it as a deprecated functionality that would have been
>> > slated to be removed in a few version releases. The old compilers
>> > could've generated object code in a particular version of a compiler
>> > that could be maintained for backward compatibility without negating
>> > the language in moving forward. My opinion. :-)
>>
>> (Reformatting your long lines *again*.)
>
>I see the text in a window on Google Groups which is about 72 characters
>wide. I have to manually insert carriage returns to break it up. I
>sometimes forget. I apologize.
>
>What news reader are you using? Try groups.google.com and subscribe to the
>comp.lang.c group.

That's precisely the problem. Google Groups is an atrocious news
group reader, use something better. There are a number of free
services (Eternal September is popular), and a number of free clients.

Keith Thompson

unread,

Jan 21, 2014, 6:58:37 PM1/21/14

to

"Rick C. Hodgin" <rick.c...@gmail.com> writes:

That's workable if your tool runs only on systems that use one of \r,
\n, \r\n, or \n\r to mark line endings. (And either you treat \n\n as
an empty line, or you can safely ignore empty lines.) But it breaks
down if you want to *write* text files.

C has text mode for a reason. Take a moment to consider the bare
possibility that the people who designed it were not idiots.

Keith Thompson

unread,

Jan 21, 2014, 7:01:14 PM1/21/14

to

"Rick C. Hodgin" <rick.c...@gmail.com> writes:

>> > I'm not sure I would've been keen on that idea. I would rather have
>> > maintained it as a deprecated functionality that would have been
>> > slated to be removed in a few version releases. The old compilers
>> > could've generated object code in a particular version of a compiler
>> > that could be maintained for backward compatibility without negating
>> > the language in moving forward. My opinion. :-)
>>
>> (Reformatting your long lines *again*.)
>
> I see the text in a window on Google Groups which is about 72 characters
> wide. I have to manually insert carriage returns to break it up. I
> sometimes forget. I apologize.
>
> What news reader are you using? Try groups.google.com and subscribe to the
> comp.lang.c group.

groups.google.com is the problem. Google provides a web interface to
Usenet, something that predates the web and even the Internet. Google
has done a horribly poor job with their interface and has been
unresponsive to complaints.

I use the news.eternal-september.org free Usenet server. The client I
use is Gnus, which runs under Emacs. Mozilla Thunderbird is another
popular client.

Rick C. Hodgin

unread,

Jan 21, 2014, 8:15:56 PM1/21/14

to

On Tuesday, January 21, 2014 5:39:14 PM UTC-5, Keith Thompson wrote:

The compiler would receive the definition of char* s where it is, but
it would logically create it as a local variable within the single
function. In short, I would not allow scoped variables within a block
within a function. I would have them all defined as local variables,
and they would all be available for use inside or outside of the block
they were defined in.

> How *should* it behave? In standard C, the behavior is undefined,
> because it attempts to modify a string literal. I have no interest in
> changing that rule (well, I'd prefer string literals to be const, but I
> understand why they're not), so I have no further answer. You're the
> one proposing changes; I'm asking you for details on how you can make
> those changes consistently.

In my case, it would not be a constant, but would be a string defined to
be the initial value indicated.

> >> You're proposing (I think) a change to the language. That change would
> >> affect compiler writers as well as developers. A compiler writer needs
> >> to do *something* with the specific code I wrote above.
> > No ... I'm creating my own new language, RDC, which is C-like, but dumps a
> > lot of what I view as "hideous baggage left over from a bygone era" ...
> > while also adding a lot of new features I see as looking to the future
> > of multiple cores, GUI developer environments, touch screens, eventual
> > 3D interfaces, and more.
> Ok. Then why are you discussing your non-C language in comp.lang.c?
> Perhaps comp.lang.misc would be of interest to you.

Perhaps. It is/was all back story to my original question, the explanation
as to why I believe the strings in char* list[] = { "one", "two", "three" } should be read/write.

Rick C. Hodgin

unread,

Jan 21, 2014, 8:17:06 PM1/21/14

to

> I don't understand assembly language well enough to figure out what
> point you're making.

I do understand assembly language, but I still didn't understand the
point being made.

Rick C. Hodgin

unread,

Jan 21, 2014, 8:22:18 PM1/21/14

to

> > It's why my algorithm looks for \r or \n in any order, and then checks the
> > character after for the alternate (\r\n or \n\r combinations). If found,
> > it considers that grouping to be one newline. If not, it considers the
> > single character to be one newline. Then it continues parsing.
>
> That's workable if your tool runs only on systems that use one of \r,
> \n, \r\n, or \n\r to mark line endings. (And either you treat \n\n as
> an empty line, or you can safely ignore empty lines.) But it breaks
> down if you want to *write* text files.

Not at all. If it finds \r\n it is a single newline. If it finds \n\r it
is a single newline. If it finds \n\n it stops after the first \n and
considers it its own newline, and then continues parsing and encounters
the second \n and it is also its own newline. \n\n would be a double space.

> C has text mode for a reason. Take a moment to consider the bare
> possibility that the people who designed it were not idiots.

It's interesting that such a handy helper feature like text mode exists
to "help" developers, while other more obvious assistance features are
left completely out -- such as certain variable types not always being a
specified number of bits across platforms.

For the record, I believe C is one of the best languages ever constructed.
I also believe it has many many flaws. I hope to undo many of them with
my effort.

Rick C. Hodgin

unread,

Jan 21, 2014, 8:35:38 PM1/21/14

to

On Tuesday, January 21, 2014 7:01:14 PM UTC-5, Keith Thompson wrote:
> "Rick C. Hodgin" <rick...n@gmail.com> writes:

> >> (Reformatting your long lines *again*.)
> > I see the text in a window on Google Groups which is about 72 characters
> > wide. I have to manually insert carriage returns to break it up. I
> > sometimes forget. I apologize.
>
> > What news reader are you using? Try groups.google.com and subscribe to the
> > comp.lang.c group.
>
> groups.google.com is the problem. Google provides a web interface to
> Usenet, something that predates the web and even the Internet. Google
> has done a horribly poor job with their interface and has been
> unresponsive to complaints.
>
> I use the news.eternal-september.org free Usenet server. The client I
> use is Gnus, which runs under Emacs. Mozilla Thunderbird is another
> popular client.

I cannot help but consider the fact that Google Groups provides a frew
web-based interface which removes shortcomings in the text-based Usenet
group. It allows HTML messages, longer lines with automatic wrapping,
immediate access to many groups, complex searching, and more.

It seems that the future may be speaking, in an attempt to bring Usenet
into the 2010s and beyond.

Text-based interfaces were nice ... they used the technology available at
the time (limited disk space, limited memory, slower clock speeds). But
the technology of the 2010s is significantly beyond anything we've had
previously. Most modern multi-core CPU desktops with 8+ GB of memory,
1+ TB of disk storage, an average to high-end GPU, have more computing
power than supercomputers did 15+ years ago.

GUIs provide a far better user experience, and are only becoming more
common as time goes on. Smart phones. Tablets. Touch screen. We're
changing our computing needs.