Why does getenv() return char*, not const char*?

Seungbeom Kim

unread,

Apr 25, 2004, 1:55:23 AM4/25/04

to

The standard clearly states that the string pointed to by the return
value of getenv() shall not be modified by the program, which seems
to be a perfect use for using const char* as the return type.
Then why is getenv() specified to return char*, not const char*?

--
Seungbeom Kim

lawrenc...@ugsplm.com

unread,

Apr 25, 2004, 4:05:38 PM4/25/04

to

Seungbeom Kim <sb...@stanford.edu> wrote:
>
> Then why is getenv() specified to return char*, not const char*?

Because getenv() predates const and the committee didn't want to break
all the existing code that used it.

-Larry Jones

Santa's gonna skip this block for years. -- Calvin

Dan Pop

unread,

Apr 26, 2004, 9:56:11 AM4/26/04

to

Haven't you noticed the parallel with string literals?

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Dan...@ifh.de

David R Tribble

unread,

Apr 26, 2004, 11:46:56 AM4/26/04

to

Seungbeom Kim wrote:
>> Then why is getenv() specified to return char*, not const char*?

Larry Jones wrote:
> Because getenv() predates const and the committee didn't want to break
> all the existing code that used it.

E.g., code such as:

void myfunc(char *tag)
{
...code which does not modify tag[*]...
...but which was written pre-C89...
}

void bar(char *name)
{
foo(getenv(name)); //[A]
}

It's the same reason that some of the C89 (and C99) library functions
take a 'const char *' but return a 'char *', such as:

char * strchr(const char *s, int c);
// Returns a non-const pointer into s[].

If you look closely at the C standard, you'll notice that string literals
(string constants) are not type 'count char[n]', but just 'char[n]'.
Again, this is for the same reason - not to break existing code. It's the
opposite in C++, of course.

And yes, there have been proposals to make C string constants have const
types:
http://david.tribble.com/text/c9xstr.txt

-drt

Paul D. Smith

unread,

Apr 28, 2004, 2:18:47 AM4/28/04

to

%% da...@tribble.com (David R Tribble) writes:

drt> Seungbeom Kim wrote:
>>> Then why is getenv() specified to return char*, not const char*?

drt> Larry Jones wrote:
>> Because getenv() predates const and the committee didn't want to break
>> all the existing code that used it.

drt> It's the same reason that some of the C89 (and C99) library functions
drt> take a 'const char *' but return a 'char *', such as:

drt> char * strchr(const char *s, int c);
drt> // Returns a non-const pointer into s[].

A quibble, but it doesn't seem to me that this is true (that the same
reasoning is used for both of these definitions).

For getenv() I don't see any reason why the return value shouldn't be
const except for backward-compatibility to pre-standard programs; as far
as I recall the standard has always said that the strings returned from
getenv() are not writable.

However, having library functions like strchr() etc. is different: these
functions don't return const because if they did, they would be
ridiculously hard to use correctly.

The large majority of the time when you call strchr() you are not
passing it a const; it's obviously not a requirement that only constant
strings can be passed to strchr! Rather, the const in the argument list
tells the user that the function (strchr) won't change the string.

If strchr were to return a const, than the large majority of the time
you would have to cast the return value to get rid of the const before
the value would be useful to you.

--
-------------------------------------------------------------------------------
Paul D. Smith <psm...@nortelnetworks.com> HASMAT--HA Software Mthds & Tools
"Please remain calm...I may be mad, but I am a professional." --Mad Scientist
-------------------------------------------------------------------------------
These are my opinions---Nortel Networks takes no responsibility for them.

John Mashey

unread,

May 23, 2004, 8:47:44 PM5/23/04

to

lawrenc...@ugsplm.com wrote in message news:<q3htl1-...@jones.homeip.net>...

> Seungbeom Kim <sb...@stanford.edu> wrote:
> >
> > Then why is getenv() specified to return char*, not const char*?
>
> Because getenv() predates const and the committee didn't want to break
> all the existing code that used it.

Not only did getenv() precede const, but there is more history.
In some ways, it is slightly strange (historically) that the standard
says:
a) You can't modify strings pointed at by return value of getenv().
b) You *can* modify strings pointed at by argv pointers.

History:
1) Starting in 1975, the PWB/UNIX shell had (single-letter) variables,
or which one in particular ($s) was initialized by the shell to the
user's home directory, which it got from some PWB-extra data kept in
the per-process data area. The $p variable was initialized to the
contents of $s/.path if such existed, or to ":/bin/:/usr/bin" if it
didn't.

2) During 1977, in particular, there was a long set of discussions
about moving to Steve Bourne's new shell, at least partly with the
idea of consolidating the mess of different UNIX shell variants that
had grown up, either directly from the original Thompson shell, or
indirectly from it via PWB or USG shells.

3) The PWB shell variables and the variable path-search features had
proved extremely useful, but were very limited. Among other things,
variables were not inherited in any general way. It seemed that we
needed to do something better in conjunction with wide introduction of
the Bourne shell, if we were going to convince people to switch
happily.

4) There were discussions among many people, but particularly, Steve
Bourne, Dennis Ritchie and I thrashed through a lot of different
possibilities regarding the semantics and implementation of what
became "the environment".
We explored various grand schemes of kernel-internal associative
memories kept per process group, with complicated protection schemes
and concerns about side-effects, and plenty of function/syscalls for
manipulating them.
Nothing was very simple. Fortunately:

5) In typical UNIX fashion, Dennis suggested that the environment
could just be handled as an extra set of argc/argv-like pointers and
strings normally passed automatically upon exec, which most programs
would never modify, but which could be interrogated without a system
call. Programs that wanted to modify the environment could do so
explicitly, just like argv manipulation.

Thus, the mechanisms for initializing the environment would be the
same as for argv. The storage cost would accure in user programs,
rather than (really precious) kernel memory, although it would add
some overhead to exec.

For minimality, the *only* C-level function provided was getenv(3), on
the belief that many programs needed simple access to environment
variables, but very few needed to delete them, change them, etc, and
if (a few) people were doing that, they could just go ahead and write
code appropriate to their needs,
which could either be fairly simple [like for the "env" command] or
more complex (like the shell].

One might fairly complain that we should have thought harder about
supplying a complete set of *env functions [akin to the putenv /
setenv / unsetenv / clearenv functions that have since grown up].
However, note that there have been lots of arguments over the
implementations of these things over the years, and different systems
have done them differently.

Also, for context, recall that there will still many PDP-11s around;
a typical PDP-11/45 had 248KB of memory and a PDP-11/70 1MB, with the
former running perhaps 16 users and the latter up to 48 (but usually
less);
both were limited to 64KB instruction plus 64KB data memory.
Heavyweight features were still viewed with suspicion, ad we were
interested in supplying a minimalist faeture set good enough to
handlethe problems we knew we had.

6) For string constants, many implementors have wanted them guaranteed
constant for decades, for storage reduction and performance tricks
like:
a) Keeping only one copy of a given literal string.
b) Including them in a read-only text segment (on some machines with
PC-relative addressing, this can be helpful).
c) Putting them in a read-only data segment, if the OS supported that,
thus letting them be shared amongst processes running the same
executable.

On the other hand, 7th Edition UNIX environment variables were really
thought of as convenient, usually-hidden extra arguments, with no more
read-onlyness than regular arguments.

I would guess that the viewpoint change comes from having a set of
functions to modify the environment, and wanting to better hide the
data.

Paul Eggert

unread,

May 24, 2004, 1:08:26 PM5/24/04

to

At 23 May 2004 17:47:44 -0700, old_sys...@yahoo.com (John Mashey) writes:

> In some ways, it is slightly strange (historically) that the standard
> says:
> a) You can't modify strings pointed at by return value of getenv().
> b) You *can* modify strings pointed at by argv pointers.

Thanks for your historical summary.

What bugs me most about C99 getenv is that the following code isn't
officially portable:

char *user = getenv ("USER");
char *logname = getenv ("LOGNAME");
return strcmp (user ? user : "?",
logname ? logname : "?");

because C99 says that the string that getenv returns may be
overwritten by a subsequent call to getenv. The historical getenv
implementation did not have this problem, I don't offhand know of any
system that does have this problem, and I've seen lots of
quite-portable-in-pratice code that assumes that the getenv return
string survives indefinitely (unless you invoke a setenv-like
primitive or futz wht the environ directly, of course). Presumably
this glitch was added to the standard for a reason, but in practice I
think the glitch was a mistake, as the standard isn't describing
reality very well.

Natarajan Krishnaswami

unread,

May 24, 2004, 2:25:07 PM5/24/04

to

In article <7wd64tn...@sic.twinsun.com>, Paul Eggert wrote:
> What bugs me most about C99 getenv is that the following code isn't
> officially portable:

[snip]

> I don't offhand know of any system that does have this problem

IBM's C implementation for z/OS uses one buffer per thread for the
value of environment variables returned by getenv().

N.

Eric Sosman

unread,

May 24, 2004, 2:27:02 PM5/24/04

to

OpenVMS, for one, doesn't have "environment variables." But
it has "logical names," which can be used for similar purposes.
It makes sense for getenv() on OpenVMS to query these logical
names, returning a name's "translation" as a simulation of an
environment variable's "value."

Problem: The logical name translations aren't lying around
anywhere in the process' address space. They are, in fact,
scattered all over the system, with scoping and inheritance and
access control and all the other goodies that make OpenVMS the
O/S of choice for fans of baroque design. How do you translate
a logical name? You pass the name and a buffer to a system
service, and the service places the translation (if any) in the
buffer you supplied. I'm sure you seen the difficulty here.

Now, a getenv() *could* be written that would work as you
desire in this framework, but it'd be expensive. You'd need to
allocate the translation buffers dynamically, cache the old
translations and rummage through them on each getenv() call to
avoid leaking memory, and so forth. Presumably, the committee
felt that the expense was not justified by the convenience.

--
Eric....@sun.com

Barry Margolin

unread,

May 24, 2004, 5:10:20 PM5/24/04

to

In article <40B23E76...@sun.com>,
Eric Sosman <Eric....@sun.com> wrote:

> OpenVMS, for one, doesn't have "environment variables." But
> it has "logical names," which can be used for similar purposes.
> It makes sense for getenv() on OpenVMS to query these logical
> names, returning a name's "translation" as a simulation of an
> environment variable's "value."
>
> Problem: The logical name translations aren't lying around
> anywhere in the process' address space. They are, in fact,
> scattered all over the system, with scoping and inheritance and
> access control and all the other goodies that make OpenVMS the
> O/S of choice for fans of baroque design. How do you translate
> a logical name? You pass the name and a buffer to a system
> service, and the service places the translation (if any) in the
> buffer you supplied. I'm sure you seen the difficulty here.
>
> Now, a getenv() *could* be written that would work as you
> desire in this framework, but it'd be expensive. You'd need to
> allocate the translation buffers dynamically, cache the old
> translations and rummage through them on each getenv() call to
> avoid leaking memory, and so forth. Presumably, the committee
> felt that the expense was not justified by the convenience.

And you would still have memory leaks, any time the caller's pointer
goes away or is updated, since C doesn't have a garbage collector.

--
Barry Margolin, bar...@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***

Eric Sosman

unread,

May 24, 2004, 5:41:54 PM5/24/04

to

The approach I was imagining wouldn't leak the memory, because
getenv() would maintain a cache of old translations internally. In
pseudocode,

use system service to translate the name
if (untranslatable)
return NULL;

if (the translation is not in the cache)
add it to the cache
return pointer to cached translation

Note that the translation is always performed, because the value
of an OpenVMS "logical name" can change while the program runs
even if the program itself took no action to change it (see the
allusions to "inheritance" and "baroque design" above).

The main point, though, is that OpenVMS stands as a counter-
example to Paul Eggert's belief that getenv() could easily be made
to behave in the way he'd like. Natarajan Krishnaswami provided
another example.

--
Eric....@sun.com

Douglas A. Gwyn

unread,

May 24, 2004, 6:08:02 PM5/24/04

to

Eric Sosman wrote:
> Now, a getenv() *could* be written that would work as you
> desire in this framework, but it'd be expensive. You'd need to
> allocate the translation buffers dynamically, cache the old
> translations and rummage through them on each getenv() call to
> avoid leaking memory, and so forth. Presumably, the committee
> felt that the expense was not justified by the convenience.

Exactly. Especially because many applications do not
benefit from the added overhead. If the program needs
to maintain multiple environment values it can use the
equivalent of strdup. (And yes, strdup itself has
been proposed or standardization, more than once.
There seems to be a committee bias against requiring
library functions to perform dynamic memory allocation.)

Keith Thompson

unread,

May 24, 2004, 6:42:49 PM5/24/04

to

"Douglas A. Gwyn" <DAG...@null.net> writes:
[...]

> There seems to be a committee bias against requiring
> library functions to perform dynamic memory allocation.)

Other than malloc() and friends, of course. 8-)}

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Paul Eggert

unread,

May 25, 2004, 4:05:22 AM5/25/04

to

At Mon, 24 May 2004 14:27:02 -0400, Eric Sosman <Eric....@sun.com> writes:

> Now, a getenv() *could* be written that would work as you
> desire in this framework, but it'd be expensive.

It not only *could* be written; it *has* been written. OpenVMS
getenv() can maintain an internal cache of values for use with setenv
and later getenv calls. The OpenVMS manual says
getenv/setenv/etc. "deliberately provide UNIX behavior, and are not a
substitute for these OpenVMS runtime library calls."

<http://h71000.www7.hp.com/doc/732FINAL/5763/5763pro_033.html>

Paul Eggert

unread,

May 25, 2004, 4:12:31 AM5/25/04

to

Thanks for the info; I wasn't aware of that. However, I looked at the
z/OS V1R5.0 C/C++ Run-Time Library Reference (March 2004), and noticed
on page 640 that IBM provides a POSIX environment which supports
**environ in the traditional way. So it sounds like there is
apparently conditional support for a Unix-like getenv() under z/OS if
that's what one really wants. (I sure would: I wouldn't like going
through zillions of programs looking for that particular portability
problem.)

Kevin Bracey

unread,

May 25, 2004, 6:05:40 AM5/25/04

to

In message <7wd64tn...@sic.twinsun.com>
Paul Eggert <egg...@twinsun.com> wrote:

> What bugs me most about C99 getenv is that the following code isn't
> officially portable:
>
> char *user = getenv ("USER");
> char *logname = getenv ("LOGNAME");
> return strcmp (user ? user : "?",
> logname ? logname : "?");
>
> because C99 says that the string that getenv returns may be
> overwritten by a subsequent call to getenv. The historical getenv
> implementation did not have this problem, I don't offhand know of any
> system that does have this problem,

My system (RISC OS) does. The environment strings (system variables in
RISC OS) are stored in Kernel space, and are only accessible via a call that
copies them into a user buffer. getenv mallocs a buffer of an appropriate
length, and copies the value into it. The buffer gets realloced and/or reused
on subsequent calls.

And the value of a variable can change between calls, so to make the getenv
result static would involve mallocing a new buffer on every call. Or changing
the C library so it cached the value of every system variable at start-up.
The latter would definitely break some code that needed getenv() to return
the current value of a variable.

--
Kevin Bracey, Principal Software Engineer
Tematic Ltd Tel: +44 (0) 1223 503464
182-190 Newmarket Road Fax: +44 (0) 1728 727430
Cambridge, CB5 8HE, United Kingdom WWW: http://www.tematic.com/

those who know me have no need of my name

unread,

May 25, 2004, 9:33:02 AM5/25/04

to

in comp.std.c i read:

>C99 says that the string that getenv returns may be
>overwritten by a subsequent call to getenv.

c89 says this too.

--
a signature

those who know me have no need of my name

unread,

May 25, 2004, 9:33:43 AM5/25/04

to

in comp.std.c i read:

>I wouldn't like going through zillions of programs looking for that
>particular portability problem.)

finding bugs in code not written to the standard is indeed annoying.

--
a signature

Eric Sosman

unread,

May 25, 2004, 9:42:42 AM5/25/04

to

Paul Eggert wrote:
> At Mon, 24 May 2004 14:27:02 -0400, Eric Sosman <Eric....@sun.com> writes:
>
>
>> Now, a getenv() *could* be written that would work as you
>>desire in this framework, but it'd be expensive.
>
>
> It not only *could* be written; it *has* been written. OpenVMS
> getenv() can maintain an internal cache of values for use with setenv
> and later getenv calls. The OpenVMS manual says
> getenv/setenv/etc. "deliberately provide UNIX behavior, and are not a
> substitute for these OpenVMS runtime library calls."

Interesting. I guess it was pressure from folks like you ;-)
that persuaded Digipaqard to add the capability. It wasn't there
when I last worked with VMS some eight or nine years ago -- and
it certainly wasn't there when C89 was being written.

--
Eric....@sun.com

John Mashey

unread,

May 25, 2004, 5:26:14 PM5/25/04

to

Kevin Bracey <kevin....@tematic.com> wrote in message news:<75bf8fb44...@tematic.com>...

> In message <7wd64tn...@sic.twinsun.com>
> Paul Eggert <egg...@twinsun.com> wrote:
>
> > What bugs me most about C99 getenv is that the following code isn't
> > officially portable:
> >
> > char *user = getenv ("USER");
> > char *logname = getenv ("LOGNAME");
> > return strcmp (user ? user : "?",
> > logname ? logname : "?");
> >
> > because C99 says that the string that getenv returns may be
> > overwritten by a subsequent call to getenv. The historical getenv
> > implementation did not have this problem, I don't offhand know of any
> > system that does have this problem,
>
> My system (RISC OS) does. The environment strings (system variables in
> RISC OS) are stored in Kernel space, and are only accessible via a call that
> copies them into a user buffer. getenv mallocs a buffer of an appropriate
> length, and copies the value into it. The buffer gets realloced and/or reused
> on subsequent calls.
>
> And the value of a variable can change between calls, so to make the getenv

...

If I read the above right, it sounds like the environment strings are
shared with other {processes? threads?} or at least something else can
change them.

The standard clearly allows this, but for history, we didn't do it
that way in V7 UNIX on purpose:

1) Just as arguments are passed in, and then become the property of
the program, to modify or not, so were environment variables. They
weren't shared.

2) When thinking of a shared pool, we were inventing data structures
akin to ALGOL-like block-structured symbol tables, to avoid having a
child process modify a parent's variable values, as that led to 3):

3) Having one pool of values (say, per process group? global?) was
something we explored, but in addition to the other reasons, we
thought that adding shared modifiable environment variables was
equivalent to adding a new and peculiar form of interprocessor
communication with questionable semantics, whose ramifications went
far beyond the problems that environment variables were supposed to
handle.

It may well be the case that this works fine on small single-user
systems.
We had enough arguments about IPC mechanisms anyway that we didn't
wantto make yet another one.

Anyway, the interesting result is that environment variables can have
semantics that differ radically among systems, even if the syntax is
the same.

Kevin Bracey

unread,

May 26, 2004, 7:32:59 AM5/26/04

to

In message <ce9d692b.04052...@posting.google.com>
old_sys...@yahoo.com (John Mashey) wrote:

> Kevin Bracey <kevin....@tematic.com> wrote in message news:<75bf8fb44...@tematic.com>...

> > My system (RISC OS) does. The environment strings (system variables in
> > RISC OS) are stored in Kernel space, and are only accessible via a call
> > that copies them into a user buffer. getenv mallocs a buffer of an
> > appropriate length, and copies the value into it. The buffer gets
> > realloced and/or reused on subsequent calls.
> >
> > And the value of a variable can change between calls, so to make the
> > getenv
> ...
>
> If I read the above right, it sounds like the environment strings are
> shared with other {processes? threads?} or at least something else can
> change them.

Yes. They are global to the system.

> 3) Having one pool of values (say, per process group? global?) was
> something we explored, but in addition to the other reasons, we
> thought that adding shared modifiable environment variables was
> equivalent to adding a new and peculiar form of interprocessor
> communication with questionable semantics, whose ramifications went
> far beyond the problems that environment variables were supposed to
> handle.

The system in question is single-user, and was not originally multitasking.
Thus it contains quite a few global settings that programs have to take
care not to alter in a way that will interfere with other programs. It's not
a secure system :)