how to avoid a memset() optimization

Francis Wai

unread,

Nov 7, 2002, 12:51:51 AM11/7/02

to

In a recent article (http://online.securityfocus.com/archive/82/297827),
Peter Gutmann raised a concern which has serious implications in
secure programming. His example, along the lines of,

int main()
{
char key[16];
strcpy(key, "whatever");
encrpts(key);
memset(key, 0, 16);
}

where memset() was optimized away because memset() is the last
expression before the next sequence point and that its side-effect is
not needed and that the subject of memset() is an auto variable. The
compiler sees that it is legitimate to optimize it away. This is _bad_
news for anyone concerns with sensitive data being left lying around
in memory.

Various suggestions have been made, such as declaring the variable
volatile and having a scrub memory function in a file of its own. I'm
wondering if there are better ways such as telling the compiler not to
optimize away a function call.
[Declaring the array volatile is the right way to do it. The reason
volatile exists is to tell the compiler not to do otherwise valid
optimizations. -John]

Lars Duening

unread,

Nov 8, 2002, 10:59:25 AM11/8/02

to

Francis Wai <fw...@rsasecurity.com> wrote:

> In a recent article (http://online.securityfocus.com/archive/82/297827),
> Peter Gutmann raised a concern which has serious implications in
> secure programming. His example, along the lines of,
>
> int main()
> {
> char key[16];
> strcpy(key, "whatever");
> encrpts(key);
> memset(key, 0, 16);
> }
>
> where memset() was optimized away because memset() is the last
> expression before the next sequence point and that its side-effect is

> not needed and that the subject of memset() is an auto variable. ...

>
> Various suggestions have been made, such as declaring the variable
> volatile and having a scrub memory function in a file of its own.

> [Declaring the array volatile is the right way to do it. The reason

> volatile exists is to tell the compiler not to do otherwise valid
> optimizations. -John]

Which is good news for the C/C++ crowd (at least those with compliant
compilers), but what about compilers for other languages?
[If they have something like volatile, use it. If not, you're on your
own. -John]

Alex Colvin

unread,

Nov 8, 2002, 11:02:08 AM11/8/02

to

"Francis Wai" <fw...@rsasecurity.com> writes:

>{
> char key[16];
> strcpy(key, "whatever");
> encrpts(key);
> memset(key, 0, 16);
>}

>Various suggestions have been made, such as declaring the variable

>volatile and having a scrub memory function in a file of its own. I'm
>wondering if there are better ways such as telling the compiler not to
>optimize away a function call.

>[Declaring the array volatile is the right way to do it. The reason
>volatile exists is to tell the compiler not to do otherwise valid
>optimizations. -John]

I hesitate to contradict the master, but I vote against 'volatile" for
key[]. If you declare key[] volatile, then you have to cast away the
volatility when passing it to strcpy(), encrpts(), and memset(), which
do not deal with volatile strings. In this example, there's no reason
why they should.

You want the compiler to assume a reference to key[] after memset(),
which is what you're assuming when you worry about someone seeing
it. Try declaring key[] static or external instead. That warns the
compiler that you're assuming a lifetime beyond main().

If you absolutely need key[] to be auto, then you've got a problem.
Consider writing your own memset() that accepts a volatile.
--
mac the naďf
[I don't see any reason that casting away the volatile wouldn't work. -John]

Fergus Henderson

unread,

Nov 12, 2002, 2:01:36 PM11/12/02

to

"Lars Duening" <la...@bearnip.com> writes:

>> [Declaring the array volatile is the right way to do it. The reason
>> volatile exists is to tell the compiler not to do otherwise valid
>> optimizations. -John]
>
>Which is good news for the C/C++ crowd (at least those with compliant
>compilers), but what about compilers for other languages?
>
>[If they have something like volatile, use it. If not, you're on your
>own. -John]

Most popular languages these days support a C interface, so you may be
able to use that to call an external routine written in C, using volatile,
which can clear out the memory.

Version 1.1 of the MS .NET CLR also provides routines for volatile memory
access in the standard library, for use from languages that don't have
direct support for volatile.
--
Fergus Henderson <f...@cs.mu.oz.au> | "I have always known that the pursuit
The University of Melbourne | of excellence is a lethal habit"
WWW: <http://www.cs.mu.oz.au/~fjh> | -- the last words of T. S. Garp.

Lars Duening

unread,

Nov 12, 2002, 2:05:00 PM11/12/02

to

Alex Colvin <al...@world.std.com> wrote:

> "Francis Wai" <fw...@rsasecurity.com> writes:
>
> ...the case of a memory scrub optimized away by the compiler...

>
> >Various suggestions have been made, such as declaring the variable
> >volatile and having a scrub memory function in a file of its own. I'm
> >wondering if there are better ways such as telling the compiler not to
> >optimize away a function call.
>
> >[Declaring the array volatile is the right way to do it. The reason
> >volatile exists is to tell the compiler not to do otherwise valid
> >optimizations. -John]
>
> I hesitate to contradict the master, but I vote against 'volatile" for

> key[]. ...

>
> You want the compiler to assume a reference to key[] after memset(),
> which is what you're assuming when you worry about someone seeing
> it. Try declaring key[] static or external instead. That warns the
> compiler that you're assuming a lifetime beyond main().

All these suggestions have in common that they try to use language
features in order to achieve a meta-language effect, while relying on
particular compiler implementations (in theory a compiler with global
lifetime analysis could optimize away the code to clear a static key[]
as well).

I think the lesson to be learned here is that compiler writers would do
well to give programmers some control over the compiler mechanics not
covered by the language. In this case, a construct

#pragma eliminate_dead_code=no
memset(key, 0, sizeof key);
#pragma eliminate_dead_code=restore

would state the programmer's intentions far clearer than any 'volatile'
construct, and also allow other optimizations still be performed on the
key.
[I don't see what the difference is between "volatile" and "don't
eliminate dead code". Volatile exists precisely to tell the compiler
that code that might appear to be dead isn't. -John]

Christian Bau

unread,

Nov 12, 2002, 2:03:43 PM11/12/02

to

> >{
> > char key[16];
> > strcpy(key, "whatever");
> > encrpts(key);
> > memset(key, 0, 16);
> >}

> [ how to be sure the memset isn't optimized away? ]

> >[Declaring the array volatile is the right way to do it. The reason
> >volatile exists is to tell the compiler not to do otherwise valid
> >optimizations. -John]
>
> I hesitate to contradict the master, but I vote against 'volatile" for
> key[]. If you declare key[] volatile, then you have to cast away the
> volatility when passing it to strcpy(), encrpts(), and memset(), which
> do not deal with volatile strings. In this example, there's no reason
> why they should.
>
> You want the compiler to assume a reference to key[] after memset(),
> which is what you're assuming when you worry about someone seeing
> it. Try declaring key[] static or external instead. That warns the
> compiler that you're assuming a lifetime beyond main().
>
> If you absolutely need key[] to be auto, then you've got a problem.
> Consider writing your own memset() that accepts a volatile.
> --
> mac the naďf
> [I don't see any reason that casting away the volatile wouldn't work. -John]

Calling memset to set a volatile variable or array in C is undefined
behaviour. Modifying or accessing any volatile data through a
pointer-to-non-volatile is undefined behaviour. When a volatile object
is modified, all accesses to that object have to happen exactly as
programmed (no as-if rule here). But if you call memset, it is
completely undefined how many accesses there are, and in which order
they happen, so this just cannot work correctly with volatile data.

[Oops, it's true, and casting away volatile makes the memset
discardable. In this case, since the buffer is small, a for loop
seems reasonable. -John]

Clayton Weaver

unread,

Nov 12, 2002, 2:08:02 PM11/12/02

to

>int main()
>{
> char key[16];
> strcpy(key, "whatever");
> encrpts(key);
> memset(key, 0, 16);
>}

if (!memset(key, 0, 16)) {
return error;
}

(ie waste a jump; might even be more efficient if it still allows
other optimizations that volatile would prevent).

Regards,

Clayton Weaver
<mailto: cgw...@aol.com>

Charles Bryant

unread,

Nov 13, 2002, 12:17:24 PM11/13/02

to

Francis Wai <fw...@rsasecurity.com> wrote:
>In a recent article (http://online.securityfocus.com/archive/82/297827),
>Peter Gutmann raised a concern which has serious implications in
>secure programming. His example, along the lines of,
>
>int main()
>{
> char key[16];
> strcpy(key, "whatever");
> encrpts(key);
> memset(key, 0, 16);
>}
>
>where memset() was optimized away because memset() is the last
>expression before the next sequence point and that its side-effect is
>not needed and that the subject of memset() is an auto variable. The
>compiler sees that it is legitimate to optimize it away.

Using a standard function such as memset() may permit such
optimisation, but if you write a special function for the purpose,
for example, clrmem(char *buf, unsigned size), then the compiler
cannot optimise it away. How could the compiler know that clrmem()
doesn't compute a value based on its input and store the result in a
global or static variable for later collection? The only way is if
there is global optimisation which is so aggressive that it deduces
the behaviour of functions in separate files. In fact, it must be the
linker which does this, since at compilation time you might not have
written clrmem(). And even if such a linker conspires with a compiler
to eliminate dead code, you can always write clrmem() in assembly.

Ultimately, in all but the most bizarre systems, there must be a way
to accomplish what you want to do, since exactly the same situation
occurs when you write data into a buffer and some piece of hardware
uses DMA to read the data. Note that such systems couldn't verify
that you really were doing DMA since a device might use DMA to fetch
a descriptor block and then interpret it as instructions for further
DMA operations - ultimately the first block could be executable code
for another CPU, so defeating the optimisation is only impossible in
some sort of closed system where the compiler and linker between them
have total knowledge of all hardware devices that will ever be
designed for the system.

Unfortunately, having appearntly proven that you can achieve what you
want, I must now prove the opposite.

Firstly, let me address the issue of 'volatile'. Its appearance in
this context is related to its appearance in relation to
multithreaded programming. Beginners to multithreaded programming
often believe that the use of 'volatile' is necessary and sufficient
to protect simultaneous access to a variable. While it may be on some
systems, in general (and in particular in conjunction with the
popular POSIX threading standard) it is neither necessary nor
sufficient. To see why, consider a system with two CPUs: A and B.
Suppose the function encrpts() is computationally intensive and split
between the two CPUs. The hardware might be connected like this:

CPU A + cache <---> Memory <---> cache + CPU B

Obviously this may result in some of key[] appearing in each cache.
The problem with 'volatile' is that when one CPU (e.g. CPU A)
executes code using 'volatile' semantics, at most it will result in
its cache and main memory being modified. There's no mechanism to
force the cache belonging to CPU B to be modified as well.

Of course this particular problem is very minor, since the program no
longer regards key[] as containing valid data, so CPU B's cache will
get discarded eventually, and only if there's a bug in the code will
it let part of key[] leak out.

However, there's a far more serious problem: the memory page
containing key[] might have been written to a paging file.
While that part of the paging file will eventually be re-used, it may
take an arbitrarily long time, and in the meanwhile the system may be
switched off, leaving the key as a mere bit pattern on the disk
vulnerable to any diagnostic tool that reads what used to be the
paging area.

Ultimately, this is a hardware problem. The requirement is that
hardware be put into a specific state, so the solution must be
hardware specific. The normal way to address this is by providing an
abstract interface which encapsulates the required semantics, leaving
the impementation to vary as necessary. In this case, for example:

int prepare_secure(secdesc_t *area, void *buf, unsigned size);

Prepares the specified memory area for secure access,
saving any necessary state in *area. This may involve
locking the pages it occupies, or merely modifying the
pager so that if it gets paged out, as soon as it is
paged in again that disk page is immediately scheduled
to be overwritten.

int delete_secure(secdesc_t area);

Ensures the area previously prepared with
prepare_secure() is completely erased.

This would make the code:

int main()
{
char key[16];

secdesc_t s;
if (prepare_secure(&s, key, sizeof(key))) {
printf("Cannot prepare secure area\n");
abort();

}
strcpy(key, "whatever");
encrpts(key);
memset(key, 0, 16);

if (delete_secure(&s)) {
printf("Cannot delete secure area\n");
abort();
}
}

Obviously this cannot be optimised away any more than the call to
printf() can be optimised away.
[Volatile isn't intended to address cache coherency and other multi-processor
issues. You need something beyond C to handle that. But see a subsequent
message where someone actually looked at the relevant parts of the C
standard. -John]

Dobes Vandermeer

unread,

Nov 13, 2002, 12:17:54 PM11/13/02

to

Lars Duening wrote:
>
> Alex Colvin <al...@world.std.com> wrote:
>
> > "Francis Wai" <fw...@rsasecurity.com> writes:
> >
> > ...the case of a memory scrub optimized away by the compiler...
> >
> > >Various suggestions have been made, such as declaring the variable
> > >volatile and having a scrub memory function in a file of its own. I'm
> > >wondering if there are better ways such as telling the compiler not to
> > >optimize away a function call.
> >
>

> #pragma eliminate_dead_code=no
> memset(key, 0, sizeof key);
> #pragma eliminate_dead_code=restore

Maybe I'm just silly, but isn't changing memset() so that it can be
optimized away just plain pointless?

The point of an automated optimizer is to perform optimizations that
can't be done in the source language, or that would be too ugly to do.
How often do programmers accidentally leave "dead" memset() calls in
their code, and is it worth the compiler's effort to remove these?

Maybe its to improve performance of badly generated code, which contains
a lot of dead memset() calls?

I suppose its possible that not just memset(), but a whole class of
functions are being targetted here. Perhaps even a solution like:

void *safe_memset(void *buf, int c, size_t len) { memset(buf, c, len); }

would be detected and removed?

CU
Dobes
[No need. See later messages. -John]

Fergus Henderson

unread,

Nov 13, 2002, 12:18:41 PM11/13/02

to

"Clayton Weaver" <cgw...@aol.com> writes:

>>int main()
>>{
>> char key[16];
>> strcpy(key, "whatever");
>> encrpts(key);
>> memset(key, 0, 16);
>>}
>
> if (!memset(key, 0, 16)) {
> return error;
> }
>
>(ie waste a jump; might even be more efficient if it still allows
>other optimizations that volatile would prevent).

This is not guaranteed to work. A compiler might easily know that
`memset(key, ...)' returns `key', and that the address of local variables
is never null, and might thus be able to optimize away both the if and
the memset.

Using volatile is better, because it provides more guarantees.

Unfortunately using volatile is not guaranteed to work either,
Using volatile guarantees that the contents of `key' will be
overwritten. But it doesn't prevent the compiler from having
alos stored the contents into other memory areas (e.g. the stack
frame used by encrpts()) which may not get cleared. However,
this failure mode is IMHO a lot less likely.

Jan C. Vorbrüggen

unread,

Nov 13, 2002, 12:20:34 PM11/13/02

to

Of course, there is also the difficulty in ensuring that a copy of the
data in question doesn't remain in backing store (e.g., a page file) because
on process rundown, the pages being deleted aren't written back but sent to
the free list right away. I'm sure of reading of some criminal investigations
yielding evidence by analysis of such files. (Windows is of course also famous
for leaving temporary files around almost indefinitely.)

Oh, and what about the copies in the OS's I/O buffers? or the memory of
attached I/O devices that were used to enter/read them?

This problem isn't easy to solve at all.

Jan

Arthur Chance

unread,

Nov 13, 2002, 12:27:54 PM11/13/02

to

"Lars Duening" <la...@bearnip.com> writes:
[On wiping sensitive data]

> I think the lesson to be learned here is that compiler writers would do
> well to give programmers some control over the compiler mechanics not
> covered by the language. In this case, a construct
>
> #pragma eliminate_dead_code=no
> memset(key, 0, sizeof key);
> #pragma eliminate_dead_code=restore
>
> would state the programmer's intentions far clearer than any 'volatile'
> construct, and also allow other optimizations still be performed on the
> key.
> [I don't see what the difference is between "volatile" and "don't
> eliminate dead code". Volatile exists precisely to tell the compiler
> that code that might appear to be dead isn't. -John]

I'm not a language lawyer, nor do I play one on TV, and am loathe to
disagree with our esteemed moderator, but looking at my copy of the
ANSI C spec I find section 5.1.2.3, third paragraph, second sentence:

"An actual implementation need not evaluate part of an expression if
it can deduce that its value is not used and that no needed side
effects are produced (including any caused by calling a function or
accessing a volatile object)."

If key isn't used after the memset (or a version of it with a volatile
void * argument), then the setting of key is not needed for future
program execution, so can be omitted whether or not key is declared
volatile. Expecting the compiler to understand that wiping the key is
"needed" for security in this case is an AI-complete problem.

It would seem to me the only way to force the memset would be to
access key afterwards, as in

memset(key, 0, sizeof key);
if (key[0]) impossibleError(); /* presuming suitable defns */

[I went back and looked at 5.1.2.3. From the context it's clear that
the sentence means that calls to functions that modify storage and
accesses to volatile storage are examples of side effects that require
that an expression containing them to be evaluated. Hence a
conformant compiler couldn't optimize away the memset() in the first
place. As others have noted, there's lots of other places the key
could be hanging around that no amount of buffer erasing would
handle. -John]

Chris F Clark

unread,

Nov 15, 2002, 12:39:13 AM11/15/02

to

I have little insight to add to this discussion, but I do wish to

reply to something Charles Bryant wrote:
> In fact, it must be the linker which does this, since at compilation
> time you might not have written clrmem(). And even if such a linker
> conspires with a compiler to eliminate dead code, you can always
> write clrmem() in assembly.

The MIPS compiler suite had exactly this behavior. The compiler spit
out "intermediate" code that the linker linked, optimized (including
dead-code elimination and value propagation), and then translated into
machine language. Not only did the MIPS (now SGI) compilers use this
technology, but the early Unix compilers for DEC Alpha's used it too.
(I don't know how the current Compaq compilers deal with this issue.)

Later the DEC group I consulted for(*), took this apporach even further,
and wrote "OM" (object modifier) which took executable images (of
machine language instructions) and optimized them. Thus, when OM was
used to eliminate your dead-code, there was no hope (unless you wanted
to construct your dead code in the data segment and jump into it),
since even assembly language routines were susceptible to its reach.

(*Technically, the development of OM was done by DEC WRL (Western
Reasearch Labs), but the Unix group released it as part of their
standard compiler suite.)

Hope this helps,
-Chris

*****************************************************************************
Chris Clark Internet : com...@world.std.com
Compiler Resources, Inc. Web Site : http://world.std.com/~compres
3 Proctor Street voice : (508) 435-5016
Hopkinton, MA 01748 USA fax : (508) 435-4847 (24 hours)

Arthur Chance

unread,

Nov 15, 2002, 12:40:17 AM11/15/02

to

"Arthur Chance" <usenet-...@qeng-ho.org> writes:
> I'm not a language lawyer, nor do I play one on TV, and am loathe to
> disagree with our esteemed moderator, but looking at my copy of the
> ANSI C spec I find section 5.1.2.3, third paragraph, second sentence:

Murphy's Law struck here. John was right and I misinterpreted the
standard. I realised my mistake 30 seconds after posting but John
obviously didn't see my mail asking him to ignore my erroneous
article.

> [I went back and looked at 5.1.2.3. From the context it's clear that
> the sentence means that calls to functions that modify storage and
> accesses to volatile storage are examples of side effects that require
> that an expression containing them to be evaluated. Hence a
> conformant compiler couldn't optimize away the memset() in the first
> place. As others have noted, there's lots of other places the key
> could be hanging around that no amount of buffer erasing would
> handle. -John]

[I didn't see the second message. Oh, well. -John]

Joachim Durchholz

unread,

Nov 17, 2002, 10:56:55 PM11/17/02

to

Chris F Clark wrote:
>
> The MIPS compiler suite had exactly this behavior. [...]

>
> Later the DEC group I consulted for(*), took this apporach
> even further, and wrote "OM" (object modifier) which took
> executable images (of machine language instructions) and
> optimized them.

Did these tools honor "volatile" modifiers? I'd assume that for the MIPS
compiler suite, but it's hard to imagine that for an OM-style optimizer.

Regards,
Joachim
[Same question for the other sequence point stuff. -John]

Chris F Clark

unread,

Nov 20, 2002, 3:32:00 PM11/20/02

to

Joachim asked:

> Did these tools honor "volatile" modifiers? I'd assume that for the MIPS
> compiler suite, but it's hard to imagine that for an OM-style optimizer.

John (the moderator) added:

> Same question for the other sequence point stuff.

I don't know for certain, but I doubt it. The MIPS internal
representation was based upon a modified P-code (Pascal) compiler and
I doubt it had a notation for don't move this across this boundary--at
least I don't remember one in the parts of the optimizer I worked on.

However, in the OM code, since it was based upon the hardware/OS spec
and there was a "memory barrier" instruction, which it was illegal to
move memory accesses across. Thus, one could construct programs that
had "sequence points" or protect "volatile" locations.

I don't know to what extent those instructions were supported at the
high-level language interface (certainly one could stick in inline
assembly to do so), but did the compilers generate memory barriers for
sequence points or around volatile accesses? I would doubt that they
did.

Of course, I also doubt that OM was considered to be part of the ANSI
conforming tool chain for C or FORTRAN anyway, since only certain
specific compiler options settings were listed as being ANSI
conformant, and other combinations did not promiss conformance. (I'm
certain that the optimization level that turned on "whole program"
optimization was not part of the ANSI conformant switches.)

That also brings up the whole question of benchmarksmanship. The
compiler options used in running the benchmarks were not the same ones
used for promising ANSI conformance (nor even the same across all
benchmarks). Then again, neither set of options necessarily matched
the ones typical users would have used to compile their programs
either. On the other hand, if one availed oneself of the benchmark
center, there were engineers that would give some help in tuning ones
application and thus finding the right combination of switches, if one
cared enough.

Hope this helps,
-Chris

*****************************************************************************
Chris Clark Internet : com...@world.std.com
Compiler Resources, Inc. Web Site : http://world.std.com/~compres
3 Proctor Street voice : (508) 435-5016
Hopkinton, MA 01748 USA fax : (508) 435-4847 (24 hours)

------------------------------------------------------------------------------

t...@cs.ucr.edu

unread,

Nov 24, 2002, 1:28:01 AM11/24/02

to

Chris F Clark <c...@shell01.theworld.com> wrote:
+ Joachim asked:
+> Did these tools honor "volatile" modifiers? I'd assume that for the MIPS
+> compiler suite, but it's hard to imagine that for an OM-style optimizer.
+
+ John (the moderator) added:
+> Same question for the other sequence point stuff.
+
+ I don't know for certain, but I doubt it. The MIPS internal
+ representation was based upon a modified P-code (Pascal) compiler and
+ I doubt it had a notation for don't move this across this boundary--at
+ least I don't remember one in the parts of the optimizer I worked on.
+
+ However, in the OM code, since it was based upon the hardware/OS spec
+ and there was a "memory barrier" instruction, which it was illegal to
+ move memory accesses across. Thus, one could construct programs that
+ had "sequence points" or protect "volatile" locations.

Thanks. I had wondered if practicing compiler writers ever pay
attention to memory barriers. (AFAIK, memory barriers are not needed
for standard C/C++, but they're needed for multithreaded extensions,
e.g., pthreads. Such barriers must prevent instructions from leaking
out the tops and/or bottoms of critical regions. That includes
leakage from code motion by compilers as well as from instruction
rearrangement by hardware.)

It appears that for conforming C/C++ implementations, compiler writers
can ignore sequence points except in the case of
- volatile variables,
- architectures where memory locations are in an inaccessible
state for some time after they've been written.
Apparently, the as-if rule takes care of all other cases.

Tom Payne

Charles Bryant

unread,

Dec 1, 2002, 10:37:49 PM12/1/02

to

Chris F Clark <c...@shell01.TheWorld.com> wrote:
... clearing memory for security and optimising it away ...

>Later the DEC group I consulted for(*), took this apporach even further,
>and wrote "OM" (object modifier) which took executable images (of
>machine language instructions) and optimized them. Thus, when OM was
>used to eliminate your dead-code, there was no hope (unless you wanted
>to construct your dead code in the data segment and jump into it),
>since even assembly language routines were susceptible to its reach.

Most CPUs have instructions other than normal stores which modify
memory. For example the SPARC has cas and ldstub. On the SPARC these
are explicitly for memory synchronisation, so any optimiser which
re-orders or deletes them is simply incorrect. On other architectures
there may instead be instructions which are too complex for an
optimiser to handle, or you could simply write code too complex for
the optimiser. Since, presumably, it isn't going to attempt to solve
the halting problem, there must be code which it cannot optimise away
but which has no effect.