HardBound and SoftBound (was "The State of Software")

Andy "Krazy" Glew

unread,

Aug 2, 2009, 3:46:41 PM8/2/09

to Robert Myers

Robert Myers::
> ... expresses opinions about the sorry state of software ...

I hope it's okay

a) if I jump into this conversation, originally entitled "The satate of
software (was Re: comp.arch has made itself a sitting duck for spam)"

b) change the title, and break into a new thread (IMHO Thunderbird
overthreads stuff, so converszations that have changed subject still
remain buried under threads I have long since started ignoring.

c) talk not about proving software mathematically correct, but about
something else...

I would like to draw the attention of comp.arch to Milo Martin's work on
HardBound and SoftBound. Indeed, I would like to draw the attention of
the whole computer industry to this work.

References:
* http://www.cis.upenn.edu/~milom/
* http://www.cis.upenn.edu/acg/papers/pldi09_softbound.pdf
* http://www.cis.upenn.edu/acg/papers/asplos08_hardbound.pdf

HardBound and SoftBound attack the problem of temporal memory safety.
I.e. buffer overflows, and related memory issues, which are the security
holes executed by so much malware. (Depending on the report, from many
viewpoints the majority.)

HardBound and SoftBound differ from previous work in this area in that
(a) they apply to C and C++ - they do not require new programming
languages like Java and C#,
(b) they apply to existing C and C++ data layout standards - i.e.
they do not require fat pointers and related changes to ABIs, such as
technologies such as IBM AS400 require.

I hope and expect that in the future. Milo will extend this work to
temporal memory errors (dangling pointers, etc.).

In HardBound Milo and his students proposed instruction set and hardware
extensions to (1) allow a compiler to indicate to hardware the bounds of
an array or other data structure in memory, and (b) propagate these
bounds in a dataflow manner, through registers and memory, and (c)
perform bounds checks as necessary.
Key is the association and propagation of bounds through memory.
Instead of relying on "fat pointers", which change the ABI and hence
cause compatibility problems, HardBound places the descriptors in a
separate, disjoint, region of memory.

SoftBound is similar, but is completely software based. SoftBound is
implemented in LLVM, an Open Source compiler that may possibly be the
next gcc.

Both HardBound and SoftBound trade off surprisingly small performance
losses for this increased security.

Q: how much performance would you trade off to have a guarantee of no
buffer overflows? Myself, I would glady trade off 20%; and Milo's
papers indicate that the cost is less.

I wish that every C or C++ library function or program in the world was
compiled with something like SoftBound. If necessary, then performance
could be accelerated with something like HardBound.

--
The content of this message is my personal opinion only.
Although I am an employee - currently of Intel,
in the past of other computer companies such as AMD, Motorola, and Gould
- I reveal this only so that the reader may account
for any possible bias I may have towards my employer's products.
The statements I make here in no way represent my employer's position,
nor am I authorized to speak on behalf of my employer.

In fact, this posting may not even represent my personal opinion,
since occasionally I play devil's advocate.

Andrew Reilly

unread,

Aug 2, 2009, 8:11:07 PM8/2/09

to

On Sun, 02 Aug 2009 12:46:41 -0700, Andy \"Krazy\" Glew wrote:

> Q: how much performance would you trade off to have a guarantee of no
> buffer overflows? Myself, I would glady trade off 20%; and Milo's
> papers indicate that the cost is less.

I think that this is a the key question. Coding in C (and C++) is still
popular, I believe, because it offers the lure of being able to get to
the "speed of light" of the machine. Assuming that the program can be
written without errors, that the right/best algorithms are used, and
(increasingly) that the work can be distributed usefully over the
available sea of processors. All of which is, I suspect, the guts of
Robert's grief: it's simply premature optimization of the worst kind.

How is a 20% "slow-down" even significant, if the program is only getting
to 5% of peak FPU capacity because it is thrashing the cache or waiting
for a communication from another processor, or only using one of the
eight (or eight hundred?) processors in the system? (Or using an O(N^2)
algorithm where an O(N logN) or O(N) algorithm exists?)

> I wish that every C or C++ library function or program in the world was
> compiled with something like SoftBound. If necessary, then performance
> could be accelerated with something like HardBound.

I wish that everyone would just stop trying to use C for anything other
than low-level operating systems work (which by population, hardly anyone
does these days.)

I very much doubt that there is ever going to be a need for hardware-
based array bounds checking like HardBound *unless* people insist on
continuing to code in C and C++, where the language offers no help. In
languages that have first-class arrays (where the size of the array is
maintained by the language and checked according to the specification)
then it is well known that *most* of the actual checks can be hoisted out
of inner loops as loop-invariant conditions, so that their presence costs
*nothing at all*. (Java makes this more complicated than necessary,
because its overly-prescriptive spec requires that the error be raised
only after the in-bounds accesses and associated code has run. There's
no requirement that that sort of condition be kept in future language
develoments.)

Yes, I think that LLVM and technologies like it (including Microsoft's IL
and the JVM) are going to be key to the future of software. The notion
that the compiler should stop working on the problem of optimizing your
code before you start running it will eventually die out... I imagine a
time when the compiler and run-time are intricately aware of the OS and
scheduler, and always present: code will continue to morph at run-time
(particularly on long-running computations) so that code moves closer to
data across the interconnect (or vice-versa), and observed patterns are
re-compiled into fast-paths: a self-annealing system of code and data.

Correctness of these programs is a completely different question (that
Robert also raised), and there are useful developments in that area too.
As Nick said, you have to start with a language that has a specification
that allows the code to be reasoned about. Scheme is one such, and I
believe that some of the Hindley-Milner languages probably also have
similarly complete semantic specifications. I recommend that anyone who
is interested have a look at the contract system that is being developed
(and already quite functional) within the PLT-scheme group. Here you can
have contracts on functions that don't just encode type constraints, but
can also have arbitrary algorithm-relevant checking too. The important
part of it, though, is the way the contract-enforcement system allocates
"blame" for breaking the contract either to the caller or the callee. It
seems that there are a large school of programmers who think that compile-
time correctness proofs are the only way to go, and maybe they'll be
right eventually, but I'm not convinced that all of the things that one
might want to say about the behaviour of a program are amenable to
proofs, and some measure of testing and shake-down during development
(and even deployment) are worth having (too).

A lot of these tecnologies are already available. How to convince
programmers to use them, instead of beating themselves up with largely
irrelevant (premature) micro-optimizations is a question that I have no
answer to...

Cheers,

--
Andrew

Robert

unread,

Aug 2, 2009, 11:46:05 PM8/2/09

to

"Andy "Krazy" Glew" <ag-...@patten-glew.net> wrote in message
news:4A75ED21...@patten-glew.net...

Very interesting. Even if this technique is not provably correct, they
still get
useful results. Change the game a bit by introducing new constraints which
reduce the problem scope, or solve a meaningful subset of the problem.

I am very intrigued by Singularity. They solve the problem by saying each
object has one, and only one owner. All communications between functions,
classes, processes, threads, etc goes through a channel. Each channel has a
distinct object interface. NO bogus data can be passed. It will not
compile..

The big hurdle they introduce is that NO external code is allowed. They
generate a run time and VM for it, with your code linked in. The entire
software stack is then available for static analysis at compile time.
VM in the sense of running on bare metal, no OS. Not a java style VM..
They get a perf BOOST. Further, running without any processes (ie only one
big one) is the fastest. Everything at Ring 0, provably correct (wrt memory)
down to typed assembler. You can do process, too. Each can have a
different
scheduler and allocator, etc.

Robert

unread,

Aug 3, 2009, 12:32:36 AM8/3/09

to

"Andrew Reilly" <andrew-...@areilly.bpc-users.org> wrote in message
news:7dmo8qF...@mid.individual.net...

> On Sun, 02 Aug 2009 12:46:41 -0700, Andy \"Krazy\" Glew wrote:
>
>> Q: how much performance would you trade off to have a guarantee of no
>> buffer overflows? Myself, I would glady trade off 20%; and Milo's
>> papers indicate that the cost is less.
>
> I think that this is a the key question. Coding in C (and C++) is still
> popular, I believe, because it offers the lure of being able to get to
> the "speed of light" of the machine. Assuming that the program can be
> written without errors, that the right/best algorithms are used, and
> (increasingly) that the work can be distributed usefully over the
> available sea of processors. All of which is, I suspect, the guts of
> Robert's grief: it's simply premature optimization of the worst kind.
>
> How is a 20% "slow-down" even significant, if the program is only getting
> to 5% of peak FPU capacity because it is thrashing the cache or waiting
> for a communication from another processor, or only using one of the
> eight (or eight hundred?) processors in the system? (Or using an O(N^2)
> algorithm where an O(N logN) or O(N) algorithm exists?)
>
>> I wish that every C or C++ library function or program in the world was
>> compiled with something like SoftBound. If necessary, then performance
>> could be accelerated with something like HardBound.
>
> I wish that everyone would just stop trying to use C for anything other
> than low-level operating systems work (which by population, hardly anyone
> does these days.)

OS's can be written in high level languages. So even there, it is of
dubious
value..

> I very much doubt that there is ever going to be a need for hardware-
> based array bounds checking like HardBound *unless* people insist on
> continuing to code in C and C++, where the language offers no help. In
> languages that have first-class arrays (where the size of the array is
> maintained by the language and checked according to the specification)
> then it is well known that *most* of the actual checks can be hoisted out
> of inner loops as loop-invariant conditions, so that their presence costs
> *nothing at all*. (Java makes this more complicated than necessary,
> because its overly-prescriptive spec requires that the error be raised
> only after the in-bounds accesses and associated code has run. There's
> no requirement that that sort of condition be kept in future language
> develoments.)

Yes. Change the constraints, build a walled garden. "Doc, it hurts when I
do this!" Answer - "Don't do that.." Some problems need to be solved,
but not by app programmers. Solve it at the right place, in the run time,
and let everybody else get on with solving their problems.

> Yes, I think that LLVM and technologies like it (including Microsoft's IL
> and the JVM) are going to be key to the future of software. The notion
> that the compiler should stop working on the problem of optimizing your
> code before you start running it will eventually die out... I imagine a
> time when the compiler and run-time are intricately aware of the OS and
> scheduler, and always present: code will continue to morph at run-time
> (particularly on long-running computations) so that code moves closer to
> data across the interconnect (or vice-versa), and observed patterns are
> re-compiled into fast-paths: a self-annealing system of code and data.

Some meta-data, some tracing, some heuristics in the run time, maybe!

> Correctness of these programs is a completely different question (that
> Robert also raised), and there are useful developments in that area too.
> As Nick said, you have to start with a language that has a specification
> that allows the code to be reasoned about. Scheme is one such, and I
> believe that some of the Hindley-Milner languages probably also have
> similarly complete semantic specifications. I recommend that anyone who
> is interested have a look at the contract system that is being developed
> (and already quite functional) within the PLT-scheme group. Here you can
> have contracts on functions that don't just encode type constraints, but
> can also have arbitrary algorithm-relevant checking too. The important
> part of it, though, is the way the contract-enforcement system allocates
> "blame" for breaking the contract either to the caller or the callee. It
> seems that there are a large school of programmers who think that compile-
> time correctness proofs are the only way to go, and maybe they'll be
> right eventually, but I'm not convinced that all of the things that one
> might want to say about the behaviour of a program are amenable to
> proofs, and some measure of testing and shake-down during development
> (and even deployment) are worth having (too).
>
> A lot of these tecnologies are already available. How to convince
> programmers to use them, instead of beating themselves up with largely
> irrelevant (premature) micro-optimizations is a question that I have no
> answer to...

We need some micro-breakthroughs. We can solve memory correctness.
We can detect deadlocks, livelocks, etc. We can do tracing, we can monitor
CPU utilization via ticks, we can preserve meta data about compiled code.
And much more. Each of those can be accommodated by stretching an
existing language. When enough of them become mature and stable, they
can be merged. At that point, hopefully we have enough specs in hand where
we can re-think all the assumptions behind hardware, OS, VM, runtime, app,
etc.
Re-draw the lines.

Then some smart person can redefine the whole mess. Like doing orbital
mechanics with cartesian coordinates. Restate the problem, things become
easier..

nm...@cam.ac.uk

unread,

Aug 3, 2009, 3:14:34 AM8/3/09

to

In article <4A75ED21...@patten-glew.net>,

Andy \"Krazy\" Glew <"ag-news AT patten DASH my-last-name DOT net"> wrote:
>
>HardBound and SoftBound differ from previous work in this area in that
> (a) they apply to C and C++ - they do not require new programming
>languages like Java and C#,
> (b) they apply to existing C and C++ data layout standards - i.e.
>they do not require fat pointers and related changes to ABIs, such as
>technologies such as IBM AS400 require.
>

> Key is the association and propagation of bounds through memory.
>Instead of relying on "fat pointers", which change the ABI and hence
>cause compatibility problems, HardBound places the descriptors in a
>separate, disjoint, region of memory.

I am sorry, Andy, but this is a complete fantasy. In theory, and
ignoring some extremely nasty ambiguities in C and C++, one MIGHT
be able to check all uses where a reference goes outside the
bounds of the originally allocated maximal object. But that does
permit detecting any aliasing errors, anywhere where an overflow
of a sub-object accesses the wrong sub-object, and so on.

But even that would apply only to a restricted subset of the C (and
the forthcoming) C++ languages. In particular, it would not work
for any C or C++ program that does non-trivial memory management,
calls asynchronous external agents (POSIX asynch I/O, MPI etc.),
and so on. While an impractical theoretician might say that those
are outside the C language, that is not a practical attitude, and
probably wrong anyway.

Lastly, one can get a good half of the gains for a vast amount less
effort and overhead, and many of us have done so. I implemented
just such a scheme because it was quicker than trying to port any
of the then available C memory checkers. A couple of days' work.

My point is not to denigrate that work, but to point out that it
isn't and can't be more than a relatively small help for C and C++,
because of the nature of the languages. Such checking CAN be done
effectively for Fortran, has been in the past, and is by at least
the NAG compiler.

Regards,
Nick Maclaren.

nm...@cam.ac.uk

unread,

Aug 3, 2009, 3:26:40 AM8/3/09

to

>... But that does

>permit detecting any aliasing errors, anywhere where an overflow
>of a sub-object accesses the wrong sub-object, and so on.

It should say "But that does NOT permit detecting ...."

Regards,
Nick Maclaren.

Andy "Krazy" Glew

unread,

Aug 3, 2009, 12:03:33 PM8/3/09

to Robert

Robert wrote:
> "Andrew Reilly":

>> On Sun, 02 Aug 2009 12:46:41 -0700, Andy \"Krazy\" Glew wrote:
>>
>>> Q: how much performance would you trade off to have a guarantee of no
>>> buffer overflows? Myself, I would glady trade off 20%; and Milo's
>>> papers indicate that the cost is less.

> Some problems need to be solved,

> but not by app programmers. Solve it at the right place, in the run time,
> and let everybody else get on with solving their problems.

We need to find places, bottlenecks if you will, that all or most codes
pass through, where security constraints can be enforced.

The compiler / JIT / run-time is one such place. Almost, but not quite.
*Most* code is compiled - but it is usually quite easy to escape into
assembly, whether as a separate .s file or as inline assembly.

At the very least, the ability to code in assembly, and, for that
matter, to link to an unchecked assembly language interface, should
require special privilege. In some ways this is what the D programming
language does with its safe and unsafe flavors. I believe that C# / .NET
have similar safe and unsafe flavors.

However - people can still get around the "security". The compiler / JIT
/ run-time is an incomplete gateway.

This is what I like about HardBound. Every piece of code, whether C/C#
or assembly, must go through the hardware address generation logic. It
is a complete bottleneck, a point of high leverage, at which security
related assertions can be made. The instruction set binding of the
security related / bounds checking can be implemented in an advisory
manner: we can allow assembly code to bypass it if necessary. But I can
also imagine "complete" implementations, that enforce strong consistency
checks, a la IBM AS400, on all memory references. Moreover, I can
imagine mixed implementations - where certain subsystems and memory
regions allow the compiler or the assembly language programmer to
eliminate bounds checks, but where other subsystes and memory regions
require stronger security.

I.e. HardBound, as an instruction set binding, can support several
different levels of security, ranging from relaxed to complete. Whereas
software based, compiler / JIT / run-time approaches, can only support
incomplete security. Unless they change the ecosystem significantly.

>> A lot of these tecnologies are already available. How to convince
>> programmers to use them, instead of beating themselves up with largely
>> irrelevant (premature) micro-optimizations is a question that I have no
>> answer to...

By the way: somebody responding to these posts said, highly paraphrased:
"It is better to do security in software via a compiler / JIT / run-time
like SoftBound, than it is to do it in hardware or an instruction set
like HardBound - since software can optimize away the vast majority of
checks."

I agree: optimizations that eliminate most buffer overflow checks.

But: this statement sounds like premature optimization to me.

Andy "Krazy" Glew

unread,

Aug 3, 2009, 12:11:20 PM8/3/09

to nm...@cam.ac.uk

nm...@cam.ac.uk wrote:
> I am sorry, Andy, but this is a complete fantasy. In theory, and
> ignoring some extremely nasty ambiguities in C and C++, one MIGHT
> be able to check all uses where a reference goes outside the
> bounds of the originally allocated maximal object. But that does
> permit detecting any aliasing errors, anywhere where an overflow
> of a sub-object accesses the wrong sub-object, and so on.

I am sorry, Nick: I happen to know that you are wrong.

For example:

struct S {
int a;
char b[10];
int c;
};
S* p = malloc(sizeof(s));

Bounds are associated with p.

But if one starts accessing p->b, one gets a narrowed set of bounds
associated only with the subobject p->b. That cannot be used to access
p->a or p->c.

The only real problem that C and C++ have is that they do not
distinguish "pointer to array of T" and "pointer into array of T" from
"pointer to a non-array instance of T".

nm...@cam.ac.uk

unread,

Aug 3, 2009, 12:51:31 PM8/3/09

to

In article <4A770C28...@patten-glew.net>,

Andy \"Krazy\" Glew <"ag-news AT patten DASH my-last-name DOT net"> wrote:
>
>> I am sorry, Andy, but this is a complete fantasy. In theory, and
>> ignoring some extremely nasty ambiguities in C and C++, one MIGHT
>> be able to check all uses where a reference goes outside the
>> bounds of the originally allocated maximal object. But that does
>> permit detecting any aliasing errors, anywhere where an overflow
>> of a sub-object accesses the wrong sub-object, and so on.
>
>I am sorry, Nick: I happen to know that you are wrong.

"It ain't what we don't know that causes the trouble; it's what we
know for sure that ain't so."

I am afraid that you have misunderstood me - I am not saying that
the software/hardware can't do such checking - I am saying that the
resulting restrictions are sometimes a breach of the C standard,
almost always straddle an ambiguity, and break a lot of important
programs.

>For example:
>
> struct S {
> int a;
> char b[10];
> int c;
> };
> S* p = malloc(sizeof(s));
>
>Bounds are associated with p.

So far, so good. Well, ignoring the fact that you meant a typedef
of S, and not a tag definition of S - but that's irrelevant to
your point.

>But if one starts accessing p->b, one gets a narrowed set of bounds
>associated only with the subobject p->b. That cannot be used to access
>p->a or p->c.

Depending on how the code does that, it can be a breach of the C
standard - and it certainly breaks important code.

In a clean, simple code like that, it is possible to read the C
standard so that your interpretation is conforming. But, as soon
as you bring in the nastier areas of the standard (such as memcpy
and the fact that any object can be treated as an array of unsigned
char), it falls over. And, yes, real, important code relies on the
fact that it WON'T fail :-(

Instead of 'p->b', try the object pointed to by the following:
(char *)((unsigned char *)&p->a+offsetof(S,b))
So what does it do about that?

I can assure you that real code does that sort of thing, and it is
very often nowhere near as simple, either.

>The only real problem that C and C++ have is that they do not
>distinguish "pointer to array of T" and "pointer into array of T" from
>"pointer to a non-array instance of T".

I wish :-(

Sorry, but I spent a LONG time as part of the UK contingent on both
the C89 and C99 standards, trying to get this sort of thing made
less inconsistent and unambiguous. I also spent a decade supporting
a wide range of unusual systems, and seeing where the problems arise
in practice once you start pushing the boundaries. I probably know
as much about this area of C as anyone.

If you disagree, I can send you my Objects diatribe. One of the
things that it does is to show a continuum of examples, from ones
that are clearly illegal to ones that are clearly legal. I have
NEVER had anyone respond with any clear idea of where the boundary
lies, let alone wording that could demonstrate it.

Regards,
Nick Maclaren.

Andy "Krazy" Glew

unread,

Aug 4, 2009, 12:53:21 AM8/4/09

to nm...@cam.ac.uk

nm...@cam.ac.uk wrote:
> I am afraid that you have misunderstood me - I am not saying that
> the software/hardware can't do such checking - I am saying that the
> resulting restrictions are sometimes a breach of the C standard,
> almost always straddle an ambiguity, and break a lot of important
> programs.

Again, I am pretty confident that any standards compliant program runs
correctly, given a compliant compiler.

>> For example:
>>
>> struct S {
>> int a;
>> char b[10];
>> int c;
>> };
>> S* p = malloc(sizeof(s));
>>
>> Bounds are associated with p.
>
> So far, so good. Well, ignoring the fact that you meant a typedef
> of S, and not a tag definition of S - but that's irrelevant to
> your point.

C++

> Sorry, but I spent a LONG time as part of the UK contingent on both
> the C89 and C99 standards,

My experts can beat up you and your experts.

> If you disagree, I can send you my Objects diatribe. One of the
> things that it does is to show a continuum of examples, from ones
> that are clearly illegal to ones that are clearly legal. I have
> NEVER had anyone respond with any clear idea of where the boundary
> lies, let alone wording that could demonstrate it.

Yes, I'd love to see this objects diatribe. I will encourage Milo to
use it as a test case.

Robert

unread,

Aug 4, 2009, 1:18:36 AM8/4/09

to

"Andy "Krazy" Glew" <ag-...@patten-glew.net> wrote in message

news:4A77BEC1...@patten-glew.net...

> nm...@cam.ac.uk wrote:
>> I am afraid that you have misunderstood me - I am not saying that
>> the software/hardware can't do such checking - I am saying that the
>> resulting restrictions are sometimes a breach of the C standard,
>> almost always straddle an ambiguity, and break a lot of important
>> programs.
>
> Again, I am pretty confident that any standards compliant program runs
> correctly, given a compliant compiler.

Can both be done at the same time? A hypothetical compiler/run time
could do some static analysis, maybe a function that takes two doubles in
and returns the sum.. This would be marked "safe" and run at native speed.
Another function takes in some pointers to foo, which is a concurrently
accessed data structure. Mark it "unsafe".

We have then divided the problem into two piles. The first pile we can
"safely" ignore.. The second could be analyzed further. Run it through
a unit test generator that probes every branch of the routine. Run the test
case under two different environments, classic "loose" C, the other with
whatever new constraints are added. If the piece of code gives the same
results
in both, then toss it in the "safe" pile.

The resulting "unsafe" pile could then be run in an interpreter, or sandbox,
giving the "loose" C specified result. This would allow you to gradually
tighten
up the constraints of an arbitrary program, while getting consistent output.

Have your code editor high light these "bad" sections of code, so the
programmer knows that a given statement causes confusion to the runtime
resulting in a slow code path. These bits of code could then be
refactored..

How should one be alerted to reliance on ambiguous language specs?

nm...@cam.ac.uk

unread,

Aug 4, 2009, 4:05:03 AM8/4/09

to

In article <4A77BEC1...@patten-glew.net>,

Andy \"Krazy\" Glew <"ag-news AT patten DASH my-last-name DOT net"> wrote:
>
>> I am afraid that you have misunderstood me - I am not saying that
>> the software/hardware can't do such checking - I am saying that the
>> resulting restrictions are sometimes a breach of the C standard,
>> almost always straddle an ambiguity, and break a lot of important
>> programs.
>
>Again, I am pretty confident that any standards compliant program runs
>correctly, given a compliant compiler.

I am sorry, but you really don't understand the C standard - which
underlies much of C++, unfortunately. Section 4 (Conformance) makes
it clear that a conforming program need work only under the conforming
implementation it is written for. A strictly conforming program must
work anywhere, but there are precisely two of them - one returns
EXIT_SUCCESS and the other returns EXIT_FAILURE.

On a practical point, the reason that so many C and C++ programs come
with their obscene and gargantuan autoconfigurations is precisely
because the authors can't write portable code. Some of that is because
they lack Clue - but as much is because the language makes it so hard
to do.

>> Sorry, but I spent a LONG time as part of the UK contingent on both
>> the C89 and C99 standards,
>
>My experts can beat up you and your experts.

Politically, I quite agree.

>> If you disagree, I can send you my Objects diatribe. One of the
>> things that it does is to show a continuum of examples, from ones
>> that are clearly illegal to ones that are clearly legal. I have
>> NEVER had anyone respond with any clear idea of where the boundary
>> lies, let alone wording that could demonstrate it.
>
>Yes, I'd love to see this objects diatribe. I will encourage Milo to
>use it as a test case.

He will have to build it into code first - it describes the problem,
and does not code a test. I will send it.

Regards,
Nick Maclaren.

nm...@cam.ac.uk

unread,

Aug 4, 2009, 4:22:56 AM8/4/09

to

In article <h58gqi$8n0$1...@news.eternal-september.org>,

Robert <rob...@nospam.com> wrote:
>
>"Andy "Krazy" Glew" <ag-...@patten-glew.net> wrote in message
>news:4A77BEC1...@patten-glew.net...
>> nm...@cam.ac.uk wrote:
>>> I am afraid that you have misunderstood me - I am not saying that
>>> the software/hardware can't do such checking - I am saying that the
>>> resulting restrictions are sometimes a breach of the C standard,
>>> almost always straddle an ambiguity, and break a lot of important
>>> programs.
>>
>> Again, I am pretty confident that any standards compliant program runs
>> correctly, given a compliant compiler.
>
>Can both be done at the same time?

No. See later as for why.

> A hypothetical compiler/run time
>could do some static analysis, maybe a function that takes two doubles in
>and returns the sum.. This would be marked "safe" and run at native speed.

Well, only if you aren't doing any numeric checking :-)

>Another function takes in some pointers to foo, which is a concurrently
>accessed data structure. Mark it "unsafe".
>
>We have then divided the problem into two piles. The first pile we can
>"safely" ignore.. The second could be analyzed further. Run it through
>a unit test generator that probes every branch of the routine. Run the test
>case under two different environments, classic "loose" C, the other with
>whatever new constraints are added. If the piece of code gives the same
>results
>in both, then toss it in the "safe" pile.

One mistake here is in thinking that there is or can be a testable
distinction between correct and incorrect code. To get there, you
would have to design a language that is almost, but not quite, entirely
unlike standard C. Yes, of course that could be done, and my point is
is that it is a prerequisite for doing the job properly.

You also seem to be assuming that the 'error' will lead to wrong
results on YOUR system. In most programming languages, C and C++
not excepted, most of the semantic restrictions are there to allow
for their implementation on "non-mainstream" systems. As I mentioned,
I have a decade's experience supporting some very unusual ones. And
please note that you can't just test on ONE unusual system.

To take a simple example, current systems are essentially serial with
some tweaks like SSE. Now consider traditional vector machines,
highly parallel asynchronous ones (like the Tera MTA), multi-core
parallel ones without cache coherence (like most of the 1960s to
1980s parallel ones), dataflow systems and so on. Each of those
approaches has less in common with each other than with a current
serial system!

Regards,
Nick Maclaren.

Andy "Krazy" Glew

unread,

Aug 4, 2009, 10:48:40 AM8/4/09

to Robert

Robert wrote:
>> Again, I am pretty confident that any standards compliant program runs
>> correctly, given a compliant compiler.
>
> Can both be done at the same time?

Yes. That is, I believe, what the HardBound and SoftBound compilers do.

Robert Myers

unread,

Aug 4, 2009, 5:37:19 PM8/4/09

to

On Aug 4, 4:22 am, n...@cam.ac.uk wrote:

> One mistake here is in thinking that there is or can be a testable
> distinction between correct and incorrect code. To get there, you
> would have to design a language that is almost, but not quite, entirely
> unlike standard C. Yes, of course that could be done, and my point is
> is that it is a prerequisite for doing the job properly.

I'm not sure it's possible even then. Godel/Turing seem to say that
it is demonstrably not possible in all cases. The question is whether
it is possible in any practical sense, and I suspect that the answer
is still no.

That's why I didn't propose machine-verifiable "correctness" as a
desirable property of code, only machine-verifiable properties that
are mathematically defined, leaving open the question as to what those
properties would be.

I'm surprised that you pursue precise definitions of computer
languages in natural language, as such a thing is surely not possible,
and I'm pretty sure you must know that.

I would trust only definitions and tests that can be reduced to
formalism. I'll live with the fact that most programmers will not
really understand the definitions or or the tests.

Redefining the world of software (and hardware) isn't going to
happen. We need to start looking for implementable guarantees and
leave it up to the chaotic world of real hardware and software as to
how to satisfy those guarantees, just so long as they are satisfied.

In short, the discussion must be moved from a specification of
methods, which leads only to endless arguments, to a specification of
verifiable results. Even if satisfying all the the guarantees (which
could be financial) doesn't result in error-free hardware and
software, it would be a vast improvement on the "trust me" world that
is currently the world of both hardware and software.

Robert.

nm...@cam.ac.uk

unread,

Aug 4, 2009, 6:02:20 PM8/4/09

to

In article <29e453a0-0162-4c9a...@r2g2000yqm.googlegroups.com>,

Robert Myers <rbmye...@gmail.com> wrote:
>
>> One mistake here is in thinking that there is or can be a testable

>> distinction between correct and incorrect code. =A0To get there, you

>> would have to design a language that is almost, but not quite, entirely

>> unlike standard C. =A0Yes, of course that could be done, and my point is

>> is that it is a prerequisite for doing the job properly.
>
>I'm not sure it's possible even then. Godel/Turing seem to say that
>it is demonstrably not possible in all cases. The question is whether
>it is possible in any practical sense, and I suspect that the answer
>is still no.

No, that's a completely separate question. You can bypass the issue
by requiring run-time error detection (which is NOT subject to the
Goedel/Turing limitation), but that is by the way.

I was referring to the fact that the C standard is both ambiguous
and inconsistent, and therefore a huge amount of real code can be
regarded as either correct or incorrect, depending on which wording
in the standard you feel takes precedence or what the words really
mean. In order to write even a run-time tester, you need to be
able to automate the decision of whether code is correct or not.

That problem accounted for something like 70% of all of the 'hard'
bug reports I got on the C compilers - i.e. the user was correctly
reporting that the compiler had broken an explicit statement in the
standard, but the compiler vendor turned round and quoted another
statement that made the program undefined. Well, actually, I often
optimised that out by explaining the mess to the user and/or vendor,
depending on which I felt was straying further from common practice.

Regards,
Nick Maclaren.

Robert Myers

unread,

Aug 4, 2009, 8:11:04 PM8/4/09

to

On Aug 4, 6:02 pm, n...@cam.ac.uk wrote:

> I was referring to the fact that the C standard is both ambiguous
> and inconsistent,

And I don't think any language standard that could be used for real
applications and that is expressed in natural language will ever be
anything but.

Robert.

Robert

unread,

Aug 5, 2009, 12:52:47 AM8/5/09

to

>>whatever new constraints are added. If the piece of code gives the same
>>results in both, then toss it in the "safe" pile.
>
> One mistake here is in thinking that there is or can be a testable
> distinction between correct and incorrect code. To get there, you
> would have to design a language that is almost, but not quite, entirely
> unlike standard C. Yes, of course that could be done, and my point is
> is that it is a prerequisite for doing the job properly.

Not sure I get your point.. First, any language that is Turing Complete
is equivalent to any other, right? So all languages are essentially the
same
at some level. They all boil down to assembler anyway and eventually
AND's, OR's and NOT's, and electrons..

Second, I do not require things be completely formal. Instead of "provably
correct", I would be happy with "can be demonstrated to give equivalent
results with arguments from a particular set." A much easier target.

At this point 2 black boxes can be compared.

> You also seem to be assuming that the 'error' will lead to wrong
> results on YOUR system. In most programming languages, C and C++
> not excepted, most of the semantic restrictions are there to allow
> for their implementation on "non-mainstream" systems. As I mentioned,
> I have a decade's experience supporting some very unusual ones. And
> please note that you can't just test on ONE unusual system.
>
> To take a simple example, current systems are essentially serial with
> some tweaks like SSE. Now consider traditional vector machines,
> highly parallel asynchronous ones (like the Tera MTA), multi-core
> parallel ones without cache coherence (like most of the 1960s to
> 1980s parallel ones), dataflow systems and so on. Each of those
> approaches has less in common with each other than with a current
> serial system!

I have no problem with this! I just do not see the necessity. Lets take
some C/C++ compiler and dissect it. Tease out each constraint. Anything
that violates a constraint should cause a compile error with the problem and
the line number. Not so hard..

Say I want to compile an app for architecture X. At this point all of those
other
architectures are irrelevant. I can remove the constraints for
architectures
Y, Z, etc, and still show the same output. We now have "safe?! portable
C",
and "unportable C". The unsafe stuff could still be emulated on the other
architectures.

Further, we could then add constraints. No aliasing or rigorous
overflow testing for instance. Add the rule, fix the compile errors..

This lets us start with a piece of code that gives some kind of useful
output
and move to a better place that has less headaches. Can there be NO middle
ground, no shortcuts, no stepping stones, nothing?

Is it even useful for C to be the portability mechanism between widely
different
architectures? GPU's, phones, CPU's, Transputers, Forth Chips? Or can we
just
say write your own mini language for whatever weirdness you invented, and
use a linker to get at the functionality?

Do we need to throw out the baby with the bathwater and start over?
Back to the logic gates? Even then we may drift towards a different local
maximum
and see a bigger and better mountain in the distance. Repeat as necessary..

Terje Mathisen

unread,

Aug 5, 2009, 12:59:11 AM8/5/09

to

Andy "Krazy" Glew wrote:
> Both HardBound and SoftBound trade off surprisingly small performance
> losses for this increased security.
>
> Q: how much performance would you trade off to have a guarantee of no
> buffer overflows? Myself, I would glady trade off 20%; and Milo's
> papers indicate that the cost is less.

I expect the real cost to be much closer to zero (1-5%).

>
> I wish that every C or C++ library function or program in the world was
> compiled with something like SoftBound. If necessary, then performance
> could be accelerated with something like HardBound.

I agree, and I don't believe we even need something like your HardBound.

My old Pascal programs (with range checking) ran at more or less the
same speed as my C code.

Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

nm...@cam.ac.uk

unread,

Aug 5, 2009, 4:19:47 AM8/5/09

to

In article <08147630-9b59-4466...@w6g2000yqw.googlegroups.com>,
Robert Myers <rbmye...@gmail.com> wrote:

Ambiguous, yes; inconsistent, no. And, more importantly, there are
levels of quality.

As I have posted before, I can classify 85% of 'hard' Fortran problems
as either user error or compiler error, with only 15% being dubious,
as either clear user error or clear compiler error, with only 15%
being ambiguous in the standard. With C, it is the other way round,
and often I know from discussions on SC22WG14 that the ambiguity is
deliberate.

Regards,
Nick Maclaren.

nm...@cam.ac.uk

unread,

Aug 5, 2009, 4:28:58 AM8/5/09

to

In article <h5b3lp$m86$1...@news.eternal-september.org>,

Robert <rob...@nospam.com> wrote:
>> One mistake here is in thinking that there is or can be a testable
>> distinction between correct and incorrect code. To get there, you
>> would have to design a language that is almost, but not quite, entirely
>> unlike standard C. Yes, of course that could be done, and my point is
>> is that it is a prerequisite for doing the job properly.
>
>Not sure I get your point.. First, any language that is Turing Complete
>is equivalent to any other, right? So all languages are essentially the
>same
>at some level. They all boil down to assembler anyway and eventually
>AND's, OR's and NOT's, and electrons..
>
>Second, I do not require things be completely formal. Instead of "provably
>correct", I would be happy with "can be demonstrated to give equivalent
>results with arguments from a particular set." A much easier target.

The first point is irrelevant and the second makes no difference.

In terms of mathematical logic, the C standard is riddled with cases
where statements like "X is true" and "not X is true" both occur,
not usually in such clear forms. A huge amount of decisions of
correctness come down to 'knowing' which overrides the other, based
entirely on either SC22WG14 deliberations or 'industry consensus'.
Let's ignore the ambiguities for now.

Even worse, the example Andy Glew brought in is PRECISELY one where
different, critically important, sections of the C community have
taken incompatible choices in precedence. And that is why I am
saying that the approach cannot be made to work - his favoured
interpetation is flatly incompatible with the usages in far too
much critical software.

Regards,
Nick Maclaren.

nm...@cam.ac.uk

unread,

Aug 5, 2009, 6:36:47 AM8/5/09

to

Sigh. Sorry. It should have read:

>As I have posted before, I can classify 85% of 'hard' Fortran problems

nm...@cam.ac.uk

unread,

Aug 5, 2009, 7:11:51 AM8/5/09

to

In article <YfydnVdUQ809jOTX...@giganews.com>,

Terje Mathisen <"terje.mathisen at tmsw.no"> wrote:
>Andy "Krazy" Glew wrote:
>> Both HardBound and SoftBound trade off surprisingly small performance
>> losses for this increased security.
>>
>> Q: how much performance would you trade off to have a guarantee of no
>> buffer overflows? Myself, I would glady trade off 20%; and Milo's
>> papers indicate that the cost is less.
>
>I expect the real cost to be much closer to zero (1-5%).

In a language like Pascal or Fortran, yes. For example, the analysis
and prechecking needed to optimise for vector systems and even SSE
is PRECISELY that which is needed to minimise bounds checks for most
array uses. Or implement SoftBound efficiently :-)

The killer is when the language permits almost arbitrary sectioning
of objects, and where overlapping objects are allowed provided only
that the same element is not updated via one and used by another.
I.e. 'old' Fortran, C and C++ - but not 'new' Fortran or Pascal.

I doubt that you could implement reliable bounds checking for only
20% overhead in most C or much of C++ - yes, you could for C++ written
in a good style, but only if it didn't use or rely on its C subset.
E.g. it didn't use <cstring>, dangerous casts, or make significant
use of arrays/pointers.

>I agree, and I don't believe we even need something like your HardBound.
>
>My old Pascal programs (with range checking) ran at more or less the
>same speed as my C code.

Ditto Fortran, given suitable compilers and tasks (Fortran is dire for
character munging, bounds checking or no).

Regards,
Nick Maclaren.

EricP

unread,

Aug 5, 2009, 9:52:51 AM8/5/09

to

nm...@cam.ac.uk wrote:
> In article <YfydnVdUQ809jOTX...@giganews.com>,
> Terje Mathisen <"terje.mathisen at tmsw.no"> wrote:
>> Andy "Krazy" Glew wrote:
>>> Both HardBound and SoftBound trade off surprisingly small performance
>>> losses for this increased security.
>>>
>>> Q: how much performance would you trade off to have a guarantee of no
>>> buffer overflows? Myself, I would glady trade off 20%; and Milo's
>>> papers indicate that the cost is less.
>> I expect the real cost to be much closer to zero (1-5%).

It's my experience with Fortran 77 and Ada that the cost was 1-5%.

> In a language like Pascal or Fortran, yes. For example, the analysis
> and prechecking needed to optimise for vector systems and even SSE
> is PRECISELY that which is needed to minimise bounds checks for most
> array uses. Or implement SoftBound efficiently :-)
>
> The killer is when the language permits almost arbitrary sectioning
> of objects, and where overlapping objects are allowed provided only
> that the same element is not updated via one and used by another.
> I.e. 'old' Fortran, C and C++ - but not 'new' Fortran or Pascal.

I assume you mean here a variant record structure with
a discriminant field (aka a union with a field indicating
which case was valid). Such that before a field is accessed
the value of the discriminant field must be tested to
ensure it is set correctly.

Such record fields tend to be updated in groups inside
a controlling CASE structure based on the discriminant field.
A smart compiler would realize that the discriminant was already
checked and eliminate follow on checks.

Also in a language designed from the start to support automatic
checks, many checks can be eliminated as redundant.
Doing so requires the compiler to carry, for example,
the legal range meta data along with each object
and eliminate the checks using what is in essence
a fancy propagation-of-constants algorithm.

What I would like in such a language is a source listing
that showed where the compiler had emitted run-time checks
because it was unable to statically determine validity.
Often these can be eliminated by rearranging the code,
or tightening up constraints.

> I doubt that you could implement reliable bounds checking for only
> 20% overhead in most C or much of C++ - yes, you could for C++ written
> in a good style, but only if it didn't use or rely on its C subset.
> E.g. it didn't use <cstring>, dangerous casts, or make significant
> use of arrays/pointers.
>
>> I agree, and I don't believe we even need something like your HardBound.
>>
>> My old Pascal programs (with range checking) ran at more or less the
>> same speed as my C code.
>
> Ditto Fortran, given suitable compilers and tasks (Fortran is dire for
> character munging, bounds checking or no).

To make this easier on the compilers the cpu needs conditional
trap instructions. Similar to MOVCC, TRAPCC <condition> <reason_code>
where <reason_code> is just a number defined by the app or OS
to mean particular reason, like BOUNDS_ERROR, or ASSERT_ERROR.
It is easiest if the reason_code is passed by hardware to the
trap handler when the condition fails.

Many languages and debuggers require the ability to associate
a constraint violation back to a particular line of code.
Without conditional trap instructions the compiler must
emit a forward branch to unique destinations that then
call the RaiseException routine.
With lots of checks this adds lots of overhead.

Conditional trap instructions eliminate the excess code and
leave the failing instruction counter pointing at the problem.

Eric

Andy "Krazy" Glew

unread,

Aug 5, 2009, 10:04:00 AM8/5/09

to Robert Myers

Robert Myers wrote:
> On Aug 4, 4:22 am, n...@cam.ac.uk wrote:
>
>> One mistake here is in thinking that there is or can be a testable
>> distinction between correct and incorrect code. To get there, you
>> would have to design a language that is almost, but not quite, entirely
>> unlike standard C. Yes, of course that could be done, and my point is
>> is that it is a prerequisite for doing the job properly.

I recently attended a talk by Andrei Alexandrescu on the "D" programming
language.

D is very much like C. Actualy, C++.

However, one of D's goals seems to have been to make the simplest
possible way of writing any given piece of code the correct way.

Andrei began his presentation with slides showing how the "Hello World"
example for many different programming languages - K&R C, C++, Java,
etc. - is actualy incorrect:

E.g.

main() {
printf("Hello world");
}

has quite a few bugs in it.

Google D. You might also find my blog's notes on Andrei's presentation
interesting.

--

Anyway, the point of D is not that it is mathematically robust. It is
just that it has made expressing correct code easier.

Andy "Krazy" Glew

unread,

Aug 5, 2009, 10:40:39 AM8/5/09

to

Terje Mathisen wrote:
> Andy "Krazy" Glew wrote:
>> Both HardBound and SoftBound trade off surprisingly small performance
>> losses for this increased security.
>>
>> Q: how much performance would you trade off to have a guarantee of no
>> buffer overflows? Myself, I would glady trade off 20%; and Milo's
>> papers indicate that the cost is less.
>
> I expect the real cost to be much closer to zero (1-5%).
>

> My old Pascal programs (with range checking) ran at more or less the
> same speed as my C code.
>
> Terje

I believe that we can get to the 5% range. Indeed, Milo reports such
for HardBound. Even his SoftBound has very good, sub-15%, performance.

I believe that we can get to really good performance for both software
and hardware implementations.

I think that it would be *WONDERFUL* to have such a performance battle
going on.

However, I am not aware of any tool, other than SoftBound, that delivers
full bounds checking for C and C++ in less than 2x slowdowns. At least,
not one which I have been able to use for large amounts of code.

E.g. Purify and friends run fairly well, requiring little code tweaking.
But they are slow. Whereas academic tools like Ccured reportedly are
faster, but when I have tried to use them required so many tweaks to my
code that I gave up.
It is of paramount importance that all such tools accept standard,
almost completely unmodified, C and C++ code. And that they give as few
false positives as possible - i.e. that they don't indicate bugs when
there is not such. Nick is right about that - however, I think that
overall he suffers from an excess of FUD.

It is not even clear that the hardware support of HardBound is necessary
for acceptable performance. However, such hardware support may not only
enable performance, but may also be useful in promoting stronger
security models, avoiding the "assembler code hole". And in promoting
better compatibility.

> My old Pascal programs (with range checking) ran at more or less the
> same speed as my C code.

It is worth noting, however, that most Pascal implementations accomplish
this by having a different ABI - a different layout for objects in
memory. Some people call this a different data model.

For example, most Pascal implementations place the array size near to
the array. E.g. (pardon, my Pascal is rusty)

VAR a: ARRAY[1..5] OF integer;

map to what in C we might declare as

struct { int lwb=1; int ubp=5; int array[ubp-lwb]; } a;

This is all well and good. Unfortunately, it is not compatible with the
large number of existing C programs that make assumptions about the
binary layout of data structures in memory.

By the way, I believe that C compilers can be implemented on systems
that require such extra information in the data structures. It is not a
question of "Can C be implemented on hardware that requires this?" It
is more a question of "Can C be implemented that performs such buffer
overflow checks, and still be binary data compatible with binaries on
the same system that do not?"

--

To accomplish such binary compatibility, you have to place the metadata
somewhere else. Not next to the array.

HardBound and SoftBound proposed to asociate such metadata with every
pointer. HardBound proposed to use a linear address transformation, e.g.

address_of_bounds_checking_metadata(T* Ap)
= Metadata_Base + (Ap - Nonmetadata_base) * Scale.

(Personally, I doubt that OSes would be willing to give up 2 virtuial
address bits to accomplish this address scaling.)

SoftBound uses a data structure - IIRC a hash table. (I think this is a
better approach, more likely to be accepted by the software ecosystem.)

Other academic proposals have used different datastructures.

Other academic proposals, such as Jones and Kelly, have associated the
metadata not with the pointer address, but with the address of the
object, the thing pointed to. IMHO that is incorrect, provides lousy
detection in the presence of the sort of nested subobjects that Nick
referred to. (Hmmm, maybe this is why Nick made such a blatantly
incorrect statement about subobjects. I believe some work on this was
done in Nick's area of the world, Cambridge, and that that work
sufferred this problem. I hope that Nick will look again, and realize
that pointer metadata has fewer problems than object metadata.)

In any case: it is the overhead of maintaining this metadata, in a
datastructure that is binary compatible, that is responsible for any
performance loss. The actual bounds checks are trivial.

--

> I expect the real cost to be much closer to zero (1-5%).

Here I must caution you:

Although, like you, I expect the real cost to be much closer to zero,
making that a requirement may basically kill any prospect of such
technology being deployed.

For example, do you mean

"I would never use such technology unless the perf cost is <5% on every
program that I use?"

or

"I have some programs that I would accept a 20% perfloss in in order to
be more secure against buffer overflows. However, I expect that these
programs will have perf loss <5% with buffer overflow protection and
suitable optimization?"

I can see how to fairly easily craft glassjaw programs that will run >2X
slower on HardBound and SoftBound. But I am willing to accept Milo's
performance data, that most programs run with much better performance.

Should we reject the technology because it has some glassjaws, eve
though on most programs it is acceptably fast?

nm...@cam.ac.uk

unread,

Aug 5, 2009, 10:52:00 AM8/5/09

to

In article <40744$4a798efa$45c49ea8$20...@TEKSAVVY.COM>,

EricP <ThatWould...@thevillage.com> wrote:
>>>>
>>>> Q: how much performance would you trade off to have a guarantee of no
>>>> buffer overflows? Myself, I would glady trade off 20%; and Milo's
>>>> papers indicate that the cost is less.
>>> I expect the real cost to be much closer to zero (1-5%).
>
>It's my experience with Fortran 77 and Ada that the cost was 1-5%.

Fortran 77? Oh, come now! There were some pretty good debugging
compilers, but the cost of enabling full bounds checking in any of
the ones I used was a LOT higher than that. And the programming
styles used were not very amenable to optimisation of the checks.

>> The killer is when the language permits almost arbitrary sectioning
>> of objects, and where overlapping objects are allowed provided only
>> that the same element is not updated via one and used by another.
>> I.e. 'old' Fortran, C and C++ - but not 'new' Fortran or Pascal.
>
>I assume you mean here a variant record structure with
>a discriminant field (aka a union with a field indicating
>which case was valid). Such that before a field is accessed
>the value of the discriminant field must be tested to
>ensure it is set correctly.

No, I don't. I am also including such mechanisms as passing array
elements to be used as the first element of an array in the called
route (Fortran 77, C and C++). As I said, the killer is when the
language allows them to overlap depending on which elements are
accessed - that needs expensive checks on every access (at best).

>Such record fields tend to be updated in groups inside
>a controlling CASE structure based on the discriminant field.
>A smart compiler would realize that the discriminant was already
>checked and eliminate follow on checks.

Don't bet on it. REALLY don't bet on it. In C and its derivatives,
it is common to take the address of such a field and pass it down
to a routine that doesn't know anything about its history. Even
the smartest compiler can't do much about that.

>Also in a language designed from the start to support automatic
>checks, many checks can be eliminated as redundant.

That's what I said in the first place :-)

>What I would like in such a language is a source listing
>that showed where the compiler had emitted run-time checks
>because it was unable to statically determine validity.
>Often these can be eliminated by rearranging the code,
>or tightening up constraints.

Yup. Already on my mental list :-)

>Many languages and debuggers require the ability to associate
>a constraint violation back to a particular line of code.
>Without conditional trap instructions the compiler must
>emit a forward branch to unique destinations that then
>call the RaiseException routine.

That's not true, actually. There are more efficient methods on
most systems. You can implement such precise diagnostics with a
negligible overhead in space and time - been there, done that.
Yes, you need a conditional instruction, but it could be a
conditional branch-and-link, for example, or an 'unusual' one
that could trigger a trap. I used 0C3 on System/370 for that,
on occasion.

Regards,
Nick Maclaren.

Terje Mathisen

unread,

Aug 5, 2009, 11:17:56 AM8/5/09

to

nm...@cam.ac.uk wrote:
> In article <YfydnVdUQ809jOTX...@giganews.com>,
> Terje Mathisen <"terje.mathisen at tmsw.no"> wrote:
>> Andy "Krazy" Glew wrote:
>>> Both HardBound and SoftBound trade off surprisingly small performance
>>> losses for this increased security.
>>>
>>> Q: how much performance would you trade off to have a guarantee of no
>>> buffer overflows? Myself, I would glady trade off 20%; and Milo's
>>> papers indicate that the cost is less.
>> I expect the real cost to be much closer to zero (1-5%).
>
> In a language like Pascal or Fortran, yes. For example, the analysis
> and prechecking needed to optimise for vector systems and even SSE
> is PRECISELY that which is needed to minimise bounds checks for most
> array uses. Or implement SoftBound efficiently :-)

Right.

>
> The killer is when the language permits almost arbitrary sectioning
> of objects, and where overlapping objects are allowed provided only
> that the same element is not updated via one and used by another.
> I.e. 'old' Fortran, C and C++ - but not 'new' Fortran or Pascal.
>
> I doubt that you could implement reliable bounds checking for only
> 20% overhead in most C or much of C++ - yes, you could for C++ written
> in a good style, but only if it didn't use or rely on its C subset.
> E.g. it didn't use <cstring>, dangerous casts, or make significant
> use of arrays/pointers.

The crux is the need to "support existing plain C code", i.e. compile
all current Linux distros, I really don't think this can be done without
incurring significant overhead (50-100%).

>
>> I agree, and I don't believe we even need something like your HardBound.
>>
>> My old Pascal programs (with range checking) ran at more or less the
>> same speed as my C code.
>
> Ditto Fortran, given suitable compilers and tasks (Fortran is dire for
> character munging, bounds checking or no).

If we really do need to switch to a different language/ABI model, then
something closer to Java seems like a better idea than C.

I do realize that the single feature of C that I liked the most when
switching from Pascal (arbitrary sub-array addressing) is also the most
dangerous/difficult to optimize. :-(

Terje Mathisen

unread,

Aug 5, 2009, 11:25:44 AM8/5/09

to

Andy "Krazy" Glew wrote:

> Terje Mathisen wrote:
>
> > I expect the real cost to be much closer to zero (1-5%).
>
> Here I must caution you:
>
> Although, like you, I expect the real cost to be much closer to zero,
> making that a requirement may basically kill any prospect of such
> technology being deployed.

I agree.

>
> For example, do you mean
>
> "I would never use such technology unless the perf cost is <5% on every
> program that I use?"

Not at all.

With very few exceptions, I don' run _anything_ today which is
performance-critical enough that a 2X slowdown would disqualify it.

>
> or
>
> "I have some programs that I would accept a 20% perfloss in in order to
> be more secure against buffer overflows. However, I expect that these

I'd love to have a 20% "safe" option!

> programs will have perf loss <5% with buffer overflow protection and
> suitable optimization?"
>
> I can see how to fairly easily craft glassjaw programs that will run >2X
> slower on HardBound and SoftBound. But I am willing to accept Milo's
> performance data, that most programs run with much better performance.
>
> Should we reject the technology because it has some glassjaws, eve
> though on most programs it is acceptably fast?

Not at all: We should instead use some of all those spare
cycles/transistors to get rid of 99+% of the current buffer
overflow-style bugs/security risks.

nm...@cam.ac.uk

unread,

Aug 5, 2009, 11:26:54 AM8/5/09

to

In article <a8SdnSTB3Ms4P-TX...@giganews.com>,

Terje Mathisen <"terje.mathisen at tmsw.no"> wrote:
>>
>> I doubt that you could implement reliable bounds checking for only
>> 20% overhead in most C or much of C++ - yes, you could for C++ written
>> in a good style, but only if it didn't use or rely on its C subset.
>> E.g. it didn't use <cstring>, dangerous casts, or make significant
>> use of arrays/pointers.
>
>The crux is the need to "support existing plain C code", i.e. compile
>all current Linux distros, I really don't think this can be done without
>incurring significant overhead (50-100%).

Agreed. As you know, my exemplar of evil code is the X Windowing
System and its copycats (yes, Bill, I mean yours) - and there the
problems are fundamental to the design.

>If we really do need to switch to a different language/ABI model, then
>something closer to Java seems like a better idea than C.

There's small difference in rotten apples ....

>I do realize that the single feature of C that I liked the most when
>switching from Pascal (arbitrary sub-array addressing) is also the most
>dangerous/difficult to optimize. :-(

Look at modern Fortran. You can do that cleanly, safely, and with
good optimisation. Not perfect, but pretty good.

Regards,
Nick Maclaren.

Andy "Krazy" Glew

unread,

Aug 5, 2009, 10:51:48 PM8/5/09

to nm...@cam.ac.uk

nm...@cam.ac.uk wrote:
> No, I don't. I am also including such mechanisms as passing array
> elements to be used as the first element of an array in the called
> route (Fortran 77, C and C++). As I said, the killer is when the
> language allows them to overlap depending on which elements are
> accessed - that needs expensive checks on every access (at best).

That is the key. The system must support nested and overlapping objects.

There have been many proposals that provide what you, Nick, seemed to
assume: they provide bounds for something like a malloc'ed region, but
do not allow you to create a "narrower" access to a subfield of such a
struct.

Although strict nesting is most common, overlapping but not strictly
nested datastructures must also be supported.

Similarly, it may be permissible to narrow to a subobject, and detect
errors if you go outside the bounds of the subobject. But occasionally
it is necessary to re-widen, back to the original bounds of the
allocation unit, such as malloc. Occasionally it is necessary to
completely re-widen, to all of memory, and hence lose all protection.
Although, as far as I know, having a pointer offset from one malloc'ed
object into another is not standards conforming - unless you know a
priori about other relationships between the malloc'ed chunks of memory,
e.g. unless you know that malloc'ing is being done out of a common array
of bytes.

Anyway: I believe that HardBound and SoftBound do the right things here.

Andy "Krazy" Glew

unread,

Aug 5, 2009, 11:08:02 PM8/5/09

to

Terje Mathisen wrote:
> I do realize that the single feature of C that I liked the most when
> switching from Pascal (arbitrary sub-array addressing) is also the most
> dangerous/difficult to optimize. :-(

That is key.

And that is what HardBound and SoftBound accomplish, by associating the
bounds with the pointers, not with the objects.

You can have multiple pointers to the same byte of memory, with
drastically different bounds. Some may have no bounds at all. Some may
be restricted to a single byte. Etc. Etc.

The compiler then has to determine the tightest bounds that are
compatible with the language standard and the compilation flags. Note
that I do expect that occasionally people will compile to a more
restricted and safe model than the most liberal interpretation of the
language standard. Just as occasionally the will compile to a model
that is more relaxed than the language standard permits to be assumed
for portable code.

nm...@cam.ac.uk

unread,

Aug 6, 2009, 3:26:28 AM8/6/09

to

In article <4A7A4544...@patten-glew.net>,

Andy \"Krazy\" Glew <"ag-news AT patten DASH my-last-name DOT net"> wrote:
>
>> No, I don't. I am also including such mechanisms as passing array
>> elements to be used as the first element of an array in the called
>> route (Fortran 77, C and C++). As I said, the killer is when the
>> language allows them to overlap depending on which elements are
>> accessed - that needs expensive checks on every access (at best).
>
>That is the key. The system must support nested and overlapping objects.
>
>There have been many proposals that provide what you, Nick, seemed to
>assume: they provide bounds for something like a malloc'ed region, but
>do not allow you to create a "narrower" access to a subfield of such a
>struct.

Yes, I know. But that not what C and C++ specify. We are all agreed
that you can do reliable checking in a suitable language.

>Similarly, it may be permissible to narrow to a subobject, and detect
>errors if you go outside the bounds of the subobject.

As neither SC22WG14 nor SC22WG21 are agreed on when and it that is
permissible, checking it is beyond ANY software.

>But occasionally
>it is necessary to re-widen, back to the original bounds of the
>allocation unit, such as malloc. Occasionally it is necessary to
>completely re-widen, to all of memory, and hence lose all protection.

Not just occasionally in most realistic C programs, I am afraid.

>Although, as far as I know, having a pointer offset from one malloc'ed
>object into another is not standards conforming - unless you know a
>priori about other relationships between the malloc'ed chunks of memory,
>e.g. unless you know that malloc'ing is being done out of a common array
>of bytes.

That's still undefined - which portmanteau category includes quite a
lot of compiler extensions.

>Anyway: I believe that HardBound and SoftBound do the right things here.

That may be true in an abstract sense, but it means that they cannot
be used on standard C (or even most C++) code, as those DON'T specify
the right thing.

Regards,
Nick Maclaren.

nm...@cam.ac.uk

unread,

Aug 6, 2009, 3:31:22 AM8/6/09

to

In article <4A7A4912...@patten-glew.net>,

Andy \"Krazy\" Glew <"ag-news AT patten DASH my-last-name DOT net"> wrote:

>Terje Mathisen wrote:
>
>> I do realize that the single feature of C that I liked the most when
>> switching from Pascal (arbitrary sub-array addressing) is also the most
>> dangerous/difficult to optimize. :-(
>
>That is key.
>
>And that is what HardBound and SoftBound accomplish, by associating the
>bounds with the pointers, not with the objects.

Fine. As with Algol 68 arrays, Fortran assumed-shape arrays (which
are, not coincidentally, very similar). But the language it is
checking is almost, but not quite, entirely unlike standard C.

Regards,
Nick Maclaren.

Terje Mathisen

unread,

Aug 6, 2009, 6:00:36 AM8/6/09

to

Andy "Krazy" Glew wrote:
> Although, as far as I know, having a pointer offset from one malloc'ed
> object into another is not standards conforming - unless you know a
> priori about other relationships between the malloc'ed chunks of memory,
> e.g. unless you know that malloc'ing is being done out of a common array
> of bytes.

This is the standard pattern I use for array processing, i.e. use a
single pointer to one of the arrays plus constant offsets from that
array to each of the others:

I.e. like this simple example:

void dsum(int len, double *sum, double *a, double *b)
{
while (len-- > 0) {
*sum++ = *a++ + *b++;
}
}

which can become

mov ecx,[len]
sub ecx,1
jl done

mov esi,[a]
mov ebx,[b];
mov edi,[sum]

sub esi,edi ;; Convert pointers to offsets
sub ebx,edi

next:
fld qword ptr [esi+edi]
fadd qword ptr [ebx+edi]
fstp qword ptr [edi]
add edi,8
sub ecx,1
jge next

done:

OTOH, at the point where I convert C into asm, I _know_ the exact hw
model I'm targeting, so the code above is safe.

A portable compiler can use a different pattern which gets rid of all
pointer updates in the inner loop, as long as [base+offset] addressing
is available:

void dsum(int len, double *sum, double *a, double *b)
{
if (len > 0) {
// Move all pointers beyond the last item:
sum += len; a += len; b += len;
int i = -len;
do {
sum[i] = a[i] + b[i];
} while (++i);
}
}

where the inner loop becomes

next:
fld qword ptr [eax+ecx]
fadd qword ptr [ebx+ecx]
fstp qword ptr [edi+ecx]
add ecx,8
jnz next

This only depends on the ability to move a pointer to one item beyond
the end of the allocated array, without ever accessing it.

Since the same capability is a basic requirement for the entire C++
template library, it won't go away, and any checking sw has to allow it.

Wilco Dijkstra

unread,

Aug 6, 2009, 7:02:25 AM8/6/09

to

<nm...@cam.ac.uk> wrote in message news:h5e0j4$4n1$1...@smaug.linux.pwf.cam.ac.uk...

I agree with Nick - SoftBound (and HardBound) will fail to detect many common
failures ands creates false positives - just like all previous approaches.
Function pointers don't work properly - you will need to recompile everything,
including shared libraries or annotate functions by hand as not checkable.
Unions and int<->pointer casts are not supported at all so it ignores many
common pointer problems. The worst issue I ever encountered was a
pointer/integer union which was incorrectly used, which resulted in very weird
behaviour due to minor differences in memory layout changing the meaning
of the integer, and equally updates to the integer caused random effects when
interpreted as a pointer across different architectures and memory models...

At best you can argue the overhead compared to previous approaches has
been reduced. That is great but even an average of 15% is not realistic for
production code given the huge fluctuations (between 0% and 170% overhead
for the fastest store-only checks). Then there is the enormous memory overhead
of 64-87%. I am missing GCC as a commonly used pointer benchmark, I bet it
was left out as it would make their results look very bad.

The bottom line is that it is provably impossible to add good bounds checking
to very badly specified languages like C/C++. I see SoftBound as just another
tool for testing and debugging purposes. Each of the approaches has their
own limitations, so many companies use several tools to catch problems.

Wilco

nm...@cam.ac.uk

unread,

Aug 6, 2009, 7:25:34 AM8/6/09

to

In article <3Lyem.13530$wz2....@newsfe08.ams2>,

Wilco Dijkstra <Wilco.remove...@ntlworld.com> wrote:
>
>Function pointers don't work properly - you will need to recompile everything,
>including shared libraries or annotate functions by hand as not checkable.

That is one of the nastiest problems for ANY checking approach when
used on real code. It just isn't realistic to assume that you have
access to all the code, and it is cleanly written in your favoured
language. For example, it isn't rare for LAPACK to be in Fortran,
even when called from C. The converse is even more common - i.e.
clean, checked Fortran calls C code or unchecked libraries, and
most of the errors occur in the latter because of logical errors in
the former.

Indeed, this is the big problem with C++ - one can forbid the 'unsafe'
casts, but <cstring> and some other components are alien code, that
can't even be written in clean C++.

Regards,
Nick Maclaren.

nm...@cam.ac.uk

unread,

Aug 6, 2009, 8:10:08 AM8/6/09

to

In article <Re-dnTl5x8tYNOfX...@giganews.com>,

Terje Mathisen <"terje.mathisen at tmsw.no"> wrote:

>Andy "Krazy" Glew wrote:
>
>> Although, as far as I know, having a pointer offset from one malloc'ed
>> object into another is not standards conforming - unless you know a
>> priori about other relationships between the malloc'ed chunks of memory,
>> e.g. unless you know that malloc'ing is being done out of a common array
>> of bytes.
>
>This is the standard pattern I use for array processing, i.e. use a
>single pointer to one of the arrays plus constant offsets from that
>array to each of the others:

In BCPL, it was legal; in C89, it is illegal; in C99, it is well, eagle?

double *a = malloc(100*sizeof*(double));
double *b = malloc(100*sizeof*(double));
uintptr_t c = (uintptr_t)b-(uintptr_t)a;
*(a+c/size(double)) = 0.0; /* illegal */
*(double *)((uintptr_t)a+c) = 0.0; /* legal */

Regards,
Nick Maclaren.

Wilco Dijkstra

unread,

Aug 6, 2009, 8:15:24 AM8/6/09

to

<nm...@cam.ac.uk> wrote in message news:h5eeje$88n$1...@soup.linux.pwf.cam.ac.uk...

I wasn't even thinking about C++ or calls to libraries in different languages.
But indeed, those add up to insurmountable problems. Just imagine a
virtual function with a pointer argument, what should we put in the vtable,
the original interface or the checked interface? What if it gets called from
a function member pointer, dynamic library - which version do we call, how
do we handle transitions between checked and unchecked functions, etc.

There is a lot of snake oil in this area - not just around adding checks to
C but also adding GC and other "safe" features. I once evaluated an
expensive array bounds checking product which claimed it could find
ALL array bounds overflows. Even Pascal, Java and other languages
with built-in array bounds checks cannot make that claim (at best they can
detect all cases that happen at runtime, quite a different matter from finding
all possible overflows).

However all this product could do is spot array accesses that were just outside
the bounds of an allocated heap block (it didn't check local or global arrays
or arrays inside a structure - and accesses that were so out of bounds
that they accessed another heap block were being allowed). I did say what I
thought of it when pressed by a salesrep, oddly enough I never ever heard
from them again...

But the worst thing is, another part of our company actually bought it! This stuff
is sold on baseless fear - if you care about software quality you have got to
hire the best software engineers and encourage best practices even if that
means your software will take longer to develop and cost more. There are no
quick fixes. This applies equally to Java as to C++.

Wilco

nm...@cam.ac.uk

unread,

Aug 6, 2009, 8:25:02 AM8/6/09

to

In article <vPzem.73245$OM.3...@newsfe06.ams2>,

Wilco Dijkstra <Wilco.remove...@ntlworld.com> wrote:
>
>I wasn't even thinking about C++ or calls to libraries in different languages.
>But indeed, those add up to insurmountable problems. Just imagine a
>virtual function with a pointer argument, what should we put in the vtable,
>the original interface or the checked interface? What if it gets called from
>a function member pointer, dynamic library - which version do we call, how
>do we handle transitions between checked and unchecked functions, etc.

Actually, those issues are soluble - just. IBM CEL solved them, in
a closely related context, but it wasn't easy and it wasn't pretty.
I doubt that most modern developers could do it.

>There is a lot of snake oil in this area - not just around adding checks to

>C but also adding GC and other "safe" features. ...

Indeed.

Regards,
Nick Maclaren.

Wilco Dijkstra

unread,

Aug 6, 2009, 8:28:44 AM8/6/09

to

"Terje Mathisen" <"terje.mathisen at tmsw.no"> wrote in message news:y_adnVlHbcfkOeTX...@giganews.com...

> Andy "Krazy" Glew wrote:
>> Terje Mathisen wrote:
>>
>> > I expect the real cost to be much closer to zero (1-5%).
>>
>> Here I must caution you:
>>
>> Although, like you, I expect the real cost to be much closer to zero, making that a requirement may basically kill
>> any prospect of such technology being deployed.
>
> I agree.
>>
>> For example, do you mean
>>
>> "I would never use such technology unless the perf cost is <5% on every program that I use?"
>
> Not at all.
>
> With very few exceptions, I don' run _anything_ today which is performance-critical enough that a 2X slowdown would
> disqualify it.
>>
>> or
>>
>> "I have some programs that I would accept a 20% perfloss in in order to be more secure against buffer overflows.
>> However, I expect that these
>
> I'd love to have a 20% "safe" option!

That is never going to happen. However a 100% overhead is more than
good enough for debugging and testing purposes. We used ValGrind a lot
in the past and eventhough it was slow, it was more than sufficient to track
down a few nasty bugs.

>> programs will have perf loss <5% with buffer overflow protection and suitable optimization?"
>>
>> I can see how to fairly easily craft glassjaw programs that will run >2X slower on HardBound and SoftBound. But I am
>> willing to accept Milo's performance data, that most programs run with much better performance.
>>
>> Should we reject the technology because it has some glassjaws, eve though on most programs it is acceptably fast?
>
> Not at all: We should instead use some of all those spare cycles/transistors to get rid of 99+% of the current buffer
> overflow-style bugs/security risks.

Why waste hardware effort on something that is a essentially a management
issue? Companies are often unwilling to pay the real cost of correct and safe
software.

Wilco

nm...@cam.ac.uk

unread,

Aug 6, 2009, 9:16:56 AM8/6/09

to

In article <_%zem.73246$OM.6...@newsfe06.ams2>,

Wilco Dijkstra <Wilco.remove...@ntlworld.com> wrote:
>
>"Terje Mathisen" <"terje.mathisen at tmsw.no"> wrote in message news:y_adnVlHbcfkOeTX...@giganews.com...
>>

>> I'd love to have a 20% "safe" option!
>
>That is never going to happen. However a 100% overhead is more than
>good enough for debugging and testing purposes. We used ValGrind a lot
>in the past and eventhough it was slow, it was more than sufficient to track
>down a few nasty bugs.

Actually, I am not sure. Few programs nowadays are limited by CPU,
and the key is to ensure that you don't need more memory locations
(needing more data from each location is less critical).

But I agree that a factor of two is usually acceptable - quite often,
even a factor of ten is!

>> Not at all: We should instead use some of all those spare cycles/transistors
>> to get rid of 99+% of the current buffer overflow-style bugs/security risks.
>
>Why waste hardware effort on something that is a essentially a management
>issue? Companies are often unwilling to pay the real cost of correct and safe
>software.

Regrettably, that is the case :-( But it may change if there is a
suitably catastrophic failure that leads to a reaction.

Regards,
Nick Maclaren.

Andy "Krazy" Glew

unread,

Aug 6, 2009, 10:50:40 AM8/6/09

to nm...@cam.ac.uk

nm...@cam.ac.uk wrote:
> In article <4A7A4544...@patten-glew.net>,

>> Similarly, it may be permissible to narrow to a subobject, and detect
>> errors if you go outside the bounds of the subobject.
>
> As neither SC22WG14 nor SC22WG21 are agreed on when and it that is
> permissible, checking it is beyond ANY software.

Oh, for Christ's bloody sake, Nick! Do you feed on FUD?

In the example we already discussed (C++) syntax:

struct S {
int a;
int b[4];
int c;
};

S* p = new S;

i = 5;
...
p->b[i] = 99;

Quite clearly a compiler can do the check.

Yes, there are situations where a check of what amounts to the same
address cannot be done. E.g. when the entire struct is being treated as
an array of char. If the compiler can recognize those situations, e.g.
casts, it can strip the bounds and not do any checks.

Moreover, if the working groups for the language standard cannot agree
on checks that are always safe, who the hell cares? (What is it they say
about the IQ of committees?) The compiler can always provide switches
to provide different levels of security
a) compile code with no checks (guaranteed to be C standard compliant)
b) compile code with checks that most reasonable people would agree
to (but not Nick).

nm...@cam.ac.uk

unread,

Aug 6, 2009, 12:03:53 PM8/6/09

to

In article <4A7AEDC0...@patten-glew.net>,

Andy \"Krazy\" Glew <"ag-news AT patten DASH my-last-name DOT net"> wrote:
>
>> In article <4A7A4544...@patten-glew.net>,
>>> Similarly, it may be permissible to narrow to a subobject, and detect
>>> errors if you go outside the bounds of the subobject.
>>
>> As neither SC22WG14 nor SC22WG21 are agreed on when and it that is
>> permissible, checking it is beyond ANY software.
>
>Oh, for Christ's bloody sake, Nick! Do you feed on FUD?
>
>In the example we already discussed (C++) syntax:
>
>struct S {
> int a;
> int b[4];
> int c;
>};
>
>S* p = new S;
>
>i = 5;
>...
>p->b[i] = 99;
>
>Quite clearly a compiler can do the check.

Sigh. I will try to explain to you one last time what the problem
area is. Yes, OF COURSE, there are cases where checking is possible;
but being able to check only the cases that are trivially checkable
is not an interesting property, either theoretically or practically.

Consider an object like:

typedef struct {int A[10]; double B[5]; int C[10];} P;
P *Q = malloc(sizeof(P));
memset(Q.A,0,sizeof(P));
fred(Q.A,(offsetof(P,C)-offsetof(P,A)));;

In another module, far, far away:

void fred (int *X, size_t Y) {
int *Z = memset((char *)X+Y,1,sizeof(int));
joe(Z);
}

You might like to say that is clearly illegal, or does not occur in
real code, but you would be wrong on both counts. In particular,
that construction is a CRITICAL part of the design of the INTERFACE
of the X Windowing System (and, if I recall, the Microsoft one, too).
So it CANNOT be changed without rewriting all code that uses them.
It is also used in a fair number of other codes.

So what bounds does the checking software pass when fred is called?

If that problem were limited to 'leaf' functions like memset, the
problem could be hacked up by eliminating checks in those or even
by passing BOTH sets of bounds and using the 'whole allocation'
ones for such uses. But it isn't. In many important programs,
such tricks are used to create a pointer which is then passed on.

So what bounds does the checking software pass when joe is called?

Regards,
Nick Maclaren.

Robert Myers

unread,

Aug 6, 2009, 3:43:09 PM8/6/09

to

On Aug 6, 8:28 am, "Wilco Dijkstra"

<Wilco.removethisDijks...@ntlworld.com> wrote:
> "Terje Mathisen" <"terje.mathisen at tmsw.no"> wrote in messagenews:y_adnVlHbcfkOeTX...@giganews.com...
>

> > Not at all: We should instead use some of all those spare cycles/transistors to get rid of 99+% of the current buffer

> > overflow-style bugs/security risks.
>
> Why waste hardware effort on something that is a essentially a management
> issue? Companies are often unwilling to pay the real cost of correct and safe
> software.

It's really more of an economic issue and should probably be a public
policy issue. There are lots of incentives to push risk off on the
buyer, and that's what companies do. Ultimately, much of the risk
gets pushed off on the public, along with the unspoken argument that
that's just the way it has to be.

We have plumbing codes, building codes, fire safety codes, codes for
cars, airplanes and buses, and codes for nearly anything that affects
public health and safety... except for anything having to do with
computers.

Incentives have to be built in. If they're not, no one (seller or
buyer) will spend the money. People would rather to buy cheap homes
that might be less safe than to pay to have safer ones, but they are
not given the option. Someone from IBM or DEC might correctly argue
that the microcomputer revolution was really about selling risky
products that they historically would not have, but there is no way to
put the genie back into the bottle. The mentality has put downward
pressure on prices in every market segment, including enterprise
software and hardware--with the possible exception of mainframes,
where the downward pressure is on volume.

One fix or another just isn't going to do it, and neither will Bill
Gates wailing about the competency of US grads. The problems are
systemic and pervasive and will yield only to fixes that are systemic
and pervasive.

I don't think that what Andy has proposed is necessarily a waste of
hardware, transistors, performance, or money, but it may lead to even
more sloppy programming. I don't worry much about syntax because it
costs little to get the compiler to find the errors. I'm not so sure
things would be better if programmers were counting on the hardware to
catch programming and conceptual errors.

Robert.

nm...@cam.ac.uk

unread,

Aug 6, 2009, 4:00:13 PM8/6/09

to

In article <6ddf56da-5f53-4430...@e4g2000vbe.googlegroups.com>,
Robert Myers <rbmye...@gmail.com> wrote:
>On Aug 6, 8:28=A0am, "Wilco Dijkstra"
><Wilco.removethisDijks...@ntlworld.com> wrote:
>> "Terje Mathisen" <"terje.mathisen at tmsw.no"> wrote in messagenews:y_adn=
>VlHbcfkOeTXnZ...@giganews.com...
>
>> > Not at all: We should instead use some of all those spare cycles/transi=

>stors to get rid of 99+% of the current buffer
>> > overflow-style bugs/security risks.
>>
>> Why waste hardware effort on something that is a essentially a management

>> issue? Companies are often unwilling to pay the real cost of correct and =

>safe
>> software.
>
>It's really more of an economic issue and should probably be a public
>policy issue. There are lots of incentives to push risk off on the
>buyer, and that's what companies do. Ultimately, much of the risk
>gets pushed off on the public, along with the unspoken argument that
>that's just the way it has to be.

It's a mistake to think of it as simply risk. There are more direct
costs, too.

30+ years ago, it was common to spend more time bypassing constraints
in an application than in doing the task in hand. Nowadays, that is
less common, but it is very common to spend much more time bypassing
and recovering from bugs in applications than in doing the task in
hand. Interestingly, this is restoring constraints, because the
users can't get a feature to work rather than it not being present.

Speaking to computer-naive people, that is true of them as well as
me. And, yes, I really do mean over 50% of their time is spent in
bypassing long-standing bugs in incredibly widely-used applications.

Regards,
Nick Maclaren.

Robert Myers

unread,

Aug 6, 2009, 5:49:43 PM8/6/09

to

On Aug 6, 4:00 pm, n...@cam.ac.uk wrote:
> In article <6ddf56da-5f53-4430-8b0f-aea9aa948...@e4g2000vbe.googlegroups.com>,

>
> It's a mistake to think of it as simply risk. There are more direct
> costs, too.
>
> 30+ years ago, it was common to spend more time bypassing constraints
> in an application than in doing the task in hand. Nowadays, that is
> less common, but it is very common to spend much more time bypassing
> and recovering from bugs in applications than in doing the task in
> hand. Interestingly, this is restoring constraints, because the
> users can't get a feature to work rather than it not being present.
>
> Speaking to computer-naive people, that is true of them as well as
> me. And, yes, I really do mean over 50% of their time is spent in
> bypassing long-standing bugs in incredibly widely-used applications.
>

The place to look for a threat worth demonizing has to be outside the
IT department..

If it's just a matter of the cost of programmer time, who cares where
or how its spent, especially if it is the customer's programmer time
that is being wasted? To the customer's programmer, after all, it's a
job. Why should he fuss?

To the people who count beans and who *ahem* manage things, it's all
just an annoying mess, and they don't want to hear about it (well,
until production is stopped for a few hours, days, or weeks). You'll
never get anywhere complaining how stupid it all is or how annoying
for anyone competent.

The direct costs right now can only be measured as incidences of
failure which are dealt with in the usual hurry up hack and patch
fashion. Until risk is monetized, nothing will change.

I see only two real fixes: either regulation (like building codes) or
monetizing the risk, and I'd far prefer to see a market solution, as
opposed to interminable arguments over specific approaches (like the
one in hand).

Robert.

Andrew Reilly

unread,

Aug 6, 2009, 7:36:05 PM8/6/09

to

On Thu, 06 Aug 2009 14:16:56 +0100, nmm1 wrote:

> In article <_%zem.73246$OM.6...@newsfe06.ams2>, Wilco Dijkstra
> <Wilco.remove...@ntlworld.com> wrote:
>>
>>"Terje Mathisen" <"terje.mathisen at tmsw.no"> wrote in message
>>news:y_adnVlHbcfkOeTX...@giganews.com...
>>>
>>> I'd love to have a 20% "safe" option!
>>
>>That is never going to happen. However a 100% overhead is more than good
>>enough for debugging and testing purposes. We used ValGrind a lot in the
>>past and eventhough it was slow, it was more than sufficient to track
>>down a few nasty bugs.
>
> Actually, I am not sure. Few programs nowadays are limited by CPU, and
> the key is to ensure that you don't need more memory locations (needing
> more data from each location is less critical).
>
> But I agree that a factor of two is usually acceptable - quite often,
> even a factor of ten is!

You guys are talking about a technique for post-facto securing old,
existing code, right?

There are plenty of nice, safe languages that are well inside the factor-
of-two performance of C or C++ that seems to be everyone's comfort
threshold. They'd be good for new code, right?

[Although it doesn't seem to be common in the PC world, there are
approximately equivalent software stacks available on some mobile phones
where everything above the kernel is written in Java.]

Cheers,

--
Andrew

Andrew Reilly

unread,

Aug 6, 2009, 8:11:21 PM8/6/09

to

On Thu, 06 Aug 2009 17:03:53 +0100, nmm1 wrote:

> typedef struct {int A[10]; double B[5]; int C[10];} P; P *Q =
> malloc(sizeof(P));

> memset(Q->A,0,sizeof(P));

Aside from being a deliberate bug/obfuscation, what's wrong with
memset(Q, 0, sizeof *Q);
in this context?

> fred(Q->A,(offsetof(P,C)-offsetof(P,A)));;

>
> In another module, far, far away:
>
> void fred (int *X, size_t Y) {
> int *Z = memset((char *)X+Y,1,sizeof(int)); joe(Z);
> }
>
> You might like to say that is clearly illegal, or does not occur in real
> code, but you would be wrong on both counts. In particular, that
> construction is a CRITICAL part of the design of the INTERFACE of the X
> Windowing System (and, if I recall, the Microsoft one, too). So it
> CANNOT be changed without rewriting all code that uses them. It is also
> used in a fair number of other codes.

I'm no expert on X. Indeed I try to stay away from GUIs altogether. My
understanding is that X is defined by a wire protocol, so I don't see how
this statement can be true in that case. The usual interface to that
wire protocol used to be Xlib. Is that where this construction shows
up? Xlib seems to be in the process of being replaced by by Xcb. I
don't know whether or not that has similar issues.

I think that your main point is valid, though: these things are clearly
possible to be expressed in the language, even though it is clearly
stupid (or at least inadvisable) to do so. What can be expressed will
be, so checking based on a "nice" subset of expressions is not likely to
be useful, overall.

The point that I'm making is that I doubt very much that this sort of
construction is in any way essential or necessary to solving the problem
that was the code's originally purpose. It's just the whim of a code-
hacker who was able to do it because C let him. If he were coding in a
language with strong type constraints and first class (i.e., bounds
checkable) arrays, the job would still be able to be done, and wouldn't
be significantly slower. Large chunks of the early Macintosh OS and
applications were written in a Pascal variant. GUIs and video editing
workstations and similar CAD systems were written stem to stern in
Smalltalk and Lisp. That was before processors had reached 1MIPS, so
it's certainly possible to do with todays' gigaflop bhemohs.

Cheers,

--
Andrew

Andrew Reilly

unread,

Aug 6, 2009, 8:49:29 PM8/6/09

to

On Thu, 06 Aug 2009 12:43:09 -0700, Robert Myers wrote:

> I don't think that what Andy has proposed is necessarily a waste of
> hardware, transistors, performance, or money, but it may lead to even
> more sloppy programming. I don't worry much about syntax because it
> costs little to get the compiler to find the errors. I'm not so sure
> things would be better if programmers were counting on the hardware to
> catch programming and conceptual errors.

When you say it as blunly as that (thanks!) it must be obvious to
everyone that the hardware isn't going to catch the conceptual errors.
In fact, this whole discussion is very much a tail-wagging-the-dog kind
of bug-hunt. Yes, lots of the current security issues that we see
reported come down to stack crashes caused by buffer overruns, and
perhaps some hardware bounds checking could detect that situation and
kill the application before it could do something nasty. (But maybe
not! What if the buffer bound was overrun not by the application but by
the operating system, becuase the application had asked for more data
than the buffer could hold? Would the hardware bounds checks apply to
peripheral DMA engines?) Remember, too, that quite a few of the current
security problems have nothing to do with software quality: the
specifications were flawed. DNS poisoning attacks? Software's fine and
bug-free: protocol is wrong. Spam? SMTP protocol did not envisage that
attack mode and has no answer.

Back up a bit, though: why are the buffers being overrun? Because a
conceptual error at the algorithm level caused that particular constraint
to not be taken into consideration. Are these the only kinds of
constraint that apply to programs and algorithms? Of course not. But
these are the only sorts of mistakes that this test can even hope to
find. A tiny fraction in the space of possible mistakes.

I'm not arguing that we shouldn't be trying to prevent this class of
mistake. We should be working towards testing (or proving, where
possible) *all* of the goals and constraints of our software. Hardware
tools and better languages can certainly help, but that's also a question
of discipline and training and systems and management. Just as you can
write Fortran in any language, you can write mistakes in any language,
too.

Cheers,

--
Andrew

Wilco Dijkstra

unread,

Aug 6, 2009, 8:54:59 PM8/6/09

to

"Andrew Reilly" <andrew-...@areilly.bpc-users.org> wrote in message news:7e17n4F...@mid.individual.net...

> On Thu, 06 Aug 2009 14:16:56 +0100, nmm1 wrote:
>
>> In article <_%zem.73246$OM.6...@newsfe06.ams2>, Wilco Dijkstra
>> <Wilco.remove...@ntlworld.com> wrote:
>>>
>>>"Terje Mathisen" <"terje.mathisen at tmsw.no"> wrote in message
>>>news:y_adnVlHbcfkOeTX...@giganews.com...
>>>>
>>>> I'd love to have a 20% "safe" option!
>>>
>>>That is never going to happen. However a 100% overhead is more than good
>>>enough for debugging and testing purposes. We used ValGrind a lot in the
>>>past and eventhough it was slow, it was more than sufficient to track
>>>down a few nasty bugs.
>>
>> Actually, I am not sure. Few programs nowadays are limited by CPU, and
>> the key is to ensure that you don't need more memory locations (needing
>> more data from each location is less critical).
>>
>> But I agree that a factor of two is usually acceptable - quite often,
>> even a factor of ten is!
>
> You guys are talking about a technique for post-facto securing old,
> existing code, right?

Yes, and that is pretty much impossible beyond the easy cases.

> There are plenty of nice, safe languages that are well inside the factor-
> of-two performance of C or C++ that seems to be everyone's comfort
> threshold. They'd be good for new code, right?

There are plenty of "safe" languages but they do not make software any
safer just like airbags and seatbelts do not make cars any safer. What they
do is reduce the damage in a crash - but the crash still happens. I'm not sure
about you, but personally I'd rather avoid the crash in the first place.

My point was that if we continue with the current "standard" of software
development things won't get any better soon. Bandaids like SoftBound
are just symptoms of a much wider problem. Imagine if bridges and
skyscrapers were built the way most companies develop software...

Wilco

Andrew Reilly

unread,

Aug 6, 2009, 9:06:06 PM8/6/09

to

On Fri, 07 Aug 2009 01:54:59 +0100, Wilco Dijkstra wrote:

> There are plenty of "safe" languages but they do not make software any
> safer just like airbags and seatbelts do not make cars any safer. What
> they do is reduce the damage in a crash - but the crash still happens.
> I'm not sure about you, but personally I'd rather avoid the crash in the
> first place.

I'm all for bug-free software (which is much better than software that
simply doesn't crash). For certain classes of problems, it's nice to
have the airbags too, though. Things outside the software can cause
programs to crash: batteries run down, power cords get tripped over,
alpha particles flip bits. It's nice when those events don't permanently
ruin the system...

Cheers,

--
Andrew

Andy "Krazy" Glew

unread,

Aug 6, 2009, 11:42:23 PM8/6/09

to Wilco Dijkstra

Wilco Dijkstra wrote:
> <nm...@cam.ac.uk> wrote in message news:h5eeje$88n$1...@soup.linux.pwf.cam.ac.uk...
>> In article <3Lyem.13530$wz2....@newsfe08.ams2>,
>> Wilco Dijkstra <Wilco.remove...@ntlworld.com> wrote:
>>> Function pointers don't work properly - you will need to recompile everything,
>>> including shared libraries or annotate functions by hand as not checkable.

I was having trouble understanding why you would say this, when I
realized that you were talking about SoftBound. Yes, one of the
weaknesses of SoftBound is that it requires creating a second version of
any function, in order to pass the bounds. However, this does not need
to be exposed to the user: the compiler can create two versions
automatically, and/or the linker can instantiate the other version -
e.g. a checked stub that throws away all of the bounds and then calls
the unchecked version - using exactly the same technology as is used,
for example, to instatiate template library functions in C++.

Nevertheless, I 100% agree: this is suboptimal.

HardBound does not have this problem. In HardBound there is only one
type of function. A function compiled with the HardBound instructions
can call, and be called by, non-HardBound code with no problem.

Most of the rest of the post is therefore irrelevant. Good points, if
the assumptions they were based on were correct. But the assumptions
are incorrect.

>> That is one of the nastiest problems for ANY checking approach when
>> used on real code. It just isn't realistic to assume that you have
>> access to all the code, and it is cleanly written in your favoured
>> language. For example, it isn't rare for LAPACK to be in Fortran,
>> even when called from C. The converse is even more common - i.e.
>> clean, checked Fortran calls C code or unchecked libraries, and
>> most of the errors occur in the latter because of logical errors in
>> the former.

I agree: it must be possible to have mixed mode. (Indeed, google to
find posts by me circa 2004, discussing the importance of exactly that.)

But, as I point out above, not applicable to HardBound, and not very
applicable to SoftBound.

I would normally have just deleted this irrelevant comment instead of
replying to it. However, it provides me an opportunity to make another
important point:

There are LOTS of buffer overflows in so called safe languages like Java
and C#. Most of them occur in C language libraries called by the safe
languages. Many of these C language libraries do not need to be unsafe,
but take too much effort to rewrite. Systems such as HardBound and
SoftBound, that allow increased safety for C and C++ code, will also
improve the security of Java and C#.

>> Indeed, this is the big problem with C++ - one can forbid the 'unsafe'
>> casts, but <cstring> and some other components are alien code, that
>> can't even be written in clean C++.
>
> I wasn't even thinking about C++ or calls to libraries in different languages.

But I can assure you that I have. Mixed mode is an absolute requirement.

We probably cannot automatically secure all code - we probably cannot
have the compiler automatically insert bounds checking in all functions.
Not in assembly code, for example. (Although as I have already
mentioned, to me me one of the attractions of HardBound is that it can
extend protection to assembly. Not automatically, but it can be used to
detect buffer overflows, if willing to add the annotations. The IBM
AS-400 family operating system, one of the first large OSes written in
C++, is a case in point.)

However, we can extend protection to the large number of C and C++
libraries that are well enough behaved.

Thus requiring hand rewriting of only the relatively small number of
libraries that cannot be automatically compiled with bounds checking.
Like malloc.

> But indeed, those add up to insurmountable problems. Just imagine a
> virtual function with a pointer argument, what should we put in the vtable,
> the original interface or the checked interface? What if it gets called from
> a function member pointer, dynamic library - which version do we call, how
> do we handle transitions between checked and unchecked functions, etc.

Not a problem for HardBound: it does not have two different types of
functions.

Not so bad a problem for SoftBound, since C++ compilers have already
introduced the technology to automatically generate the missing checked
and/or unchecked version, the one calling the other.

> But the worst thing is, another part of our company actually bought it! This stuff
> is sold on baseless fear - if you care about software quality you have got to
> hire the best software engineers and encourage best practices even if that
> means your software will take longer to develop and cost more. There are no
> quick fixes. This applies equally to Java as to C++.
>
> Wilco

What company do you work for, Wilco?

The hope that good programmers can solve this problem is vain. Look at
all of the buffer overflows that in so-called "secure" OSEs like
OpenBSD, recently posted to this group.

Yes: hire the best programmers possible. Take the time necessary. But
your code almost undoubtedly still has bugs, if you program in C or C++.

Andy "Krazy" Glew

unread,

Aug 7, 2009, 12:36:33 AM8/7/09

to nm...@cam.ac.uk

nm...@cam.ac.uk wrote:
> In article <4A7AEDC0...@patten-glew.net>,
> Andy \"Krazy\" Glew <"ag-news AT patten DASH my-last-name DOT net"> wrote:
> Sigh. I will try to explain to you one last time what the problem
> area is. Yes, OF COURSE, there are cases where checking is possible;
> but being able to check only the cases that are trivially checkable
> is not an interesting property, either theoretically or practically.

Sigh.

I will explain to you one last time that the problem is not whether a
compiler can generate checks for all buffer overflows.

The question is what fraction of buffer overflows can be detected.

What is the fraction of true positives (real bugs, that can be turned
ibto security holes) are detected?

What is the fraction of false positives (code that is not buggy, but
that is incorrectly indicated as having a bug)?

It's numbers that matter here. Not pontification. And not worst case.

I hope that Milo Martin will continue his research, and be able to make
such numbers public.

> Consider an object like:
>
> typedef struct {int A[10]; double B[5]; int C[10];} P;
> P *Q = malloc(sizeof(P));
> memset(Q.A,0,sizeof(P));
> fred(Q.A,(offsetof(P,C)-offsetof(P,A)));;
>
> In another module, far, far away:
>
> void fred (int *X, size_t Y) {
> int *Z = memset((char *)X+Y,1,sizeof(int));
> joe(Z);
> }
>
> You might like to say that is clearly illegal, or does not occur in
> real code, but you would be wrong on both counts. In particular,
> that construction is a CRITICAL part of the design of the INTERFACE
> of the X Windowing System (and, if I recall, the Microsoft one, too).
> So it CANNOT be changed without rewriting all code that uses them.
> It is also used in a fair number of other codes.
>
> So what bounds does the checking software pass when fred is called?

By the way, in the last example of such code that I saw, it had been
changed to use the address of the struct, not of the first element of
the struct:

memset((int*)Q,0,sizeof(P));
fred((int*)Q,(offsetof(P,C)-offsetof(P,A)));;

which happens to make it all work hunky dory.

But I agree: in traditional C, the address of a struct is the address of
the first member thereof. So having the compiler enforce bounds blindly
leads to false positives. Steps that can be taken:

1) Rewrite the code, to take address of struct rather than address of
first element. Compile appropriately. If you feel confident, advertize
your code as "New! Improved! Better Security! Fewer Buffer Overflows!"

Note that it is not a major rewrite.

If there are too many such false positives to rewrite code, then you
take steps such as the following:

0) Perhaps you compile all of your code without bounds checking. But
then you deserve any loss of sales to a competitor who has changed the
code, and/or loss of lawsuits to customers whose systems were broken
into because of the flaw.

1) Perhaps you compile some of your code without bounds checking, but
some of it with.

2) Perhaps the compiler adds idioms to detect address of first element
of struct, and/or address of last (for that other classic, struct { int
size; char data[1]; }).

Note that for this last changes to the language standard were made
(struct { int size; char data[]; }), and more and more compilers do not
allow data[1] code to be accessed past 1. Change happens... just slowly.

3) Perhaps the compiler or library vendor adds annotations or a database
indicating what interfaces have what. Microsoft already has such an
annotation system. It is used extensively, particularly for standard
libraries like the windowing system. However, we cannot rely on new
programmers always doing the right thing.

Annotations systems are a crutch.

For important, widely used, libraries like the X Windows systems,
annotations can be added - *because* they are important and widely used,
and the effort is paid back.

If a library is not important and not widely used - well, I still care,
because bugs find such code to break in. But, if it is not widely used,
then by definition there won't be too many false positives.

It is numbers that matter here.

> If that problem were limited to 'leaf' functions like memset, the
> problem could be hacked up by eliminating checks in those or even
> by passing BOTH sets of bounds and using the 'whole allocation'
> ones for such uses. But it isn't. In many important programs,
> such tricks are used to create a pointer which is then passed on.
>
> So what bounds does the checking software pass when joe is called?

This has an unambiguous answer: the bounds of X.

void fred (int *X, size_t Y) {
int *Z = memset((char *)X+Y,1,sizeof(int));
joe(Z);
}

fred is called with pointer argument X, with associated bounds.

memset is called with pointer argument X+Y, which is derived from X, and
hence has the bounds of X. There has been no narrowing operation.

memset returns its first argument. There is no narrowing.

Hence memset returns the bounds of X, and Z is assigned the bounds of X,
and joe is called with the bounds of X.

The question is not how to propagate bounds from function parameters,
across computations, to other subfunction parameters.

The question is what bounds to associate with a pointer initially.
Perhaps with special pleading for the first time a pointer to an object
is passed as a parameter.

This is as it should be. After all, it is (or should be) part of the
semantics of a function how much memory it can tread on.

Nick and Wilco (and others) are correct in saying that C and C++ do not
provide mechanisms that allow the programmer to precisely specify how
much memory is modified.

For that matter, neither C nor C++ provide a mechanism that allows a
compiler to figure out precisely what can be placed in ROM, and what
must be placed in RAM. Nevertheless, we have tools that do this.

It would be nice if C and C++ made a distinction between a pointer to a
scalar of type T, and a pointer into an array of type T. Milo and I
have discussed language extensions that might do this - notations such
as T&& ptr_to_scalar_of_T, instead of T* ptr_into_array_of_T etc. I am
afraid that we do not have a perfect solution. (Actually, C++ comes
close, with T& ref_to_scalar_of_T.

Andy "Krazy" Glew

unread,

Aug 7, 2009, 12:55:13 AM8/7/09

to

Terje Mathisen wrote:
> Andy "Krazy" Glew wrote:
>> Although, as far as I know, having a pointer offset from one malloc'ed
>> object into another is not standards conforming - unless you know a
>> priori about other relationships between the malloc'ed chunks of
>> memory, e.g. unless you know that malloc'ing is being done out of a
>> common array of bytes.
>
> This is the standard pattern I use for array processing, i.e. use a
> single pointer to one of the arrays plus constant offsets from that
> array to each of the others:

If declared as "struct of array"

struct SOA {
int r[1024];
int g[1024];
int b[1024];
};

this is fine - you just use a pointer to the SOA, and offsets from the
SOA to the arrays.

Even dynamically:
malloc( sizeof(array1) + sizeof(array2) + sizeof(array3) );

If doing it between different malloc'ed arrays

char p1[n1];
...
char p2[n2];
...
int delta12 = p2-p1;
...
// use p1 + delta12 instead of p2

Well,

a) you shouldn't do that. But, you and I have both done this, eh, Terje?

b) I don't think it is guaranteed to work according to the C standard.
Indeed, I think that delta12 may be an undefined calue.

c) Indeed, if memory serves me correctly, many C compilers for the old
segmented x86 architecture would do exactly this: they would implement
pointer difference by subtracting the segment offset parts, ignoring the
segment base. (What mode was that? Not huge mode; huge mode did the
right thing.)

Nobody is saying HardBound and SoftBound will work everywhere. But it
works most places. The question is, what are the numbers? False
positives, false negatives, true positives...

By the way, back when I worked at Gould and was involved in Secure Unix,
the assumption was that false negatives were the bad thing. False
positives were okay, as long as they could easily be converted into true
negatives by only minor recoding.

In the commercial world, false positives are the bug barrier to acceotance.

> This only depends on the ability to move a pointer to one item beyond
> the end of the allocated array, without ever accessing it.
>
> Since the same capability is a basic requirement for the entire C++
> template library, it won't go away, and any checking sw has to allow it.

Of course. HardBound and SoftBound check only at dereference, not at
pointer arithmetic.

(Actually, Jones & Kelly, IIRC, checked at pointer arithmetic, but set a
flag, and only reported an error if dereferenced. This had a nice
benefit in that it allowed detecting going more than 1 past the end or
before the beginning.)

Andy "Krazy" Glew

unread,

Aug 7, 2009, 1:01:24 AM8/7/09

to Wilco Dijkstra

Wilco Dijkstra wrote:
> Why waste hardware effort on something that is a essentially a management
> issue? Companies are often unwilling to pay the real cost of correct and safe
> software.

Because security is all about finding the points of maximum leverage,
and fixing those.

Instead of fixing millions of programmers
fix the far smaller number of compiler writers
by deploying a feature in the chip
that all of them execute code on.

Saying "security is a management problem" is a lot like saying "we would
not have crime, if only everyone were well behaved". Thanks, but I
prefer to live in a society that has a police force.

Andy "Krazy" Glew

unread,

Aug 7, 2009, 1:07:00 AM8/7/09

to Andrew Reilly

Andrew Reilly wrote:
> You guys are talking about a technique for post-facto securing old,
> existing code, right?

Yes.

> There are plenty of nice, safe languages that are well inside the factor-
> of-two performance of C or C++ that seems to be everyone's comfort
> threshold. They'd be good for new code, right?

Yes, but... lots of these nice safe libraries call libraries written in
native or unsafe code. Which have bugs.

> [Although it doesn't seem to be common in the PC world, there are
> approximately equivalent software stacks available on some mobile phones
> where everything above the kernel is written in Java.]

By the way, what is the trend there? I used to have great hope that
cell phones would lead to safe Java taking over the world. But,
instead, the trend seems to be going the other way: C and C++ based
phones OSes like Android. I think Windows Mobile is C and C++. Isn't
iPhone? The saving grace may be that Android applications are supposed
to be Java. But, people on mailing lists complain bitterly. And, there
are already C compilers that compile to Java bytecodes, including the
classic "The whole world is a single array of int" (I don't need any
steenking type safety).

Andy "Krazy" Glew

unread,

Aug 7, 2009, 1:15:02 AM8/7/09

to Robert Myers

Robert Myers wrote:

> I don't think that what Andy has proposed

Not me. Not in this news string. (Although I proposed similar stuff
earlier in comp.arch.)

Credit Milo Martin and his students such as Joe DeVietti.

I'm just a fan.

> I don't think that what Andy has proposed is necessarily a waste of
> hardware, transistors, performance, or money, but it may lead to even
> more sloppy programming. I don't worry much about syntax because it
> costs little to get the compiler to find the errors. I'm not so sure
> things would be better if programmers were counting on the hardware to
> catch programming and conceptual errors.

I share this concern.

HardBound and SoftBound would lead to more system interruptions.
Instead of a buffer overflow allowing a bad guy to break into your
system, they would take the subsystem down. But the webserver would
reboot or restart.

There might be fewer bugs overall, however, because more would be caught
in testing.

---

It's a lot like Java and Javascript: the ability to fix bugs easily in
production has led to initial code drops often being buggier than in the
old days.

Terje Mathisen

unread,

Aug 7, 2009, 3:36:02 AM8/7/09

to

Wilco Dijkstra wrote:
> are just symptoms of a much wider problem. Imagine if bridges and
> skyscrapers were built the way most companies develop software...

"... the first woodpecker to come along would destroy civilization"

Actually, this isn't _quite_ true:

We do have woodpeckers these days, in the form of virus/worm/root kit
writers, and we more or less stumble along, spending lots and lots of
money to fix things that shouldn't have needed fixing in the first place.

nm...@cam.ac.uk

unread,

Aug 7, 2009, 4:20:23 AM8/7/09

to

In article <K5mdnTtfNfH-RObX...@giganews.com>,

Terje Mathisen <"terje.mathisen at tmsw.no"> wrote:
>Wilco Dijkstra wrote:
>> are just symptoms of a much wider problem. Imagine if bridges and
>> skyscrapers were built the way most companies develop software...
>
>"... the first woodpecker to come along would destroy civilization"
>
>Actually, this isn't _quite_ true:
>
>We do have woodpeckers these days, in the form of virus/worm/root kit
>writers, and we more or less stumble along, spending lots and lots of
>money to fix things that shouldn't have needed fixing in the first place.

What is potentially more serious is an attack of woodworm. When a
woodpecker causes one place to collapse, the parts holding that up
then fail under the strain, causing the parts that support the to
follow suit, ....

And we have that, too - but it's not yet caused catastrophic collapse.

Regards,
Nick Maclaren.

nm...@cam.ac.uk

unread,

Aug 7, 2009, 4:25:20 AM8/7/09

to

In article <AXKem.183591$Tk2.1...@newsfe21.ams2>,

Wilco Dijkstra <Wilco.remove...@ntlworld.com> wrote:
>
>"Andrew Reilly" <andrew-...@areilly.bpc-users.org> wrote in message news:7e17n4F...@mid.individual.net...
>

>> There are plenty of nice, safe languages that are well inside the factor-
>> of-two performance of C or C++ that seems to be everyone's comfort
>> threshold. They'd be good for new code, right?
>
>There are plenty of "safe" languages but they do not make software any
>safer just like airbags and seatbelts do not make cars any safer. What they
>do is reduce the damage in a crash - but the crash still happens. I'm not sure
>about you, but personally I'd rather avoid the crash in the first place.

A more serious problem is that a lot of "safe" languages "eliminate"
errors by defining their effects. While that does eliminate others,
it turns most of the detectable errors into undetectable logic ones.

Java does that with all numeric errors except integer division by
zero.

Garbage collectors do that with scoping errors, leading to memory
leaks and cases where you are accessing the wrong version of an
object.

And so on.

Regards,
Nick Maclaren.

nm...@cam.ac.uk

unread,

Aug 7, 2009, 4:39:56 AM8/7/09

to

In article <7e19p8F...@mid.individual.net>,

Andrew Reilly <andrew-...@areilly.bpc-users.org> wrote:
>
>> typedef struct {int A[10]; double B[5]; int C[10];} P; P *Q =
>> malloc(sizeof(P));
>> memset(Q->A,0,sizeof(P));
>
>Aside from being a deliberate bug/obfuscation, what's wrong with
>memset(Q, 0, sizeof *Q);
>in this context?

Nothing. But it's irrelevant.

>I'm no expert on X. Indeed I try to stay away from GUIs altogether. My
>understanding is that X is defined by a wire protocol, so I don't see how

>this statement can be true in that case. ...

Sorry, but you have been confused by some of the nonsense on the
Web of a Million Lies.

The raw X Protocol isn't that simple, anyway, but let that pass.
What you have missed is that it comes with a huge amount of mandatory,
fiendishly complicated usage conventions, that make it effectively
impossible to program directly (and interoperate with any existing
applications). Those are added by the API library (Xlib), the X
Toolkit, the widget sets and the window managers; that programming
convention is central to several of those.

> Xlib seems to be in the process of being replaced by by Xcb. I
>don't know whether or not that has similar issues.

Don't bet on it, and I don't know. After I spent 7 months in the
cess pit that is X programming, I swore never again.

>I think that your main point is valid, though: these things are clearly
>possible to be expressed in the language, even though it is clearly
>stupid (or at least inadvisable) to do so. What can be expressed will
>be, so checking based on a "nice" subset of expressions is not likely to
>be useful, overall.

Precisely. I will give some well-known evidence to Andy Glew, but
with no great hope of effect.

>The point that I'm making is that I doubt very much that this sort of
>construction is in any way essential or necessary to solving the problem
>that was the code's originally purpose. It's just the whim of a code-

>hacker who was able to do it because C let him. ...

Close, very close, but at the wrong level. The intent of the DESIGN
was to make a clean, 'object-oriented' interface. So far, so good.
Thereafter, things developed precisely as you said.

Regards,
Nick Maclaren.

nm...@cam.ac.uk

unread,

Aug 7, 2009, 5:12:45 AM8/7/09

to

In article <4A7BAF51...@patten-glew.net>,

Andy \"Krazy\" Glew <"ag-news AT patten DASH my-last-name DOT net"> wrote:
>
>> Sigh. I will try to explain to you one last time what the problem
>> area is. Yes, OF COURSE, there are cases where checking is possible;
>> but being able to check only the cases that are trivially checkable
>> is not an interesting property, either theoretically or practically.
>
>Sigh.
>
>I will explain to you one last time that the problem is not whether a
>compiler can generate checks for all buffer overflows.

That is the first time, as well. As least you have now admitted
that the technique doesn't do a proper job of checking even a single
category of errors.

>The question is what fraction of buffer overflows can be detected.
>
>What is the fraction of true positives (real bugs, that can be turned
>ibto security holes) are detected?

As those of us who have experience in this area know very well, those
are NOT the right questions. There is a LOT of mathematical theory
and practical experience, and they confirm each other's conclusions.

Firstly, the question should be what fraction of the expected cost
(i.e. the probability of the problem occurring multiplied by the
cost if it does). Standard cost-benefit analysis, backed up by
game theory. Mere frequency counting is a politician's approach.

Secondly, there is a lot of experience that such partial checkers
are often (even usually) counter-productive. Programmers, auditors
and managers start to trust them and stop looking for bugs (or
blame the innocent) when the checker comes up clean.

Thirdly, there is 35+ years of experience with such techniques,
mostly in Fortran. Given what you now admit, SoftBound/HardBound
do roughly the equivalent of the 'common' form of Fortran bounds
checking (i.e. against the declared sizes). Since you seem to be
unaware of this experience, let me explain.

It is tricky to pass array bound information for assumed size (or
even explicit size) arrays in Fortran 77, and most compilers don't.
So most debugging compilers' bounds checks compare only against
the locally declared dimensions - i.e. there is effectively no
checking across procedure calls.

Experience is that this picks up most of the errors made by people
just learning to program, but very few of those made by people with
even a few months' experience. And, worse, it picks up primarily
the errors that are easier to find by hand! So few people bother
to enable it, even if they have it.

This is comparable with the frequent claim that syntax checking
editors increase productivity by 3 times - who spends almost all
of their time writing incorrect syntax? - kiddies learning to
program and senior executives playing at programming, that's who.

NAG does better, as did/(does?) Fujitsu Fortran, WATFOR/WATFIV and
some others. All experience is that the better tools found only
a few more bounds errors by count, but saved between ten and a
hundred times more debugging time than the other ones. Yes, THAT
much.

>What is the fraction of false positives (code that is not buggy, but
>that is incorrectly indicated as having a bug)?

No, it isn't. Again, you are using a political and not a technical
measure. It is the probability of such positives times the cost of
bypassing them or ignoring the problem.

For example, if you get ONE such problem that causes failure and is
infeasible to bypass, it renders the whole mechanism infeasible for
any program that uses the offending technique.

Similarly, if you just produce warnings and have ONE problem that
produces thousands of false positives, each of which has to be
analysed for whether it is genuine, few people will waste time using
the checker.

[ And please don't waste our time saying that such things can be
eliminated by a script - if they could be, SoftBound could do it. ]

Look, Andy, I am prepared to accept that you know a hundred times
as much about hardware design as I do. In this area, the ratio is
likely to be reversed - I have spent most of my working life in
and around it.

Regards,
Nick Maclaren.

Bernd Paysan

unread,

Aug 7, 2009, 5:17:07 AM8/7/09

to

Andrew Reilly wrote:
> Back up a bit, though: why are the buffers being overrun? Because a
> conceptual error at the algorithm level caused that particular constraint
> to not be taken into consideration.

It's also a programming language issue. C simply is sloppy about buffers -
a buffer is a pointer, with unspecified size. That's how K&R designed the
language, that's how the library functions use it. People tried to
deprecate the worst offenders, but people learn the C programming style,
because the teaching material either comes from K&R or has been strongly
influenced by them. And since C is popular, C-style bugs are widespread.
Also, changing the language doesn't help much, because there's so much
legacy code around.

One reason why you want single pointers in C, and not addr/len buffers is
that passing multiple return values is so much more difficult. Compare that
to Forth: This language has a similar close-to-the-metal semantics as C, but
buffer overruns are rare. Forth uses addr/len for buffers quite
consistently, so people follow that style. And passing multiple return
values in Forth is dead easy. It's also much easier to extend the language
to new data types, so if you like to have a string data type which keeps all
the details like growing and shrinking the necessary buffer hidden from the
user, it's easy. It's not that complicated in C++, but in C, it is a pain.

A number of approaches to "solve" the problem have been made, like Java, C#,
D, but they all are new languages, they don't help the legacy problem. And
often, they just go too far: They change the semantics of the language to be
not close-to-the-metal. Yes, this helps to eliminate bugs, but it also
means that you can't do certain things. If the language prevents you from
doing things you want to, you'll use another language, like C.

--
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://www.jwdt.com/~paysan/

Wilco Dijkstra

unread,

Aug 7, 2009, 5:38:08 AM8/7/09

to

"Andy "Krazy" Glew" <ag-...@patten-glew.net> wrote in message news:4A7BB524...@patten-glew.net...

> Wilco Dijkstra wrote:
>> Why waste hardware effort on something that is a essentially a management
>> issue? Companies are often unwilling to pay the real cost of correct and safe
>> software.
>
> Because security is all about finding the points of maximum leverage, and fixing those.
>
> Instead of fixing millions of programmers
> fix the far smaller number of compiler writers
> by deploying a feature in the chip
> that all of them execute code on.
>
> Saying "security is a management problem" is a lot like saying "we would not have crime, if only everyone were well
> behaved". Thanks, but I prefer to live in a society that has a police force.

The compiler is indeed the right place to improve things. But C/C++ compilers
are already overly complex beasts, and codegeneration bugs are not at all rare.
Adding additional complex stuff on top of it cannot be good. What is the chance
we get concensus of the exact semantics and behaviour of the checks? If each
compiler does it differently then compilers with the least effective checks will
become mainstream.

Why? The reason is that users simply do not want to know about errors. Some
people compile without optimizations "because the optimizer is buggy" rather
than admitting that their code may be incorrect. Compilers with the least strict
adherence to the C/C++ standards are more successful simply because they
annoy people less with compile time errors. Unbelievable but true. With this
kind of culture, what are the chances the bounds checks even get turned on?

It's a bit like having a police force but telling them not to arrest anyone.

Wilco

Wilco Dijkstra

unread,

Aug 7, 2009, 6:29:25 AM8/7/09

to

"Andy "Krazy" Glew" <ag-...@patten-glew.net> wrote in message news:4A7BA29F...@patten-glew.net...

> Wilco Dijkstra wrote:
>> <nm...@cam.ac.uk> wrote in message news:h5eeje$88n$1...@soup.linux.pwf.cam.ac.uk...
>>> In article <3Lyem.13530$wz2....@newsfe08.ams2>,
>>> Wilco Dijkstra <Wilco.remove...@ntlworld.com> wrote:
>>>> Function pointers don't work properly - you will need to recompile everything,
>>>> including shared libraries or annotate functions by hand as not checkable.
>
> I was having trouble understanding why you would say this, when I realized that you were talking about SoftBound.
> Yes, one of the

Yes, I was mostly talking about SoftBound here.

> weaknesses of SoftBound is that it requires creating a second version of any function, in order to pass the bounds.
> However, this does not need to be exposed to the user: the compiler can create two versions automatically, and/or the
> linker can instantiate the other version - e.g. a checked stub that throws away all of the bounds and then calls the
> unchecked version - using exactly the same technology as is used, for example, to instatiate template library
> functions in C++.

So how do you handle the case of taking the address of the checked
version in one file and calling that function pointer from a non-checked file?
At the point of call you don't know that it is the checked version with a
different interface.

> Nevertheless, I 100% agree: this is suboptimal.
>
> HardBound does not have this problem. In HardBound there is only one type of function. A function compiled with the
> HardBound instructions can call, and be called by, non-HardBound code with no problem.
>
> Most of the rest of the post is therefore irrelevant. Good points, if the assumptions they were based on were
> correct. But the assumptions are incorrect.

Sure HardBound works better in mixed mode where some of the code
isn't recompiled.

>> But indeed, those add up to insurmountable problems. Just imagine a
>> virtual function with a pointer argument, what should we put in the vtable,
>> the original interface or the checked interface? What if it gets called from
>> a function member pointer, dynamic library - which version do we call, how
>> do we handle transitions between checked and unchecked functions, etc.
>
> Not a problem for HardBound: it does not have two different types of functions.
>
> Not so bad a problem for SoftBound, since C++ compilers have already introduced the technology to automatically
> generate the missing checked and/or unchecked version, the one calling the other.

It's not the generation that is the issue, it is calling the correct version. If you
can't understand there is a major problem here then you need to actually try
working out the scenario I described at the start. If you find a solution, let me
know and I'll explain why it doesn't work. :-)

>> But the worst thing is, another part of our company actually bought it! This stuff
>> is sold on baseless fear - if you care about software quality you have got to
>> hire the best software engineers and encourage best practices even if that
>> means your software will take longer to develop and cost more. There are no
>> quick fixes. This applies equally to Java as to C++.
>>
>> Wilco
>
> What company do you work for, Wilco?

I run my own compiler design / software optimization consulting business.
The example I gave was with a previous company. The bounds checking stuff
was inspired by a drive to ISO9000 certification.

> The hope that good programmers can solve this problem is vain. Look at all of the buffer overflows that in so-called
> "secure" OSEs like OpenBSD, recently posted to this group.
>
> Yes: hire the best programmers possible. Take the time necessary. But your code almost undoubtedly still has bugs, if
> you program in C or C++.

"Safe" language are no different, they don't magically remove all the bugs.
You still have buffer overflows - bounds checking just makes it harder for malicious
software to inject its payload (but causing a program to crash is good enough for
a DOS attack).

Wilco

Terje Mathisen

unread,

Aug 7, 2009, 6:35:01 AM8/7/09

to

Andy "Krazy" Glew wrote:

> Terje Mathisen wrote:
> If doing it between different malloc'ed arrays
>
> char p1[n1];
> ...
> char p2[n2];
> ...
> int delta12 = p2-p1;
> ...
> // use p1 + delta12 instead of p2
>
> Well,
>
> a) you shouldn't do that. But, you and I have both done this, eh, Terje?

Only in asm, at which point portability (and sw checking) is out of the
question.

>
> b) I don't think it is guaranteed to work according to the C standard.
> Indeed, I think that delta12 may be an undefined calue.
>
> c) Indeed, if memory serves me correctly, many C compilers for the old
> segmented x86 architecture would do exactly this: they would implement
> pointer difference by subtracting the segment offset parts, ignoring the
> segment base. (What mode was that? Not huge mode; huge mode did the
> right thing.)

Let's see...

We had

Tiny (64K total for code & data)
Small (64K code + 64K data)
Medium (64K code + 1M data using segment+offset)
Large (1M code, 1M data, both segm:offs)
Huge (1M code segm:offs,
1M data using normalized segment + 0-15 byte offset)

"huge model did the right thing"???

No way! It encoded the 4-bit segment shift directly in all operations,
making it totally impossible to run under any kind of real segmentation
model.

>
> Nobody is saying HardBound and SoftBound will work everywhere. But it
> works most places. The question is, what are the numbers? False
> positives, false negatives, true positives...
>
>
> By the way, back when I worked at Gould and was involved in Secure Unix,
> the assumption was that false negatives were the bad thing. False
> positives were okay, as long as they could easily be converted into true
> negatives by only minor recoding.
>
> In the commercial world, false positives are the bug barrier to acceotance.
>

False positives are like the compiler warning messages under -w3 or -w4:
Lots of code has been developed using rules like "no warning messages!",
leading to huge amounts of hand-coded casts which effectively hide some
of the underlying bugs.

nm...@cam.ac.uk

unread,

Aug 7, 2009, 6:42:42 AM8/7/09

to

In article <0CSem.234715$bA.4...@newsfe16.ams2>,
In article <11079249....@elfi.zetex.de>,

Bernd Paysan <bernd....@gmx.de> wrote:
>Andrew Reilly wrote:
>> Back up a bit, though: why are the buffers being overrun? Because a
>> conceptual error at the algorithm level caused that particular constraint
>> to not be taken into consideration.
>
>It's also a programming language issue. C simply is sloppy about buffers -
>a buffer is a pointer, with unspecified size. That's how K&R designed the
>language, that's how the library functions use it. People tried to

>deprecate the worst offenders, ...

Don't blame K&R C - it is actually cleaner than ISO C! The biggest
problem is that ISO C adds some constraints that appear to restrict
pointers, and hence deceive the naive into believing that such things
can be checked, but many other sections of the standard rely on (and
specify!) the absence of such checks.

If anyone wants to learn more about this mess, my Objects document
describes the situation with many references to the C and C++ standards.
For the practical issues, all you have to do is to look at real code,
and try making the changes Andy suggests on some real application
source - been there, done that, gave it up as a hopeless task :-(

>One reason why you want single pointers in C, and not addr/len buffers is

>that passing multiple return values is so much more difficult. ...

Not really. Structure valued results have been there since C89, and
Fortran solved the problem longer before for COMPLEX results on
machines with accumulators. It's a semantic problem, not really an
implementation one.

Wilco Dijkstra <Wilco.remove...@ntlworld.com> wrote:
>
>The compiler is indeed the right place to improve things. But C/C++ compilers
>are already overly complex beasts, and codegeneration bugs are not at all rare.

God help us, yes :-( When the C bandwagon started rolling, it was
claimed to be a small language, unlike the monster that was Fortran
(and that was Fortran 77!) Unfortunately, that was never true; it
was a small COMPILER, and the language was defined by the effects of
the compiler, library and operating system. By the time that
SC22WG14 had specified the language, even as far as they did, it was
a LARGE language - mainly because of the interactions between its
non-orthogonal features.

On most of the systems I used with separate compilers, the C89 compiler
as larger than the Fortran 90/95 one. Nowadays, it isn't feasible to
do the comparison, as C is normally rolled in with C++.

>Adding additional complex stuff on top of it cannot be good. What is the chance
>we get concensus of the exact semantics and behaviour of the checks? If each
>compiler does it differently then compilers with the least effective checks will
>become mainstream.

As I stated explicitly, nil. If there's no consensus in SC22WG14,
don't expect it more widely. And, since 1990, I have observed that
the C compilers that did impose some of the extra constraints have
removed them or downgraded them to warnings.

Regards,
Nick Maclaren.

Andrew Reilly

unread,

Aug 7, 2009, 10:07:04 AM8/7/09

to

On Fri, 07 Aug 2009 11:29:25 +0100, Wilco Dijkstra wrote:

> So how do you handle the case of taking the address of the checked
> version in one file and calling that function pointer from a non-checked
> file? At the point of call you don't know that it is the checked version
> with a different interface.

Isn't this the same sort of problem as the FFI schim in all non-C
languages? Treat SoftBound/C as non-C with callouts to regualr C through
an FFI and it should all work, more-or-less. (C++ could be trickier:
most FFIs that I know of work at C level.)

Speaking of similar things, I've read (on usenet) of another safe-C
system that compiles C code to a strict model, with wide (bounded)
pointers. Can't remember the name at the moment. Same sort of issue,
though.

Cheers,

--
Andrew

Andrew Reilly

unread,

Aug 7, 2009, 10:31:58 AM8/7/09

to

On Thu, 06 Aug 2009 22:07:00 -0700, Andy \"Krazy\" Glew wrote:

> Andrew Reilly wrote:
>> You guys are talking about a technique for post-facto securing old,
>> existing code, right?
>
> Yes.
>
>> There are plenty of nice, safe languages that are well inside the
>> factor- of-two performance of C or C++ that seems to be everyone's
>> comfort threshold. They'd be good for new code, right?
>
> Yes, but... lots of these nice safe libraries call libraries written in
> native or unsafe code. Which have bugs.

That's going to be a given, for a while, isn't it? So it's still a win
if "new code" is self consistent and safe, right? Unsafe libraries that
are popular will become safe over time, or become unpopular.

>> [Although it doesn't seem to be common in the PC world, there are
>> approximately equivalent software stacks available on some mobile
>> phones where everything above the kernel is written in Java.]
>
> By the way, what is the trend there?

Nothing coherent, as far as I can tell.

> I used to have great hope that
> cell phones would lead to safe Java taking over the world. But,
> instead, the trend seems to be going the other way: C and C++ based
> phones OSes like Android.

Android is one of the more Java-centric platforms, I believe. Haven't
checked in detail, though. (Of course, the Linux underpinnings are pure
C.)

> I think Windows Mobile is C and C++.

Yes. Has .NET, but that seems to get as much use as it does on PC
variety Windows.

> Isn't
> iPhone?

That's OS-X (Mach, BSD and Quartz, all C; with Objective C on top). At
least a fair chunk of the applications seem to be in Javascript, which
isn't completely awful.

> The saving grace may be that Android applications are supposed
> to be Java. But, people on mailing lists complain bitterly.

I haven't followed the lists. Android does "not-quite-Java", in that
they replace some of the Java-standard libraries with their own, which is
why they have to do a peculiar not-quite-JVM installation dance, to avoid
legal issues. So it wouldn't surprise me.

> And, there
> are already C compilers that compile to Java bytecodes, including the
> classic "The whole world is a single array of int" (I don't need any
> steenking type safety).

Well, even Sun's JVM compiles bytecodes to native code. GCJ (a GCC front
end) compiles bytecodes to native, but the runtime has to include an
interpreter to cope with code arriving at "run-time".

There's nothing particularly magical about compiling to native or to C.
There are Common Lips, Scheme and Eiffel (and probably other safe
languages) compilers that compile through C, and that doesn't make them
necessarily less safe (apart from when they call out to non-safe
libraries, but everyone has that problem, and most have "turn off safety"
compile time options.)

Cheers,

--
Andrew

nm...@cam.ac.uk

unread,

Aug 7, 2009, 10:35:32 AM8/7/09

to

In article <7e2qo8F...@mid.individual.net>,

Andrew Reilly <andrew-...@areilly.bpc-users.org> wrote:
>On Fri, 07 Aug 2009 11:29:25 +0100, Wilco Dijkstra wrote:
>
>> So how do you handle the case of taking the address of the checked
>> version in one file and calling that function pointer from a non-checked
>> file? At the point of call you don't know that it is the checked version
>> with a different interface.
>
>Isn't this the same sort of problem as the FFI schim in all non-C
>languages? Treat SoftBound/C as non-C with callouts to regualr C through
>an FFI and it should all work, more-or-less. (C++ could be trickier:
>most FFIs that I know of work at C level.)

FFI? Foreign Function Interface? People never seem to weary of
inventing TLAs :-(

Yes, it's the same problem, and it doesn't all work, not even
more-or-less. It's fairly easy provided that you never pass a
pointer back, but that immediately leads to you having completely
uncheckable pointers in your checked code.

>Speaking of similar things, I've read (on usenet) of another safe-C
>system that compiles C code to a strict model, with wide (bounded)
>pointers. Can't remember the name at the moment. Same sort of issue,
>though.

Yup. Two or three have been reinvented, hailed as the solution to
all C's unreliability problems, and have sunk like stones. They
all work excellently on simple demonstration programs, and leave
all of the hard problems 'for development' - i.e. they never get
beyond the CS project stage.

Regards,
Nick Maclaren.

Andrew Reilly

unread,

Aug 7, 2009, 10:59:24 AM8/7/09

to

On Fri, 07 Aug 2009 15:35:32 +0100, nmm1 wrote:
> FFI? Foreign Function Interface? People never seem to weary of
> inventing TLAs :-(

Yes. Seemed to be fairly pervasive, so I didn't bother to define. Sorry
about that.

> Yes, it's the same problem, and it doesn't all work, not even
> more-or-less. It's fairly easy provided that you never pass a pointer
> back, but that immediately leads to you having completely uncheckable
> pointers in your checked code.
>
>>Speaking of similar things, I've read (on usenet) of another safe-C
>>system that compiles C code to a strict model, with wide (bounded)
>>pointers. Can't remember the name at the moment. Same sort of issue,
>>though.
>
> Yup. Two or three have been reinvented, hailed as the solution to all
> C's unreliability problems, and have sunk like stones. They all work
> excellently on simple demonstration programs, and leave all of the hard
> problems 'for development' - i.e. they never get beyond the CS project
> stage.

It just struck me that there *is* a C compiler for the AS-400 that must
have to deal with exactly these sorts of issues. Presumably it must
compile some fairly useful codebases, or else they wouldn't bother to
maintain it. I wonder what the stories from the trenches are, about what
works and what is impossible to make work?

[AS-400 is essentially HardBound by design, and is in fairly wide use, if
perhaps not within the fields that most of us here inhabit. So it is
somthing of an existence proof, perhaps.]

Cheers,

--
Andrew

Robert Swindells

unread,

Aug 7, 2009, 11:18:10 AM8/7/09

to

Andrew Reilly wrote:
>On Fri, 07 Aug 2009 15:35:32 +0100, nmm1 wrote:
>> Yes, it's the same problem, and it doesn't all work, not even
>> more-or-less. It's fairly easy provided that you never pass a pointer
>> back, but that immediately leads to you having completely uncheckable
>> pointers in your checked code.
>>
>>>Speaking of similar things, I've read (on usenet) of another safe-C
>>>system that compiles C code to a strict model, with wide (bounded)
>>>pointers. Can't remember the name at the moment. Same sort of issue,
>>>though.
>>
>> Yup. Two or three have been reinvented, hailed as the solution to all
>> C's unreliability problems, and have sunk like stones. They all work
>> excellently on simple demonstration programs, and leave all of the hard
>> problems 'for development' - i.e. they never get beyond the CS project
>> stage.
>

>It just struck me that there *is* a C compiler for the AS-400 that must
>have to deal with exactly these sorts of issues. Presumably it must
>compile some fairly useful codebases, or else they wouldn't bother to
>maintain it. I wonder what the stories from the trenches are, about what
>works and what is impossible to make work?

There was also the Symbolics C compiler, it apparently could compile
the MIT X server to run on a Lisp Machine, I don't know whether this
was X11 or just X10R4 though.

I guess it could have used locatives (effectively pointers) for the
nasty stuff.

Robert Swindells

nm...@cam.ac.uk

unread,

Aug 7, 2009, 11:30:40 AM8/7/09

to

In article <7e2tqbF...@mid.individual.net>,

Andrew Reilly <andrew-...@areilly.bpc-users.org> wrote:
>
>It just struck me that there *is* a C compiler for the AS-400 that must
>have to deal with exactly these sorts of issues. Presumably it must
>compile some fairly useful codebases, or else they wouldn't bother to
>maintain it. I wonder what the stories from the trenches are, about what
>works and what is impossible to make work?

Not according to my IBM sources, who should know, there isn't! Yes,
there is some sort of compiler, but the originally planned full
interoperability was abandoned as a silly idea.

And, if you want stories from the trenches, I was peripherally
involved with that project at one stage - indeed, I was the person
who first pointed out to the relevant project that they HAD to
include C in it!

Regards,
Nick Maclaren.

Robert Myers

unread,

Aug 7, 2009, 1:39:05 PM8/7/09

to

On Aug 6, 8:54 pm, "Wilco Dijkstra"
<Wilco.removethisDijks...@ntlworld.com> wrote:

> My point was that if we continue with the current "standard" of software
> development things won't get any better soon. Bandaids like SoftBound
> are just symptoms of a much wider problem. Imagine if bridges and
> skyscrapers were built the way most companies develop software...

Mistakes *do* happen, sometimes spectacular ones:

http://www.duke.edu/~hpgavin/ce131/citicorp1.htm

<quote>

He later detailed these mistakes in a thirty-page document called
"Project SERENE''; the acronym, both rueful and apt, stands for
"Special Engineering Review of Events Nobody Envisioned." What emerges
from this document, which has been confidential until now, and from
interviews with LeMessurier and other principals in the events, is not
malfeasance, or even negligence, but a series of miscalculations that
flowed from a specific mind-set. In the case of the Citicorp tower,
the first event that nobody envisioned had taken place when
LeMessurier sketched, on a restaurant napkin, a bracing system with an
inherent sensitivity to quartering winds. None of his associates
identified this as a problem, let alone understood that they were
compounding it with their fuzzy semantics. In the stiff, angular
language of "Project SERENE," "consideration of wind from non-
perpendicular directions on ordinary rectangular buildings is
generally not discussed in the literature or in the classroom."

</quote>

This ultimately led to tearing off drywall in the finished building
and welding previously-bolted joints. The consultant was in
Cambridge, Massachusetts, but, I must emphasize, teaching at
*Harvard*, not the lesser institution.

One might draw all kinds of analogies to the sorts of things that go
wrong with computers, with one crucial difference. No one
contemplated allowing the structure to stand as it was built.

Pointers in c are the inadequately strong bolted joints of software
engineering, but the dangerous structure still stands.

Robert.

Jean-Marc Bourguet

unread,

Aug 7, 2009, 1:00:01 PM8/7/09

to

nm...@cam.ac.uk writes:

> In article <0CSem.234715$bA.4...@newsfe16.ams2>,

> If anyone wants to learn more about this mess, my Objects document
> describes the situation with many references to the C and C++ standards.

Is it somewhere publicly accessible?

Yours,

--
Jean-Marc

nm...@cam.ac.uk

unread,

Aug 7, 2009, 2:02:04 PM8/7/09

to

In article <87zlab8...@news.bourguet.org>,

Jean-Marc Bourguet <j...@bourguet.org> wrote:
>
>> If anyone wants to learn more about this mess, my Objects document
>> describes the situation with many references to the C and C++ standards.
>
>Is it somewhere publicly accessible?

It's fairly widely circulated, but no. I will send you a copy.

Regards,
Nick Maclaren.

Andy "Krazy" Glew

unread,

Aug 7, 2009, 9:45:05 PM8/7/09

to Wilco Dijkstra

> In article <7e2qo8F...@mid.individual.net>,

> Andrew Reilly <andrew-...@areilly.bpc-users.org> wrote:
>> On Fri, 07 Aug 2009 11:29:25 +0100, Wilco Dijkstra wrote:

>> Speaking of similar things, I've read (on usenet) of another safe-C
>> system that compiles C code to a strict model, with wide (bounded)
>> pointers. Can't remember the name at the moment. Same sort of issue,
>> though.

Fat pointers.

Works if you are starting from scratch. The IBM AS-400 is a case in point.

Doesn't work at all well if you need compatibility with existing code
that assumes that pointers are "thin" - only 32 or 64 bits.

This is whay HardBound and SoftBound are not fat pointers. Instead,
they place the bounds metadata elsewhere - someplace not adjacent to the
pointer itself.

Andy "Krazy" Glew

unread,

Aug 7, 2009, 9:59:34 PM8/7/09

to Andrew Reilly

Andrew Reilly wrote:

> On Thu, 06 Aug 2009 17:03:53 +0100, nmm1 wrote:
>
>> typedef struct {int A[10]; double B[5]; int C[10];} P; P *Q =
>> malloc(sizeof(P));
>> memset(Q->A,0,sizeof(P));
>>

>> etc.

>>
>> You might like to say that is clearly illegal, or does not occur in real
>> code, but you would be wrong on both counts. In particular, that
>> construction is a CRITICAL part of the design of the INTERFACE of the X
>> Windowing System (and, if I recall, the Microsoft one, too). So it
>> CANNOT be changed without rewriting all code that uses them. It is also
>> used in a fair number of other codes.

I started off accepting Nick's statement, but I have done some X
programming myself. I have seen dirty code, but not in the APIs themselves.

So, last night I re-read http://www.x.org/docs/X11/xlib.pdf. I did not
find any such code.

I also emailed one of the original and still active authors of X.
Briefly, he is not aware of any such dirty code in X APIs. (With the
possible exception of Xt.)

There were a few memory management tricks that might result in wider
than structly necessary bounds, but nothing that would produce false
positives.

I hope that he will give me permission to post his full reply.

But, in the meantime:

Nick, can you please provide accurate pointers to the X Windowing System
APIs and/or code that contains the code patterns that you think might
produce false positives in HardBound or SoftBound?

Andy "Krazy" Glew

unread,

Aug 7, 2009, 10:20:52 PM8/7/09

to nm...@cam.ac.uk

nm...@cam.ac.uk wrote:
> Andy \"Krazy\" Glew <"ag-news AT patten DASH my-last-name DOT net"> wrote:
>>
>> I will explain to you one last time that the problem is not whether a
>> compiler can generate checks for all buffer overflows.
>
> That is the first time, as well. As least you have now admitted
> that the technique doesn't do a proper job of checking even a single
> category of errors.

No, Nick.

What I have admitted is that there are errors that the technique doesn't
catch.

What I have also determined is the limits of accuracy in your
pronouncements. Usually tolerable, because you occasionally have useful
insights, and are often quite amusing in your misapprehensions.

>> What is the fraction of false positives (code that is not buggy, but
>> that is incorrectly indicated as having a bug)?
>
> No, it isn't. Again, you are using a political and not a technical
> measure. It is the probability of such positives times the cost of
> bypassing them or ignoring the problem.

Here you go again. This is the useful insight - one that I completely
concur with. So I can't completely flame you.

> For example, if you get ONE such problem that causes failure and is
> infeasible to bypass, it renders the whole mechanism infeasible for
> any program that uses the offending technique.

The key is that it is easy to bypass the problem. If recoding is
required, it is a simple recoding. If recoding is impossible, compiler
switches.

> Look, Andy, I am prepared to accept that you know a hundred times
> as much about hardware design as I do. In this area, the ratio is
> likely to be reversed - I have spent most of my working life in
> and around it.

You flatter me by saying that I know a lot about hardware.

In the computer hardware community, I am usually considered the software
guy. I worked on both hardware and software to accomplish such checking
in the 1980s at Gould. Of course, back then I was the software guy who
happened to know a lot about hardware. I have continued to follow the
field, and pursue ideas. One of my coworkers, who knows me back to the
1980s, says that this is the greatest area of long-term interest I have
had other than out-of-order processors. (By the way, the topics are
related: both are dataflow analyses.)

Nick: I may not know anything as well as you do. But if there is any
area that I know well, it is this area.

Andy "Krazy" Glew

unread,

Aug 7, 2009, 10:22:28 PM8/7/09

to nm...@cam.ac.uk

So, should you stop using wood, and build everything with metal?

Or should you employ a wood preservative and filler that stops up the holes?

Andy "Krazy" Glew

unread,

Aug 7, 2009, 10:34:19 PM8/7/09

to Wilco Dijkstra

Wilco Dijkstra wrote:
> "Andy "Krazy" Glew" <ag-...@patten-glew.net> wrote in message news:4A7BB524...@patten-glew.net...
>> Wilco Dijkstra wrote:
>>> Why waste hardware effort on something that is a essentially a management
>>> issue? Companies are often unwilling to pay the real cost of correct and safe
>>> software.
>> Because security is all about finding the points of maximum leverage, and fixing those.
>>
>> Instead of fixing millions of programmers
>> fix the far smaller number of compiler writers
>> by deploying a feature in the chip
>> that all of them execute code on.
>

> The compiler is indeed the right place to improve things. But C/C++ compilers
> are already overly complex beasts, and codegeneration bugs are not at all rare.
> Adding additional complex stuff on top of it cannot be good.

I think this is a good argument for doing such checks in hardware, like
HardBound, not in software, like SoftBound.

Compiler changes are necessary to indicate to hardware the bounds of the
object. But, once that is done, hardware can perform the dataflow
propagation and the checks.

--

From time to time people propose that there is no need for hardware
support for security at the OS level - i.e. no need for kernel mode (x86
ring 0) versus user mode (x86 ring 3).

After all, all of this can be done simply by the software, right?

However, after bad experience with the Burroughs machines (software
bugs, in the assembler or loader, as I recall, allowing user code to
become "kernel"), Gligor wrote the policy recommendation that all
hardware should have at least kernel and user mode, and a limied number
of ways to go from one to the other that can be rigorously inspected.

There was a brief resurgence in interest in software only security on
Java based cell phones. One of my Motorola contacts has assured me
that this turned out to be a bad idea. Fortunately, now most cell phone
processors are powerful enough to have user and kernel modes, or the
equivalent.

> [a good discussion on compilers, standards, and Gresham's Law]

Andy "Krazy" Glew

unread,

Aug 7, 2009, 10:35:58 PM8/7/09

to Jean-Marc Bourguet

I'm waiting to see it.

nm...@cam.ac.uk

unread,

Aug 8, 2009, 3:22:27 AM8/8/09

to

In article <4A7CDC06...@patten-glew.net>,

Andy \"Krazy\" Glew <"ag-news AT patten DASH my-last-name DOT net"> wrote:
>

>I started off accepting Nick's statement, but I have done some X
>programming myself. I have seen dirty code, but not in the APIs themselves.
>
>So, last night I re-read http://www.x.org/docs/X11/xlib.pdf. I did not
>find any such code.
>
>I also emailed one of the original and still active authors of X.
>Briefly, he is not aware of any such dirty code in X APIs. (With the
>possible exception of Xt.)

Aargh! Yes, OF COURSE, I was referring to Xt and its related stuff.
In X11R3, most of the standard clients were written using only Xlib
(because Xt was junk), but nowadays virtually EVERY client is based
on Xt and some widget set that uses similar methods. You will find
that was the stated intention of the X Windowing System from the very
start of X11, if you look.

Writing a raw Xlib client and getting all of the arcane operations
right in order to interoperate correctly with other clients, window
managers and so on is a nightmare task. And the X11 documentation
says precisely that. Back in the days of X11R3/X11R4 the number of
incompatibilities between clients and each other and clients and
window managers was legion.

I said the X Windowing System, and I meant the X Windowing System.

Regards,
Nick Maclaren.

nm...@cam.ac.uk

unread,

Aug 8, 2009, 3:28:47 AM8/8/09

to

In article <4A7CE164...@patten-glew.net>,

Andy \"Krazy\" Glew <"ag-news AT patten DASH my-last-name DOT net"> wrote:
>>

>> What is potentially more serious is an attack of woodworm. When a
>> woodpecker causes one place to collapse, the parts holding that up
>> then fail under the strain, causing the parts that support the to
>> follow suit, ....
>>
>> And we have that, too - but it's not yet caused catastrophic collapse.
>

>So, should you stop using wood, and build everything with metal?

No - that rusts.

>Or should you employ a wood preservative and filler that stops up the holes?

If you had ever dealt with serious attacks of woodworm, you would
know that the former doesn't do anything to restore the integrity
of wood that has already been attacked, or even protect the wood
unless you can treat every part of it (including invisible surfaces).
And you would know that stopping up the holess and/or applying a coat
paint merely hides the problem.

Regards,
Nick Maclaren.

nm...@cam.ac.uk

unread,

Aug 8, 2009, 3:32:32 AM8/8/09

to

In article <4A7CE48E...@patten-glew.net>,

Andy \"Krazy\" Glew <"ag-news AT patten DASH my-last-name DOT net"> wrote:

>Jean-Marc Bourguet wrote:
>> nm...@cam.ac.uk writes:
>>
>>> In article <0CSem.234715$bA.4...@newsfe16.ams2>,
>>
>>> If anyone wants to learn more about this mess, my Objects document
>>> describes the situation with many references to the C and C++ standards.
>>
>> Is it somewhere publicly accessible?
>
>I'm waiting to see it.

I sent it to you, but you play so many damn-fool games with your
Email address that you may not have seen it. I didn't get a bounce
message, so it went to ONE of your addresses. Please send me a
message with a decent address on and I will resend.

Oh, and to answer your last question on X, try XtSetValues and
XtGetVales. There are others, but those are the simplest.

Regards,
Nick Maclaren.

ken...@cix.compulink.co.uk

unread,

Aug 8, 2009, 1:52:15 PM8/8/09

to

In article <h5j9ff$c1v$1...@smaug.linux.pwf.cam.ac.uk>, nm...@cam.ac.uk ()
wrote:

> or even protect the wood
> unless you can treat every part of it

The question with woodworm is whether or not you still have active
beetle larva. Treating with a strong contact insecticide should kill any
beatles and emerging larva before they can pupate. Still comparing
architecture to software is a useless analogy. The margin of safety in
most construction is such that it would take years of infestation, rot
or other environmental factors to produce structural collapse where one
bug can bring a program down.

Note that applies to properly designed structures. Something like the
Tay Bridge was the result of poor assessment of structural loads. The
wind speeds at the sight were never measured before it was designed and
shoddy construction, though the combination of cast and wrought iron in
bowstring girders was always a bad idea.

Ken Young

Andy "Krazy" Glew

unread,

Aug 8, 2009, 2:03:16 PM8/8/09

to nm...@cam.ac.uk

nm...@cam.ac.uk wrote:
>> I also emailed one of the original and still active authors of X.
>> Briefly, he is not aware of any such dirty code in X APIs. (With the
>> possible exception of Xt.)
>
> Aargh! Yes, OF COURSE, I was referring to Xt and its related stuff.
> In X11R3, most of the standard clients were written using only Xlib
> (because Xt was junk), but nowadays virtually EVERY client is based
> on Xt and some widget set that uses similar methods. You will find
> that was the stated intention of the X Windowing System from the very
> start of X11, if you look.

Again:

My X author contact only said that Xt possibly had such issues. He
could not point to any particulars. Can you?

Besides, he said that Xt is junk (as you did), and effectively dead. No
new apps should be coded with Xt. Most old apps coded with Xt should
fade awayinto obsolesecence, if they have not already done so.

Widget sets other than Xt do NOT, accordxing to my contact, use similar,
unsafe, methods.

So, again, I ask you, Nick: please show data. Particulars.

> I said the X Windowing System, and I meant the X Windowing System.

The term my friend used, referring the Xt, was "Ding, dong, the witch is
dead."

But then again, he is on to XX R6. And beyond.

Andy "Krazy" Glew

unread,

Aug 8, 2009, 2:08:56 PM8/8/09

to nm...@cam.ac.uk

nm...@cam.ac.uk wrote:
> I sent it to you, but you play so many damn-fool games with your
> Email address that you may not have seen it. I didn't get a bounce
> message, so it went to ONE of your addresses. Please send me a
> message with a decent address on and I will resend.

Gmail doesn't see your email anywhere, even in the spam folder.

What's so hard about figuring out

ag-news AT patten DASH my-last-name DOT net

I'll send you more specifics directly.

Andy "Krazy" Glew

unread,

Aug 8, 2009, 2:37:06 PM8/8/09

to nm...@cam.ac.uk

nm...@cam.ac.uk wrote:
> Oh, and to answer your last question on X, try XtSetValues and
> XtGetVales. There are others, but those are the simplest.

I have looked at them. The APIs for these functions do not seem to be a
problem. Nor do any of first several hundred examples I found by googling.

Of course, my answer is only based on inspection. I have not actually
compiled the code using compilers such as those for HardBound and
SoftBound. Unfortunately, I do not have access to such tools at the
moment. I wish that I did.

However, since my eyes do not see the problem, I would appreciate you
showing more detail.
a) How do the API *mandate* code that will have false positives.
Or, since I am more and more confident that you are just plain wrong in
that regard
b) what common code bases (e.g. Open Source tools using X, so that I
can look at the code) use Xt, or other such C libraries, in ways that
you think HardBound and SoftBound

More and more, however, I suspect that you have some basic
misunderstanding of what HardBound and SoftBound do. I suspect that
your misunderstanding is based on your having encountered similar tools
in the past, in particular similar research, done in your neighbourhood.
(Hmm, Jones and Kelly were at Imperial, and my understanding is that
Cambridge looks down on Imperial. But there sure are a lot of cam.ac.uk
sites that have such papers.)

nm...@cam.ac.uk

unread,

Aug 9, 2009, 4:25:51 AM8/9/09

to

In article <dsidna9Q2qrSJuDX...@giganews.com>,

<ken...@cix.compulink.co.uk> wrote:
>
>> or even protect the wood
>> unless you can treat every part of it
>
> The question with woodworm is whether or not you still have active
>beetle larva. Treating with a strong contact insecticide should kill any
>beatles and emerging larva before they can pupate.

You have missed my point. Tools and (most especially) structural
wood in housing have exposed surfaces, surfaces embedded in other
materials, and inaccessible surfaces. The big problem with treatment
is ensuring that the last ones get treated, as the beetles can get
places that brushes and sprays can't. And it was THAT aspect that
is analogous to security tools for programs.

>Still comparing architecture to software is a useless analogy.

It wasn't mine :-)

The margin of safety in
>most construction is such that it would take years of infestation, rot
>or other environmental factors to produce structural collapse where one
>bug can bring a program down.

Don't bet on it. Before mandatory treatment of structural timber in
the UK, failures within a year or two were not rare. But I agree that
the analogy with programs is weak.

Regards,
Nick Maclaren.

nm...@cam.ac.uk

unread,

Aug 9, 2009, 4:55:48 AM8/9/09

to

In article <4A7DC5D2...@patten-glew.net>,

Andy \"Krazy\" Glew <"ag-news AT patten DASH my-last-name DOT net"> wrote:
>
>> Oh, and to answer your last question on X, try XtSetValues and
>> XtGetVales. There are others, but those are the simplest.
>
>I have looked at them. The APIs for these functions do not seem to be a
>problem. Nor do any of first several hundred examples I found by googling.

What they do is to supply a pointer to a generic structure type and
an offset into a specific structure type, which is then used to
access the object pointed to by the former as if it were the latter.

>More and more, however, I suspect that you have some basic
>misunderstanding of what HardBound and SoftBound do. I suspect that
>your misunderstanding is based on your having encountered similar tools
>in the past, in particular similar research, done in your neighbourhood.
> (Hmm, Jones and Kelly were at Imperial, and my understanding is that
>Cambridge looks down on Imperial. But there sure are a lot of cam.ac.uk
>sites that have such papers.)

Sigh. Look, virtually EVERY modern paper has just been republishing
work that was done in the 1970s. NONE of this is fundamentally new
(nor actually very difficult). What those things do is irrelevant,
because the problem is not primarily doing the checking, but in
deciding which side of the boundary an event lies.

A prerequisite for ANY decent checking is a specification where you
can distinguish illegal from legal use. Do you REALLY disagree
with that?

If HardBound and SoftBound are checking against some language that
isn't standard C, then where is a specification of the language
they ARE checking against?

I wrote that damn Objects document on the basis of bitter experience,
those problems occur widely in real code, INCLUDING in well-written
code. Perhaps half of the autoconfigure/preprocessor crap in open
source packages is there because of them, and a huge proportion of
bugs reported in parallel environments are neither the user's nor
the compiler's bug.

I have sent you another copy.

Regards,
Nick Maclaren.

nm...@cam.ac.uk

unread,

Aug 9, 2009, 7:44:08 AM8/9/09

to

In article <4A7DBDE4...@patten-glew.net>,

Andy \"Krazy\" Glew <"ag-news AT patten DASH my-last-name DOT net"> wrote:
>

>Again:
>
>My X author contact only said that Xt possibly had such issues. He
>could not point to any particulars. Can you?

I have done.

>Besides, he said that Xt is junk (as you did), and effectively dead. No
> new apps should be coded with Xt. Most old apps coded with Xt should
>fade awayinto obsolesecence, if they have not already done so.

Oh, my God! Upon checking, that has started. It explains why I have
been seeing so many of the failure modes that were so common under
X11R2/R3/R4 start to reappear :-(

If you are interested in discussing the inherent design defects of
the ******* X model, I can do so. While that is on-group (my points
are primarily architectural, and are well-known), I am NOT interested
in denial without evidence.

>Widget sets other than Xt do NOT, accordxing to my contact, use similar,
>unsafe, methods.

Probably not. Even by computer scientist standards, Xt was an
incompetent piece of design.

>So, again, I ask you, Nick: please show data. Particulars.

Well, it is you who are made the original claims, but have failed to
back them up with evidence. Given that I have heard precisely that
marketing spiel a couple of dozen times before, and it has always
fizzled out before delivery, and I have 20+ years of investigating
practical C programs in detail, I am disinclined to believe bare
claims of miracles.

If you want me to take your hype seriously, let's see some decent
evidence, and not just marketing. No, I am not interested in
performance figures - as I have said, that has been a solved problem
since the 1970s. Boring. What you need to provide is data like
the following:

For a reasonable range of realistic applications (gcc has been
mentioned):

How many demonstrable bugs SoftBound found, with details of
what the bugs were, so other people can check they were real.
How many false positives were encountered, with details of
how they were identified as such.
How many changes were made to the original source, with
precise reasons and details (this is critical).
How many programs were abandoned as hopeless, with details
of why.

My original statement was that the C (and hence C++) standards are
too inconsistent and ambiguous to enable practically worthwhile
checking on commercial codes. That is justified in my Objects
document.

As far as specific evidence is concerned, I am not going to repeat
the investigations I did over 20+ years. Don't be silly.

Regards,
Nick Maclaren.

Andy "Krazy" Glew

unread,

Aug 9, 2009, 8:29:28 AM8/9/09

to nm...@cam.ac.uk

nm...@cam.ac.uk wrote:
> In article <4A7DBDE4...@patten-glew.net>,
> Andy \"Krazy\" Glew <"ag-news AT patten DASH my-last-name DOT net"> wrote:
> Well, it is you who are made the original claims, but have failed to
> back them up with evidence. Given that I have heard precisely that
> marketing spiel a couple of dozen times before, and it has always
> fizzled out before delivery, and I have 20+ years of investigating
> practical C programs in detail, I am disinclined to believe bare
> claims of miracles.
>
> If you want me to take your hype seriously, let's see some decent
> evidence, and not just marketing. No, I am not interested in
> performance figures - as I have said, that has been a solved problem
> since the 1970s. Boring. What you need to provide is data like
> the following:
>
> For a reasonable range of realistic applications (gcc has been
> mentioned):
>
> How many demonstrable bugs SoftBound found, with details of
> what the bugs were, so other people can check they were real.
> How many false positives were encountered, with details of
> how they were identified as such.
> How many changes were made to the original source, with
> precise reasons and details (this is critical).
> How many programs were abandoned as hopeless, with details
> of why.

Nick:

I am only referring people to the published HardBound and SoftBound papers.

I am not reporting any of my own work. If I were working or have worked
on this, I could not report to comp.arch.

I agree with you: the true tests will be as you describe. With the goal
of no source code changes, except to fix real bugs, no false positives,
and no "abandonment".

I appreciate you pointing out areas that you believe will break the
HardBound and SoftBound approach. I hope and expect that people doing
work in this area will incorporate these programs in their test suites.

However, since I don't have resources to do such evaluations, at the
moment I can only talk / discuss on comp.arch. When it is my
understanding that the problems you posit will not occur, I say so.
When you refer me to an API that you say intrinsically has the problems,
and I do not agree, I say so.

By the way, I have just this morning received your Objects document, and
will read it on the airplane I am about to get onto. I am posting using
the last net.connectivity I will have today.

Andy "Krazy" Glew

unread,

Aug 9, 2009, 8:39:24 AM8/9/09

to nm...@cam.ac.uk

nm...@cam.ac.uk wrote:
> In article <4A7DC5D2...@patten-glew.net>,
> Andy \"Krazy\" Glew <"ag-news AT patten DASH my-last-name DOT net"> wrote:
>>> Oh, and to answer your last question on X, try XtSetValues and
>>> XtGetVales. There are others, but those are the simplest.
>> I have looked at them. The APIs for these functions do not seem to be a
>> problem. Nor do any of first several hundred examples I found by googling.
>
> What they do is to supply a pointer to a generic structure type and
> an offset into a specific structure type, which is then used to
> access the object pointed to by the former as if it were the latter.

And I don't see the problem with that.

E.g.

struct Specific_Structure_Type* p \
= malloc(sizeof Specific_Structure_Type));

Function( (Generic_Structure_Type*)p, offset);

Function( Generic_Structure_Type* p, int offset) {
switch(type_code) {
case SST:
Specific_Structure_Type* p_sst = (Specific_Structure_Type*)p;
...
}
}

Works fine. As does your example *(field*)((char*)p+offset).

If you use the former, where the Specific_Structure_Type is explicit
within the Function, you get better bounds checking than if you treat
the struct as an array of bytes.

TANSTAAFL.

Robert Myers

unread,

Aug 9, 2009, 6:39:17 PM8/9/09

to

On Aug 9, 4:55 am, n...@cam.ac.uk wrote:
> In article <4A7DC5D2.5000...@patten-glew.net>,

> Andy \"Krazy\" Glew <"ag-news AT patten DASH my-last-name DOT net"> wrote:

> Sigh. Look, virtually EVERY modern paper has just been republishing
> work that was done in the 1970s.

True though such a comment conceivably may be, I don't see how it
advances the discussion.

One related question is Andy's: "What's changed?" Well, lots of
things have changed--far too many to mention, and there is some kind
of plausibility test that your claim doesn't meet, frequently though
it may be made, and not just by you. That is to say, even *if* there
are no new ideas left, which I doubt, the old ideas must be
continually reexamined in the light of new realities, as you yourself
seem to be arguing by demanding specifics about the actual usefulness
of ideas.

If standard c can't be made acceptable, and it seems clear to me that
it can't, then some way must be found to build safer code, even when
the code is in the hands of relatively clueless individuals (structure
no stronger than its point of greatest weakness).

The objection that I have to what's on the table is that it seems like
a lot of work for a fairly narrow, if important, class of problems.
It would seem more worth the effort if some way were found (using
hardware support, as in virtual memory) to allocate privilege in a
more fine-grained way. A compilation unit has privileged access to
certain things (most notably some extent of memory and not others) and
it must either possess those privileges as granted by the os or
acquire them from (say) a calling procedure, as a child process does
in *nix. One can imagine even more fine-grained control, as in
treating every array as an object whose methods include ways of
acquiring and examining privilege. I'm sure that, as you could
readily answer, some version of these ideas must have been worked out
in zillions of different flavors. You need not require that all
existing code exploit such features or even require their use, only
create attractive opportunities for creating safe products and let the
competitive environment see to their uptake.

You could, for example, simply create safe class libraries. If they
really work, people will start using them. Or you could even create
an ISO standard that fed into some of the ISO standards that people
really pay attention to.

To harken back to the olden days, it was clear, in building time-
sharing systems, that some way must be found to protect the system
from users and the users from one another. No one would have imagined
a time-sharing system where such a thing was left to chance, and yet
that is the situation with the current Internet.

Movement will come when someone important (=microsoft, intel, or
possibly amd/ibm) sees a competitive advantage in materially changing
the current situation. Given the amount of trouble that dangerous
code causes, the *real* question is not how things might be changed,
but why no one that matters is sufficiently motivated to take action.

Robert.

nm...@cam.ac.uk

unread,

Aug 9, 2009, 9:01:14 PM8/9/09

to

In article <0ac6ed4e-97de-4382...@l31g2000vbp.googlegroups.com>,

Robert Myers <rbmye...@gmail.com> wrote:
>
>If standard c can't be made acceptable, and it seems clear to me that
>it can't, then some way must be found to build safer code, even when
>the code is in the hands of relatively clueless individuals (structure
>no stronger than its point of greatest weakness).

Fine. And we KNOW how to do that! Simply use a language with vastly
fewer "gotchas" and preferably better detection of common user errors.
All of the compiler technology is known, proven and usually efficient.

>The objection that I have to what's on the table is that it seems like
>a lot of work for a fairly narrow, if important, class of problems.
>It would seem more worth the effort if some way were found (using
>hardware support, as in virtual memory) to allocate privilege in a

>more fine-grained way. ...

That's been done. It worked, and worked well.

But it's STILL pointless without a language that makes such checking
possible and effective.

Regards,
Nick Maclaren.

Robert Myers

unread,

Aug 9, 2009, 10:31:31 PM8/9/09

to

On Aug 9, 9:01 pm, n...@cam.ac.uk wrote:

>
> But it's STILL pointless without a language that makes such checking
> possible and effective.
>

You know the old saw, "I don't know what language we will be using in
the next millennium, but it will be called FORTRAN (well, actually,
now Fortran, so even that prediction wasn't correct).

Why not endow C with objects that can be [more] secure? People will
find a way to use pointers recklessly, no matter what you do. At
least you give people the option of using programming methods that are
relatively secure. Then it's up to programmers, managers, and
customers to demand that they be used. It sounds more plausible to me
than chucking C, which I don't think is going to happen.

Robert.

Stephen Fuld

unread,

Aug 9, 2009, 11:06:35 PM8/9/09

to

Robert Myers wrote:

snip

> People will
> find a way to use pointers recklessly, no matter what you do.

I'm not sure this is true. If pointers are a distinct type, that
syntactically can't be cast to another type, and can only be compared
with with other pointers or to nil, with no arithmetic allowed on them,
that seems to pretty well eliminate most of the bad things you can do
with them. Of course, it wouldn't be C. But you certainly could have a
very expressive language that could solve the problems that C now solves.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Robert Myers

unread,

Aug 9, 2009, 11:39:29 PM8/9/09

to

On Aug 9, 11:06 pm, Stephen Fuld <SF...@alumni.cmu.edu.invalid> wrote:
> Robert Myers wrote:
>
> snip
>
> > People will
> > find a way to use pointers recklessly, no matter what you do.
>
> I'm not sure this is true. If pointers are a distinct type, that
> syntactically can't be cast to another type, and can only be compared
> with with other pointers or to nil, with no arithmetic allowed on them,
> that seems to pretty well eliminate most of the bad things you can do
> with them. Of course, it wouldn't be C. But you certainly could have a
> very expressive language that could solve the problems that C now solves.
>

You'll just drive programming into assembler, which is where the idea
came from to begin with. Much easier now that x86 is everywhere.

Robert.