Norrish PhD thesis

Dennis Ritchie

ungelesen,

15.01.1999, 03:00:0015.01.99

an

On 2 December 1998, Michael Norrish mentioned (in article
with Message-ID yvsr9ui...@albatross.cl.cam.ac.uk) his
PhD thesis (in the form of a subsequent technical report) at

http://www.cl.cam.ac.uk/users/mn200/PhD/thesis-report.ps.gz

I downloaded it pretty soon after the announcement; it's still there,
however.

Entitled "C formalized in HOL", it's an analysis of C89 in formal
terms, and in particular gives a very good account of expression
semantics.
The ultimate goal of studies like this is to understand the possible
behaviors of programs written in conformance with some standard
in order to prove--or at least find reasons to believe--that they
conform
to the standard. One also hopes that this makes them more likely to
be correct, and create a verification system that ensures conforming
behavior on execution.

As various contributors to this group point out, C is a pragmatically
developed language and its roots stick out even in the existing
standard and the proposed one.

Norrish's semantics avoid treating several aspects of the language:
enumerations, type qualifiers, bit fields, unions (parts of the
type system); and switch and goto are omitted from the control-flow
semantics. The first group has features unique in detail to C;
the difficulties with the second, control-flow, has mainly to do with
issues not special to C. Some of the omissions are genuinely
inessential
(e.g. positing the renaming of function-scope static variables as new
uniquely named globals).

Norrish's work as presented here does somewhat less than he evidently
hoped for: he wanted to verify a BDD (binary decision diagram) program,
but didn't. The actual verification examples (factorial and strcpy)
are very simple. In the circumstances, though, they are't trivial.

Norrish does treat and account for issues that consume much space here:
sequence points, pointers, the possible implicit overlayment of objects
owing to casts of pointers to unsigned char *, nearly the whole lot
(including
++ and --). His work on expression evaluation semantics in C is much
more
complete than anything done before.

For comp.std.c, I'd say that C89 escapes--in Norrish's writing--without
many major blows, which I define as a confession that the reader (him)
is
unable to determine what in the world was meant, a giant hole that
needs to be filled in. Some palpable hits are there, like the question
of
whether the standard permits parallel evaluation of functions in
f()+g() .
There are plenty of other (often aside) comments about things hard to
take account of, but in general the approach is positive: it is written
in the spirit of trying to build bridges between the theoretical and the
practical minded, not burn them.

It's a worthwhile read.

Dennis Ritchie

Paul Eggert

ungelesen,

15.01.1999, 03:00:0015.01.99

an

Dennis Ritchie <d...@bell-labs.com> writes:

> http://www.cl.cam.ac.uk/users/mn200/PhD/thesis-report.ps.gz

>...Some palpable hits are there, like the question of

>whether the standard permits parallel evaluation of functions in
>f()+g().

Yes, and this has been a bone of contention in comp.std.c for quite
some time. I maintain that C90 permits it because it doesn't prohibit
it, but many people, including some members of the standards committee,
believe otherwise.

I don't have my copy of draft C9x handy, but I believe that the
official language in the f()+g() area hasn't changed in the latest
draft; instead, a non-normative appendix has been added that is
supposed to explain things. To my mind that appendix is inferior to
Norrish's work, and it would improve the quality of the standard if the
appendix were dropped and and Norrish's work referenced instead.

>It's a worthwhile read.

Yes it is. I read the previous version a couple of years ago, and was
impressed even then.

Francis Glassborow

ungelesen,

16.01.1999, 03:00:0016.01.99

an

In article <77pb4s$iit$1...@shade.twinsun.com>, Paul Eggert
<egg...@twinsun.com> writes

>>...Some palpable hits are there, like the question of
>>whether the standard permits parallel evaluation of functions in
>>f()+g().
>
>Yes, and this has been a bone of contention in comp.std.c for quite
>some time. I maintain that C90 permits it because it doesn't prohibit
>it, but many people, including some members of the standards committee,
>believe otherwise.

I thought this had been the subject of a defect report (I raised the
issue in the UK C Standards Panel in 1992) and that response was that
they must be evaluated sequentially (though in any order.)

Francis Glassborow Chair of Association of C & C++ Users
64 Southfield Rd
Oxford OX4 1PA +44(0)1865 246490
All opinions are mine and do not represent those of any organisation

Michael Norrish

ungelesen,

19.01.1999, 03:00:0019.01.99

an

egg...@twinsun.com (Paul Eggert) writes:

> Dennis Ritchie <d...@bell-labs.com> writes:

> >It's a worthwhile read.

> Yes it is. I read the previous version a couple of years ago, and
> was impressed even then.

Thank you (both) for the kind words.

I'm just back from a longish post-PhD holiday in Australasia, and am
now employed on an unrelated project, but I am happy to discuss the
thesis with anyone interested. The relevant files associated with the
mechanisation in the theorem-prover are also available if anyone would
like to see them.

My e-mail address should be in the headers of this message, but just
in case, it is
Michael...@cl.cam.ac.uk

Regards,
Michael.

Clive D.W. Feather

ungelesen,

01.02.1999, 03:00:0001.02.99

an

In article <77pb4s$iit$1...@shade.twinsun.com>, Paul Eggert
<egg...@twinsun.com> writes

>I don't have my copy of draft C9x handy, but I believe that the
>official language in the f()+g() area hasn't changed in the latest
>draft; instead, a non-normative appendix has been added that is
>supposed to explain things. To my mind that appendix is inferior to
>Norrish's work, and it would improve the quality of the standard if the
>appendix were dropped and and Norrish's work referenced instead.

Having now read Norrish's thesis, I disagree.

Firstly, a side note: the annex in question has been dropped from the
current draft.

Annex D was originally planned as normative text. However, there was
some disquiet in WG14 because of the lack of experience with it, and
this is why it was first made non-normative and then dropped.

There is a project plan to work on a normative amendment which would
restore a version of the annex that people are comfortable with.

I would object to references to the Norrish work for several reasons:
- it is, by its own admission, incomplete;
- it is unreadable and unusable by the average C programmer (who was,
of course, not the target audience);
- it is not a standards document published by an appropriate body (this
means that an ISO standard cannot refer to it);
- I believe it contains errors.

As an example of the last, consider the construct:

(x = 1 && 1) + (x = 0 || 0)

My reading of Norrish is that he does not allow the two operands to be
evaluated at the same time, and so there is definitely a sequence point
(which he expresses as "emptying the bag of side-effects") between the
two assignments to a. Thus he claims the above is well-defined, though
the value assigned to x could be either 0 or 1. I would claim that the
above code is undefined.

Norrish also loses the sequence points when the tests are the wrong way
round. Thus he *will* state that:

(x = 0 && 1) + (x = 1 || 0)

is undefined because he analyses it exactly as if it were:

(x = 0) + (x = 1)

Paul Eggert

ungelesen,

03.02.1999, 03:00:0003.02.99

an

"Clive D.W. Feather" <cl...@on-the-train.demon.co.uk> writes:

>I would object to references to the Norrish work for several reasons:
>- it is, by its own admission, incomplete;

Obviously; it's just a thesis. But it's a reasonable framework and
could be completed.

>- it is unreadable and unusable by the average C programmer (who was,
> of course, not the target audience);

But CD2 Appendix D is also unreadable and unusable by the average C
programmer, and I'd guess that any formal model of evaluation would
probably have this problem. That's OK, since the audience for Appendix
D is not average C programmers; it's implementers and language lawyers.

>- it is not a standards document published by an appropriate body (this
> means that an ISO standard cannot refer to it);

I think you could easily talk Norrish into incorporating the text of
his thesis, suitably modified, into future editions of the standard.
If not the exact text, you could use his framework.

>- I believe it contains errors.

Of course it contains errors! This is partly because the text of the
original C standard was ambiguous, and these ambiguities have caused
many problems of interpretation; this is why a formal method is
needed. Norrish sometimes guesses wrong about the committee's intent,
but that's entirely to be expected. Where the committee disagrees with
Norrish's disambiguation of the spec, it can correct his model.

The issue here is: what method should be used for the formal semantics of C?
Norrish's approach is based on many years of experience by programming
language theorists; it's a much safer technical choice than the untested
approach taken in CD2 Appendix D.

Michael Norrish

ungelesen,

12.02.1999, 03:00:0012.02.99

an

"Clive D.W. Feather" <cl...@on-the-train.demon.co.uk> writes:

> In article <77pb4s$iit$1...@shade.twinsun.com>, Paul Eggert
> <egg...@twinsun.com> writes

>> I don't have my copy of draft C9x handy, but I believe that the
>> official language in the f()+g() area hasn't changed in the latest
>> draft; instead, a non-normative appendix has been added that is
>> supposed to explain things. To my mind that appendix is inferior
>> to Norrish's work, and it would improve the quality of the standard
>> if the appendix were dropped and and Norrish's work referenced
>> instead.

> Having now read Norrish's thesis, I disagree.

> Firstly, a side note: the annex in question has been dropped from the
> current draft.

> Annex D was originally planned as normative text. However, there was
> some disquiet in WG14 because of the lack of experience with it, and
> this is why it was first made non-normative and then dropped.

This is a shame. I think any sort of formal treatment in the standard
would be something to be welcomed.

> There is a project plan to work on a normative amendment which would
> restore a version of the annex that people are comfortable with.

I hope this comes to pass!

> I would object to references to the Norrish work for several reasons:

[...]

> - I believe it contains errors.

> As an example of the last, consider the construct:

> (x = 1 && 1) + (x = 0 || 0)

> My reading of Norrish is that he does not allow the two operands to
> be evaluated at the same time, and so there is definitely a sequence
> point (which he expresses as "emptying the bag of side-effects")
> between the two assignments to a. Thus he claims the above is
> well-defined, though the value assigned to x could be either 0 or
> 1. I would claim that the above code is undefined.

You haven't read the specification of my model correctly. This code
is definitely undefined because there is an execution sequence that
sees it modify the x object twice before reaching a sequence point.
When the (x = 1) is evaluated, it puts a side effect into the bag of
pending side effects. Let's assume that it is applied immediately.
(This is not required but possible.)

The fact that there is a sequence point because of the && allows for
an execution path where the side effect is forgotten about, but my
semantics also allows for the next part of the execution to switch
across to the x = 0. This will then cause undefined behaviour as a
side effect will be first made pending, and then eventually will
attempt to update an already-updated object.

> Norrish also loses the sequence points when the tests are the wrong way
> round. Thus he *will* state that:

> (x = 0 && 1) + (x = 1 || 0)

> is undefined

The nature of the test caused by the logical operators is completely
irrelevant. Both of the expresions you give as examples will be
trapped as undefined in my semantics.

Regards,
Michael.