I don't know how many passes the original Fortran compiler used, but
Bernard Hodson ("Computers Without Machine Code") claimed to have
written a single pass compiler. Typically the claimed difference between
Fortran and C/C++, was not that iFortran could do more checking at
compile time, but the languages assumptions, i.e., no modifiable
aliasing, meant that it did not have to do as much compile time (or run
time) checking if it could assume valid Fortran code.
The amount of runtime vs compile time checking has little if anything to
do with the number of passes. For early Fortran it would have had to do
more with the limited capabilities of the machine than the runttime vs
compile time checking. The limited memory of those mahines made it
necessary to meant that the analysis had use as little memory as
possible, which in turn dictated doing the analysis as a sequence of
well defined simple stages. Generally the simpler a language is the
easier it is to focus on error checking.
I don't know how of any C compiler that used a single pass, though
perhaps LCC comes close. It is just conceptually simpler to factor the
analysis into a few well defined stages. Conisde Richard O'Keefe's
discussion of the analysis of C9x (17 Oct. 1998, in comp.compilers)
"""
Frankly, I _wouldn't_ describe C9x/C++/Java identifiers using a
regular expression. Remember, there are EIGHT "translation phases" in
C9x:
1. Map source multibyte characters to the source character set.
This includes converting end of record to newline, and it
SPECIFICALLY INCLUDES CONVERTING non-basic characters to
UCNs. So you are allowed to have an <e-acute> character
in your source code, and it may even be represented in the
source file by a single 16#E9# byte, but subsequent phases
of translation will 'see' \u00E9 or possibly even
\u0065\u0301 <e,floating acute> instead.
The main consequence of this for your regular expression is
that if you want to recognise identifiers in SOURCE files,
you need to handle the full range of local multibyte codes
AS WELL AS universal character names. If your regular
expression processor is 8-bit-clean, you might be able to
get away with
letter = [a-zA-Z_] | [\0x80-\0xFF]+ | \u[0-9a-fA-Z]{4} | ...
ident = letter (letter | digit)*
1a. After this, trigraphs are replaced. (Yes, that means C9x
really has nine phases, not 8.)
2. \<newline> is spliced out.
3. the input is tokenized as a sequence of pp-tokens and white space
4. preprocessing is done, directives, macros, &c.
THIS PHASE MAY GENERATE NEW IDENTIIFERS, so foo(x)(y) may
actually _be_ an identifier even though it doesn't _look_
like one. (No, you can't generate new UCNs here.)
5. Characters are now converted from the source character set
to the execution character set.
6. Strings are pasted (narrow strings with narrow strings, wide
strings with wide strings). The effect of "x" L"y" and
L"x" "y" is not defined, which is a pity, because that was
a very nasty problem that they should have fixed.
7. Now pp-tokens are converted to tokens, and of course some
pp-tokens that look like identifiers are actually keywords.
White space including comments is finally discarded.
7a. The program is parsed. (Yes, that means there are really
ten phases, not 8.)
8. External references are resolved and everything is put into an
"image" suitable for execution in the target environment.
What this means is that if you want a tool to do something useful with
identifiers in C source files, you would have to be very very silly
not to do it by taking a freely available preprocessor (such as the
GNU one) and bolting your tool on the end.
"""