I suggest people interested in this topic ftp the Plan 9 paper set
from research.att.com. Ken Thompson's paper on his new compiler
is quite impressive and is something sobering to think about in this age
of all-the-rage compilers which require large virtual address spaces to
produce code which isn't remotely as good as that produced by Ken's compiler,
in over twice the time required by Ken's compiler.
The entire compiler and linker system is about 19,000 lines of code for
the MIPS architecture and about 16,000 for the 68K.
Once again, we get a too-badly-needed reminder that quantity is utterly
unrelated to quality.
-Mike
>I suggest people interested in this topic ftp the Plan 9 paper set
>from research.att.com. Ken Thompson's paper on his new compiler
>is quite impressive and is something sobering to think about in this age
>of all-the-rage compilers which require large virtual address spaces to
>produce code which isn't remotely as good as that produced by Ken's compiler,
>in over twice the time required by Ken's compiler.
>Once again, we get a too-badly-needed reminder that quantity is utterly
>unrelated to quality.
I've read this paper and I've found it very interesting. For those who
havent, here's my recollection of where the Big Wins come from:
Everything is done in one pass. That is, cpp, the parser and the code
generator are all bundled into one process. The compiler emits an object file
which is then loaded by a special linker. The big win here is that the compiler
does not keep writing and reading files from /tmp. In contrast, here's what
happens on my sun when I compile a hello-world program:
; cat>a.c
main(){printf("hello,world\n");}
; cc -O -v a.c
/lib/cpp -undef -Dunix -Dsun -Dsparc a.c >/tmp/cpp.29354.0.i
/lib/ccom - -XO2 /tmp/ccom.29354.2.ir </tmp/cpp.29354.0.i >/tmp/ccom.29354.1.s
rm /tmp/cpp.29354.0.i
/usr/lib/iropt -O2 -o /tmp/iropt.29354.3.ir /tmp/ccom.29354.2.ir
rm /tmp/ccom.29354.2.ir
/usr/lib/cg /tmp/iropt.29354.3.ir >/tmp/cg.29354.4.s
rm /tmp/iropt.29354.3.ir
/bin/as -o a.o -O2 /tmp/ccom.29354.1.s /tmp/cg.29354.4.s
rm /tmp/ccom.29354.1.s
/bin/ld -dc -dp -e start -X -o a.out /usr/lib/crt0.o a.o -lc
rm /tmp/cg.29354.4.s
rm a.o
;
This is awful! There are (count em!) 6 stages in the compilation process,
with 5 of those stages talking to /tmp. No wonder a compiler like this is
i/o bound..
Ken's compiler does not, I think, attempt what I term "herioc optimizations"
e.g., I bet it does not fold common subexpressions, and so on. Someone correct
me if I'm wrong, but I thought it was Ken's view that the programmer is largely
responsible for optimizations that can be performed by source-level
transformations. Therefore, Ken's compiler probably spends much less time in
the optimization phase than other compilers do. Also, I think it would be
possible to construct example C programs which, say, the IBM RIOS compiler
generates great code for, but for which Ken's compiler bombs.
I also recommend ftp'ing this paper. I don't think Ken's work is the last
word in compilers, merely a testament to sound programming principles, and
to the fact that current Unix vendors' compilers are a crock.
--
Byron Rakitzis
by...@archone.tamu.edu
bj
In article <1991Aug10.1...@walter.bellcore.com>
m...@gizmo.bellcore.com () writes:
mo> I suggest people interested in this topic ftp the Plan 9 paper set
mo> from research.att.com. Ken Thompson's paper on his new compiler
mo> is quite impressive [ ... ]
Now, now, let's not get carried away. He has done a clone of Turbo
Pascal, or of the small/fast Pascal compiler from ETH. It is good
implementation architecture, yes, but it shines only in comparison with
the awful things we usually get and accept.
mo> Once again, we get a too-badly-needed reminder that quantity is
mo> utterly unrelated to quality.
But quantity is related to sales to the incompetent; paraphrasing a well
known proverb "nobody has ever gone bankrupt by underestimating the
competence of America's MIS managers" :-).
In a collection of Dijkstra's papers he tells of a conference on
compiler writing, where each compiler writer that got to speak before
him proudly announced the number of source lines for their fortran,
cobol, pl/1 compilers, in the many dozens of thousand lines region, with
an increasingly excited and cheering audience. He got to talk eventually
on his own work and he made a very embarassed and coldly received talk
because he had to admit that his Algol 60 compiler was only 2,500 lines
long :-).
byron> I've read this paper and I've found it very interesting.
If you found this very interesting, your eyes will pop when you read
about TWIG, or about the ETH small Pascal compiler, or about many other
little known feats.
byron> For those who havent, here's my recollection of where the Big
byron> Wins come from:
byron> Everything is done in one pass. That is, cpp, the parser and the
byron> code generator are all bundled into one process. The compiler
byron> emits an object file which is then loaded by a special linker.
Turbo Pascal precisely...
byron> The big win here is that the compiler does not keep writing and
byron> reading files from /tmp.
No, not quite, that should be covered by a decent caching
implementation. The big wins are that the linker is special, and that
the compiler is well written; there were and maybe there still are
excellent reasons to put temporary files in /tmp, and if you don't want
to do it, the Sun and the GNU C compilers allow you to use pipes between
the compiler passes, with the -pipe option, which can be a big win on
multiprocessors (but not on monoprocessors).
byron> In contrast, here's what happens on my sun when I compile a
byron> hello-world program:
byron> ; cat>a.c
byron> main(){printf("hello,world\n");}
A Real C Programmer! :-)
byron> ; cc -O -v a.c
byron> /lib/cpp -undef -Dunix -Dsun -Dsparc a.c >/tmp/cpp.29354.0.i
byron> /lib/ccom - -XO2 /tmp/ccom.29354.2.ir </tmp/cpp.29354.0.i >/tmp/ccom.29354.1.s
byron> rm /tmp/cpp.29354.0.i
byron> /usr/lib/iropt -O2 -o /tmp/iropt.29354.3.ir /tmp/ccom.29354.2.ir
byron> rm /tmp/ccom.29354.2.ir
byron> /usr/lib/cg /tmp/iropt.29354.3.ir >/tmp/cg.29354.4.s
byron> rm /tmp/iropt.29354.3.ir
byron> /bin/as -o a.o -O2 /tmp/ccom.29354.1.s /tmp/cg.29354.4.s
byron> rm /tmp/ccom.29354.1.s
byron> /bin/ld -dc -dp -e start -X -o a.out /usr/lib/crt0.o a.o -lc
byron> rm /tmp/cg.29354.4.s
byron> rm a.o
byron> ;
byron> This is awful! There are (count em!) 6 stages in the compilation
byron> process, with 5 of those stages talking to /tmp. No wonder a
byron> compiler like this is i/o bound..
Well, well. This compiler structure is simply the traditional one, and
it worked very well on the PDP-11, which had limited address and data
space. Using separate processes and /tmp does not really add that much
overhead; if you use a nice size buffer cache, say 2MB, your compiles
can be entirely memory resident.
There is an excellent chance that if the cache is large enough a
compiler intermediary file is unlinked before it is written out, thus
eliminating IO transactions entirely. I have this happen often; in an
edit/compile/link cycle I often don't get a single IO transaction (more
recent Unixes that use a hardened file system unfortunately do write out
inodes immediately). This can also be easily observed under DOS with a
suitable cache management driver.
Sun also have implemented a mostly-RAM-resident filesystem precisely to
optimize /tmp performance, kind of flexible RAM disk. Notice that
however this is a seriously less preferable alternative to having a
large cache, as a large cache will also cache source files and include
files and library files wherever they are, not just /tmp files.
Then you are left with the overheads of copying things to/from the
buffer cache, but that is usually a minor problem, especially with a
decent pipe implementations. Context switching is possibly a bigger
problem. And even the copy from/to the cache can be obviated if less
inefficient IPC technology than pipes were used.
Moreover if you look at the times and IO counts of the sequence above,
you will notice that the real drag of the traditional Unix compiler
environment is the linker, not the compiler. Linking is a terrible
thing, and indeed most bullett compilers, such as Turbo Pascal, or
WATFIV, or Thompson's, gain most of their speed from a special linker,
rather than from the compiler.
byron> Ken's compiler does not, I think, attempt what I term "herioc
byron> optimizations" e.g., I bet it does not fold common
byron> subexpressions, and so on.
Triple cheers!
Many "optimizing" compilers have impossibly expensive optimization
phases; for example the SPARC and MIPS compilers have a global
optimization algorithm with a time complexity of O(n^3), and the GNU CC
compiler has a space complexity of O(n^2). These are well past any
reasonable cost/benefit ratio... Clearly anything well done compares
well against these monsters.
So, don't exxxxxagerate the importance of Thompson's new C compiler, or
of Plan 9 itself, or even Unix; none of these things has exactly been
ground breaking work, they are simply well engineered developments of
well known technology. It is amazing that merely good implementation
architecture is newsworthy, but yes, such is the sorry state of the art.
If you want to read something really interesting on the "small is
beautiful" technology front, do a literature search for TWIG, the
Princeton code generator tool, and its successors. I have even seen a
very interesting TWIG implementation is Standard ML, now that's
something amusing.
Incidentally, I don't know what is the legal status of TWIG, but surely
I would love it if the GNU CC code generator were replaced with TWIG, or
its successors, just for fun. I could even do it myself, in my copious
spare time :-).
A final remark: isn't it amazing that Unix was supposed to be the system
where the name of the game was composing little modules with a flexible
low overhead glue like pipes or cached files (really the same thing in
the original implementation), and now we have one of the designers do
precisely the opposite for performance reasons? This is about as ironic
as having a stdio library...
--
Piercarlo Grandi | ARPA: pcg%uk.ac...@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: p...@aber.ac.uk
On 10 Aug 91 16:16:13 GMT, by...@archone.tamu.edu (Byron Rakitzis) said:
In article <1991Aug10.1...@walter.bellcore.com>
m...@gizmo.bellcore.com () writes:
Turbo Pascal precisely...
Triple cheers!
--
Send compilers articles to comp...@iecc.cambridge.ma.us or
{ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.
In article <1991Aug10.1...@walter.bellcore.com>
m...@gizmo.bellcore.com () writes:
mo> I suggest people interested in this topic ftp the Plan 9 paper set
mo> from research.att.com. Ken Thompson's paper on his new compiler
mo> is quite impressive [ ... ]
pcg> On 10 Aug 91 16:16:13 GMT, by...@archone.tamu.edu (Byron Rakitzis) said:
byron> The big win here is that the compiler does not keep writing and
byron> reading files from /tmp.
pcg> No, not quite, that should be covered by a decent caching
pcg> implementation. The big wins are that the linker is special, and
pcg> that the compiler is well written; there were and maybe there still
pcg> are excellent reasons to put temporary files in /tmp, and if you
pcg> don't want to do it, the Sun and the GNU C compilers allow you to
pcg> use pipes between the compiler passes, with the -pipe option, which
pcg> can be a big win on multiprocessors (but not on monoprocessors).
byron> In contrast, here's what happens on my sun when I compile a
byron> hello-world program: [ ... ]
byron> This is awful! There are (count em!) 6 stages in the compilation
byron> process, with 5 of those stages talking to /tmp. No wonder a
byron> compiler like this is i/o bound..
pcg> Well, well. This compiler structure is simply the traditional one,
pcg> and it worked very well on the PDP-11, which had limited address
pcg> and data space. Using separate processes and /tmp does not really
pcg> add that much overhead; if you use a nice size buffer cache, say
pcg> 2MB, your compiles can be entirely memory resident.
pcg> [ ... caching can almost completely eliminate IO, thus making
pcg> a compiler into a purely CPU bound engine ... ]
I have spent some time doing some test compiles on the local, quite
unloaded machines (it's Sunday afternoon, and I am the only logged in
user).
I have tried the MIPS cc and the GNU cc on a DEC 5830, and the SUN cc on
a Sun3/260. I have compiled with these UMB scheme, a 13,000 C package
split in half a dozen sources, with three of them around 2,000 lines
long and a few others. I have chosen this source because it is very
portable and entirely self contained, and of a middling size, and it has
both simple and complex passages.
I have tried each compiler with various optimization levels; no
optimization, peephole optimization, as much global optimization as
possible. I have enabled various reporting options.
The numbers appended are those for the various phases of compiling one
specific source (architecture.c, 647 lines), the time taken by linking,
and the time taken by the entire compile and link process.
I will comment briefly the figures: compilation in a memory rich
environment is almost all CPU bound, as caching is very effective; even
sources do not need to be read in, and they will have been brought into
the cache by editing, or simply by running something 'wc' on them, as
happened to me.
The amount of system CPU time can however be amazing, as it can be
substantially larger than the user time, e.g. under DEC Ultrix.
Evidently DEC have not cared a lot about optimizing disk cache accesses.
Tsk Tsk... Well, DEC Ultrix is designed to sell more CPUs too, after
all; hasn't DEC been importing scores of IBM managers in the past years?
:-) :-)
It is also apprent that Multics style memory mapped files can be a big
win in such a situation (big savings on copying and on caching
overheads). SUN cc piping is incomplete (the assembler cannot be piped
into) and does not make much of a difference. Link time for such a
program that only needs a small number of libraries is small compared to
the compile time needed for the entire set of sources, but not small
compared to that for a single source, and thus likely to be very
perceivable during development.
I think that on a loaded machine the biggest difference is clever
working set management, and simpler code generation, ala TWIG.
Bah, here are the figures:
M I P S C O M P I L E R
% time cc -v *.c -lm -o scheme
/usr/lib/cmplrs/cc/cpp -v architecture.c -DLANGUAGE_C -DMIPSEL -I/usr/include2.0 > /tmp/ctmpa20363
0.2u 0.3s 0:00 95% 8+35k 0+15io 0pf+0w
/usr/lib/cmplrs/cc/ccom -Xv -EL -Xg0 -O1 -XS/tmp/ctmsta20363 < /tmp/ctmpa20363 > /tmp/ctmfa20363
1.1u 0.5s 0:02 83% 36+137k 0+11io 4pf+0w
/usr/lib/cmplrs/cc/ugen -v -G 8 -EL -g0 -O1 /tmp/ctmfa20363 -o /tmp/ctmca20363 -t /tmp/ctmsta20363 -temp /tmp/ctmgta20363
0.1u 0.8s 0:01 66% 39+41k 0+10io 5pf+0w
/usr/lib/cmplrs/cc/as1 -v -G 8 -p0 -EL -g0 -O1 /tmp/ctmca20363 -o architecture.o -t /tmp/ctmsta20363
0.4u 1.0s 0:01 73% 44+58k 0+6io 6pf+0w
/usr/lib/cmplrs/cc/ld -o scheme -G 8 -g0 -nocount /usr/lib/cmplrs/cc/crt0.o -count architecture.o bignum.o compiler.o complex.o debug.o eval.o fixnum.o io.o number.o object.o primitive.o rational.o real.o steering.o -lm -nocount -lc
0.4u 1.8s 0:02 89% 25+123k 0+53io 4pf+0w
51.1u 78.8s 2:52 75% 156+380k 0+904io 216pf+0w
% time cc -v -O1 *.c -lm -o scheme
/usr/lib/cmplrs/cc/cpp -v architecture.c -DLANGUAGE_C -DMIPSEL -I/usr/include2.0 > /tmp/ctmpa20507
0.3u 0.2s 0:00 92% 8+36k 0+17io 0pf+0w
/usr/lib/cmplrs/cc/ccom -Xv -EL -Xg0 -O1 -XS/tmp/ctmsta20507 < /tmp/ctmpa20507 > /tmp/ctmfa20507
1.0u 0.4s 0:01 88% 36+139k 0+11io 4pf+0w
/usr/lib/cmplrs/cc/ugen -v -G 8 -EL -g0 -O1 /tmp/ctmfa20507 -o /tmp/ctmca20507 -t /tmp/ctmsta20507 -temp /tmp/ctmgta20507
0.2u 0.7s 0:01 76% 38+39k 0+10io 5pf+0w
/usr/lib/cmplrs/cc/as1 -v -G 8 -p0 -EL -g0 -O1 /tmp/ctmca20507 -o architecture.o -t /tmp/ctmsta20507
0.4u 1.1s 0:02 64% 44+60k 0+5io 6pf+0w
/usr/lib/cmplrs/cc/ld -o scheme -G 8 -g0 -nocount /usr/lib/cmplrs/cc/crt0.o -count architecture.o bignum.o compiler.o complex.o debug.o eval.o fixnum.o io.o number.o object.o primitive.o rational.o real.o steering.o -lm -nocount -lc
0.4u 1.8s 0:02 86% 25+121k 0+53io 4pf+0w
52.0u 79.5s 3:10 68% 156+378k 0+921io 216pf+0w
% time cc -O2 -v *.c -lm -o scheme
/usr/lib/cmplrs/cc/cpp -v architecture.c -DLANGUAGE_C -DMIPSEL -I/usr/include2.0 > /tmp/ctmpa20637
0.3u 0.3s 0:00 81% 8+33k 0+6io 0pf+0w
/usr/lib/cmplrs/cc/ccom -Xv -EL -Xg0 -O2 -XS/tmp/ctmsta20637 < /tmp/ctmpa20637 > /tmp/ctmfa20637
1.1u 0.4s 0:01 86% 36+138k 0+9io 4pf+0w
/usr/lib/cmplrs/cc/uopt -v -G 8 -EL -g0 -O2 /tmp/ctmfa20637 /tmp/ctmoa20637 -t /tmp/ctmsta20637 /tmp/ctmosa20637
1.4u 1.3s 0:05 55% 80+74k 0+9io 42pf+0w
/usr/lib/cmplrs/cc/ugen -v -G 8 -EL -g0 -O2 /tmp/ctmoa20637 -o /tmp/ctmca20637 -t /tmp/ctmsta20637 -temp /tmp/ctmgta20637
0.2u 0.7s 0:01 58% 38+40k 0+12io 5pf+0w
/usr/lib/cmplrs/cc/as1 -v -G 8 -p0 -EL -g0 -O2 /tmp/ctmca20637 -o architecture.o -t /tmp/ctmsta20637
0.4u 1.1s 0:02 65% 44+56k 0+7io 6pf+0w
/usr/lib/cmplrs/cc/ld -o scheme -G 8 -g0 -nocount /usr/lib/cmplrs/cc/crt0.o -count architecture.o bignum.o compiler.o complex.o debug.o eval.o fixnum.o io.o number.o object.o primitive.o rational.o real.o steering.o -lm -nocount -lc
0.4u 1.9s 0:02 87% 25+122k 0+51io 4pf+0w
108.6u 111.3s 4:49 76% 228+538k 0+1128io 310pf+0w
% time cc -O3 -v *.c -lm -o scheme
/usr/lib/cmplrs/cc/cpp -v architecture.c -DLANGUAGE_C -DMIPSEL -I/usr/include2.0 > /tmp/ctmpa20903
0.3u 0.3s 0:00 95% 8+34k 0+16io 0pf+0w
/usr/lib/cmplrs/cc/ccom -Xv -EL -Xg0 -O3 -XS/tmp/ctmsta20903 < /tmp/ctmpa20903 > /tmp/ctmfa20903
1.1u 0.5s 0:01 90% 36+138k 0+11io 4pf+0w
/usr/lib/cmplrs/cc/ujoin -v -o architecture.u /tmp/ctmfa20903 /tmp/ctmsta20903
0.0u 0.2s 0:00 43% 11+13k 0+11io 8pf+0w
/usr/lib/cmplrs/cc/uld /usr/lib/cmplrs/cc/crt0.o architecture.u bignum.u compiler.u complex.u debug.u eval.u fixnum.u io.u number.u object.u primitive.u rational.u real.u steering.u -lm -lc -ko /tmp/ctmlua20903
2.9u 2.8s 0:05 97% 29+172k 1+113io 5pf+0w
/usr/lib/cmplrs/cc/usplit -v -o /tmp/ctmsa20903 -t /tmp/ctmsta20903 /tmp/ctmlua20903
0.0u 2.9s 0:03 92% 10+9k 0+107io 0pf+0w
/usr/lib/cmplrs/cc/umerge -v -noinline -g0 -O3 /tmp/ctmsa20903 -o /tmp/ctmma20903 -t /tmp/ctmsta20903
0.5u 1.6s 0:02 77% 20+273k 0+86io 14pf+0w
/usr/lib/cmplrs/cc/uopt -v -G 8 -EL -g0 -O3 /tmp/ctmma20903 /tmp/ctmoa20903 -t /tmp/ctmsta20903 /tmp/ctmosa20903
62.7u 30.8s 1:48 86% 89+1267k 0+122io 4pf+0w
/usr/lib/cmplrs/cc/ugen -v -G 8 -EL -g0 -O3 /tmp/ctmoa20903 -o /tmp/ctmca20903 -t /tmp/ctmsta20903 -temp /tmp/ctmgta20903
7.5u 19.2s 0:38 69% 45+443k 0+166io 5pf+0w
/usr/lib/cmplrs/cc/as1 -v -G 8 -p0 -EL -g0 -O3 /tmp/ctmca20903 -o /tmp/ctmba20903 -t /tmp/ctmsta20903
12.0u 33.8s 0:59 77% 49+456k 0+56io 6pf+0w
/usr/lib/cmplrs/cc/ld -o scheme -G 8 -g0 -nocount /usr/lib/cmplrs/cc/crt0.o -count /tmp/ctmba20903 -lm -nocount -lc
0.2u 1.8s 0:02 82% 26+143k 1+63io 4pf+0w
117.6u 115.4s 4:44 81% 235+2729k 5+1353io 130pf+0w
G N U C O M P I L E R
% time gcc -v *.c -lm -o scheme
/aber/pcg/LibDec./gcc-cpp -v -undef -D__GNUC__ -D__ANSI_COMPAT -D__LANGUAGE_C -D__MIPSEL -D__R3000 -D__SYSTYPE_BSD -D__bsd4_2 -D__host_mips -D__mips -D__ultrix -D__unix -DLANGUAGE_C -DMIPSEL -DR3000 -DSYSTYPE_BSD -Dbsd4_2 -Dhost_mips -Dmips -Dultrix -Dunix -D____ANSI_COMPAT__ -D____LANGUAGE_C__ -D____MIPSEL__ -D____R3000__ -D____SYSTYPE_BSD__ -D____bsd4_2__ -D____host_mips__ -D____mips__ -D____ultrix__ -D____unix__ -D__LANGUAGE_C__ -D__MIPSEL__ -D__R3000__ -D__SYSTYPE_BSD__ -D__bsd4_2__ -D__host_mips__ -D
__mips__ -D__ultrix__ -D__unix__ architecture.c /usr/tmp/cca22082.cpp
/aber/pcg/LibDec./gcc-cc1 /usr/tmp/cca22082.cpp -quiet -dumpbase architecture.c -version -o /usr/tmp/cca22082.s
/usr/lib/cmplrs/cc/as0 -v -G 8 -EL -g0 -O1 /usr/tmp/cca22082.s -o /tmp/ctmaa22088 -t /tmp/ctmsta22088
ld -o scheme -G 8 -EL /lib/crt0.o architecture.o bignum.o compiler.o complex.o debug.o eval.o fixnum.o io.o number.o object.o primitive.o rational.o real.o steering.o -lm /aber/pcg/LibDec./gcc-gnulib -lc /aber/pcg/LibDec./gcc-gnulib
100.7u 92.0s 3:57 81% 233+559k 0+863io 489pf+0w
% time gcc -v -O1 *.c -lm -o scheme
/aber/pcg/LibDec./gcc-cpp -v -undef -D__GNUC__ -D__ANSI_COMPAT -D__LANGUAGE_C -D__MIPSEL -D__R3000 -D__SYSTYPE_BSD -D__bsd4_2 -D__host_mips -D__mips -D__ultrix -D__unix -DLANGUAGE_C -DMIPSEL -DR3000 -DSYSTYPE_BSD -Dbsd4_2 -Dhost_mips -Dmips -Dultrix -Dunix -D____ANSI_COMPAT__ -D____LANGUAGE_C__ -D____MIPSEL__ -D____R3000__ -D____SYSTYPE_BSD__ -D____bsd4_2__ -D____host_mips__ -D____mips__ -D____ultrix__ -D____unix__ -D__LANGUAGE_C__ -D__MIPSEL__ -D__R3000__ -D__SYSTYPE_BSD__ -D__bsd4_2__ -D__host_mips__ -D
__mips__ -D__ultrix__ -D__unix__ -D__OPTIMIZE__ architecture.c /usr/tmp/cca22229.cpp
/aber/pcg/LibDec./gcc-cc1 /usr/tmp/cca22229.cpp -O -mgpOPT -quiet -dumpbase architecture.c -version -o /usr/tmp/cca22229.s
/usr/lib/cmplrs/cc/as0 -v -G 8 -EL -g0 -O2 /usr/tmp/cca22229.s -o /tmp/ctmaa22235 -t /tmp/ctmsta22235
0.4u 0.6s 0:01 64% 27+40k 0+15io 7pf+0w
/usr/lib/cmplrs/cc/as1 -v -G 8 -p0 -EL -g0 -O2 /tmp/ctmaa22235 -o architecture.o -t /tmp/ctmsta22235
0.4u 1.0s 0:01 80% 44+53k 0+6io 6pf+0w
ld -o scheme -G 8 -EL /lib/crt0.o architecture.o bignum.o compiler.o complex.o debug.o eval.o fixnum.o io.o number.o object.o primitive.o rational.o real.o steering.o -lm /aber/pcg/LibDec./gcc-gnulib -lc /aber/pcg/LibDec./gcc-gnulib
118.9u 84.2s 4:17 78% 297+642k 0+935io 494pf+0w
% time gcc -v -O3 *.c -lm -o scheme
/aber/pcg/LibDec./gcc-cpp -v -undef -D__GNUC__ -D__ANSI_COMPAT -D__LANGUAGE_C -D__MIPSEL -D__R3000 -D__SYSTYPE_BSD -D__bsd4_2 -D__host_mips -D__mips -D__ultrix -D__unix -DLANGUAGE_C -DMIPSEL -DR3000 -DSYSTYPE_BSD -Dbsd4_2 -Dhost_mips -Dmips -Dultrix -Dunix -D____ANSI_COMPAT__ -D____LANGUAGE_C__ -D____MIPSEL__ -D____R3000__ -D____SYSTYPE_BSD__ -D____bsd4_2__ -D____host_mips__ -D____mips__ -D____ultrix__ -D____unix__ -D__LANGUAGE_C__ -D__MIPSEL__ -D__R3000__ -D__SYSTYPE_BSD__ -D__bsd4_2__ -D__host_mips__ -D
__mips__ -D__ultrix__ -D__unix__ -D__OPTIMIZE__ architecture.c /usr/tmp/cca22490.cpp
/aber/pcg/LibDec./gcc-cc1 /usr/tmp/cca22490.cpp -O -fstrength-reduce -fomit-frame-pointer -finline-functions -mgpOPT -quiet -dumpbase architecture.c -version -o /usr/tmp/cca22490.s
/usr/lib/cmplrs/cc/as0 -v -G 8 -EL -g0 -O3 /usr/tmp/cca22490.s -o /tmp/ctmaa22494 -t /tmp/ctmsta22494
0.3u 0.6s 0:01 75% 28+40k 0+14io 7pf+0w
/usr/lib/cmplrs/cc/as1 -v -G 8 -p0 -EL -g0 -O3 /tmp/ctmaa22494 -o architecture.o -t /tmp/ctmsta22494
0.4u 1.1s 0:01 83% 44+54k 0+6io 6pf+0w
ld -o scheme -G 8 -EL /lib/crt0.o architecture.o bignum.o compiler.o complex.o debug.o eval.o fixnum.o io.o number.o object.o primitive.o rational.o real.o steering.o -lm /aber/pcg/LibDec./gcc-gnulib -lc /aber/pcg/LibDec./gcc-gnulib
134.0u 84.9s 4:23 83% 323+790k 0+960io 494pf+0w
S U N C O M P I L E R
athene% time cc -I. -v -time *.c -lm -o scheme
architecture.c:
/lib/cpp -I. -undef -Dunix -Dsun -Dmc68000 -Dmc68020 architecture.c >/tmp/cpp.10036.0.i
cpp: time U:1.6s+S:0.3s=1.9s REAL:6.6s 29%. core T:0k D:4k. io IN:0b OUT:4b. pf IN:0p OUT:78p.
/lib/ccom - -fsoft -mc68020 </tmp/cpp.10036.0.i >/tmp/ccom.10036.1.s
ccom: time U:5.8s+S:0.4s=6.2s REAL:11.0s 56%. core T:0k D:24k. io IN:4b OUT:4b. pf IN:7p OUT:102p.
rm /tmp/cpp.10036.0.i
/bin/as -o architecture.o -mc68020 /tmp/ccom.10036.1.s
as: time U:2.0s+S:0.4s=2.5s REAL:3.1s 79%. core T:0k D:6k. io IN:4b OUT:7b. pf IN:23p OUT:80p.
rm /tmp/ccom.10036.1.s
/bin/ld -dc -dp -e start -X -o scheme /usr/lib/crt0.o /usr/lib/Fcrt1.o -L/usr/lib/fsoft architecture.o bignum.o compiler.o complex.o debug.o eval.o fixnum.o io.o number.o object.o primitive.o rational.o real.o steering.o -lm -lc
ld: time U:7.0s+S:1.3s=8.3s REAL:21.2s 39%. core T:0k D:27k. io IN:66b OUT:37b. pf IN:74p OUT:128p.
319.6 real 228.0 user 16.0 sys
athene% time cc -I. -v -time -O1 *.c -lm -o scheme
architecture.c:
/lib/cpp -I. -undef -Dunix -Dsun -Dmc68000 -Dmc68020 architecture.c >/tmp/cpp.10169.0.i
cpp: time U:1.5s+S:0.2s=1.8s REAL:2.8s 62%. core T:0k D:3k. io IN:1b OUT:4b. pf IN:1p OUT:67p.
/lib/ccom - -fsoft -mc68020 </tmp/cpp.10169.0.i >/tmp/ccom.10169.1.s
ccom: time U:5.6s+S:0.3s=5.9s REAL:6.7s 87%. core T:0k D:23k. io IN:1b OUT:3b. pf IN:0p OUT:108p.
rm /tmp/cpp.10169.0.i
/lib/c2 -20 </tmp/ccom.10169.1.s >/tmp/c2.10169.2.s
c2: time U:3.4s+S:0.3s=3.7s REAL:4.5s 83%. core T:0k D:12k. io IN:3b OUT:3b. pf IN:25p OUT:80p.
rm /tmp/ccom.10169.1.s
/bin/as -o architecture.o -mc68020 /tmp/c2.10169.2.s
as: time U:2.2s+S:0.2s=2.4s REAL:2.9s 81%. core T:0k D:6k. io IN:2b OUT:8b. pf IN:0p OUT:80p.
rm /tmp/c2.10169.2.s
/bin/ld -dc -dp -e start -X -o scheme /usr/lib/crt0.o /usr/lib/Fcrt1.o -L/usr/lib/fsoft architecture.o bignum.o compiler.o complex.o debug.o eval.o fixnum.o io.o number.o object.o primitive.o rational.o real.o steering.o -lm -lc
ld: time U:6.9s+S:1.1s=8.0s REAL:10.3s 78%. core T:0k D:26k. io IN:58b OUT:37b. pf IN:66p OUT:110p.
468.5 real 324.1 user 23.9 sys
athene% time cc -I. -v -time -O4 *.c -lm -o scheme
architecture.c:
/lib/cpp -I. -undef -Dunix -Dsun -Dmc68000 -Dmc68020 architecture.c >/tmp/cpp.10331.0.i
cpp: time U:1.4s+S:0.4s=1.8s REAL:2.1s 85%. core T:0k D:4k. io IN:2b OUT:4b. pf IN:2p OUT:66p.
/lib/ccom - -XO4 /tmp/ccom.10331.2.ir -fsoft -mc68020 </tmp/cpp.10331.0.i >/tmp/ccom.10331.1.s
ccom: time U:5.8s+S:0.4s=6.2s REAL:6.8s 90%. core T:0k D:27k. io IN:0b OUT:23b. pf IN:0p OUT:140p.
rm /tmp/cpp.10331.0.i
/usr/lib/iropt -O4 -c -o /tmp/iropt.10331.3.ir /tmp/ccom.10331.2.ir
iropt: time U:15.8s+S:1.2s=17.1s REAL:19.3s 88%. core T:0k D:93k. io IN:3b OUT:40b. pf IN:19p OUT:195p.
rm /tmp/ccom.10331.2.ir
/usr/lib/cg -fsoft -mc68020 /tmp/iropt.10331.3.ir >/tmp/cg.10331.4.s
cg: time U:2.7s+S:0.3s=3.0s REAL:3.6s 84%. core T:0k D:7k. io IN:3b OUT:4b. pf IN:19p OUT:95p.
rm /tmp/iropt.10331.3.ir
/lib/c2 -20 </tmp/cg.10331.4.s >/tmp/c2.10331.5.s
c2: time U:3.7s+S:0.3s=4.0s REAL:5.7s 70%. core T:0k D:13k. io IN:0b OUT:2b. pf IN:0p OUT:81p.
rm /tmp/cg.10331.4.s
/bin/as -o architecture.o -O -mc68020 /tmp/ccom.10331.1.s /tmp/c2.10331.5.s
as: time U:2.0s+S:0.4s=2.5s REAL:7.8s 32%. core T:0k D:6k. io IN:2b OUT:6b. pf IN:0p OUT:77p.
rm /tmp/ccom.10331.1.s
/bin/ld -dc -dp -e start -X -o scheme /usr/lib/crt0.o /usr/lib/Fcrt1.o -L/usr/lib/fsoft architecture.o bignum.o compiler.o complex.o debug.o eval.o fixnum.o io.o number.o object.o primitive.o rational.o real.o steering.o -lm -lc
ld: time U:6.1s+S:0.9s=7.1s REAL:8.1s 87%. core T:0k D:23k. io IN:40b OUT:35b. pf IN:50p OUT:108p.
1269.5 real 880.4 user 64.8 sys
athene% time cc -I. -v -time -pipe *.c -lm -o scheme
architecture.c:
/lib/cpp -I. -undef -Dunix -Dsun -Dmc68000 -Dmc68020 architecture.c | /lib/ccom - -fsoft -mc68020 >/tmp/ccom.10772.1.s
cpp: time U:1.4s+S:0.2s=1.7s REAL:5.1s 33%. core T:0k D:3k. io IN:2b OUT:0b. pf IN:2p OUT:65p.
ccom: time U:5.5s+S:0.1s=5.7s REAL:7.5s 76%. core T:0k D:23k. io IN:0b OUT:3b. pf IN:0p OUT:102p.
/bin/as -o architecture.o -mc68020 /tmp/ccom.10772.1.s
as: time U:2.1s+S:0.1s=2.2s REAL:2.5s 90%. core T:0k D:5k. io IN:0b OUT:7b. pf IN:0p OUT:78p.
rm /tmp/ccom.10772.1.s
/bin/ld -dc -dp -e start -X -o scheme /usr/lib/crt0.o /usr/lib/Fcrt1.o -L/usr/lib/fsoft architecture.o bignum.o compiler.o complex.o debug.o eval.o fixnum.o io.o number.o object.o primitive.o rational.o real.o steering.o -lm -lc
ld: time U:7.3s+S:1.0s=8.3s REAL:9.8s 85%. core T:0k D:28k. io IN:18b OUT:39b. pf IN:16p OUT:115p.
258.9 real 228.4 user 14.6 sys
athene% time cc -I. -v -time -pipe -O1 *.c -lm -o scheme
architecture.c:
/lib/cpp -I. -undef -Dunix -Dsun -Dmc68000 -Dmc68020 architecture.c | /lib/ccom - -fsoft -mc68020 | /lib/c2 -20 >/tmp/c2.10828.2.s
cpp: time U:1.6s+S:0.2s=1.8s REAL:6.7s 27%. core T:0k D:4k. io IN:0b OUT:0b. pf IN:0p OUT:66p.
ccom: time U:5.7s+S:0.3s=6.0s REAL:12.1s 50%. core T:0k D:24k. io IN:0b OUT:0b. pf IN:0p OUT:101p.
c2: time U:3.5s+S:0.2s=3.8s REAL:12.3s 31%. core T:0k D:12k. io IN:1b OUT:5b. pf IN:1p OUT:80p.
/bin/as -o architecture.o -mc68020 /tmp/c2.10828.2.s
as: time U:1.9s+S:0.2s=2.2s REAL:2.5s 88%. core T:0k D:5k. io IN:1b OUT:6b. pf IN:1p OUT:78p.
rm /tmp/c2.10828.2.s
/bin/ld -dc -dp -e start -X -o scheme /usr/lib/crt0.o /usr/lib/Fcrt1.o -L/usr/lib/fsoft architecture.o bignum.o compiler.o complex.o debug.o eval.o fixnum.o io.o number.o object.o primitive.o rational.o real.o steering.o -lm -lc
ld: time U:6.9s+S:1.1s=8.1s REAL:10.8s 75%. core T:0k D:27k. io IN:44b OUT:40b. pf IN:52p OUT:117p.
462.2 real 324.4 user 22.6 sys
athene% time cc -I. -v -time -pipe -O4 *.c -lm -o scheme
architecture.c:
/lib/cpp -I. -undef -Dunix -Dsun -Dmc68000 -Dmc68020 architecture.c | /lib/ccom - -XO4 /tmp/ccom.10990.2.ir -fsoft -mc68020 >/tmp/ccom.10990.1.s
cpp: time U:1.7s+S:0.3s=2.0s REAL:7.7s 26%. core T:0k D:4k. io IN:2b OUT:0b. pf IN:2p OUT:62p.
ccom: time U:5.7s+S:0.5s=6.3s REAL:10.5s 59%. core T:0k D:26k. io IN:1b OUT:24b. pf IN:0p OUT:135p.
/usr/lib/iropt -O4 -c -o /tmp/iropt.10990.3.ir /tmp/ccom.10990.2.ir
iropt: time U:15.8s+S:1.6s=17.5s REAL:24.2s 72%. core T:0k D:95k. io IN:6b OUT:39b. pf IN:21p OUT:196p.
rm /tmp/ccom.10990.2.ir
/usr/lib/cg -fsoft -mc68020 /tmp/iropt.10990.3.ir | /lib/c2 -20 >/tmp/c2.10990.5.s
cg: time U:2.8s+S:0.3s=3.1s REAL:6.6s 47%. core T:0k D:7k. io IN:5b OUT:2b. pf IN:22p OUT:95p.
c2: time U:3.7s+S:0.3s=4.0s REAL:8.1s 50%. core T:0k D:13k. io IN:0b OUT:2b. pf IN:5p OUT:77p.
rm /tmp/iropt.10990.3.ir
/bin/as -o architecture.o -O -mc68020 /tmp/ccom.10990.1.s /tmp/c2.10990.5.s
as: time U:1.9s+S:0.2s=2.2s REAL:2.6s 84%. core T:0k D:5k. io IN:2b OUT:6b. pf IN:3p OUT:75p.
rm /tmp/ccom.10990.1.s
/bin/ld -dc -dp -e start -X -o scheme /usr/lib/crt0.o /usr/lib/Fcrt1.o -L/usr/lib/fsoft architecture.o bignum.o compiler.o complex.o debug.o eval.o fixnum.o io.o number.o object.o primitive.o rational.o real.o steering.o -lm -lc
ld: time U:6.2s+S:1.0s=7.2s REAL:8.5s 84%. core T:0k D:23k. io IN:61b OUT:35b. pf IN:69p OUT:120p.
1442.2 real 898.0 user 70.3 sys
I just love "optimizing" compilers.
Why not just drop the horror-quotes, and _optimizing_ for that matter,
and say _compilers_. If you want to talk about compilers without
optimization, say _dumb_ perhaps, or _naive_, or _experimental_.
The complexities you mention aren't really unusual.
Just doing global data flow analysis (no optimization, just collecting
information) is at least O(n) bit vector steps, where "n" is
the number of basic blocks. Since the length of the bit vectors
and the number of basic blocks are probably proportional to the procedure
length, we're already up to O(n^2) space and time.
Of course, bit vectors are pretty dense and working over basic blocks
is pretty quick, so most compilers are able to accomplish data flow
analysis very quickly, not withstanding the time complexity.
It's perhaps harder to discuss the cost/benefit ratio.
I will note that I spend much more time writing and using
programs than compiling them.
>If you want to read something really interesting on the "small is
>beautiful" technology front, do a literature search for TWIG, the
>Princeton code generator tool, and its successors.
^^^^^^^^^ Bell Labs
Look in
Code Generation Using Tree Matching and Dynamic Programming
Aho, Ganapathi, and Tjiang
TOPLAS, October 1989
Preston Briggs
pcg> Many "optimizing" compilers have impossibly expensive optimization
pcg> phases; for example the SPARC and MIPS compilers have a global
pcg> optimization algorithm with a time complexity of O(n^3), and the GNU CC
pcg> compiler has a space complexity of O(n^2). These are well past any
pcg> reasonable cost/benefit ratio... Clearly anything well done compares
pcg> well against these monsters.
So what? The standard is that the programs are run many more times
than they are compiled. Fast compile times are great for development,
but I'd like to my kernel, shell and editor to be as optimized as
possible. Of course, whether you can trust these optimizers is
another question.
--
Jeff Sparkes jspa...@bnr.ca Bell-Northern Research, Ottawa (613)765-2503
A hole is a solution. A drill is a tool. - Guy Kawasaki, "The Macintosh Way"
Cheers, Henrik
Would a text version of this and the other Plan 9 papers be possible? The
Postscript versions are pretty, but awfully large and a bit of a pain to
print. The large bitmapped image in 158d, as well as the smaller diagram
in 158a and 158b, would be unreproducable but I think the rest of the papers
stand on their own merits.
I agree... it's an impressive paper. Or at least it describes an impressive
compiler.
--
Peter da Silva; Ferranti International Controls Corporation; +1 713 274 5180;
Sugar Land, TX 77487-5012; `-_-' "Have you hugged your wolf, today?"
I think this is does a major disservice to Thompson's work. I'm not that
familiar with the ETH compiler, but Turbo Pascal is a horrible example. Not
only was it for a single architecture, but it let that architecture show
through to the user level. It didn't even implement a full Pascal... it was
a subset (no standard Pascal I/O, for example) with extensions. The Thompson
compiler is extremely hardware-independent and generates good 68020, MIPS,
and Crisp code. It also implements full ANSI standard C, not a subset.
Oh yes, not only was the compiler 8086 dependent, it was also restricted
to small model on the 8086. Programs larger than 64K had to use overlays.
> none of these things has exactly been
> ground breaking work, they are simply well engineered developments of
> well known technology.
Isn't that ground-breaking itself? Particularly in contrast to things like
OS/2... let alone the gross disasters that MS-DOS and the Mac system software
have grown into?
| I also recommend ftp'ing this paper. I don't think Ken's work is the last
| word in compilers, merely a testament to sound programming principles, and
| to the fact that current Unix vendors' compilers are a crock.
An overstatement, I think. Optimizing compilers are not going to
produce as much improvement on well written code as they will on student
first attempts, because the common subexpressions have be done in
source, the invariant stuff is out of loops, the loops are unrolled, the
register type has been used here and there... all by a one pass code
generator called a programmer.
But there are a lot more bad programs than good, and I don't think
that current compilers are unjustified. They serve the needs of the
average programmer, and the employer of that programmer, who buys the
compiler based on performance on the buyer's code.
--
bill davidsen (davi...@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
GE Corp R&D Center, Information Systems Operation, tech support group
Moderator comp.binaries.ibm.pc and 386-users digest.
| Now, now, let's not get carried away. He has done a clone of Turbo
| Pascal, or of the small/fast Pascal compiler from ETH. It is good
| implementation architecture, yes, but it shines only in comparison with
| the awful things we usually get and accept.
Single pass compilers are hardly that recent, and small compilers are
not new. The original B compiler (two pass because of memory limits) was
19 pages of code for one pass and either 22 or 29 for the other (that's
going back a way).
Note that true one pass compilers can not generate best possible code
for some machines, notably those which have a short branch instruction,
since forward jumps must be coded as long jumps. I believe the plan 9
compiler delays code generation for a bit, so peephole optimization can
be done. That permits a little code motion, better register use, etc.
>Isn't that ground-breaking itself? Particularly in contrast to things like
>OS/2... let alone the gross disasters that MS-DOS and the Mac system software
>have grown into?
And UNIX, and ....
And now for the $65,536 question:
"Name a sucessful, widely-used OS that hasn't grown into
a gross disaster. You have 10 seconds. Tick. Tick."
--
+--------------------------------------------------------------------------+
| Robert J. Woodhead, Biar Games / AnimEigo, Incs. tre...@foretune.co.jp |
| ``If you want to stab someone in the back, Bernard, you must first get |
| behind them!'' -- Sir Humphrey Appleby on the mechanics of politics. |
> Optimizing compilers are not going to
>produce as much improvement on well written code as they will on student
>first attempts, because the common subexpressions have be done in
>source, the invariant stuff is out of loops, the loops are unrolled, the
>register type has been used here and there... all by a one pass code
>generator called a programmer.
I'm going to poke a little at this...
IMHO, traditional optimizations (say, up to vectorization) are mostly
useful for cleaning up addressing expressions -- things that arise
when compiling
A(i, j) = A(i-1, j) * B(i, j-1)
It's the sort of thing C programmers can get at using pointer
arithmetic, but most other languages must leave alone.
Field accesses are a wonderful source of constant additions to be folded,
pointer accesses offer lots of opportunity for common subexpression
elimination, and array accesses are always great for strength reduction.
If we happen to catch some extra left in by the programmer...
well, that's fine too.
If you do a good job on your basic optimization, and use some care in
instruction selection and register allocation, the compiler makes the sort
of code you'd hope for -- clean and straightforward, nothing fancy.
I remember C. A. R. Hoare writing persuasively for a language
in which "a simple straightforward 'non-pessimising' compiler will
produce straightforward object programs of acceptable efficiency."
To an extent, C and its early compilers accomplish this, but the
amount of detail required of the programmer seems extreme.
Relying on "a one pass code generator called a programmer" is inadequate.
Programmers aren't 1-pass; they go over the code repeatedly, during
the initial creation and, more importantly, during maintanence.
Therefore, I'd advocate writing clean, straightforward code that
correctly expresses the algorithm you've carefully selected
and let the compiler worry about the low level tedium that
they were designed to handle.
Then, if the resulting code is inadequate (or worse, incorrect),
push hard on the vendor to improve his compiler.
Preston Briggs
> An overstatement, I think. Optimizing compilers are not going to
>produce as much improvement on well written code as they will on student
>first attempts, because the common subexpressions have be done in
>source, the invariant stuff is out of loops, the loops are unrolled, the
>register type has been used here and there... all by a one pass code
>generator called a programmer.
the only program i ever generated in one pass did little more than
print out `Hello world!', and even then i cheated by looking
up the hard parts :-).
about doing optimizations at the source level ... yes, but.
not all common subexpressions are available at the source level, and many
of those that are tend to turn a readable program into a morass of pointer
arithmetic and mysterious extra variables. excessive loop unrolling
is maybe attractive for people who get paid by the line, hardly
for the maintenance programmer. using the `register' storage class
specifier (in languages where such an anachronism is available)
to achieve speedups is hardly worthwhile with modern compilers.
they tend to ignore it.
the gist is, a reasonable amount of optimization seems worthwhile even
for `well written code'.
--
Alexander Vrchoticky | al...@vmars.tuwien.ac.at
TU Vienna, CS/Real-Time Systems | +43/222/58801-8168
"it's never enough to see how others fall in love" (the blue aeroplanes)
Note that the code generator proper is hand-written for each of these
architectures. To quote the paper: "There is a considerable amount of
talk in literature about automating this part of the compiler with a
hardware description. Since this code generator is so small (less than
500 lines of C) and easy, it hardly seems worth the effort." I guess
this is less than a gcc hardware description anyway.
It also implements full ANSI standard C, not a subset.
Not correct. To quote the paper again: "The compiler implements ANSI C
with some restrictions and extensions." "Several of the poorer
features were left out."
This is just nit-picking. I also really liked the paper.
J
| I think this is does a major disservice to Thompson's work. I'm not that
| familiar with the ETH compiler, but Turbo Pascal is a horrible example. Not
| only was it for a single architecture, but it let that architecture show
| through to the user level.
The 8080 and 8086 are from the same vendor, but hardly the same
architecture. Unless you mean they both use a stack, in which case I
guess PSARC would be the same, too.
Let's keep the Intel bashing out of the discussion, please?
> And UNIX, and ....
I wouldn't describe UNIX, even VR4, as a gross disaster. Gross, yes, but
people aren't trying to shoehorn it into stuff it's obviously unsuited for
in any sort of mass market.
I might consider some of the less-well-considered forays into secure or real
time systems disasters, though.
> "Name a sucessful, widely-used OS that hasn't grown into
> a gross disaster. You have 10 seconds. Tick. Tick."
AmigaDOS. It's not as successful or widely used as MacOS or MS-DOS, but
3 million units shipped isn't bad. It's more than any "successful" UNIX
workstation I know of.
I wasn't aware of any Intel bashing. I'd have made the same comments about
any other CPU-specific compiler.
I'd forgotten the 8080 version of the compiler, but now that you mention it
the 8086 compiler does share a lot of the limitations of the 8080 one: small
memory model, same overlay structure, and so on.
> The 8080 and 8086 are from the same vendor, but hardly the same
> architecture.
If you stick to small model on the 8086, they're pretty close.
I had a look on research.att.com, and couldn't find anything that was
obviously a Plan 9 paper. Could someone specifically cite a file that I
should FTP to get this new(er) plan 9 paper that discusses the compiler
in question? I only have the old plan 9 paper that made the rounds last
year.
Thanks,
Crispin
-----
Crispin Cowan, CS grad student, University of Western Ontario
Phyz-mail: Middlesex College, MC28-C, N6A 5B7
E-mail: cri...@csd.uwo.ca Voice: 519-661-3342
Temporarily at: w1c...@watson.ibm.com
"If you want an operating system that is full of vitality and has a
great future, use OS/2." --Andy Tanenbaum
when i was using the occam transformation system (an ML program
that applied semantics preserving transformations to occam-1
programs) i briefly experimented with an idea of for using it as
an occam preprocessor. in this way a program could be written
in a clear and straightforward manner along with transformation
directives - e.g. loops to unwind, functions to inline. in this
way the programmer both used his/her knowledge to guide the
optimization as well as ensuring that the "optimized" code compiled by the
simple non-pessimising compiler (an occam-1 compiler can be
done in page or two of prolog if you that way inclined - i'm not)
is semantically equivalent to the written code.
there are obvious problems with this approach in developing all the
transformations required etc ... especially if the language doesn't
have a "nice" semantics like occam ... but perhaps it may show
a potential approach. the advantage is that it seperates the issues
of algorithm correctness (the "simple code") from optimization (the
transformations) and compilation (the simple compiler algorithm)
rather than having the optimization mixed in at both ends.
--
--------------------------------------------------------------------------
david shepherd: d...@inmos.co.uk or d...@inmos.com tel: 0454-616616 x 379
inmos ltd, 1000 aztec west, almondsbury, bristol, bs12 4sq
"pugh, pugh, barney mcgrew, cuthbert, dibble, grubb !"
preston> davi...@crdos1.crd.ge.com (bill davidsen) writes:
davidsen> Optimizing compilers are not going to produce as much
davidsen> improvement on well written code as they will on student first
davidsen> attempts, because the common subexpressions have be done in
davidsen> source, the invariant stuff is out of loops, the loops are
davidsen> unrolled, the register type has been used here and there...
davidsen> all by a one pass code generator called a programmer.
This is one of my favoyrite arguments, and its architectural aspect is
that I agree with you (and Hoare as reported by Preston Briggs later on)
that for conventional languages the architectural boundary between the
effort spent by the programmer and that spent by the compiler should be
such that the compiler need not do extensive program
rewriting/reorganization...
preston> I'm going to poke a little at this... IMHO, traditional
preston> optimizations (say, up to vectorization) are mostly useful for
preston> cleaning up addressing expressions -- things that arise when
preston> compiling
preston> A(i, j) = A(i-1, j) * B(i, j-1)
preston> It's the sort of thing C programmers can get at using pointer
preston> arithmetic, but most other languages must leave alone.
But that's the problem with those languages -- they have the unfortunate
property (an architectural tragedy!) of not being low level enough that
the programmer can specify what sort of code he wants (as say in C and
other true to form MOHLL/systems implementation language), and not high
level enough that the compiler can generate the sort of code it wants
(as say in SQL and other true to form application oriented languages).
preston> I remember C. A. R. Hoare writing persuasively for a language
preston> in which "a simple straightforward 'non-pessimising' compiler
preston> will produce straightforward object programs of acceptable
preston> efficiency."
And myself on these screens, in a less important way, I have been
echoing Hoare's opinions (which are Thompson's as well, I understand).
preston> To an extent, C and its early compilers accomplish this,
Yes, it being a systems implementation language.
preston> but the amount of detail required of the programmer seems
preston> extreme.
That's a question of taste, but you only need to take great care when it
matters, which is usually rarely. Most of almost every application does
not need any automatic/manual optimization.
preston> Therefore, I'd advocate writing clean, straightforward code
preston> that correctly expresses the algorithm you've carefully
preston> selected
Yes, precisely, but I see that very few people are using APL/S instead
of Fortran or SQL/Equel instead of Cobol.
preston> and let the compiler worry about the low level tedium that they
preston> were designed to handle.
If the language defines a suitably abstract interface between programmer
and compiler, that's wonderful. If the interface is at the wrong level
of abstraction, or incomplete, this becomes program rewriting by the
compiler, and it is dangerous and expensive.
Unfortunately a lot of people with the bucks to purchase compilers have
Cobol/Fortran/... dusty decks.
I believe the classic term for non-optimising compilers is "Q&D" (quick
and dirty), at least according to the famous "Dragon Book".
+---------------------------------------------------------------------+
| Zik | "I don't want the world - |
| Michael Saleeba | I just want your half" |
| z...@bruce.cs.monash.oz.au | - TMBG |
+---------------------------------------------------------------------+
I claim no great knowledge of compilers, other than slogging
through the dragon book, but I've stepped through a lot of code, with
optimization on, and the assembly never showed this type of optimization.
I also will point out that not all code is written where it is compiled. I've
bought numerous source code libraries for X.25, or other packages, that I
would never alter
("Sorry, no support for you - you altered our source! HAHAHA..")
so optimization is very important for many of my uses.
- Michael Goldman
>I am surprised that there has been some discussion of how much optimization
>should take place. IMHO, the more the better. In fact, the one optimization
>I haven't seen is that of figuring out that a particular value from a
>series of pointers (e.g. a->b->..->c[N]) is used several times and should
>be registered. I certainly can't do it in a declaration, so I end up
>just storing - temp = a->...c[N]. The overhead in retreiving a->b->..->c[N]
>is horrendous, and well worth a stupid "temp".
And you should do it, not the compiler. In many cases,
such chains contain pointers that are extremely volatile
and can be changed by the OS and/or other processes. You
know when this is a possibility, but the compiler cannot
be expected to be so omnicient.
For example, many Macintosh memory manager calls return a
handle (a pointer to a pointer to the memory allocated);
the handle itself doesn't change, but the underlying pointer
does. One must be very careful in the use of "dereferenced"
handles to "lock" the objects in question down so they cannot
be modified. Very often the overhead of the lock/unlock is
more than a little additional pointer following.
Perhaps the ideal solution would be a compiler comment semantic
that tells the compiler that such finagaling as you desire is
safe or unsafe. The default should be that it is unsafe, imho.
>Programmers aren't 1-pass; they go over the code repeatedly, during
>the initial creation and, more importantly, during maintanence.
>
>Therefore, I'd advocate writing clean, straightforward code that
>correctly expresses the algorithm you've carefully selected
>and let the compiler worry about the low level tedium that
>they were designed to handle.
I'd like to second this notion. In my view, the source code should be
readable above all else (save, of course, an appropriate algorithm choice).
And yes, that means even if nobody is ever going to see it but you, if you
ever intend to modify the code.
Mostly, things like collapsing common sub-expressions, moving loop-
invariant code, and unrolling loops hurt the readability of the code,
to say nothing of simpler things like constant-expression evaluation,
etc.
I *want* the compiler to be able to do those things for me, so I can write
code in what I consider the cleanest, most easily understandable way. Then
the compiler can make up for the more obvious inefficiencies of strictly
interpreting the source. I resist the notion that my code is not "well-
written" simply because I have *not* handled the usual source-code trans-
formations myself.
If these transformations are something that lots of folks want to leave
*out* of a compiler, then maybe there is a market for a source-code
transformation tool that does these things before feeding the code to
the postulated simpler compiler. Actually, such a thing might even be
educational; examining the transformations made on one's program might
be revealing!
Terry R. Friedrichsen
te...@venus.sunquest.com (Internet)
uunet!sunquest!terry (Usenet)
te...@sds.sdsc.edu (alternate address; I live in Tucson)
Of course, if nearly any store operation is performed between the two
successive pointer reference chains, the compiler is thwarted - it's
nearly impossible to determine if one of the stores has trashed a
pointer along the way.
Fortunately, a piece of hardware can take care of some of this pretty
nicely at run time. It's called a data cache. The second time you
follow the reference chain, the pointers are probably already in the
cache.
The Multiflow TRACE machines didn't have a data cache. For C programs
like this, with valiant efforts of compiler optimization and hardware
thrown at the problem, it could sort of break even by making up
performance in other parts of the program.
trebor> mi...@sono.uucp (Michael Goldman ) writes:
miklg> I am surprised that there has been some discussion of how
miklg> much optimization should take place. IMHO, the more the
miklg> better.
Well, if only optimization were cost-free! One writes any crappy program
that more or less doe swhat is required,a nd the compiler optimizes it
so that it looks as if it were written with care. Hey, some people have
written optimizers that detect common algorithms (e.g. three nested
loops for matrix multiplication) and rewrite them with more efficient
ones!
But architecture is about striking compromises, being aware of
cost/benefit ratios, at the level of the whole system in all its
aspects, not about wishful thinking. And like all good things,
optimization is subject to the law of diminishing returns. Architects
try to determine where the knee of the curve is, marketing people try to
persuade you that more is better. You seem to be a success story for
the latter.
miklg> In fact, the one optimization I haven't seen is that of figuring
miklg> out that a particular value from a series of pointers (e.g.
miklg> a->b->..->c[N]) is used several times and should be registered.
miklg> I certainly can't do it in a declaration, so I end up just
miklg> storing - temp = a->...c[N]. The overhead in retreiving
miklg> a->b->..->c[N] is horrendous, and well worth a stupid "temp".
Here you have the compeltely wrong attitude. If that value that you call
temp has some significance of its own, it *ought* to be given a special
descriptive name of its own. Some say that this is hadn optimization; I
reckon it is descriptive, legible programming. Consider
'next->next->info[i]' vs. 'temp' vs 'optimal_weight'...
trebor> And you should do it, not the compiler. In many cases,
trebor> such chains contain pointers that are extremely volatile
trebor> and can be changed by the OS and/or other processes.
Or even by the program itself, as the given pointer values can be
aliased, e.g. just consider if the linked structure you are using has
loops, where for example 'this->next->next == this'...
trebor> You know when this is a possibility, but the compiler cannot be
trebor> expected to be so omnicient. [ ... Mac example ... ] Perhaps the
trebor> ideal solution would be a compiler comment semantic that tells
trebor> the compiler that such finagaling as you desire is safe or
trebor> unsafe. The default should be that it is unsafe, imho.
And here we are again at the architectural, design problem: if the
language defines an interface between programmer and compiler that is
too detailed for the compiler to apply safely program transformations,
but not detailed enough that the programmer can indicate which
tranformations are possible/appropriate, the alternatives are:
1) take the loss but whine :-).
2) have mind reading compilers.
3) have compilers make dangerous assumptions by default.
4) use a language that defines a more detailed, safe, interface.
5) use a language that defines a less detailed, safe, interface.
1) is what most people are content with; 2) by marketing departments; 3)
is advocated by aggressive optimization proponents as an approximation
to 2); 4) (and also 5), where appropriate) by you and me.
Unfortunately you and me (and a few others) are a small minority. The
majority could not care less about architecture. They want more, more,
at no cost!
| Relying on "a one pass code generator called a programmer" is inadequate.
Which is exactly the point I made in the following paragraph you
didn't quote. I'm glad we agree. I pointed out that typical programmers
don't write code with even the obvious optimizations.
| Programmers aren't 1-pass; they go over the code repeatedly, during
| the initial creation and, more importantly, during maintanence.
I'll pass on that. I have the feeling that in good modular code the
programmer writes it, looks for typos, and then leaves it alone.
[about a->b->...->c[n] being used more than once]
> And you should do it, not the compiler. In many cases,
> such chains contain pointers that are extremely volatile
> and can be changed by the OS and/or other processes. You
> know when this is a possibility, but the compiler cannot
> be expected to be so omnicient.
A normal CSE in an ANSI C compiler can handle this correctly; if there
The real problem for compilers in doing good optimization is that no
existing language has enough info to allow the compiler to do best
optimization, and it's not clear that the source code to provide that
information (a) is smaller or easier to read than the hand
optimization, or the (b) as portable and maintainable as the hand
optimization.
The assumptions you have to make to optimize may not be true if
another part of the program changes. This presents a maintenance
problem, in that hand optimized source code is unambiguous and will
work or not, but bad hints to a compiler will not cause problems on
that machine if the compiler ignores the hints.
Consider the case where you hint that certain variables are
non-volatile, or that a procedure has no side effects. A non-optimizing
compiler won't care, and the code will work fine, *even if the
assumptions are wrong*. When the same code is moved to an optimizing
compiler it will break, with little clue to the problem.
Considering the wealth of information needed to safely do maximum
optimization, I have serious doubts that a useful language would be
readable while containing all that info, or that the effort needed to
provide it and keep it correct would be cost effective. Consider the
maze of C programs using _const_ and _volatile_, and casts of address
arguments to indicate lack of side effects... all to allow the compiler
to optimize, and all reducing portability, because they can't be tested
on a compiler which doesn't use them.
To a certain extent there is some "purism" involved. To defend coding
a quick sort instead of a bubble sort as "a better algorithm" and
condemn moving a few expressions out of loops or calculation of common
subexpressions as "worthless hand optimizing" and "making code hard to
read" is inconsistent at best.
I tested code using Duff's device to unroll a loop, only by factor
three, and compiled on the SCO C compiler, Xenix and SPARC gcc, SunOS
levels 2,3,4, Vax C for Ultrix (vcc), Digital RISC C and MIPS C, Convex
C and vectorizing C, Encore parallel C, and Cray (UNICOS) C (plain cc).
On every compiler without exception the loop was not unrolled, and a
saving of 30% in CPU time was gained from hand unrolling.
I certainly don't claim that this shows hand optimization is needed
for all programs. Indeed, the only place I ever felt unrolling was
needed was in a loop called 600 million times, and that was not only
unrolled, but the assembler output was hand polished to make best use
of the FPU. But I don't see that the most common compilers do as good a
job as a good programmer; the programmer can indeed make the algorithm
better while the compiler can do machine dependent stuff, and things
the programmer missed.
Since my original post was actually in support of optimizing compilers
because most programmers today don't even do simple optimization, I
stand by that, and add that common production compilers don't do as well
as I had assumed they did. I also conclude that using profiling to find
hot spots, then working on them is still appropriate for some programs.
> Tiggr
Clarification: Robert said _processes_, not functions. Your point
assumes
that the address space of the code being compiled is protected, e.g. by
a
memory manager or by guaranteeing non-preemptive single threaded
execution.
This may be a convenient programming model, but there are others
(e.g. memory-mapped I/O).
Adam
Perhaps the ideal solution would be a compiler comment semantic
that tells the compiler that such finagaling as you desire is
safe or unsafe.
Like noalias?
Jeffrey Siegal
Well, you also have to be careful about stores that the compiler cannot
be sure don't mess up the intermediate pointers.
However, I think that you will find that the better cases of the current
crop of optimizers are really pretty good at these things, although
they indeed cannot be omniscient.
Some of this discussion reminds me of the arguments used N years ago
about whether one dare take the performance hit of moving from
Assembly to C .... Plauger once wrote a great article about his
intermediate language "LIL" (a structured assembly language, basically)
and why its time had come ... and passed. (Basically, Dennis kept making
C generate better code :-)
There is still no substitute for humans thinking about good algorithms.
It's clear for years and years that having humans worry about low level
details that practical compilers could do, was a waste of time.
Every year, the level of detail that a compiler could do gets a little
better.
--
-john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: ma...@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash
DDD: 408-524-7015, 524-8253 or (main number) 408-720-1700
USPS: MIPS Computer Systems MS 1/05, 930 E. Arques, Sunnyvale, CA 94088-3650
Common Subexpression Elimination should eliminate this sort of memory
reference chain - except that memory aliasing often makes it difficult
to tell if the value of a->b used in the chain a->b->c... has been
changed.
In the C environment, with parameter pointers all over the place, this
would require interprocedural analysis. Of course, most people don't
even do "global" analysis within a procedure to handle this sort of thing.
Separate compilation means that you have to worry about worst case
aliasing. One thing that I would like to see more of is compilation of
two (or more) versions of a procedure - one assuming maximum aliasing,
one assuming minimum aliasing. Which to use could be determined
statically: the compiler, when it generates a call, would say whether
the call had potentially aliased parameters or not, and the linker
would choose the appropriate version of the procedure. This
information could be propagated down the call tree. Or the information
could be used dynamically (for really big procedures).
--
Andy Glew, gl...@ichips.intel.com
Intel Corp., M/S JF1-19, 5200 NE Elam Young Pkwy,
Hillsboro, Oregon 97124-6497
--
Send compilers articles to comp...@iecc.cambridge.ma.us or
{ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.
In the example case of multi-hop pointer chasing in C, there is no way in the
world for any compiler to divine whether that is safe to eliminate such
chains ("volatile" declarations being a cruel hoax at best). And if the
programmer can't help, then it either doesn't matter (ie, if the programmer
doesn't know any reason it's dangerous, it is probably safe), or the guy gets
what he deserves for writing a program he doesn't understand. (I'm thinking
here of places in device drivers where such things MUST be used for reasons
the programmer has no control over but which must be understood in
excrutiating detail.)
So, it doesn't break my heart that the random C compiler doesn't do Cosmic
Subexpression Elimination. Frankly, I wouldn't trust it if it thought it
could because I have as much faith in the compiler's dark corners as I do any
other piece of software's dark corners. Good compilers are no substitute for
good programmers, and any attempt to make them so is at best misguided.
HOWEVER, this is NOT to say that good compilers cannot materially assist any
programmer of any skill level. It's just that the best approach is
controllable intervention just in case the compiler's omniscience falters.
(yes, I know that most C compilers have knobs, too.)
-Mike O'Dell
The problem is that many of the things people talk about as "algorithm"
are actually machine-specific optimizations. For example, the desirable
degree of loop unrolling is heavily influenced by things like cache
design. In an ideal world, these are things you *would* want to leave to
the compiler, so that your code would run well on any machine, rather
than well on one and poorly on others. When you stir in issues like
cache associativity and TLB misses, a remarkable number of "algorithm"
issues become "machine dependent stuff".
In real life, of course, we live with limited compilers, and do end up
doing some of the work ourselves. But we should not kid ourselves by
saying that this is natural and desirable, or pretending that we aren't
doing machine-specific optimizations by hand.
--
Any program that calls itself an OS | Henry Spencer @ U of Toronto Zoology
(e.g. "MSDOS") isn't one. -Geoff Collyer| he...@zoo.toronto.edu utzoo!henry
I disagree, for two reasons. First, current optimizers are incompetent.
Last year I posted to comp.lang.misc a thirty-line description of an
optimization which I had done. It was obvious that the technology
necessary for that optimization---e.g., range analysis, simple algebra,
range invariant introduction---was way beyond the reach of compilers we
know how to write.
Several people thought I was trying to show off. What they didn't
realize is that this optimization was no more than the statement that if
you've used some of the bits in an 8-bit byte, then you have between 0
and 7 bits left. In context this is obvious to even novice programmers.
How long do you think it'll be before compilers can do this? (For the
curious: The resulting code is in bit_printbuf() in bitout.c in my
yabbawhap package, comp.sources.unix volume 24. Can your compiler prove
that curbb8 is between 0 and 8 after line 29?)
The original whap program ran three to four times slower than compress,
depending on the machine. The MIPS compiler produced a 25% speedup; the
somewhat smarter Astronautics ZS compiler produced a 40% speedup on a
similar architecture. After several dozen hand optimizations like the
bit output optimization above, I had whap running nearly as fast as
compress with no help from the compiler. With -O it runs faster on some
machines (e.g., a Sun 3).
Keep in mind that AP coding inherently does about three times more
dictionary manipulations than LZW coding. I didn't use a radically
different data structure in whap than the hash trie in compress. Where'd
this 2x-3x speedup come from? Hand optimization, at a level way below
``replace insertion sort with heap sort'' but still above what the MIPS
or any other compiler could comprehend.
I think that most CPU-intensive programs can see similar benefits from
hand optimization. Available compiler technology simply isn't competent
to achieve these results. People are!
(Note that I am not arguing for optimizers to be eliminated entirely.
There are some transformations---notably smart instruction scheduling---
which are not only impossible to express in high-level code but which
compilers really can do reasonably well. My final yabbawhap code still
gets a speedup of over 20% under the ZS scheduler. I see no reason to
give that up.)
Henry says that you should leave transformations to the compiler merely
because they produce worse code on some machines. I can sympathize with
this sentiment. The programs in my yabbawhap package, for instance, run
faster on most architectures with -DPTRS, and faster on the Convex with
-UPTRS. But did I ignore pointer optimizations merely because they
should be turned off on some machines? No! I provided the PTRS define
for the user to choose between the code without the transformation and
the code with the transformation. (The total cost in source lines of
supporting both versions was 24 lines. Long live macros.)
If you follow Henry's advice, you're leaving yourself at the mercy of
compilers which, as noted above, are just too stupid to perform most
optimizations which are obvious to any human who takes the time to look.
Chances are your code will work poorly on most machines. If, on the
other hand, you provide a switch to choose between different versions,
your code will work well on all machines.
One disadvantage of the latter strategy is that (feedback compilers
aside) a human has to get into the loop to set the switch at some point.
The yabbawhap README has to take time out to note that on a Convex you
should set -UPTRS. Even this can be solved. In my Q language I have a
``quickpick'' control structure which asserts that any number of
sections of code do the same thing:
quick
pick // new syntax...
a = 3; b = 5; if(x) a = 7; b = 11; fi
pick
if(x) a = 7; b = 11; else a = 3; b = 5; fi
end
The compiler is free to choose the form which results in fewer
instructions, or a lower expected run time. (Currently q2c always
chooses the first form, but it's not a high-quality compiler.) Of
course, this can be combined with preprocessor macros to let a human
override the compiler's decisions.
Obviously this could be added to any language. I think it would
completely answer Henry's objections to ``machine-specific
optimizations'' done by hand. I think it's perfectly natural to specify
the same code in several different ways for several different
architectures, just as I find the following data-rotation code in C
quite natural (y is a local array):
#ifdef VECTOR
for (i = 0; i < k; i++) y[i] = x[i + r - k];
for (i = k; i < r; i++) y[i] = x[i - k];
for (i = 0; i < r; i++) x[i] = y[i];
#else
for (i = 0; i < k; i++) y[i] = x[i + r - k];
for (i = r - 1; i >= k; i--) x[i] = x[i - k];
for (i = 0; i < k; i++) x[i] = y[i];
#endif
I'm not going to write just the #ifdef VECTOR lines and pretend that a
scalar compiler will know what I mean, or vice versa. I think of the
code in both ways, and like Schrodinger's cat I'll gladly defer the
decision between one and the other until the last possible moment.
---Dan
Yes, it was a great article. Lil was one of the last successors of
PL/360, which was a *great* way to write IBM 360 machine code. But
those days are gone, and programmers should be moving on.
>There is still no substitute for humans thinking about good algorithms.
Right. And, if application performance is the issue, then humans
should probably be thinking about how to improve locality, or how to
mask latencies.
--
Don D.C.Lindsay Carnegie Mellon Robotics Institute
| In real life, of course, we live with limited compilers, and do end up
| doing some of the work ourselves. But we should not kid ourselves by
| saying that this is natural and desirable, or pretending that we aren't
| doing machine-specific optimizations by hand.
People can get carried away with this, but in general the simple
optimizations are not dependent on any particular machine. Moving code
out of loops seems to fall in that case, because doing something N times
may take less than N times the resources it take to do it once, but not
less time than *only* doing it once. And invariant code moved out of a
loop makes it clear that something is done once, making the code easier
to understand.
Putting common subexpressions in *well named* variables doesn't seem
to hurt readability, and most people find somthing like "LimitValue" to
be easier to read and understand than "(MaxVal + Overflow +
FudgeFactor)" if used in many places, and a lot easier to type, too! The
example posted using dozens of meaningless temp variables was a good
example of hard to maintain programming, but the whole program fragment
looked like an argument for more meaningful names.
Getting the most out of the Cray is an art, but the code is very
specific, and I certainly wouldn't argue that it should be in the
source. That's the kind of thing a compiler can and does find, as
opposed to common subexpressions in multiple statements, possibly
involving things the compiler shouldn't assume are unchanging.
I think this is too strong a statement... MIPS compilers (and at least
a few others surely must) can be quite sure in some cases that it is
perfectly safe to eliminate such chains, and do. Of course, there are
many cases in which it would be safe, but the compiler cannot know,
whereas a human might. (And I think this is what Mike meant (?)
However there are still many cases where it
is possible for the compiler to know:
a = p->q->r;
b = p->q->t;
suppose a and b are register variables: it is clearly safe to genrate code
like:
lw r1,p
lw r2,0(r1) q
lw r3,4(r2) r
lw r4,8(r2) t (skipping reload of r2)
However, any good global optimizer does enough tracking of addresses,
and certainly knowing if the address of a variable has potentially
gotten loose, to effectively act like local variables are
declared register.
I have NO idea of the statistical usefulness of such optimizations.
I do KNOW that it works, and that simple changes that moved code from
safe to unsafe caused the extra reload to happen (as in making "a"
an external variable.)
--
-john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: ma...@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash
DDD: 408-524-7015, 524-8253 or (main number) 408-720-1700
USPS: MIPS Computer Systems MS 1/05, 930 E. Arques, Sunnyvale, CA 94088-3650
--
This brings up something I've wondered about. Why do "Standard Unix" linkers
only include entire object files- ie every function inside the object file-
rather than the "smart linking" which is standard on Macs and PCs where only
the actual functions you use in a prticular object file are used? It was my
understanding that Next use a linker which does this smart linking, but that
AIX, SunOS and most other vendors do not.
Maynard Handley
lindsay> In article <70...@spim.mips.COM> ma...@mips.com (John Mashey) writes:
mash> Some of this discussion reminds me of the arguments used N years
mash> ago about whether one dare take the performance hit of moving from
mash> Assembly to C .... Plauger once wrote a great article about his
mash> intermediate language "LIL" (a structured assembly language,
mash> basically) and why its time had come ... and passed. (Basically,
mash> Dennis kept making C generate better code :-)
lindsay> Yes, it was a great article. Lil was one of the last successors
lindsay> of PL/360, which was a *great* way to write IBM 360 machine
lindsay> code. But those days are gone, and programmers should be moving
lindsay> on.
Seconded by me, for what it's worth. But note: C is really PL/360 a few
years later... The writing is on the wall.
mash> There is still no substitute for humans thinking about good algorithms.
lindsay> Right. And, if application performance is the issue, then humans
lindsay> should probably be thinking about how to improve locality, or how to
lindsay> mask latencies.
Locality? Latencies? Oh no, another sucker like me that has not yet
understood that cheap RAM prices mean that we no longer have to worry
about locality, and big fast caches mean that we no longer have to
worry about latencies (and ever faster CPUs mean that we no longer have
to worry about compiler speed, ...).
I hope you don't have family to support; this article of yours will put
you too on the blacklist of all manufacturers out there: "he is one of
those subversive locality/latency bolsheviks that want to destroy the
American Programming Dream". :-)
According to te...@venus.sunquest.com (Terry R. Friedrichsen):
>Mostly, things like collapsing common sub-expressions, moving loop-
>invariant code, and unrolling loops hurt the readability of the code...
I disagree. I consider code to be clearer if I can mentally file an
expression away under a (variable) name for later use.
--
Chip Salzenberg at Teltronics/TCT <ch...@tct.com>, <uunet!pdn!tct!chip>
"He's alive; he's fat; and he's fighting crime. He's Elvis: FBI."
>This brings up something I've wondered about. Why do "Standard Unix" linkers
>only include entire object files- ie every function inside the object file-
>rather than the "smart linking" which is standard on Macs and PCs where only
>the actual functions you use in a prticular object file are used?
Mainly because most Unix object module formats don't have any format to tell
you where functions/procedures begin and end. They provided libraries or
archives to allow the level of "optional" inclusion you are referring to;
i.e. if you *want* separately included functions, build separate object
files, and use the archiver to make a library.
-------
Marc Mengel
men...@fnal.fnal.gov
[Actually, DOS and most other object files have the same problem, so you
have to make libraries explicitly. -John]
Because of the Daemon of Eternal Compatability with the Evils of the Past, in
conjunction with the Daemon of Premature Standardization on the Evils of the
Future. People who write compilers for modern languages like Ada and Oberon
often use custom linkers, but that doesn't help the general problem.
It wasn't standard on PC's for a while, mind you! (In fact, the uSoft
linker for MSC still links in the entire module.)
The reason it generally doesn't exist under *nix is because it's generally
not worth the bother. It's *possible* to do it, but, on most systems, you
don't gain much. (On a PC, you want to save as much space as possible, and
compacting text space has the added advantage that intersegment calls and
jumps can turn into intrasegment calls and jumps, which are quite a bit
faster.)
--
Sean Eric Fagan
s...@kithrup.COM
>Locality? Latencies? Oh no, another sucker like me that has not yet
>understood that cheap RAM prices mean that we no longer have to worry
>about locality, and big fast caches mean that we no longer have to
>worry about latencies (and ever faster CPUs mean that we no longer have
>to worry about compiler speed, ...).
>
>I hope you don't have family to support; this article of yours will put
>you too on the blacklist of all manufacturers out there: "he is one of
>those subversive locality/latency bolsheviks that want to destroy the
>American Programming Dream". :-)
And why the heck not? You know that as soon as the next price/performance level
pops downward, you'll be bemoaning the lack of the 128-bit word, lack of
disk storage ("Why, 1.2 Gb at 7 millisecond access time JUST isn't enough") or
some other windmill.
"Honey, you know I would NEVER wear black silk underwear"
-- > SYS...@CADLAB.ENG.UMD.EDU < --
(-: Face red, angry reaction suppressed, hold my breath and count to ten :-)
You gain a huge amount! I've got an RT on my desk with 2 meg of RAM and 180
meg of disk. I run X-windows, and every time I fire up a window, the system
sits there thrashing for 15 seconds trying to get the thing crammed into
memory. The X library is huge, and most X applications don't need half of it.
I'd guess that most X windows users could get by with half the memory they
currently need if the linkers were smart enough to include only those parts
that are required.
I admit that 2 meg is considered small for an RT, but you can't get RT memory
very easily these days, except from IBM, and they charge an arm and a leg (I
only have two of each and I want to keep them!).
Furthermore, it hardly takes any bother to make a linker that can selectively
link only those routines in the library that are needed! I've written one in
a few weeks! Actually, I've written two of them, one is described in a paper
I published in Software Practice and Experience, Volume 13, Number 8, August
1983, titled Assembly Language as Object Code. The upshot of that paper is
that the source-code inclusion, conditional assembly, and symbolic address
calculation facilities of any decent assembler are exactly sufficient to do
smart linking. There's a paper I cite there, by Fraser and Hanson, where they
show a similar result, starting with a flexible linkage editor and
demonstrating that it could be used as a rudimentary assembly language.
Doug Jones
jo...@cs.uiowa.edu
[I'd be interested to hear whether shared libraries that are page faulted
into the address space alleviate the performance problems of library bloat.
I can imagine that it might or might not, but data would be interesting.
-John]
Sorry, I cannot resist. Those of us old enough to have used lots of
computers before Unix existed remember when all {linkers|loaders|link-editors}
worked this way. The correct terminology is not "Unix linkers" vs.
"Smart linkers" but "Dumb linkers" vs. "Standard linkers." ;->
[Eh? The PDP-6 loader in the late 1960s certainly did it the dumb way. And
wasn't the 1965-vintage OS/360 link editor pretty dumb, too?]
What unix are you using? There *is* a difference between libraries and simply
groups of object (.o) files that are specifically included into an executable.
My unix experience (admittedly 4.2/4.3 with Unicos as my only System V system)
has been that *real* libraries are scanned and only those routines that are
called are included. I just finished using "nm" to verify this on an
executable.
Granted, if you include a .o file with an unused entry point (procedure), it
is kept in the executable.... and omitting it would make debugging more
difficult in certain cases.... ie, a routine *only* called when you type in a
debugger command... such as one to check the validity of some complex data
structure you think might be crapped out... but you only do it from the
debugger prompt.... omitting that would require a dummy call somewhere to get
it included.... then the optimizer might determine that the "dummy call" is
never reached and remove that........ use a library, the -l link option to
specify said library and stop complaining..... surely there are enough
optimization problems (and the like) to minimize the importance of this when
there *is* a workaround....
[The point was that if there is more than one routine in a .o file, even a .o
file in a library, the entire .o is dragged into the executable, even though
the rest of the routines are not useful and may themselves have external
references that cause yet more unneeded stuff to be included. -John]
|> i.e. if you *want* separately included functions, build separate object
|> files, and use the archiver to make a library.
1) This doesn't work with all the nice and big existing libraries;
2) It's no good to create zillions of individual object files out
of zillions of *.c's;
3) If we really care about compilers eliminating dead code, why having
a linker so dumb that it links in megabytes of dead functions ?;
I think that ld(1) is close to braindamaged. Just to mention it, the fact that
it doesn't complain about multiple definitions is (ok, ANSI) really dangerous.
Cheers, Henrik
MPCI at LLNL
IBM Research
U. of Munich
PS: I know it's not specifically comp.arch, but I really dislike ld.
PS/2: See you at HotChips !
|> The reason it generally doesn't exist under *nix is because it's generally
|> not worth the bother. It's *possible* to do it, but, on most systems, you
|> don't gain much.
Why do you think they invented shared libraries ? Because the duplication of
large objects files im memory and on discs is not very clever. So, I guess
having significantly smaller objects is not that undesirable.
Cheers, Henrik
MPCI at LLNL
IBM Research
U. of Munich
PS: See you at HotChips !
[All this ld bashing is getting tedious, so I am declaring this particular
line of discussion closed, unless someone wants to talk about the design of
a linker that he really likes. Personally, I never understood why block
structured linkers that let you export symbols to a group of modules without
exporting them to the whole world never caught on. It would make stuff like
multiple yacc parsers in a single program much easier. -John]
HOWEVER, in my experience neither the appropriate use of the existing tools
NOR a new "smart" linker (which accomplishes that same end via a different
path) will solve the true problem -- library bloat.
Library bloat is almost always caused by poor design or coding practices in
the run time libraries. Usually the situation is like this:
I call some library routine which is simple and therefore small. I discover
much to my horror that my executable has grown in size much more than I
expected. Obviously I got more than I asked for, conclusion: a smarter linker
would not have include the extraneous stuff.
Usually this is an incorrect conclusion, what I have usually seen is that the
"simple" library routine calls some other library routine (frequently to
construct an error message) that calls another library routine, ... Eventually
you find a call to the formatted I/O library which is invariably HUGE. For
example:
routine(...)
{
...
sprintf(bigstring,"%s%s%s",stringa,stringb,stringc);
...
}
Here the sprintf routine is being used to concatenate three strings. However,
due to the poor design of the C file/string I/O library the sprintf library
routine must be prepared to handle all possible format strings (even messy
ones like floating point numbers). This "extraneous" code is what causes the
code bloat.
sysmgr> In article <PCG.91Au...@aberda.aber.ac.uk>, p...@aber.ac.uk
sysmgr> (Piercarlo Grandi) writes:
pcg> Locality? Latencies? [ ... anybody who worries about them just ...
pcg> ] is one of those subversive locality/latency bolsheviks that want
pcg> to destroy the American Programming Dream". :-)
sysmgr> And why the heck not? You know that as soon as the next
sysmgr> price/performance level pops downward, you'll be bemoaning the
sysmgr> lack of the 128-bit word, lack of disk storage ("Why, 1.2 Gb at
sysmgr> 7 millisecond access time JUST isn't enough") or some other
sysmgr> windmill.
Well, I am not one of the horsehair-and-fasting ascetic guys that hark
back to the good days of the punchaed card and the 16KB central memory.
To me, more is better, but not merely in quantity; to me price
performance matters too. If a 16MB central memory is full of badly
designed windowing code, I cringe, but not because I'd rather have a
1MB central memory, but because I'd rather have 16MB full of more
functional better designed code.
Unfortunately programming talent is the scarcest resource, and it pays
to waste more physical ones in order to be able to afford less skilled
programmers. The problem therefore lies not in the overabundance of
resources, but the scarcity of talent to make abundant resources as
productive as they could be.
My memory is now fading, but I believe the following is essentially correct.
On the PQCC project we (mostly Joe Newcomer) invented a "msgscan" procedure,
much like [...]printf in its user interface, that didn't have to suffer from
"library bloat". The main idea was that, instead of the code for %f directly
calling the floating point procedures, it'd indirect through a table of
procedures indexed by format code (all with a standard calling sequence). By
controlling the creation of the table you could control which routines got
loaded. To play it real safe, you could include the whole world of
formatting, but you didn't have to. Empty table entries were initialized to
point to a generic "missing format code" procedure. We mostly did this to be
able to dynamically add user-defined format codes.
It's interesting how many problems can be solved by adding a level of
indirection.
With a smarter linker, I suppose the near-equivalent might be some facility of
mapping references to a particular external name to a different name (the
generic one), though that's less dynamic than Joe's scheme. With all the
interest in module interconnection languages over the last 15 years, I'm
surprised I haven't seen more people do this.
[Some years ago, I saw a Fortran compiler that looked at all of the format
statements in a routine and emitted dummy external references to force in the
runtime code corresponding to the format types actually used. My memory
fades too, but I think that it was the old F4/F40 compiler for the PDP-6 and
PDP-10. By the way, the insides of F4 were extremely similar to IBM's
Fortran G for OS/360. I always wondered who wrote it. -John]
Well, I'm not a linker whiz. But dynamic linking is definitely my area, so
I'll say a few words about it. Warning: I'll make a few simplifying
assumptions.
* The designers of VMS (1970s) considered allocating 1Gb for shared
libraries. The goal was that standard libraries would be placed at
standard locations. However even at that time they decided that they
would rapidly run out of address space. Not all systems would have
all libraries, so the disk space requirements could be much less than
1Gb, but assigning addresses across all systems would use a lot of
address space.
* In most systems, the dynamically-linked libraries can be placed at
any memory location. Implications: they must be position-independent
code, and the caller (your program) doesn't know the link address
statically (at compile/static link time).
* It is generally assumed that the text segments will not be modified
and can thus be shared. The libraries may be mapped at different
addresses in different processes. As a result, some kind of
indirection is required on library calls. The one I've seen most
commonly on RISC machines is to jump to some code in the (non-shared)
data segment that is set at dynamic link time to jump to the target
library routine.
* In SunOS, the code and data segments are adjacent, so a pc-relative
reference always lets you find the library's static data. In the
RS/6000 there is a special data segement register. On each
library call, the data segment register must be spilled and loaded
with the target library's data segment pointer; on return, the
original data segment register is reload from the stack.
* In System V, each (dynamically linked) library is assigned a major
version number and a minor version number. Each revision that
maintains the same external interface gets a new minor number.
Changes to the external interface get a new major number. When an
executable is compiled, symbols are resolved but not bound, and the
name and major and minor number of the library used to resolve the
symbol are stored in the executable. At runtime the dynamic linker
searches for the newest minor number with a matching major number.
* The System V dynamic linker can be told to link all libraries at
program startup, or to link libraries incrementally. If a library is
linked incrementally so that each function is linked to each call
site only if it is called, and the linking is done when the routine
is first called.
* Under System V incremental linking, the dymamic link tables are
initialized so that a call to a not-yet-linked routine will instead
call the dynamic linker. The dynamic linker will then map the
library and invoke the given routine. Libraries that cause
unresolved data referfences are mapped at program startup and the
data references are resolved. If the newly-linked library itself
causes an unresolved data reference, then the dependent library is
mapped and so on.
* The System V dynamic linker has a search path for libraries. In
addition, the user may set an environment variable that adds to the
search path. For suid programs, the user-defined search path is
ignored.
* In SunOS, when a library is found, the library is mapped in to the
process's address space, but pages are not brought in to memory until
they are referenced. Pages may already be in memory if another
process has been using that library.
* Incremental update is tricky because e.g., the process may get a
signal and try to call library function X at the same time that the
dymamic linker is trying to resolve function X.
* In System V, the dynamic linker is an interpreter in the sense that
all programs starting with the `#!' magic number are interpreters.
However, in SunOS, a special magic number is used. Dymamiclly-linked
programs are scripts so that the dymamic linker is started before the
program starts executing.
* All System V-compliant application programs are required to import
certain machine-specific information using the dynamic linker.
To summarize:
* Shared libraries can be mapped at different addresses in different
processes. Thus, the routine addresses are not known at static link
time and the library must use position-independent code.
* Dymamic linking can be done entirely on program startup, or can be
incremental so the full linking cost is only paid for routines that
are actually called during a given execution of the program.
* Dynamically linking external static data requires binding at startup.
* Version control is via major and minor numbers. A system must keep
all major numbers needed by all programs on the given system.
* Libraries are searched at compile time so unresolved symbols can be
detected staically.
* The same library is discovered dymamically using a library name and
a search path.
I think that covers most of the major points. For further information,
see:
%Q AT&T
%T System V Application Binary Interface
%D 1990
%I Prentice-Hall
%P 5-12 to 5-21
%Q AT&T
%T System V Application Binary Interface SPARC Processor Supplement
%D 1990
%I Prentice-Hall
%P 5-1 to 5-11
;-D oN ( The missing link ) Pardo
[The System V libraries mentioned above are the V.4 ones that are almost but
not quite compatible with the ones in SunOS. V.3.2 has an awful reserved
address scheme similar to the one rejected for VMS. -John]
The FORTRAN 4 compiler for RT-11 on PDP-11s did that. The linker allowed a
runtime object module to specify only one pointer in a global table:
.csect sys$p ro,glb,ovr
.=.+26
.word i$fmt
will initialise the 12th word in sys$p and leave the rest alone.
(The formats were translated to an intermediate form where each format letter
became an offset into the table. Of course, the runtime format interpreter
module had to include references to all the format conversion routines.)
--
Lars Mathiesen, DIKU, U of Copenhagen, Denmark [uunet!]mcsun!diku!thorinn
tho...@diku.dk
That's true as far as it goes, but there's a subtle problem there: smart
programmmers are a renewable resource only given the right conditions. The
wrong conditions (overabundant hardware resources chief among them) produce
a monotonically decreasing skill level of the average programmer. The
solution to the problem of programmers writing slower, more efficient code
does not lie in "Oh, that program's slow. Let's buy more MIPS" exclusively,
for that way lies GNU EMACS; rather, programmers must be shown the hard
way what happens when they don't pay attention to efficiency.
--
Jay Maynard, EMT-P, K5ZC, PP-ASEL | Never ascribe to malice that which can
jmay...@oac.hsc.uth.tmc.edu | adequately be explained by stupidity.
"If I can do without comp.sys.amiga in my bedroom, you can do without
rec.arts.startrek." -- Peter da Silva, to his wife Stephanie, in news.groups
That's really an ellegant way of expressing it. Unfortunately it's
only part of the answer. Limited languages have only two scopes: local
(or procedure) and global. C adds "compilation unit" scope for both
variables and procedures, and this makes it very hard to separate part
of those unit, since even though the name of a static element is not
visible, the visible procedures may provide the address of the items.
The full answer is what you said, "that's the way we always did it,"
but also "because it's quite hard to do right and we wanted it to work
correctly."
Honestly, unless someone has been very careless with their library,
the penalty is pretty samll in terms of current memory sizes, and on
many systems the library is shared and it make no difference since it is
all in memory anyway.
I am very much in favor of tight code and good practices, but not at
the cost of reliability, and not at the price of diverting resources
which could solve problems which are more pressing than this one.
--
bill davidsen (davi...@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
GE Corp R&D Center, Information Systems Operation, tech support group
Moderator comp.binaries.ibm.pc and 386-users digest.
"Stupidity, like virtue, is its own reward" -me
But the sprintf (_doprintf, really) bashing is misguided. Even the floating
point part of sprintf can be coded in a couple of pages of C, and the object
file size is not that big. It may have mattered on IBM PC or microcontrollers
but it is _nothing_ compared to, let us say, X libraries. I am sure that the
general idea is correct---if you try to link standalone executable images,
you will get code bloat. Yet another argument for shared libraries.
--
przemek klosowski (prz...@ndcvx.cc.nd.edu)
Physics Department
University of Notre Dame IN 46556
[It's true. On my SysV machine, the entire shared libc is only 26K. The X
shared library is 213K. -John]
| 2) It's no good to create zillions of individual object files out
| of zillions of *.c's;
Given that random libraries search quickly and the file is executed
more times than it's linked, why not? If I accept the original premise
that this code is bad, what's wrong with the "many parts" approach?
Certainly we have the tools to handle either large compilation units
or small, by including the source code for example, and if the time to
link is too high with fine granularity, a coarse grained library can be
kept also.
| I think that ld(1) is close to braindamaged. Just to mention it, the fact that
| it doesn't complain about multiple definitions is (ok, ANSI) really dangerous.
I have not seen any linker which helps less than the VMS linker which
neither detects duplicate declarations in various modules, but also
ignores variables declared extern and then never defined.
>HOWEVER, in my experience neither the appropriate use of the existing tools
>NOR a new "smart" linker (which accomplishes that same end via a different
>path) will solve the true problem -- library bloat.
>I call some library routine which is simple and therefore small. I discover
>much to my horror that my executable has grown in size much more than I
>expected. Obviously I got more than I asked for, conclusion: a smarter linker
>would not have include the extraneous stuff.
An anecdote concerning library bloat: the early implementations of the Sun
NeWS server provided a client-side library for sending PostScript to the
server. As per X11, they sugessted that you set a NEWSSERVER environment
variable so that the client knows the network address of the server, but
unlike X11 they didn't allow you to put a machine *name*, only the actual IP
address. I sent them patches to allow a machine name, and was told that they
had deliberately not done that because "you need gethostbyname and that pulls
in all of the Yellow Pages code" with the result that their little
demonstration binaries grew by several hundred percent....
>Library bloat is almost always caused by poor design or coding practices in
>the run time libraries.
Or raging functionality. Actually, I can't see any way that this could be
avoided, though it is true that YP stuff in libc makes use of things like
sprintf.
--
% William Roberts Internet: li...@dcs.qmw.ac.uk
% Queen Mary & Westfield College UUCP: li...@qmw-dcs.UUCP
% Mile End Road Telephone: +44 71 975 5234
% LONDON, E1 4NS, UK Fax: +44 81-980 6533
On the other hand, on my system:
hpmonk% ld -u _printf /lib/libc.a
hpmonk% size a.out
22984 + 3568 + 18436 = 44988
hpmonk% size /lib/libc.sl
480812 + 44728 + 96976 = 622516
This bloat comes from support for such things as quad precision conversion
("long double") and multibyte Kanji characters / "NLS" (Native Language
Support). All of which would get dragged in even by a "smart" linker, unless
it actually parsed printf() arguments, as someone suggested for FORTRAN.
My libX11 is 216K, which compares nicely to your 213K. Thus our systems appear
to be otherwise apples-to-apples.
--------------
Marc Sabatella (ma...@hpmonk.fc.hp.com)
If your 16MB machine is full of "badly designed windowing code," you
have no one but yourself to blame. No one's forcing you to use it, and
no one's stopping you from doing a better job. If you're using it, then
bad as it is, it must be doing something you like. If you're not
building a better one yourself, then it can only be because the size of
the task prevents you in some way from doing so.
> Unfortunately programming talent is the scarcest resource, and it pays
> to waste more physical ones in order to be able to afford less skilled
> programmers. The problem therefore lies not in the overabundance of
> resources, but the scarcity of talent to make abundant resources as
> productive as they could be.
The idea that operating systems (or compilers, or windowing systems,
or...) are large and inefficient because of insufficient skill and care
by programmers is a pernicious myth.
Whether we like it or not, we live in a time where hardware performance
and practical memory sizes are increasing at an astonishing rate, while
software productivity has remained more or less constant. In such a
time, the proper emphasis of software development is on the timely
delivery of functionality, not on software performance or compactness.
Consder, for example, a lonely OS group within a company that has to
choose between two tasks, supporting a new processor or improving its OS
performance by 20%. Suppose further that processor speeds are doubling
every two years or so, that the OS consumes about half the machine's
cycles, and that either effort will take about six months. If they
decide to work on software, they will only improve their machine's
performance by about 10% total, compared to a machine performance curve
that increased by 20% in the same amount of time. The choice of working
on software performance leads to lower performance for the system
overall.
In order for the OS group to focus performance, they would have
to expect a 40% performance payoff for those six months of work. Even
if they can do that once, I seriously doubt they can sustain that over
several releases.
Of course, the real situation is more complicated--software performance
bugs can eat up cycles even faster than the hardware folks can provide
them, processor performance is not continuous, bad software adds up,
etc.--but the principle holds in practice. Under current conditions,
the most important thing that software folks can do to improve system
performance is to keep hardware releases on time. Even when the people
involved don't understand that (and they often don't), competitive
pressures often force them to make the right choices.
--bob--
Speaking only for myself.
For my $0.02, PCG is dead right.
I have no idea how we as a community, go about solving this problem.
I do know that if we don't do it soon, it will come back to haunt us all.
We already live in an age where commercial customers cannot move forward
onto newer technology because of the impossibility of re-creating or
upgrading their software base. In the commercial world it is typical for
70% of the entire DP budget to be spent on maintenance. Meanwhile, the
application backlog gets longer and longer.
Slow iron has long since ceased to be the problem. OS technology (even
ack - UNIX) far outstrips the enabling technologies (database, tp monitor
etc). However, at least these are more or less late 70s or early 80s
technology. The vast bulk of the world's application base is built with
60s technology. If you don't believe that this hurts, go ask a few
commercial customers.
Chris.
--
"Garbage. This show would insult a 6 year old! and I should know"
- Calvin
This is a largely specious argument. If you look at the software lifecycle,
a well designed piece of software will spend 90% of its life in production.
Therefore it will be in maintenance for 90% of its life. As we know, no
business sits still very long and software must change along with the
organization. If the software cannot, it is either thrown out or ignored.
The stuff that's thrown out gets reasserted as requirements and thus
another application is specified and put into the backlog. Don't say
"backlog" without classifying its contents. The stuff that's ignored was
probably a bad idea to begin with.
It is quite true that there are many sites who are locked into the past.
It is common to talk to people who have no source for their objects, who
must run in some crungy emulation mode and whose software can't be extended.
They have to buy expensive proprietary stuff because they can't work their
way out of the mess they created. So be it. It's called organizational
Darwinism. The problem is not their implementation, it's the organization.
>
>Slow iron has long since ceased to be the problem. OS technology (even
>ack - UNIX) far outstrips the enabling technologies (database, tp monitor
>etc). However, at least these are more or less late 70s or early 80s
>technology. The vast bulk of the world's application base is built with
>60s technology. If you don't believe that this hurts, go ask a few
>commercial customers.
I do ask commercial customers. They're a whole lot smarter than that and
mostly a whole lot less encumbered by technology than we, the posters in
comp.arch are.
Most organizations do not worship their technology, they respect their
margins. If 60s technology works, WHO CARES? The dumbest computer sales
exec in the world understands that there are a lot of accounts who are
doing well and have great contingency plans, even if their current technology
is ancient. This is called smart business driven by need, not whizziness.
When the time comes to do something new, you'll hear about it and they'll
tell you exactly what they need.
Closing:
The assertion that great technology will bootstrap a corporate elevation in
enlighmentment has been wrong since the idea of technology was invented.
There were a lot of extremely successful businesses who didn't buy in to
Hollerith technology when it was The Thing and still survived to become
successful and productive. Put yourself into the shoes of an IBM card system
salesman and think about it.
Technology applied at the right place and the right time is a very powerful
tool and can make bigger gains in business producivity than many other
techniques. But it is only a technique and swaggering grandly about how bad
applications are because they are this or that era ignores the real problem
and begs opening up the argument.
--
Dave Fuller This is the biased hyper-signature, or the
Sequent Computer Systems null signature divided by zero, minus any
(503) 578-5063 (voice) any ideas that I represent Sequent.
dafu...@sequent.com It means specific things to certain people.
Disagree. Compare the code in UNIX V6 or V7 with UNIX SysIII and cry.
It's not just the quality of the coding, it's the quality of the design, or
rather the lack of design, which bloats current software.
>Of course, the real situation is more complicated--software performance
>bugs can eat up cycles even faster than the hardware folks can provide
>them, processor performance is not continuous, bad software adds up,
multiplies, at the least, might even be exponential. And, layers of vast
bad software have a terrible effect on computer archicture.
System V R 4 and BSD UNIX have the edge on smaller systems because they are
seen as being "better", somehow, because they derived from AT&T's code. There
are plenty of systems that are smaller with as much or better functionality
that can't get a toehold because they don't provide a bug-for-bug compatible
interface.
X-Windows has the edge on smaller window systems because it's "free" and
because it's from MIT. There really isn't enough additional functionality
in X over your classic Rob Pike window system to justify the size, but
because of political decisions (based in part on opposition to Sun's NeWS
system which, while unweildy, does have functionality to match) nobody can
compete.
And the really rotten thing is that this is all used as a marketing tool by
IBM and Microsoft to push their own proprietary systems on the consumer. "See",
they say, "OS/2 runs happily in 2 MB" (TWO MEGABYTES? Jesus... that's the MAX
you can fit in a 3b1)... but you need 4MB (or 8, or 12, or whatever number
they can support) for UNIX.
The UNIX API is first class. The implementations are suffering badly from
feeping creaturism. It's long since past the time they should have been
done over from scratch. But they won't be, because of the politics.
--
Peter da Silva
Ferranti International Controls Corporation
Sugar Land, TX 77487-5012; +1 713 274 5180
"Have you hugged your wolf today?"
I can't tell whether you've missed the point, or hit in square on the
head. When people talk of the design of a system such as SysIII (or
HP-UX, with which I'm more familiar), they make the implicit assumption
that someone actually designed it, and that if that person could only
have been more intelligent, things would have ended up differently.
Poppycock.
Software systems aren't designed, they accrete. Features are added and
deleted by people whose only job is to match the features in the OS and
the features needed by customers. These people often aren't system
designers, they're marketing types trying to figure out how to keep
customers happy. The best designers then get assigned to the most
difficult tasks, and so on down the line, with everything scheduled and
estimated so that everyone is working on the largest problem that they
can solve in their allotted time.
At least that's how the system is supposed to work. Sometimes it breaks
down and people get enough time to think about how they'd like to build
things, if they had the time and the company could afford the risk (the
system works hard to prevent that much slack time, as it saps morale
:-)). More competence would not result in a better overall system, it
would just add more features to the TO-DO list.
What's the result?
Ugly, unaesthetic systems that become unmaintainable over time.
Who cares?
Eventually, the Old Warhorse will be rewritten and cleaned up, or will
be replaced by the New Best Thing. Everyone will know a little more
about what a system does and doesn't need. In the meantime, bloated,
overgrown systems provide customers with ever-increasing functionality,
and as a customer of computer systems, this makes me quite happy.
I started out working on V6 and V7, and I agree that they are much more
coherent and elegant than the systems that followed them. If I were
looking for a system to hack in, I'd much rather start with one of them.
If I were looking for a system to use, I'd much rather have this one,
warts and all. What can I say, I like being able to have satellite
weather photos as my background, communicate with far-away places
without having to specify a complete path, and import the occasional
game without having to teach it about my graphics device. I even like
having a graphics device.
Go figure.
--bob--
All too true.
>Features are added and deleted by people whose only job is to match
>the features in the OS and the features needed by customers. These
>people often aren't system designers, they're marketing types trying
>to figure out how to keep customers happy.
I will not dispute this, although being largely a research type I have
not experienced it myself.
>The best designers then get assigned to the most difficult tasks,
>and so on down the line ....
Unfortunately, *this* is clearly not the case. The more senior people
can avoid the dirtier work. If it is difficult and interesting, they
will do it, but if it is difficult and dull, the newest hires wind up
with the task. This is, I think, how we got YP/NIS and the SunOS lock
manager....
Wondering what this has to do with computer architecture,
--
In-Real-Life: Chris Torek, Lawrence Berkeley Lab CSE/EE (+1 415 486 5427)
Berkeley, CA Domain: to...@ee.lbl.gov
new area code as of September 2, 1991: +1 510 486 5427
to which, in article <17...@dog.ee.lbl.gov>, I responded:
>Unfortunately, *this* is clearly not the case ... [examples being the
early, horrible YP and lock-manager code from Sun, which *do* take on
hard problems but were designed and/or coded poorly.]
Some private mail suggests that I should revise the suggestion that:
>The more senior people can avoid the dirtier work. If it is difficult
>and interesting, they will do it, but if it is difficult and dull, the
>newest hires wind up with the task.
Perhaps a better explanation is that the best designers take on those
tasks that are *perceived as* the most difficult. Unfortunately, these
are not always the ones that *are* the most difficult, and often a bad
bad design escapes and then must be supported forever. The latter is
due to, as Bob English noted, the `accretion' effect: you cannot rip
out a bad system because customers now depend on it.
(This is one of the reasons I prefer my ivory tower, even if it *is*
actually an old trailer perched up in the Berkeley hills. :-) )
The "politics" you speak of must result from the enormous artificial
problems vendors create for users by "horizontally" fragmenting a
market. E.g., when 10 vendors create similar products, they have a
very large number of inconsequential design decisions to make, often
of no greater real impact than deciding which side of the road traffic
should use. Apolitical vendors will make these decisions at random,
creating frivolous incompatibilities between competing products in a
given category.
These frivolous incompatibilities destroy incalculable amounts of
user wealth, because they make computers harder to use. Users are as
greedy as vendors, so they understandably have a strong interest in
persuading vendors to behave consistently wherever performance is not
an issue. Since each vendor has an interest in coercing all other vendors
except itself to change, the approach to vendor consistency requires
a political process.
>System V R 4 and BSD UNIX have the edge on smaller systems because they are
>seen as being "better", somehow, because they derived from AT&T's code. There
>are plenty of systems that are smaller with as much or better functionality
>that can't get a toehold because they don't provide a bug-for-bug compatible
>interface.
A few MB of RAM is much cheaper today than the productivity loss during
the time for a worker to master a new system of any useful complexity.
The payoff from the new system must be much greater than merely
conserving $200 worth of RAM. (Lots of workers cost their companies
that much in a couple of days or less.)
>X-Windows has the edge on smaller window systems because it's "free" and
>because it's from MIT. There really isn't enough additional functionality
>in X over your classic Rob Pike window system to justify the size, but
>because of political decisions (based in part on opposition to Sun's NeWS
>system which, while unweildy, does have functionality to match) nobody can
>compete.
Users have real work to do. They usually don't care about which among N
largely equivalent windowing systems they use. But they do know that
using more than one incompatible windowing system is a useless
complication that keeps them away from what they bought their computer
to do: creating wealth.
>And the really rotten thing is that this is all used as a marketing tool by
>IBM and Microsoft to push their own proprietary systems on the consumer. "See",
>they say, "OS/2 runs happily in 2 MB" (TWO MEGABYTES? Jesus... that's the MAX
>you can fit in a 3b1)... but you need 4MB (or 8, or 12, or whatever number
>they can support) for UNIX.
The average information worker routinely manipulates (or routinely
*could profit from manipulating*) chunks of information larger than
2 MB. (However, most of them still do this inefficiently, e.g., with
paper.) As far as most users are concerned, a computer which addresses
a maximum of 2 MB belongs in a museum or a toy store.
>The UNIX API is first class. The implementations are suffering badly from
>feeping creaturism. It's long since past the time they should have been
>done over from scratch. But they won't be, because of the politics.
Even the best programmers find that to add features, they must usually
add code. However, computer users try to use computers to solve real
problems. Real problems have an infinite number of features. Computer
users tend to reject software that is simpler than the problems they
are trying to solve.
Few mechanics can get by with just one hammer and one screwdriver.
Instead, they have to lug around huge cases of tools, which cost a lot
of money and take up a lot of space. That's because mechanics get paid
to fix things, not to glorify the aesthetics of minimalism.
--
Dan Mocsny
Internet: dmo...@minerva.che.uc.edu
>In article <+9LD...@xds13.ferranti.com> pe...@ficc.ferranti.com (peter da silva) writes:
>These frivolous incompatibilities destroy incalculable amounts of
>user wealth, because they make computers harder to use. Users are as
>greedy as vendors, so they understandably have a strong interest in
>persuading vendors to behave consistently wherever performance is not
>an issue. Since each vendor has an interest in coercing all other vendors
>except itself to change, the approach to vendor consistency requires
>a political process.
>>X-Windows has the edge on smaller window systems because it's "free" and
>>because it's from MIT. There really isn't enough additional functionality
>>in X over your classic Rob Pike window system to justify the size, but
>>because of political decisions (based in part on opposition to Sun's NeWS
>>system which, while unweildy, does have functionality to match) nobody can
>>compete.
>Users have real work to do. They usually don't care about which among N
>largely equivalent windowing systems they use. But they do know that
>using more than one incompatible windowing system is a useless
>complication that keeps them away from what they bought their computer
>to do: creating wealth.
This would be a good point, except that X routinely *does* prevent
graphics users (not necessarily ``computer graphics'' users) from
getting their work done in a timely manner where they are made to
use it. The principal graphics resource of the machine, the screen,
is being monopolised by a junta of programs which have only the
most rudimentary graphics facilities, that make even dragging a
curve around seem difficult. Since X took over the world I have
met people at shows who genuinely believe `all graphics is difficult'
What would they do when confronted with something really tricky?
>--
>Dan Mocsny
>Internet: dmo...@minerva.che.uc.edu
--
Ian D. Kemmish Tel. +44 767 601 361
18 Durham Close uad...@dircon.UUCP
Biggleswade ukc!dircon!uad1077
Beds SG18 8HZ United Kingdom uad...@dircon.co.uk
It has a lot to do with software architecture (which was part of this
group the last time I checked). Sometime in the next ten years, The
Next Best Thing will arrive. It may, in fact, already be here, but not
be acknowledged as such. If it can solve in design the issues that have
been solved by accretion in the current generation of OS, then it may
avoid some of the layers of sediment that coat our current systems.
That will only happen, however, if those designing TNBT stop thumbing
their noses at current systems and start asking why people find those
systems so useful.
Architecture isn't just about pretty solutions and elegance. It's about
building structures that people can actually live in.
--bob--
I've never met the Hewlett-Packard Company. I don't believe it holds
opinions.
Possibly true, but by playing politics in the marketplace costs more time
and money.
> A few MB of RAM is much cheaper today than the productivity loss during
> the time for a worker to master a new system of any useful complexity.
Every system out there has a limit to the amount of RAM you can put in it.
At that point those few MB of RAM become very expensive. The argument that
RAM is cheap so we can treat it as free is enticing, but in the real world
it doesn't hold water.
> Users have real work to do. They usually don't care about which among N
> largely equivalent windowing systems they use. But they do know that
> using more than one incompatible windowing system is a useless
> complication that keeps them away from what they bought their computer
> to do: creating wealth.
This assumes that different window systems are basically equivalent. The
additional cost of programming X, and the additional resources needed to
get acceptable performance out of X (which, I am surprised to add, is
possible... just this year I saw the first system I'd ever used that gave
me acceptable performance on X: a Sun Sparcstation II. At 27.5 MIPS with
16 MB of RAM it was as responsive as my 0.7 MIPS Amiga 1000 with half a
meg of RAM).
> >The UNIX API is first class. The implementations are suffering badly from
> >feeping creaturism. It's long since past the time they should have been
> >done over from scratch. But they won't be, because of the politics.
> Even the best programmers find that to add features, they must usually
> add code.
At some point you usually need to sit down and redesign things to allow a
new set of features that don't fit in the basic paradigm. For example, poll
on System V and select on BSD implement a portion of asynchronous I/O on
some system calls. Each has a separate mechanism of implementing asynchronous
opens. And so on... you now have a whole bunch of separate mechanisms to add
individual features that could be better provided by a single common mechanism
of threads, or an asynchronous interface to system calls in general such as
AmigaOS provides.
The resulting operating system would be smaller, more efficient, simpler,
and more importantly *provide more features* because it's using a more
powerful design. A coherent design is itself a valuable feature.
> However, computer users try to use computers to solve real
> problems. Real problems have an infinite number of features. Computer
> users tend to reject software that is simpler than the problems they
> are trying to solve.
Exactly right. And as long as basic design decisions can't be made over in
the light of new knowledge, they will never get the software they need.
> Few mechanics can get by with just one hammer and one screwdriver.
> Instead, they have to lug around huge cases of tools, which cost a lot
> of money and take up a lot of space. That's because mechanics get paid
> to fix things, not to glorify the aesthetics of minimalism.
Few mechanics carry wrench sets calibrated to 1/64th inch increments to
allow for the differences between different manufacturers' bolts, or
dozens of sets of 3/8th inch nuts with different thread pitches. If I buy
a Sears staple gun I can use Swingline, Sears, or Arrow staples in it and
I don't have to have a complex set of adjustments on the staple gun to
accomodate them.
In the hardware world all these arbitrary decisions were resolved by
standards comittees long ago. They looked at the various manufacturers'
decisions and chose one mechanism, one set of bolt sizes, one thread
pitch, one staple width, and so on.
Having every mechanism for interprocess communication in the kernel is
as crazy as having a staple gun that automatically adapts to 1/2" and 3/8"
width staples... better to standardize on half inch staples and let vendors
compete on price and *useful* features. Even if a multimode staple-gun
would only cost $50 extra... a mere pittance compared to the value of
the average mechanic. Problem is, the staple gun is a $10 item.
Don't you think it would be useful if you could go out and buy a computer
for $1000 instead of $5000 that provided all the functionality of that
Sparcstation-2, simply because it had a better basic design and didn't need
that extra $500 of RAM, the extra $700 CPU, the extra $2000 hard disk, the
extra $1000 of programmer time in the system software, the extra $500 for
the application because of the greater complexity in the design, and so
on... down to the extra $50 in the power supply?
Letting *more people* have access to these tools in itself would create
wealth. Making them cheaper is in and of itself a desirable goal, because
it lets you do that.
Nice concept, but not true. There are:
Metric
National Fine
National Coarse
Pipe
These are the common right hand thread sizes. Some left hand thread
stuff is still around. (Mostly old Japanese equipment, and some oddball
auto lug nuts (for one side of the car, so the wouldn't loosen due to
the wheel spinning)
Staples come in various depths as well as widths, some staples are
crowned as well. Then there are the various screw types, each made
for a slightly different purpose, but requiring the appropriate tool to
drive them.
Try using you staple gun (I presume a home construction type) to drive
office staples, the office staples are generally different thicknesses.
Wire gage sizes are even different based on whether it is ferrous or
non-ferrous metal in the wire. Precious metal is still another
standard, and then there is the metric wire gage as well.
This is not to say that every thing you said is wrong, just that the
hardware analogy is incorrect. Pick the tool for the job. Scalable
architechers may be the way to go, but even then the model better be
very flexible.
Pardon the spelling, but this side of my environment doesn't have all
my tools yet, and I don't own one of those blasted paper versions of a
dictionary.
/Tjp
> Nice concept, but not true. There are:
> Metric
> National Fine
> National Coarse
> Pipe
And how many of these threads are expected to be in the typical well-stocked
mechanics' toolbox? Metric and "standard" (I don't know if "standard" is
"national fine" or "national coarse", but there's just one of them on sale
at your typical big hardware store).
> Try using you staple gun (I presume a home construction type) to drive
> office staples, the office staples are generally different thicknesses.
Yeh, but I can buy the right staples for my gun from Sears, K-mart, or
Furrow, and I don't have to worry if they're Swingline, Arrow, or house
brand.
There may be a few oddballs out there, but I don't have to buy tools to
deal with them all. I can get the parts to work with the standard set of
tools. Compare that to System V.4: it's got screwdrivers for phillips,
slotted, torx, square, lock, hex, martian, and 4-dimensional screws built
in and you can't buy it without. I just want to buy it with one or two
screwdrivers... and if I want to add blivet-drivers I'll buy them separately.
Plus, each screwdriver comes in a swiss-army-knife configuartion.
Your "System V.4" toolbox would weight 30 tonnes and require a C5A to
carry it around. If I want a "socket" interface, then I'll stick a
shared library in there. And until I have an application that needs
it I'll leave it on the installation tapes, thank you very much.
We've managed to do that with device drivers, thanks to UNIX providing a
nice high level abstraction for devices. It needs the same level of
abstraction for system calls and compatibility kits.
> This is not to say that every thing you said is wrong, just that the
> hardware analogy is incorrect. Pick the tool for the job. Scalable
> architechers may be the way to go, but even then the model better be
> very flexible.
Of course.
I have taps and dies for select sizes of all of them.
>
>> Try using you staple gun (I presume a home construction type) to drive
>> office staples, the office staples are generally different thicknesses.
>
>Yeh, but I can buy the right staples for my gun from Sears, K-mart, or
>Furrow, and I don't have to worry if they're Swingline, Arrow, or house
>brand.
Availability is not the issue, I can get most kinds of staples at sears, etc...
and there are benifits associated with particular types.
>There may be a few oddballs out there, but I don't have to buy tools to
>deal with them all. I can get the parts to work with the standard set of
>tools. Compare that to System V.4: it's got screwdrivers for phillips,
>slotted, torx, square, lock, hex, martian, and 4-dimensional screws built
>in and you can't buy it without. I just want to buy it with one or two
>screwdrivers... and if I want to add blivet-drivers I'll buy them separately.
>Plus, each screwdriver comes in a swiss-army-knife configuartion.
My car has Metric, NF, NC, torx, clutch-head, phillips, reed-prince, slotted,
and probably a few more types of fasteners. Whilst a air driven chisel would
probably loosen all of them, the right tool would still be better.
>Your "System V.4" toolbox would weight 30 tonnes and require a C5A to
>carry it around. If I want a "socket" interface, then I'll stick a
>shared library in there. And until I have an application that needs
>it I'll leave it on the installation tapes, thank you very much.
I'm not attacking your assumptions, only correcting your analogy. U*IX is
not the best vehicle, it is arguably the best one in widespread, cross vender
use _*NOW*_. (U*IX includes it's work alikes and derivitives) This means
than until we replace it, we better be able to work on it.
>We've managed to do that with device drivers, thanks to UNIX providing a
>nice high level abstraction for devices. It needs the same level of
>abstraction for system calls and compatibility kits.
Didn't I imply this with what I said below. (the abstract concept, not
the specifics)
>> This is not to say that every thing you said is wrong, just that the
>> hardware analogy is incorrect. Pick the tool for the job. Scalable
>> architechers may be the way to go, but even then the model better be
>> very flexible.
>
>Of course.
You certainly have a long winded way of agreeing with me :-)
/Tjp
------ type n now ----- blank lines for brain dead mailer follow ------
-----Honest there's nothing more --- thank you for reading this though -----
> Availability is not the issue...
Availability is the issue. When I buy an application (staple gun) I know I can
get the staples (system calls) for it. I can buy a manual and electric gun
(different applications) that use the same staples, so I don't need to buy
special staples for both.
So my toolbox can contain two staple guns, and one box of staples.
When I buy a UNIX system (toolbox) I *have* to stock it with dozens of
different kinds of staples (system calls) that do the same thing in very
slightly different ways... not because they need to but because that's just
what the programmer decided to use. And I can't even tell what those calls
are!
And I don't even want to work on my car with this thing... I just want to
drive it to work. Why should I need an extra 300K of kernel modules to do
that? An auto mechanic might need 40 wrenches, but the car owner usually
doesn't work on the parts of the car where the weird stuff happens.
Now if UNIX was modular, I could have a "sysv-ipc.library", and a
"socket.library" and a "xenix-ipc.library", and they wouldn't take up
any space in the kernel unless I had them loaded. If I was strapped
for space I could remove most of them from my disk and buy applications
that required the libraries I wanted. But even Mach implementations
don't provide this modularity *.
So the alternative is to provide a single interface that's a superset
of the two *in functionality*, with shared libraries to emulate them.
Anything less is simply going to make UNIX bigger and bigger, and less
and less competitive in the marketplace.
--
Peter da Silva
Ferranti International Controls Corporation
Sugar Land, TX 77487-5012; +1 713 274 5180
"Have you hugged your wolf today?"
* AmigaOS does, as does QNX. I don't know of anything else.
Let me point out once again. The ORIGINAL analogy was wrong, not the basis
for the arguments. MANY differnet hardware (like hardware store) "standards"
exist. There are more different fasteners (one catagory, like inter-whatever
communications) than you might imagine. Among the different types, are different
styles, then different sizes. I really hope that this situation does not
happen to computer communications API or any other API.
#
#And I don't even want to work on my car with this thing... I just want to
#drive it to work. Why should I need an extra 300K of kernel modules to do
#that? An auto mechanic might need 40 wrenches, but the car owner usually
#doesn't work on the parts of the car where the weird stuff happens.
Driving it to work is analogeous to running an application that was already
set up. NOT to setting an application up that is more like getting gasoline
where you must choose Leaded/Unleaded, and what octane, etc...
Tuning up your car might be like tweaking the runtime environment for
performance. Still not a great analogy, but getting closer.
#
#Now if UNIX was modular, I could have a "sysv-ipc.library", and a
#"socket.library" and a "xenix-ipc.library", and they wouldn't take up
#any space in the kernel unless I had them loaded. If I was strapped
#for space I could remove most of them from my disk and buy applications
#that required the libraries I wanted. But even Mach implementations
#don't provide this modularity *.
...
#* AmigaOS does, as does QNX. I don't know of anything else.
...
If I didn't care for my current contract ;-> I could make comments about
U*IX implementations, but as it is paying the bills at the moment ...
Sigh ...
#
#So the alternative is to provide a single interface that's a superset
#of the two *in functionality*, with shared libraries to emulate them.
#Anything less is simply going to make UNIX bigger and bigger, and less
#and less competitive in the marketplace.
This will only make the sum total of the software bigger. Hardly a optimal
goal. Better would be to standardize a flexible and scalable interface and
provide shared libraries to translate for older/obsolete interfaces. This way
conforming new applications would not need the burden of an extra layer code
for common tasks.
My vote goes for defining a single "standard" and making it scalable so as to
take advantage of new technology while protecting the current development
(based on the new standard) from becoming obsolete.
/Tjp - Some editors are better than others, I just want this one to work.
Sorry if some spelling errors creep in, but this thing is quirky at
beast. ;-)
Bruce J. Bell br...@tybalt.caltech.edu
>Now if UNIX was modular, I could have a "sysv-ipc.library", and a
>"socket.library" and a "xenix-ipc.library", and they wouldn't take up
>any space in the kernel unless I had them loaded. If I was strapped
>for space I could remove most of them from my disk and buy applications
>that required the libraries I wanted. But even Mach implementations
>don't provide this modularity *.
>* AmigaOS does, as does QNX. I don't know of anything else.
Er, excuse me, but doesn't OS/2 provide this "library" modularity via
DLLs?
I'm well aware of QNX's client/server architecture, where the kernel
esentialy provides IPC services; you just plug in the various pieces/parts
to get what you want. This appears to be a Good Thing, at least on
paper.
>Now if UNIX was modular, I could have a "sysv-ipc.library", and a
>"socket.library" and a "xenix-ipc.library", and they wouldn't take up
>any space in the kernel unless I had them loaded. If I was strapped
>for space I could remove most of them from my disk and buy applications
>that required the libraries I wanted. But even Mach implementations
>don't provide this modularity *.
Um, my UNIX machine (at&t unix-pc) does this already, via a dynamic
linking of kernel device drivers. It works great. The machine on my desk at
work does this to (DECstation 3100) via the config program, although I have
to recompile the kernel to change to configuration.
I would appeal to kernel designers everywhere to implement dynamic
loadable drivers: its not hard, and makes driver development a breeze. It
also helps make my 68010 based UNIX machine faster then a 68020 box running
apples AUX.
--
################################# :alex.
#Disclaimer: Anyone who agrees # Systems Programmer
#with me deserves what they get.# University of Maryland Baltimore County
################################# al...@umbc3.umbc.edu
> >* AmigaOS does, as does QNX. I don't know of anything else.
> Er, excuse me, but doesn't OS/2 provide this "library" modularity via
> DLLs?
Can you write a new file system as a DLL and have it show up as a regular
file system? Or a pseudo-device? Shared libraries are necessary for building
an operating system out of components, but they're not sufficient. What I'm
thinking of is the equivalent of transferring the UNIX "toolbox" approach
to the kernel.
> I'm well aware of QNX's client/server architecture, where the kernel
> esentialy provides IPC services; you just plug in the various pieces/parts
> to get what you want.
That's basically what the Amiga O/S does. It's also what Mach is supposed to
do, but they cheaped out on the O/S and made the whole thing one monolithic
library.
> This appears to be a Good Thing, at least on paper.
Works pretty well in practice, too. There are a wide variety of libraries
and handlers on the Amiga, many in the public domain or shareware.
Sounds like what you really want is Plan 9, which is after all what
started this long and convoluted thread. Pretty much the only way that
any sort of service is made available is by a kernel or user task
installing it as a directory. Unlike Unix, the name space is potentially
created per task, more likely per process group, but you get nice features
like the window server installing /dev/cons as a pointer to the tty stream
to the current process group's window, using a standard system service
rather than a special purpose hack like the Unix /dev/tty. Serial ports
appear as a directory with one entry for the data and another for control.
Installed directories can either replace previous entries with the same
name or sit "in front" of them so that, for example, there's no executable
search path -- all the directories that in Unix would be in the path are
stacked up on /bin. It's really cool.
As you would hope, the whole thing appears to be small and fast.
John R. Levine, IECC-by-the-sea, Harvey Cedars NJ "Where TAT-9 comes ashore"
jo...@iecc.cambridge.ma.us, {ima|spdcc|world}!iecc!johnl
Plan 9 and QNX are similar in many respects. Processes under QNX can also
have their own I/O name space, implemented as prefix tables for mapping local
and remote portions of the namespace. Open file descriptors, environment
variables, etc. are inherited during process creation on the network, so
writing things like network-distributed Make, etc is very easy ( its a
standard utility ). The architecture is microkernel-based ( only 8 Kbytes ).
The microkernel does only process scheduling and message passing. A process
manager takes care of memory allocation, process creation, accounting, etc.
Unlike Plan 9, remote processes are assigned pids in the local process table,
indiscernible from local pids. A diskless workstation can start a file system
process from the command line ( loaded from the network ), attach drivers to
it, and then "kill" it when no longer needed. Everything in the system is
startable/stoppable at runtime in this manner. This allows the OS to be as
small or large as needed. Embedded systems can choose not to start the fsys,
dev and net processes to get very tiny for real mode 8088 or protected mode
286, ROM-based systems. On the high end, they could include everything to
produce a POSIX-compliant system running network distributed on a few hundred
486-based machines.
>From: Bill Beebe
>Re: Small, fast software (Re: Writing fast compilers...)
>Date: Tue Sep 10 22:29:35 1991
>Organization: W. J. Vermillion - Winter Park, FL
>
>I'm well aware of QNX's client/server architecture, where the kernel
>esentialy provides IPC services; you just plug in the various pieces/parts
>to get what you want. This appears to be a Good Thing, at least on
>paper.
Well, it doesn't run on paper. :-) Its a shrink-wrapped product that runs on
generic 80x86-based machines. With a deterministic, 16 microsecond context
switch on a 33 MHz 486, the IPC overhead of allowing client processes to use
various system services via message passing is negligible. Network, disk and
device I/O approach the physical limits of the hardware.
Yep. I've read the Plan-9 papers and it would be quite sufficient to satisfy
me. Unfortunately, it's (a) not available, and (b) currently oriented around
a departmental computing service. Solving (b) is a SMOP, but (a) is a killer.
And, yes, it looks really cool.
While I'm on the subject I have been taken to task for denigrating Mach. I
understand there is a microkernel* implementation of UNIX under Mach in the
works, and it's probably really cool too... but what's currently available
is the Mach-BSD implementation. It's hard enough getting available software
for a standalone PC, let alone stuff that's still in the mill.
On the subject of available microkernel implementations, CTOS (currently
sold as BTOS by Unisys) should go into the list. CTOS has the additional
advantage to the hip computerist of running on a totally modular bookshelf
computer...
--
-- Peter da Silva
-- Ferranti International Controls Corporation
-- Sugar Land, TX 77487-5012; +1 713 274 5180
-- "Have you hugged your wolf today?"
* Microkernel being the current term for what used to be known as a kernel
before UNIX appropriated the word.
>> Er, excuse me, but doesn't OS/2 provide this "library" modularity via
>> DLLs?
>Can you write a new file system as a DLL and have it show up as a regular
>file system? Or a pseudo-device? Shared libraries are necessary for building
>an operating system out of components, but they're not sufficient. What I'm
>thinking of is the equivalent of transferring the UNIX "toolbox" approach
>to the kernel.
Yes, that's why you can select between HPFS or DOS-based FAT file systems
on the hard drive.
Uhm, that required a fairly major rewrite of OS/2, not just the addition of
a single library.
As far as I know, it is *still* impossible, in OS/2, to write a new
filesystem -- although I suspect uSoft may be doing something in that
direction.
Note the followup-to line.
--
Sean Eric Fagan | "I made the universe, but please don't blame me for it;
s...@kithrup.COM | I had a bellyache at the time."
-----------------+ -- The Turtle (Stephen King, _It_)
Any opinions expressed are my own, and generally unpopular with others.
Can you use them both at once, (acting on separate partitions of course) along
with maybe a file system that makes FTP look like a volume or directory or
such. Or do you make the choice in CONFIG.SYS of which one you want and that's
all you get until you reboot. (And, on an unrelated matter, do either of these
file systems handle lower case letters in a sensible fashion- ie either UNIX
lower!=upper or Mac lower==upper, for all purposes, but the file name is kept
as the user typed it in. Both of these are rational, defensible choices, unlike
MS-DOS' method.)
Maynard Handley
>Maynard Handley
Any given partition on any device under OS/2 can be formatted as either
FAT or HPFS. You can not switch file systems without reformatting the
partition (obviously..).
The HPFS file system preserves case in the file name, but things like
open calls are not case sensitive. The FAT file system is the same under
OS/2 as it is under DOS (naturally enough), so the same limitations
(dumb file names, etc) apply.
Another unmentioned (I think) point of interest is OS/2 supports IFS,
Installable File Systems. You can buy a toolkit from IBM to create your
own file system, which would replace HPFS or FAT. I don't know of any
other file systems commercially available - but it can, in theory at
least, be done.
Chris
--
Christopher A. Boaro Internet: ch...@scscom.com
Systems Programmer UUCP: ..!uunet!ncrcom!ncratl!scscom!chris
SCS/Compute, Inc. AT&Tnet: +1 404 368 1040
Can there be more than 1 file systems around ? Different file systems for
different partitions, devices such as CD ROMS, network stuff ? Or you are
only allowed to have one - have to replace HPFS or FAT with a IFS that would
cover all devices ??
>
>Chris
>--
>Christopher A. Boaro Internet: ch...@scscom.com
>Systems Programmer UUCP: ..!uunet!ncrcom!ncratl!scscom!chris
>SCS/Compute, Inc. AT&Tnet: +1 404 368 1040
K. C. Lee "The (MS) DOG ate my file." -me, 101 excuses.
You can have as many file systems around as you want. People who want to
boot DOS need to keep a FAT file system around since DOS has no way of dealing
with HPFS (or any other IFS). DOS boxes (virtual machines) are, however,
able to access HPFS volumes, although the HPFS advantage of allowing long,
arbitrary file names is not accessible. It was intended that 3rd party
file systems would be available. For example, database software could benefit
from a properly designed custom file system, particularly for large databases.
Given the amount of code that we have seen recycled from earlier OS/2
1.x revs into Windows 3.0/3.1 it would not, IMHO, suprise me a bit to see
the HPFS (as an IFS) reappear in the NT mode kernel.
Jon Mellott
High Speed Digital Architecture Laboratory
Internet: j...@alpha.ee.ufl.edu
>In article <1991Sep16.2...@atl.scscom.UUCP>, ch...@atl.scscom.UUCP
>(Chris Boaro) says:
>>
>>Any given partition on any device under OS/2 can be formatted as either
>>FAT or HPFS. You can not switch file systems without reformatting the
>>partition (obviously..).
> HPFS stuff deleted....
>>
>>Another unmentioned (I think) point of interest is OS/2 supports IFS,
>>Installable File Systems. You can buy a toolkit from IBM to create your
>>own file system, which would replace HPFS or FAT. I don't know of any
>>other file systems commercially available - but it can, in theory at
>>least, be done.
>Can there be more than 1 file systems around ? Different file systems for
>different partitions, devices such as CD ROMS, network stuff ? Or you are
>only allowed to have one - have to replace HPFS or FAT with a IFS that would
>cover all devices ??
If I understand your question, the answer is yes. That is, you can have
any file system in use on any given partition. For example, if you have
a 150 meg hard disk, you could have a 30 meg FAT partition and a 120 meg
HPFS partician.
Chris