On 20/03/14 15:35, BartC wrote:
>
>
> "David Brown" <
david...@hesbynett.no> wrote in message
> news:lg6pe0$l3s$1...@dont-email.me...
>> On 17/03/14 11:21, Nils M Holm wrote:
>
>>> The original version of the compiler was for education and was hosted
>>> on and targeted at FreeBSD/386. Later I kept hacking it for fun and
>>> added back-ends for the x86-64, 8086 and, lately, the ARMv6. I also
>>> added runtime support for Linux, various BSDs and DOS. Support for
>>> Windows and Darwin were added by contributors.
>>>
>>> What I like about the compiler is that it is simple, easy to hack, and
>>> boostraps in 4 seconds on a 700 MHz Raspi. Of course, its code generator
>>> is rather limited, and its code runs (on average) almost twice as long
>>> as code generated by GCC -O0, but it is suffient for most stuff I do.
>
>> Thanks - that puts things in a more complete context. "Real" compilers
>> such as gcc are far from simple or understandable, even for people who
>> have worked with them for years. llvm is a bit clearer and more
>> structured, but it too is a huge project.
>
>> So a limited small compiler
>> for educational purposes seems like a good idea, even if it cannot be
>> used for "real" programs.
>
> Why not? There are considerable advantages to having a simple, small
> compiler, some of which the OP has pointed out.
Yes, there are advantages to having a small and simple compiler in some
contexts. But this particular compiler is very limited in the subset of
C it supports, and that greatly reduces its usefulness of normal work.
And since the generated code is about half the speed of gcc on -O0, it
will be a factor of 5 to 10 times slower than /real/ code generated by
/real/ compilers using /real/ compiler flags. Would you be happy for
all the software on your computer to run at 10-20% of normal speed, just
because the software vendor wanted to use a simpler compiler? Of course
not. So for normal work, you use a normal compiler - for educational
work or other specialised usage, you might want to use a more niche
compiler.
>
> I've always used my own compilers *and* languages, and they have been used
> to create real, commercial products too.
There are vast numbers of programming languages, many of which are
supported by different tools - and there are many good reasons for using
them. Very often you will pick the right programming language for the
task (occasionally making your own language if that's the best
solution), then pick the compiler or interpreter to suit (again,
occasionally writing your own). But if you pick a major language - such
as C - for your program, you do not then pick a very small and very
limited compiler as your development tool unless you have very
specialised needs - such as being able to run it on a tiny host, or
being able to understand the compiler's source code.
>
> I think there is a place for a simple, static compiled language other than
> the same boring choice of always using C (or sometimes, C++; same thing
> really). It can be simpler and tidier too because it will have less
> baggage.
I don't disagree with that (except that C and C++ are not the same thing
at all - and the distance between them has been growing rapidly). I
don't think the complexity or size of C compilers is the reason for
this, however - the aim would be to make a better statically compiled
language than C (for whatever value of "better" suits your purpose).
> (Although my own effort is likely to stay private.)
>
> As for speed, my last working unoptimised compiler, for x86, was on a par
> with gcc -O0 (and an experimental optimised version could just match other
> non-gcc optimising compilers). However, because typically I make it very,
> very easy to have inline asm code, it is a simple matter to optimise
> specific routines this way, and approach or surpass gcc -O1.
With inline assembly, you are no longer working in C (obviously) and so
is of no relevance in a speed contest.
There are many reasons why one might usefully write a C compiler - for
fun, education (either your own or other peoples'), for specialised
processors, for specialised hosts, or as a basis for an "extended C"
language. But you don't write one for a standard processor architecture
and aim to be fast, unless you have the resources to compete with the
big names (gcc, llvm, Intel, MS, and big embedded toolchain vendors) -
unless it is for fun or education. Otherwise you need to do an enormous
amount of work (probably tens of man-years) to get close on speed,
features, and correctness, for a tool that almost no one will ever use.
>
> As for ARM, I haven't had a go at that yet. I also noticed the lack of
> absolute addressing. But what puts me off though is, after generating ARM
> ASM, I end up having to use gcc anyway! (I assume the gnu assembler that's
> been mentioned is the one inside gcc.)
The gnu assembler is part of the gnu binutils project (along with the
linker and librarian) - it is not part of gcc. And of course if you are
writing your own compiler, you can write your own assembler and linker -
it's a lot easier than writing an optimising compiler.
>
> But a funny thing about ARM (specifically the one in the 'Pi') and gcc: the
> first C program I tried, even compiled with -O3, ran at about one third the
> expected speed. This was because gcc, thinking some pointer values were
> misaligned (they weren't misaligned in my program, and this model of ARM
> didn't have that issue anyway) was doing byte-at-a-time accesses to load
> and store values! It didn't bother to mention this small detail. I can
> tell you that any code of mine, no matter how bad, at least wouldn't
> have done that!
>
gcc will assume that pointers are properly aligned according to their
types - you have to go out of your way to lose that (such as by casting
pointers). If you have code that takes a pointer-to-char and you pass
it pointers to 32-bit values, then of course the compiler has to
generate byte-sized code, because that is the only legal choice in C.
There are several ways to tell the compiler about larger alignment and
access sizes, but you have to give the compiler the information before
it can generate such code. (And if your compiler breaks the relevant
rules for C and/or the hardware, that's your choice - but it is not then
a C compiler.)