There has been a lot of work put into optimizing C compilers: the efforts
of FSF, Sun and MIPSCO to mention but three. There hasn't been a great
deal of work put into optimizing BASIC compilers, at least not that I know
of.
There seems to be a more general problem though: compilers of high-level
languages tends to produce optimal machine code, whereas compilers of
low-level languages tend to produce sub-optimal machine code.
Some time ago, code produced by an Algol68S compiler was compared to code
produced by an optimizing Fortran compiler on an ICL1906. The program
compiled from Algol68 ran 30% faster than that produced from the Fortran
compiler. The reason for this seems to be that the high-level language
compiler has a better idea of what the programmer is trying to do and can
make appropriate optimizations-- such optimizations frequently show up as
doing nothing at all where another programming language would require code
to be executed, or use a single machine instruction for coping where (C
for example), would use a loop. This form of optimization, of course, is
largely wasted on RISC machines, although the compiler can still exercise
the right to copy, say, 4 bytes at a time whereas the C program forces a
byte by byte copy.
--
John Haxby
Digital <j...@rdg.dec.com>
Reading, England <...!uknet!wessex!jch>
1. All numeric computations are done in floating point by default.
One must explicitly name a variable with a special integer suffix
code, to declare it as an integer.
2. String operations rely on the dynamic memory allocation and
garbage collection. Both are slower than the staticaly
allocated string memory / programer controlled allocation and
disposal schemes that C programmers use.
Furthermore, in PC environments, C programs often directly control the
hardware of the output devices (e.g. screen) by using pointers to the
memory associated with them (e.g. screen buffer). The same operation can
be expressed in BASIC only in a more roundabout way (usualy through a POKE
operation) and will therefore be slower.
Diomidis
--
Diomidis Spinellis Internet: <d...@doc.ic.ac.uk> UUCP: ...!uknet!icdoc!dds
Department of Computing, Imperial College, London SW7
2. To support the strings in BASIC, we need (a) a heap, (b) some garbage
collecting or reference counting technique. These will introduce some
overhead in memory management.
These language features are not unique to BASIC. And we can see that the
implementations of any languages with these features can hardly beat the
implementations of C in execution efficiency.
Hao-yang Wang
Pai Technology, Inc
Slower in what sense, rate of compilation, or the run time speed of
executables generated. C requires more than one pass for compilation, and
therefore compiles slower than Basic. Basic is much easier to compile and
interpret than C (recall in the beginning of the PC movement Basic was the
first higher level language for many machines, Altair, Commodore, Apple,
IBM). The address space for a Basic interpreter/compiler is much smaller
than for C.
C is a more accurate representation of the underlying hardware, and it is
possible for the programmer to take advantage of that knowledge. Basic is
not a stack oriented language and cannot take advantage of the hardware
stack the same way that 'C' can. The C programmer has access to low level
(by that I mean close to the hardware) operations. C supports bit map
management, pointers, compile time constant folding, the register
qualifier etc.
Also since Basic is not suitable for implementation of large projects and
system software, much less work goes into optimization.
Basic programmers just don't have the same tools at their disposal.
Bill
--
| mani...@cs.rpi.edu - in real life Bill Maniatty
[How much slower is compiled Basic than C, any why?]
Here are some thoughts. Warning: I have not looked at Basic in some
years.
2. Why is the basic compilers creating a slower product than the C
compilers?
Possible answers
a. Because of the beginner-reputation of basic,
basic programmers write more bad code than for example C programmers,
because "serious" programmers quickly change to C.
In the PC world, many implementations were interpreters, so this may have
given Basic a bad reputation for speed. This also lowers expectations for
compilers. Of course, the original Dartmouth Basic was compiled.
c. Because it is much harder to write a compiler for basic (even modern
basic) than for C, which is designed for the purpose of being
compiled, not for the programmers.
This is part of it, no doubt. It also depends what you are doing. I
would expect a good Basic compiler to be comparable to C for straight
floating-point code. Old Basics used floating-point for ALL arithmetic,
so integer code will be much slower. String manipulation will be slower,
since Basic's facilities are higher-level.
...So why is not Basic the biggest programming language today?
Well, my theory is this: When C was invented, it was made to be
simple to write a compiler for, because the compiler constructors
were not that good then.
Yes, this is certainly true. In fact, large areas of C's semantics (if
you can dignify them with that name) are defined by what is easy to
implement, not what makes sense.
C is also a lower-level language. C doesn't really `understand' strings,
only pointers to byte positions. So it is possible to write lower-level
code. It is also possible to write high-level code to a certain extent,
but it requires discipline.
[.... By the way, the guys who wrote Basic in
the first place definitely wanted it to be easy to compile -- the
first compiler at Dartmouth compiled so fast it was hard to measure
its speed. -John]
Basic was certainly designed to be easy to compile, but not
necessarily to be fast to run.
-s
We market a business-oriented BASIC-style language called BBx. It is
interpreted. Although it is quite different from Microsoft BASIC there
are still common qualities of any BASIC dialect.
A few years ago there was some discussion about producing a compiler for
it. We found several reasons why a compiler would not be a great idea.
The main reason is that the runtime dynamics of BASIC don't translate well
to a compiled environment. The most important aspects are dynamic
string/array management and dynamic flow control involving error/exception
handling. The latter is especially hard to optimize.
A C programmer tends to approach these things with a more "static" frame
of mind resorting to a dynamic model only if it is really necessary (and
knowing there will be a performance hit). Furthermore, BASIC normally
performs lots of runtime checks such as array subscripts and ranges.
Granted, a BASIC compiler could allow the option of turning these checks
off but I'm assuming a proper semantical translation.
Since our product is used primarily for business data processing
applications which spend most of their time performing file and user i/o,
the need for a compiler isn't as critical. Indeed we have been known to
beat some compiled languages in file-oriented benchmarks.
I just don't think interpreted languages compile well. If they do then
they really aren't exploiting the special strengths of interpretation.
You end up with the worst of both worlds.
--
Scott Amspoker
Basis International, Albuquerque, NM
sc...@bbx.basis.com
We market a business-oriented BASIC-style language called BBx. It is
interpreted. Although it is quite different from Microsoft BASIC there
are still common qualities of any BASIC dialect.
A few years ago there was some discussion about producing a compiler for
it. We found several reasons why a compiler would not be a great idea.
The main reason is that the runtime dynamics of BASIC don't translate well
to a compiled environment. The most important aspects are dynamic
string/array management and dynamic flow control involving error/exception
handling. The latter is especially hard to optimize.
Dynamic strings and arrays are supported in many programming languages
besides Basic. Many of these languages have compilers (in fact, many have
only compilers and not interpreters).
Reasonable error and exception handling are also supported by many
languages.
A C programmer tends to approach these things with a more "static" frame
of mind resorting to a dynamic model only if it is really necessary (and
knowing there will be a performance hit). Furthermore, BASIC normally
performs lots of runtime checks such as array subscripts and ranges.
Yes, so do other languages with compilers. A compiled range check is much
more efficient than an interpreted one, especially when analysis shows (as
is often the case) that no check needs to be performed at all.
Granted, a BASIC compiler could allow the option of turning these checks
off but I'm assuming a proper semantical translation.
Proper semantic translation does not preclude correct optimizations.
Consider the following loop:
FOR I = 1 TO N
A(I) = A(I)*3
NEXT I
Array bounds checking need only be performed once (to assure that 1..N
fall within A's subscript bounds).
Since our product is used primarily for business data processing
applications which spend most of their time performing file and user i/o,
the need for a compiler isn't as critical. Indeed we have been known to
beat some compiled languages in file-oriented benchmarks.
This is a quite different argument. This shows why you believe you do not
need a compiler, but it does not show that Basic cannot be compiled.
I just don't think interpreted languages compile well.
?! There is no such thing as an `interpreted language'. There are
interpretive implementations. True, some languages lend themselves more
to easy interpretation, and others to easy compilation, but that's an
implementor's problem. Anyway, Basic's first implementation (at
Dartmouth) was compiled.... You might consider Lisp to be an `interpreted
language'. Compilers typically speed up Lisp programs by an order of
magnitude or so over interpretive execution, often more for numerical
code. The resulting code may or may not be faster than compiled C or
Fortran code, but it is certainly faster than interpreted Lisp code.
If they do then they really aren't exploiting the special strengths
of interpretation.
The special strengths of interpretation have mostly to do with debugging
and quick turn-around of code corrections.
You end up with the worst of both worlds. --
Nonsense. There are also many environments (e.g. most Lisp environments)
which permit mixing interpreted and compiled code. This gives you in many
ways the best of both worlds.
-s
I just don't think interpreted languages compile well. If they do then
they really aren't exploiting the special strengths of interpretation.
You end up with the worst of both worlds.
I think interpreted languages _could_ compile well if _lots_ of effort
were expended. However, that kind of effort probably isn't worth it --
the performance gains compared to the effort, that is -- when contrasted
to the effort expended in buying new, faster processors, making the
interpreters run faster (often quite easy), translating programs to
maintenable code in a language designed for compilation, and so on.
I remember a discussion with rms (GNU EMACS author) regarding TECO, the
language in which he first wrote EMACS. TECO was (and still is,
primarily) an interpreted language (well, it's an editor, kind of like the
MS-DOS DEBUG program is a partition editor :-). He told me about how
someone we both knew had developed (or help develop) a compiler for TECO
and was prepared to demonstrate its superiority in benchmarks. But
submitting one simple case (something like deleting every other line in
the file) proved the opposite. (Of course, TECO is probably like APL in
that the time it takes to read in source code and figure out what it means
is minimal, as compared to C, FORTRAN, and COBOL. So the penalty for
being an interpreter could be seen as significantly lower for terse
languages like TECO and APL, higher for verbose languages like BASIC,
FORTRAN, and COBOL.)
As already mentioned, the way a language is _expected_ by a program to be
implemented has a noticable effect on how code is written for that
language. So do language features, of course.
C has facilities for easily denoting static and automatic storage, but not
for heap storage or automatically readjustable storage. C can't even have
functions that return strings without requiring the programmer to impose
one method or another for managing the strings (contrasted to PL/I, which
has the facility and leaves imposition of the technique to the compiler
designers). Thus, C programmers are acutely aware of what techniques are
likely to be expensive -- if it's easy to do in the raw C language, it's
likely to be fast, otherwise.... Add to that the awareness that C is a
compiled language, and C programmers realize that it's often better to
have lots of _written_ code that expresses a complicated implementation in
an optimal manner using fundamental features of the language.
BASIC has facilities for easily denoting readjustable storage such as
strings, arrays, and so on, and it tends to be implemented as an
interpreter. (The Dartmouth implementation has never been an interpreter,
I understand, nor was the implementation whose internals I hacked as a
teenager -- but the latter offered an interpretive environment yet no
compiled environment, e.g. you couldn't really get the output of the
compiler, just execute it by typing RUN.) Thus, programmers naturally
tend to use the more concise notations for higher-level constructs offered
by BASIC rather than try and reengineer them using what they think will be
faster (but more verbose) low-level operations, especially given that an
interpreter tends to "dislike" verbosity. (On the other hand, I often
wrote utility programs in BASIC and consciously chose to use integer
variables to manipulate characters in critical areas of the code since I
knew that the string operations were much slower. But then I knew my
particular processor compiled, rather than interpreted, my programs --
once per RUN command.)
Of course, what we are discussing generally applies to one or two "slices"
of the compiled vs. interpreted issue. One such slice is whether there is
a separate compilation phase. Another is the way in which the user
interacts with the processor when developing code. There is another major
issue regarding interpretation which puts both BASIC and C in the
"compiled" camp: whether the running program has a means to modify itself
at run time. LISP, for example, belongs in the "interpreted" camp.
Looked at in this light, it would seem that a good BASIC compiler would be
easier to build than a good LISP compiler, since the latter has to worry
about the running program, in effect, changing itself. Yet, from what
little I know of LISP, there apparently are a number of quite good LISP
compilers out there. This suggests that perhaps the effort needed to make
a really good BASIC compiler isn't quite so high. Some of the techniques
I've learned about in doing a good job of compiling FORTRAN programs
(especially dusty decks that use no formal loop control but GOTO instead),
plus others I've dreamt up that aren't needed (generally) in compilers for
languages like C, wouldn't be hard to implement in this day and age and
would probably make lots of BASIC programs run quite fast.
Hmm, anyone want to fund me to develop GNU BASIC now that I'm winding up
doing GNU FORTRAN? I might as well reach even earlier into my childhood!
(Heck, I'm even open to doing GNU AID [by JOSS] or a GNU simulator for the
PDP-8. :-)
--
James Craig Burley, Software Craftsperson bur...@gnu.ai.mit.edu
This is German software and can be interpreted or compiled. There are a
good few lessons to others about how to make GUI programming simple.
Development is ongoing.
A number of years ago I needed to write a specialist editor/comms program
on the AtariST. This was done using an early version of the language (the
compiler arrived with me during writing the program). In a week or so of
odd moments I succeeded in producing a program with a full GUI, error
handling (alert boxs etc.). No assembler whatever was needed and
execution speed was excellent when compiled with comparable execution
speed to C or Modula2. The same people marketed a serious drafting
package...
What about Basic2C?
There are a number of major commercial business packages written in the UK
which use compiled Basic. They work fine but unless you look or know how
can you tell the language used?
I don't use that implementation of Basic now but it looks much improved.
(mostly Modula2 or C for pc's, Forth and assembler for embedded)
TC.
E-mail: tcha...@black.demon.co.uk or tcha...@cix.compulink.co.uk
I have been writing a compiler based loosely on AmigaBASIC - a Microsoft
derivative - for the best part of a year now, and have naturally had to
deal with these kind of issues. Re: strings memory allocation, I have
compromised. Strings in ACE have a default length of 1K. Small, I know.
However, the programmer can optionally declare a variable of any type
including strings. The only non-optional declarations are for arrays. With
strings, the programmer can specify smaller (say 100 bytes or less) or
larger strings (limited only by available memory). For many applications,
1K is ample. Indeed, 512 bytes is often enough, and I may yet reduce the
default to that. I know that many of you are thinking, "geez what a waste
of memory." I concede that it is not by any means optimal, but it does
avoid having to worry about garbage collection. More importantly, it works
quite well!
By the way, if you are interested, I hope to have ACE in the PD sometime
in the next few months. It has everything from simple BASIC stuff to
structures, pointers, turtle graphics, powerful sound and wave statements,
case...end case etc.
Regards...
--
db...@leven.appcomp.utas.edu.au
David Benn - Applied Computing, University of Tasmania at Launceston.
I'm surprised nobody has mentioned the Deutsch-Shiffman Smalltalk-80
virtual machine implementation or the SELF system. Neither system is a
`traditional' compiler, but both run fast and are examples of `dynamic'
languages (little information for static optimization) with efficient
implementations. Their results shold apply to BASIC and to some other
languages.
Smalltalk programs lack type declarations. Programmers are encouraged to
modularize extensively, resulting in very small procedures (methods).
Some common control structures are implemented using method invocation.
Smalltalk is more dynamic than basic, yet a ``fairly straightforward''
dynamic compilation implemenation of the Smalltalk-80 Virtual machine runs
benchmarks within an order of magnitude of compiled C, while an optimized
interpreter is 2X slower.
SELF is more dynamic than even Smalltalk-80. Even primitive control
structures (if/then/else, loops) are built using method invocation [*].
Yet on a set of benchmarks, SELF programs run with an optimizing dynamic
compiler run at about half the speed of modestly-optimized C programs (on
a SPARC, using the default pcc-based compiler with optimization turned
on). With an interpretive implementation, SELF programs might be two
orders of magnitude slower than it is; indeed, programs compiled without
inlining (procedure integration) are about that much slower.
The disadvantages of dynamic compilation generally include (a) the need to
pay for compilation at runtime and (b) a lack of ways to share code (e.g.,
each instance of program `foo' has a private copy of the compiled machine
code). The first sometimes results in long startup times or variability
in execution time. The latter results in large(r) memory demands.
[*] Note that in Smalltalk, some control structures appear to be message
sends but are inlined; changing the implementation code for the control
structures does not change the system's behavior. In SELF, the control
structures are treated uniformly as message sends.
5 References Follow:
%A Craig Chambers
%T The Design and Implementation of the S\s-2ELF\s+2 Compiler,
an Optimizing Compiler for Object-Oriented Programming Languages
%R Ph.D. dissertation, Department of Computer Science, Stanford
University
%A Craig Chambers
%A David Ungar
%T Making Pure Object-Oriented Languages Practical
%J OOPSLA '91 Proceedings; SIGPLAN Notices
%V 26
%N 11
%P 1-15
%C Phoenix, AZ
%D November 1991
%A Craig Chambers
%A David Ungar
%A Elgin Lee
%T An Efficient Implementation of SELF, a Dynamically-Typed
Object-Oriented Language Based on Prototypes
%J OOPSLA '89 Proceedings
%D October 1-6 1989
%P 49-70
%A Craig Chambers
%A David Ungar
%T Customization: Obptimizing Compiler Technology for SELF, a
Dynamically-Typed Object-Oriented Programming Language
%J 0-89791-306-X/89/006/0146
%D 1989
%P 146-160
%A Peter Deutsch
%A Alan M. Schiffman
%T Efficient Implementation of the Smalltalk-80 System
%J 11th Annual Symposium on Principles of Programming Languages
(POPL 11)
%D January 1984
%P 297-302
;-D on ( Just call me Mr. Runtime! ) Pardo
[Many APL systems (other than IBM's) also compile on the fly since it can't
usually tell what the types and array rank of variables will be until a
procedure is actually called. -John]
I think interpreted languages _could_ compile well if _lots_ of effort
were expended. However, that kind of effort probably isn't worth it --
Could you please define `interpreted language'? Is it anything more than
`a language whose first implementation was interpretive'? In the present
discussion, the language under question is Basic, whose first (and only
for a long time) implementation was in fact compiled. Other languages you
might call `interpreted', like Lisp, Prolog, ML, and Snobol, in fact have
very good compilers, which make major differences in runtime.
the performance gains compared to the effort, that is -- when contrasted
to the effort expended in buying new, faster processors,
The compiler speedup is complementary to the hardware speedup.
making the interpreters run faster (often quite easy),
Interpreters can, of course, be bummed (and often are). But in many
cases, this still leaves you far from the compiled speed.
translating programs to
maintenable code in a language designed for compilation, and so on.
This is a possible approach. However, you lose whatever advantages the
`interpreted' language had, e.g. more appropriate abstractions, easier
debugging, fast turnaround, etc. There are of course compiler-based
systems that provide easy debugging and fast turnaround e.g. most Lisp
compilers, but also the CenterLine C (formerly Saber C) system.
I remember a discussion with rms (GNU EMACS author) regarding TECO, the
language in which he first wrote EMACS. TECO was (and still is,
primarily) an interpreted language (well, it's an editor, kind of like the
MS-DOS DEBUG program is a partition editor :-).
Teco is a weird language. Most commands are single-character, and the
main loop of the interpreter essentially dispatches directly to the
relevant routine. So it is a sort of human-readable (well, this is
debatable, too) byte-compiled form.
He told me about how someone we both knew had developed (or help
develop) a compiler for TECO and was prepared to demonstrate its
superiority in benchmarks. But submitting one simple case
(something like deleting every other line in the file) proved the
opposite.
Teco is an extreme case, but even so, you could probably _slightly_ reduce
runtime by compiling to machine code. I certainly agree that in most
cases, it would not be worthwhile to try to compile Teco code....
(Of course, TECO is probably like APL in that the time
it takes to read in source code and figure out what it means is
minimal, as compared to C, FORTRAN, and COBOL. So the penalty for
being an interpreter could be seen as significantly lower for terse
languages like TECO and APL,
Teco is not just terse, it requires no lexing or parsing. Like APL, many
of its operators are bulk operators and so most runtime is within their
inner loops. But even APL has been compiled with good results....
higher for verbose languages like
BASIC, FORTRAN, and COBOL.)
...There is another major issue regarding interpretation which puts
both BASIC and C in the "compiled" camp: whether the running
program has a means to modify itself at run time. LISP, for
example, belongs in the "interpreted" camp.
This is not really a problem. The amount of Lisp code which actually
modifies the program written by the programmer is just minuscule. Much
more code _generates_ pieces of Lisp, typically using the `macro' feature.
But almost all cases of macro usage are handled straightforwardly in Lisp
compilers. There are a few cases where Lisp code generates code on the
fly, then executes it. This is handled by having an interpreter loaded
along with the compiled code, and assuring that code can work correctly in
such a mixed environment. In fact, most Lisp implementations _assume_ a
mixed environment....
-s
Compiled Prolog is almost invariably a slightly different language from
the original interpreted Prolog. The reason for this is the following:
>bur...@geech.gnu.ai.mit.edu (Craig Burley) writes:
> ...There is another major issue regarding interpretation which puts
> both BASIC and C in the "compiled" camp: whether the running
> program has a means to modify itself at run time. LISP, for
> example, belongs in the "interpreted" camp.
>
>This is not really a problem. The amount of Lisp code which actually
>modifies the program written by the programmer is just minuscule. Much
>more code _generates_ pieces of Lisp, typically using the `macro' feature.
>But almost all cases of macro usage are handled straightforwardly in Lisp
>compilers. There are a few cases where Lisp code generates code on the
>fly, then executes it. This is handled by having an interpreter loaded
>along with the compiled code, and assuring that code can work correctly in
>such a mixed environment. In fact, most Lisp implementations _assume_ a
>mixed environment....
(I think that a good definition of interpreted language is one where even
if you compile it, you must load an interpreter or compiler with the
object code.)
Many old Prolog programs modify themselves using assert and retract.
Prolog compilers handle this by using the above techniques, but also by
making a slight change to the language: programmers are required to
declare predicates that will be modified with assert/retract as "dynamic".
Without these declarations, it is very difficult get reasonable efficiency
of the compiled code.
--
Fergus Henderson f...@munta.cs.mu.OZ.AU
I used a TECO variant on the DECSYSTEM-20 that we had at school. The
manual claimed to compile the code on the fly before it executed it. The
claimed speedup was on the order of a factor of 5 due to some optimization
tricks. It was called something like NTECO, or something like that.
Warner
--
Warner Losh i...@Solbourne.COM
bur...@geech.gnu.ai.mit.edu (Craig Burley) writes:
I think interpreted languages _could_ compile well if _lots_ of effort
were expended. However, that kind of effort probably isn't worth it --
Could you please define `interpreted language'? Is it anything more than
`a language whose first implementation was interpretive'? In the present
discussion, the language under question is Basic, whose first (and only
for a long time) implementation was in fact compiled. Other languages you
might call `interpreted', like Lisp, Prolog, ML, and Snobol, in fact have
very good compilers, which make major differences in runtime.
I was using the apparent prevailing definition of "interpreted language"
which I then subsequently broke down in a few (but not all) ways. The
prevailing definition, to me, seemed to be "a language I believe is most
often implemented as an interpreter", where "I" is whoever posted the
original question.
Meanwhile, as I think I also pointed out in my post, BASIC was indeed
originally a compiled language, and the first implementation I ever used
(and hacked myself) was compiled. (Note, however, that anyone starting
out using that particular implementation of BASIC would likely assume it
was _interpreted_, and their coding style sometimes changed in obvious
ways once they learned otherwise. A lot of what I was addressing in my
post had to do with how knowledge of the implementation can govern how
code is written, and this can then affect how easy it is to build a
different implementation that mproves performance for existing code.)
the performance gains compared to the effort, that is -- when contrasted
to the effort expended in buying new, faster processors,
The compiler speedup is complementary to the hardware speedup.
Agreed. Memory also is becoming much more plentiful, which generally
improves the ease with which compilers are written more than it does for
interpreters. Nevertheless, interpreters are still, to varying degrees
for each language, easier to write than compilers. Most people experience
BASIC as a personal computer implementation, essentially all of which have
been interpreters. That's probably because building a half-decent
compiler to run on a 4K machine is much harder than building a half-decent
interpreter for that machine.
making the interpreters run faster (often quite easy),
Interpreters can, of course, be bummed (and often are). But in many
cases, this still leaves you far from the compiled speed.
Obviously. But writing the compiler is the trick. Simply writing _a_
compiler is not enough -- you must write _the_ compiler that produces
output that consistently and greatly exceeds the performance of the
fastest interpreter likely, since the fastest interpreter is probably much
easier to build than even a half-decent compiler, and is probably more
portable.
translating programs to
maintenable code in a language designed for compilation, and so on.
This is a possible approach. However, you lose whatever advantages the
`interpreted' language had, e.g. more appropriate abstractions, easier
debugging, fast turnaround, etc. There are of course compiler-based
systems that provide easy debugging and fast turnaround e.g. most Lisp
compilers, but also the CenterLine C (formerly Saber C) system.
Yes, but from what little I've seen, people who write code in BASIC rarely
have a problem with the speed of the resulting program. When they do,
they often are "ready" to move "up" to a language designed for high-speed
execution via compilation, such as C, Pascal, or Fortran, if not Assembly.
It would seem to make sense that there would be a large market for a
top-quality compiler for "interpreted languages" (again, languages for
which most users have access only to an interpreted implementation) such
as BASIC, Hypertalk, and so on, and the existence of this market would
cause such a compiler to be built. As far as I know, such compilers don't
permeate the market the way compilers for C, Pascal, Fortran, or perhaps
even LISP do. I can't say why that is, except to do the usual "punt to
market forces".
I remember a discussion with rms (GNU EMACS author) regarding TECO, the
language in which he first wrote EMACS. TECO was (and still is,
primarily) an interpreted language (well, it's an editor, kind of like
the MS-DOS DEBUG program is a partition editor :-).
Teco is a weird language. Most commands are single-character, and the
main loop of the interpreter essentially dispatches directly to the
relevant routine. So it is a sort of human-readable (well, this is
debatable, too) byte-compiled form.
Obviously it is more human-readable than an object file. One doesn't have
to worry about endianness. :-)
He told me about how someone we both knew had developed (or help
develop) a compiler for TECO and was prepared to demonstrate its
superiority in benchmarks. But submitting one simple case
(something like deleting every other line in the file) proved the
opposite.
Teco is an extreme case, but even so, you could probably _slightly_ reduce
runtime by compiling to machine code. I certainly agree that in most
cases, it would not be worthwhile to try to compile Teco code....
Right, and this makes the likelihood of being able to fund development of
a top-quality compiler for TECO more difficult. However, it is
technically feasible, probably roughly as feasible as doing a top-quality
BASIC compiler.
Remember, the original question was "why is compiled BASIC slower than
[compiled] C?" I'm saying the answer has a lot to do with the fact that
there doesn't seem to be enough of a market for a top-quality BASIC
compiler that might be able to challenge the superiority of C compilers.
On the other hand, an OOP BASIC compiler might be able to make a good run
at most C++ implementations.
(Of course, TECO is probably like APL in that the time
it takes to read in source code and figure out what it means is
minimal, as compared to C, FORTRAN, and COBOL. So the penalty for
being an interpreter could be seen as significantly lower for terse
languages like TECO and APL,
Teco is not just terse, it requires no lexing or parsing. Like APL, many
of its operators are bulk operators and so most runtime is within their
inner loops. But even APL has been compiled with good results....
Yes, and even TECO, BASIC, and all other languages _can_ be compiled with
excellent results, perhaps even superior to what can be achieved via
translation to C. I know, for instance, that a lot of Fortran code cannot
be translated to C without losing some speed (unless the compilers
globally optimize entire programs) and without using extensions to the
language that I've rarely seen implemented.
The question is, why aren't there lots of great compilers for BASIC? The
answer seems to be "because there aren't lots of people demanding them".
...There is another major issue regarding interpretation which puts
both BASIC and C in the "compiled" camp: whether the running
program has a means to modify itself at run time. LISP, for
example, belongs in the "interpreted" camp.
This is not really a problem. The amount of Lisp code which actually
modifies the program written by the programmer is just minuscule. Much
more code _generates_ pieces of Lisp, typically using the `macro' feature.
But almost all cases of macro usage are handled straightforwardly in Lisp
compilers. There are a few cases where Lisp code generates code on the
fly, then executes it. This is handled by having an interpreter loaded
along with the compiled code, and assuring that code can work correctly in
such a mixed environment. In fact, most Lisp implementations _assume_ a
mixed environment....
I don't feel it is a _problem_, but it is, I think, a more important
_issue_ in defining what constitutes an _interpreted_ vs. _compiled_
language. Ideally, one might say a compiled language is one where all the
facilities needed by a program written in that language are identifiable
by analyzing the source program; an interpreted language is one where all
the facilities are available to the running program at all times. By
"facilities" I mean things like library routines, language constructs, I/O
capabilities, and so on.
By this definition, for existence, one could say IBM's JCL is a fairly
pure compiled language, at least as well as I understood it way back when.
Your source program had to identify _all_ the input and output files (data
sets) explicitly -- no runtime determination of which file to open and when
to open it. This seems artificially constraining, but it's really great
for some things that are "meta-language" issues, like scheduling of tasks
(you don't get "deadly embrace" in an environment where multiple JCL jobs
run at the same time like you might in a traditional interactive-shell-
script-as-batch-job environment).
On the other hand, LISP is a fairly pure interpreted language, since,
while running, a program can read in or otherwise construct an entirely
new program and run that. It also can examine and modify itself as
needed.
Fortran and C lean more toward the compiled side, Fortran-77 more so since
its memory needs are statically determinable without global
interprocedural analysis. But they both offer the ability to dynamically
determine names of files to be opened and closed, and both offer dynamic
determination of formatting for strings. Hence, any Fortran or C program
that uses a run-time format or printf-type string, especially if that
run-time string is pulled from an input stream, the compiler must ensure
that _all_ facilities accessible via that mechanism are available at run
time.
In some ways, BASIC is even more of a compiled language than Fortran and
C. (However, this is based on awfully old and probably faulty knowledge
of the full scope of the language.) I don't recall that it has anything
quite like _run-time_ formatting of strings. On the other hand, its
memory needs, unlike Fortran, are not statically determinable. (The
implementation I started out with just let arrays, even string arrays,
grow as much as needed, like C can do via malloc.)
Of course, a big determinant of how easy it is to build a compiler that
produces excellent code is how extensive the built-in type mechanism is
and how much it is used. C and even Fortran are pretty well endowed in
this respect because they offer a fair number of frequently useful types
and programmers tend to use those types appropriately. LISP, Smalltalk,
and BASIC, to varying degrees, offer less of some types (varying kinds of
integers and floats) and more of others (varying-length strings, dynamic
and/or sparse arrays), and programmers in those languages might not always
be as careful to use the most restrictive type minimally necessary to get
the job done as they are with Fortran and C. So, the compiler's job must
include the ability to determine, in place of the programmer, what more
restricted type can be gotten away with to produce more efficient code. C
and Fortran compilers generally don't have to do this to do what is widely
considered to be a good job of compilation. A good BASIC compiler
probably would.
Practically speaking, lots of what makes a compiler or interpreter hard or
easy to implement is not so much the theoretical things as the practical.
C programmers have long designed and implemented code with the idea that
memory is basically flat, pointers can be passed around freely, and so on.
A C interpreter would have to deal with things like how to do ioctl() or
other things where it might need to determine how the programmer expected
memory to be laid out. That might be easy or hard depending on how the
interpreter is designed and what other problems it is solving. A Fortran
interpreter that executes as it reads code without first reading the
entire procedure can't be built anyway, since a DATA statement that
specifies initial values for variables can be at the end of a procedure.
BASIC, on the other hand, ws pretty much designed for this kind of
execution, from what I can remember -- it didn't, years ago, even have to
worry about "separately compiled modules" (where the initial value for a
global variable used by, say, the main program is contained in some other
file or procedure).
And it's actually easier in some ways to build a compiler for a structured
language, as far as how to handle a GOTO from one block to another, as in:
... { ... goto foo; ... }
... { ... { ... { ... foo: ... } ... } ... } ...
The compiler of code like this in ANSI C or Fortran-77 has much less to
worry about than a canonical interpreter, the latter having to worry about
playing around with stacked blocks and so on. This may seem obvious to
most of the readers here, but 've seen at least one implementation of an
interpreter where the designer seemed to have forgotten about cases like
this (it was for a shell language much like PL/I), or at least the code
didn't handle it right.
Ultimately, interpreters and compilers seem to be one of those areas in
which theory is fine, but practice is what governs both perceptions and
realities. There don't seem to be adequate definitions of terms (but then
I'm no theorist) nor summaries of key language facilities that distinguish
between the various languages described (to which could be usefully added
and researched: Forth, MUMPS, Eiffel, ADA...). I wonder how hard or easy
(or useful) it would be to design a language that had facilities that met
today's and tomorrow's projected needs, yet lent itself well to pure
interpretation, pure global compilation, and other flavors in between the
extremes?
--
James Craig Burley, Software Craftsperson bur...@gnu.ai.mit.edu
Are you sure it is?
>2. Why is the basic compilers creating a slower product than the C
>compilers?
[stuff deleted]
> e. Other (?)
Because C is pretty trendy at the moment and so most of the effort in
improving compilers is going into C. Compiled BASIC has never been very
trendy.
I've usually found a good FORTRAN compiler to be better than any C
compiler. That's not because FORTRAN is 'better' than C, it's just
because a lot of people were once interested in making FORTRAN programs
run very fast, so they spent the money to sort the compilers out.
Paul
--
+ Paul Goffin + Crosfield Electronics Ltd. U.K. +44 442 230000x3357 +
[Fortran is easier to optimize than C, mostly because it doesn't allow a lot
of the aliasing possible in C. -John]
True. I could write a small interpreter, be it for Basic or Lisp, with a
fairly reasonable effort, because I could write it without any competence
in (my case) Sparc assembly. If I were to write a compiler for the same
language, I would have to start worrying about register windows and all
that stuff.
When it comes to the early personal computer Basic interpreters, though, I
would suspect that people writing them were more fluent with assembly that
high level languages.
--
Pertti Kellom\"aki
Tampere Univ. of TeXnology
Software Systems Lab
... or by having a COMPILER loaded along with the compiled code ...
Also, much of the kind of programming that people used to do by generating
and executing code on the fly, is now done instead by using functional
programming and closures. In other words, using "lambda" instead of "eval".
Robert Munyer
rob...@metropolis.com