Yet-Another-Crazy-Language-No-one-Will-Ever-Use-Anyway-Except-The-Designer-So-What's-The-Point
!!!
Here it is:
I'd like to create a Virtual Assembly Language. That is, a very
low-level language, that uses assembly-like syntax(ie: no control
structures, no OOP, no if/else statements, etc). This basicly amounts to
designing a simplified, virtual processor. I posted about this idea in
several newsgroups a while back(around the beginning of the year). I got
three main types of feedback:
- The main type I got was people flaming me for being off-topic(which I
guess I sort of was in some cases). Highly helpful!
- Then there were those that said it couldn't be done.
- Others(sometimes, in fact, the same ones as previously!), tried to
convince me that it had already been done many times.
As far as I'm concerned, the last two types are mutually exclusive, so I
couldn't really take people seriously when they preached both. It ends
up giving one the impression that all they really want is for me to
abandon this silly idea, so, of course, I just ignored them!
Usenet's all very well, but it's very conservative. New ideas don't
always go down well.
Anyway, the more I work on this idea, the more feasable it seems, so I'm
not going to be giving up too soon!
I realised that in order to design it in such a way as to make it easy
to implement(in an optimal way), it is necessary to understand the basic
workings of serveral radicaly different architectures, and so I
downloaded lots of info rather recently, and I'm currently making my way
through the SPARC architechture Manual V8.
If you know a bit(it needn't be a lot) about other architechtures, you
could be a great help! Please let me know!
You're probably wondering what I'm doing all this for. Here are some of
the goals/advantages:
- Programs that are now written in real assembly language could be
written in virtual assembly language, which would make it portable,
while still providing many low-level features and being very fast.
After all, even something as simple-seeming as an if/else statement
gets compiled into several lines of assembly language which are hard
to make optimal in a general sort of way. This is of course a classic
argument in favor of assembly language. The reason it applies here is
because my virtual assembly language will deal directly with
registers, jumps, etc. This may seem unportable but I've thought a lot
about it, and actually, it can be quite portable. Deatails if you're
interrested.
- Higher-level languages can be compiled into it in the same way they
would compile into assembly language, only for a different processor,
a virtual one. This means: only one compiler needed, and: the VAL
(Virtual Assembly Language) code assembles into a compact binary
format so source code needn't be distributed in order for it to be
portable(as it is in the case of C).
- It could be very useful on a distributed operating system, because,
if all programs were written using it, the same system could
integrate any manchine: the virtual code for the program would be sent
upon request, and translated to real assembly as it comes in. This
couldn't be done with high-level languages because the compilation
would take to long, and the source code would waste a tremendous
amount of network bandwidth.
If you think of any others, please tell me!! <g>
Thanks a lot!
Jonathan Neve.
--
The manual said "Requires Windows95 or better", so I installed Linux!
|> Usenet's all very well, but it's very conservative. New ideas don't
|> always go down well.
Unfortunately:
1) The basic idea, and variants thereof, are over 40 years old,
has been tried many times, generally unsuccessfully. I will conjecture on
the reasons for this a bit later, but there is not the slightest doubt
in the world that:
(a) It's been proposed before.
(b) And it has not been very successful over any long period.
AND (c) People familiar with computing history understand this, and hence,
it's not a case of USenet rejecting a brilliant new idea, but not
being interested in a rather tired old one ... sorry.
2) This goes back, at least, to UNCOL: Universal Computer-Orientated Language,
proposed in the mid-1950s... It is easy to find references to UNCOL, example:
http://www.hole69.freeserve.co.uk/Uncol.html
It also mentions ANDF, one of the more recent attempts, and has a good quote:
"These are only some of the attempts in creating an UNCOL. Many have been unsuccessful and are rarely talked about, thus making
information on this very difficult to obtain. Also, there have been many attempts of creating parts of an UNCOL with the hope of developing
these into a true UNCOL when machine development and human knowledge has the ability to do so."
"Problem with finding an UNCOL is that it's really hard, on a par with, say, automated translation of poetry from English into Chinese."
Of course, at the binary level, one has Smalltalk, JAVA byte-codes, etc,
but that's not what was proposed.
2) There's an IEEE Standard for Microprocessor Assembly Languauge,
694-1985(R1994)
http://standards.ieee.org/catalog/olis/micro.html
(I don't know, offhand, if anything in particular obeys this standard).
3) Over the years, there have been numerous languages attempting to be
slightly above assembler, although not necessarily in the way that you suggest.
I think of IBM's PL/S (~1970ish), Intel PL/M, etc. BCPL was higher,
but not very high, and there were numerous others.
4) Another interesting example was LIL (Little Implementation Languauge),
writen by P. J. Plauger at Bell Labs in the early 1970s, i.e.,
contemporaneous with C. I won't bother to hunt down the post-mortem
memo he wrote, but the insights on LIL's inability to find a niche
reamin. In particular, he noted that every time it looked like LIL
might start getting more use, Dennis improved C.
5) Once upon a time, most vendor OS's and systems code were written in
assembly code, with a few notable exceptions, like Burroughs' use of
ALGOL (and ESPOL, the systems derivative), MULTICS (PL/I).
This was certainly true in the early 1970s.
There were two distinct, but inter-related issues:
(a) people started with aseembly language for a machine, or a family,
possibly with multiple feature sets, and wanted something a little
higher-level, but not so high-level that they couldn't get to specific
machine features, or couldn't get very good code.
Attempts to solve this problem typically involved making assembly
languages with more features, or fairly low-level languages (like BCPL).
(b) Good compiler construction was hard work, portable compilers were
few and far between, code generation strategies were rather ad hoc, etc.
The wish to fix this was embodied in UNCOL, with the idea that if you
had M languages and N machines, instead having M*N compilers, you
could have M translators to UNCOL, and N translators from UNCOL to machine
code. As noted, this never really made it.
6) What really happened (in mid-1970s)
(a) Compiler technology got better, and tools for building
compilers got better, and code generation got much more formalized.
[I.e., this is when things like lex and yacc got written,
as well as a lot of the Aho/Ullman work on compiler theory.]
(b) Inside Bell Labs, C ran over the other internal contenders
for this niche, i.e. "high-level assembler language", and was
good enough that people shifted away from assembler code into
C for large amounts of systems code. People ported Dennis'
PDP-11 compiler to various other machines, but there was still a real
hassle with the differences of system environments.
Neverthless, by mid-1970s, people were shifting, even for systems
code on small minicomputers into the mode that has persisted:
(1) Use assembler code for access to machine-dependent features.
(2) Write assembler code for heavily-used libraries.
(3) Otherwise, use C (or even better, shell, awk, other
scvripting languages, etc).
(c) BTL then ported UNIX to an Interdata 8/32, and as an integral
part of that effort, Steve Johnson wrote the Portable C Compiler,
geared to make for relatively easy retargeting. A key reference
would be: S. C. Johnson and D. M Ritchie, "Portability of C
Programs and the UNIX System", Bell System Technical Journal,
July-August 1978, VOl. 57, No 6, Part 2, 2021-2048.
[UNIX got popular; C got popular, even otuside UNXI environments.]
(d) In the early 1980s, another round of improvements came in
compiler optimization techniques [especially: IBM, HP, Stanford],
and as RISCs came in, with good optimizing compilers, there was
even less need to write assembly code, as the compilers were
somes amazingly good. [In some cases, at MIPS, in the early days,
I wrote assembler code for routines for which one typically
wrote asembler code ... only to find out that the compiler-generated
code was getting "good enough".]
7) By now:
(a) People reserve assembler language code for the very
most machine-specific code, particulary that which is not well-expressed
by high-level languages. When they want that detailed control,
that's what they want ... the *last* thing in the world they want
is some slightly-above-assembler-language thing that doesn't let
them do waht they need to do, in the name of a portability that
isn't very useful.
(b) And otherwise, write in whatever language makes sense, of
which C would be the *lowest* level.
(c) These days, it is far easier to go from high-level languages
to machine code, with the sorts of machine-specific tables that
retargetable compilers have had for years, than it is to to want
to have humans writing masses of higher-than-assembler, lower-than-C
code.
8) So, in summary:
(a) This is an old, old idea.
(b) It's been tried, in various forms, many times.
(c) It hasn't been successful, and is not very likely to be.
--
-John Mashey EMAIL: ma...@sgi.com DDD: 650-933-3090 FAX: 650-933-2663
USPS: SGI 1600 Amphitheatre Pkwy., ms. 562, Mountain View, CA 94043-1351
SGI employee 25% time, non-conflicting,local, consulting elsewise.
This is not AT ALL what I'm trying to do! Rather, I want to go the other
way:
write assembly language code for a virtual processor, which is designed
in such
a way as to make the translation not too damaging to the speed of the
code, no matter
which processor it is translated on. I don't want "something a little
high-level".
> Attempts to solve this problem typically involved making assembly
> languages with more features, or fairly low-level languages (like BCPL).
>
> (b) Good compiler construction was hard work, portable compilers were
> few and far between, code generation strategies were rather ad hoc, etc.
> The wish to fix this was embodied in UNCOL, with the idea that if you
> had M languages and N machines, instead having M*N compilers, you
> could have M translators to UNCOL, and N translators from UNCOL to machine
> code. As noted, this never really made it.
Fine. However, I'm not trying to do that. It might be another advantage
if I could,
but I could do without the ability to have high-level languages compile
efficiently.
I don't quite see why not though.
> 6) What really happened (in mid-1970s)
> (a) Compiler technology got better, and tools for building
> compilers got better, and code generation got much more formalized.
> [I.e., this is when things like lex and yacc got written,
> as well as a lot of the Aho/Ullman work on compiler theory.]
>
> (b) Inside Bell Labs, C ran over the other internal contenders
> for this niche, i.e. "high-level assembler language", and was
> good enough that people shifted away from assembler code into
> C for large amounts of systems code. People ported Dennis'
> PDP-11 compiler to various other machines, but there was still a real
> hassle with the differences of system environments.
> Neverthless, by mid-1970s, people were shifting, even for systems
> code on small minicomputers into the mode that has persisted:
> (1) Use assembler code for access to machine-dependent features.
> (2) Write assembler code for heavily-used libraries.
> (3) Otherwise, use C (or even better, shell, awk, other
> scvripting languages, etc).
"even better"?!? That sort of depends what you're trying to do! I'm
quite glad all my apps
aren't written in Java or something crazy like that! Scripting languages
are fine for, well...
scripts!! It's not enough just to say that the hardware is getter better
and faster all the time, so we
can afford sloppy, clumbersome, slow code.
>
> (c) BTL then ported UNIX to an Interdata 8/32, and as an integral
> part of that effort, Steve Johnson wrote the Portable C Compiler,
> geared to make for relatively easy retargeting. A key reference
> would be: S. C. Johnson and D. M Ritchie, "Portability of C
> Programs and the UNIX System", Bell System Technical Journal,
> July-August 1978, VOl. 57, No 6, Part 2, 2021-2048.
>
> [UNIX got popular; C got popular, even otuside UNXI environments.]
>
> (d) In the early 1980s, another round of improvements came in
> compiler optimization techniques [especially: IBM, HP, Stanford],
> and as RISCs came in, with good optimizing compilers, there was
> even less need to write assembly code, as the compilers were
> somes amazingly good. [In some cases, at MIPS, in the early days,
> I wrote assembler code for routines for which one typically
> wrote asembler code ... only to find out that the compiler-generated
> code was getting "good enough".]
>
> 7) By now:
> (a) People reserve assembler language code for the very
> most machine-specific code, particulary that which is not well-expressed
> by high-level languages. When they want that detailed control,
> that's what they want ... the *last* thing in the world they want
> is some slightly-above-assembler-language thing that doesn't let
> them do waht they need to do, in the name of a portability that
> isn't very useful.
I intend, precisly, to give "detailed control", and "things which aren't
well expressed by high-
level languages".
For example, my virtual processor will provide a very large set of
virtual registers. Anything that can't be fit
in the real registers would get placed in memory by the translator. This
may seem simplistic, but actually
there's a lot more to it than that, and that part of the language has
been very well thought out.
> (b) And otherwise, write in whatever language makes sense, of
> which C would be the *lowest* level.
> (c) These days, it is far easier to go from high-level languages
> to machine code, with the sorts of machine-specific tables that
> retargetable compilers have had for years, than it is to to want
> to have humans writing masses of higher-than-assembler, lower-than-C
> code.
So what? I never said I was designing the language for "ease of
programming" or any such thing; This
language would be useful in the same situations(almost) in which
assembly language is.
> 8) So, in summary:
> (a) This is an old, old idea.
> (b) It's been tried, in various forms, many times.
> (c) It hasn't been successful, and is not very likely to be.
>
Jonathan Neve.
>
> An even bigger problem is posed by the numbers and type of registers
> available. You can't possibly hand produce assembly code optimal for
> everything from the x86 to the Itanium,
hand-produce?!? What do you think the translator's for? You should
hand-produce code
that is optimal for the Virtual Processor. That's all. I realise that
this could be a problem, but
I've found a solution to it.
> and if you leave details such as
> register allocation to the assembler then you might as well just use C
> in the first place.
Quite true. But I don't.
>
> You're doomed, *doomed* I tell you.
Hmm...
Jonathan Neve.
So, what is your solution then? Can you give some examples ilustrating at
least the basic concepts behind your Virtual Assembly language.
Tzvetan
In universities all over students are making assemblers,
compilers, and virtual cpus, as part of their CS
course work. One tool I think they use is:
http://www.gnu.org/manual/bison/html_chapter/bison_toc.html
Creation of new languages is not my field.
I hope this helps you find tools and info on the subject.
Oh and if your doing this because you want to know
how, or think its interesting, Who cares if it is old news.
--
------------------------------------------
Mark & Candice White
System programming hobbyists.
http://members.home.net/mhewii/welcome.htm
An even bigger problem is posed by the numbers and type of registers
available. You can't possibly hand produce assembly code optimal for
everything from the x86 to the Itanium, and if you leave details such as
register allocation to the assembler then you might as well just use C
in the first place.
You're doomed, *doomed* I tell you.
-- Bruce
>I don't think you can portably get any lower-level than C or Scheme.
see the C-- project (http://www.cminusminus.org/)
--
mac the naรฏf
"You might also generate C, if you can afford its calling conventions.
And forget about proper tail calls, computed gotos, accurate garbage
collection, efficient exceptions, or source-level debugging."
OK, sounds cool.
Is this real enough that we should seriously consider e.g. retargetting
d2c (http://www.gwydiondylan.org) at it?
-- Bruce
Well is there anything NEW and ADVANCED that is less than several
meg to run...? I have several 386 machines with 40 meg HD's here.
Ben.
--
"We do not inherit our time on this planet from our parents...
We borrow it from our children."
"24 bit CPU's R us" http://www.jetnet.ab.ca/users/bfranchuk/index.html
> > >I don't think you can portably get any lower-level than C or Scheme.
> > > see the C-- project (http://www.cminusminus.org/)
> <cut>
> > Is this real enough that we should seriously consider e.g. retargetting
> > d2c (http://www.gwydiondylan.org) at it?
>
> Well is there anything NEW and ADVANCED that is less than several
> meg to run...?
This is a comment to what, exactly? The 96 MB RAM recommendation for
using the CodeWarrior plug-in?
d2c is fairly piggy on RAM. It peaks at about 65 - 70 MB RAM usage when
bootstrapping itself (on Linux -- it takes about 15 minutes on my Athlon
700). On the other hand I've seen C++ compilers get up into those
regions as well. Don't even ask about the Cecil compiler...
> I have several 386 machines with 40 meg HD's here.
I've got a Mac 128 with a 400 KB floppy, and a Sinclair ZX-81 with 1 KB
RAM. I don't try to do real work on them any more and the Athlon cost
me around a quarter of what the Mac 128 did (or an eighth of what the
first Mac II or PC/AT cost) -- and that's not counting 15 years of
inflation.
-- Bruce
> > Well is there anything NEW and ADVANCED that is less than several
> > meg to run...?
>
> This is a comment to what, exactly? The 96 MB RAM recommendation for
> using the CodeWarrior plug-in?
>
> d2c is fairly piggy on RAM. It peaks at about 65 - 70 MB RAM usage when
> bootstrapping itself (on Linux -- it takes about 15 minutes on my Athlon
> 700). On the other hand I've seen C++ compilers get up into those
> regions as well. Don't even ask about the Cecil compiler...
Bootstable -- thats good.
I was thinking of the C-- stuff for memory logging.
For what little programing I do ( C mostly)
I use a 80x25 text editor,nothing fancy. le under linux,ne or t
under dos.t is a nice 4096 BYTE sized com file.
It just seems a odd that most computer work does not require
large machines regardless what the PR people say and things
are some what oddly scaled. A 40k C file and 40meg C compiler
and 400meg OS?. (sizes picked at random). While I don't expect
to go back to using a PDP-8 I can't afford a new computer every
2 years.
"Lean and Mean" are forgotten words for computer development.
I like Linux but that is getting a bloated in the boot floppies.
(I have a CD-rom but I can't boot from it). I use debian 2.1
and keep the boot floppies around for tar-ing my HD to zip disks
and fixing up my HD on system crashes. (Debian I can upgrade over the
MODEM) not like all the other linux versions.
> > I have several 386 machines with 40 meg HD's here.
>
> I've got a Mac 128 with a 400 KB floppy, and a Sinclair ZX-81 with 1 KB
> RAM. I don't try to do real work on them any more and the Athlon cost
> me around a quarter of what the Mac 128 did (or an eighth of what the
> first Mac II or PC/AT cost) -- and that's not counting 15 years of
> inflation.
True - but remember the famous quote "All computers wait at the same speed"
I am just grumbling over the large size of everything tonight.
> It just seems a odd that most computer work does not require
> large machines regardless what the PR people say
I agree. The last time I bothered to upgrade the speed of my main Mac
was three years ago when I put a 266 MHz "G3" CPU card into an already
nearly three year old machine. (I've since bought -- nearly two years
ago -- a laptop computer at the same performance level).
I have no current plans to get a faster machine. It's just fine for
everything I do on it.
On the PC side, I now have (as I said) a 700 MHz Athlon. It replaces a
200 MHz machine (Pentium Pro) which certainly doesn't owe me anything at
this point in time and the main problem with it was that 72-pin ECC
memory is getting nigh on impossible to find now..
> "Lean and Mean" are forgotten words for computer development.
I try pretty hard to produce lean and mean programs of my own -- and I
test them on old, slow machines, and if they're not usable there then I
do something about it. But I like a pretty grunty machine for doing the
development on.
-- Bruce
Take care
SG
Note that gcc provides something fairly similar to computed gotos
(label variables) as an extension to C, C++, and possibly other
languages.
--
-- Jonathan Thornburg <jth...@thp.univie.ac.at>
http://www.thp.univie.ac.at/~jthorn/home.html
Universitaet Wien (Vienna, Austria) / Institut fuer Theoretische Physik
"Nuclear powered vacuuum cleaners will probably be a reality within
10 years." -- Alex Lewyt (President of the Lewyt Corporation,
a leading vacuum-cleaner manufacturer), 10 June 1955
> In article <bruce-37E6E7....@news.akl.ihug.co.nz>,
> Bruce Hoult <br...@hoult.org> wrote [possibly quoting someone else,
> I can't quite tell who]
It was from the web page referred to by Alex Colvin, at
<http://www.cminusminus.org/>.
> >"You might also generate C, if you can afford its calling conventions.
> >And forget about proper tail calls, computed gotos, accurate garbage
> >collection, efficient exceptions, or source-level debugging."
>
> Note that gcc provides something fairly similar to computed gotos
> (label variables) as an extension to C, C++, and possibly other
> languages.
Yes. It also has nested functions, tail call elimination (in some
circumstances, I believe), and other good stuff that you don't find in
(other) C compilers.
-- Bruce
> 2) This goes back, at least, to UNCOL: Universal Computer-Orientated Language,
> proposed in the mid-1950s... It is easy to find references to UNCOL, example:
> http://www.hole69.freeserve.co.uk/Uncol.html
> It also mentions ANDF, one of the more recent attempts, and has a good quote:
The only thing that made serious use of ANDF that I am aware of is the
tendra compiler suite. I'm not sure ANDF is well-suited for direct execution,
it's a bit abstract (imho) for that. It's still a good intermediate
language.
[snip]
> 2) There's an IEEE Standard for Microprocessor Assembly Languauge,
> 694-1985(R1994)
> http://standards.ieee.org/catalog/olis/micro.html
> (I don't know, offhand, if anything in particular obeys this standard).
Wasn't it the standard that covered sparc (V8?)? IIRC, of course.
> --
> -John Mashey EMAIL: ma...@sgi.com DDD: 650-933-3090 FAX: 650-933-2663
> USPS: SGI 1600 Amphitheatre Pkwy., ms. 562, Mountain View, CA 94043-1351
> SGI employee 25% time, non-conflicting,local, consulting elsewise.
--
Sander
FLW: "I can banish that demon"
> Here it is:
> I'd like to create a Virtual Assembly Language. That is, a very
> low-level language, that uses assembly-like syntax(ie: no control
> structures, no OOP, no if/else statements, etc). This basicly amounts to
> designing a simplified, virtual processor.
Hasn't this already been done under the name of "Java bytecode"?
Matthew Huntbach
david | david...@cognent.com
Jonathan Neve.
Anyway, my idea is to give a very large amount of virtual registers(2 to
the power of 64, so that the register number can be expressed using
64-bits), referred to by number. For ease of programming, a name could
be given to a VR(Vitual Register), so that it could be used sort of like
an ordinary variable. The Virtual Instructions could use the VRs as if
the real architechture were as simple as that. In actual fact, the
translator would be fairly complex(especially on crazy processors like
the x86 where no register is _truely_ general-pupose! What I propose is
that the translator interpret and translate simultaneously. That is, it
acts as though it really were a processor, with a great number of
registers, etc. When it sees an instruction telling it to give a certain
value to some VR, it simply does so, without producing any code at all.
Later, it's highly likely that it will come across an instruction using
that VR. When it does, the translation is easy, because it knows where
it should the value(ie which registers, or as immediate values, etc.).
For example, the following code:
set $1,34
set $2,45
simply gives values to the first and second VRs. If space had not yet
been allocated for the corresponding translator data-structures(because,
of course, with so many VRs, a plain old table just won't do!;-)), then
it is allocated. This code still gives no indication to the translator
as to which _real_ register it should use, so no code is produced. If
later it finds:
add $1,$2,$3
(using the SPARC-like triadic notation, which I quite like!), then it
will produce code to assign the values to the correct registers, and to
perform the add operation. The translator of course is
platform-dependant, so it will know exacly where the add instruction
leaves its result. On the other hand, the programmer that wrote the
above line evidently expects to be able to access that result using the
$3 VR. To keep up the myth that the VRs are real, the translator places
a code in the data structure corresponding to the $3 VR, to say which
_real_ register its contents are currently placed in, and sets a flag in
it indicating whether or not that value is somewhere real. Before the
add instruction, this field was cleared in both $1 and $2, because the
code to place them somewhere real had not yet been produced. When this
flag is clear, the value in the VR is just that: a numeric value. If
it's set, the value tells the translator which register the real value
is in, or,(depending on another flag), its memory location. All through
the translation of the program, the VRs are kept up to date with the
current values that would be in the registers if the _real_ code
generated up to this point were executed.
That's why it's somewhat as though it were interpreted while being
translated. The VRs would be sort of like interpreting the program one
instruction at a time, and updating some local variables to reflect the
new state of the registers every time. The only difference is that when
the is on the real hardware, the VRs merely tell where it is to be
found, they do not(nor can they) contain the real values; also, of
course, the VRs do not correspond directly to any predefinable processor
registers: they are assigned storage on the real hardware whenever it's
needed, but there's no telling which one it will be.
If a value if assigned to a VR that already refered to a value present
on hardware, the the translator frees up the resourses used by the
previous value, by indicating as free the register it was in(the
translator should maintain a table of real registers as well as VRs, so
that it can check whether a given register(or any) is free, and perhaps
for some other similar purposes), or by decrementing the translator's
next_free memory pointer(if the translator is so designed). Thus, any
previous values will be overwritten, as indeed one would hope and
expect.
Coming back to my previous example, suppose the add instruction were
followed by a sub instruction(I really don't have much imagination, do
I? I somehow can't get out of the basic arithmetic instructions <g>).
Say the code were:
sub $3,$2,$1
The translator would find that $3 is on the machine already, and would
locate it. If necessary, it would then produce code to move it to a
place more convenient for the sub instruction. The same is true for $2:
even though its contents are predictable, as it hasn't been touched
since the value was put there, there's no sense it loading it into the
registers if it's already there. The translator then generates the sub
code, and sets the $1 VR to point to the location of the result of the
sub operation.
Details might vary more or less depending on the architechture the code
is to be produced for, but I think the scheme here described could be
rather easily implemented, and could allow for maximum usage of the
variables without the programmer having to know how many there are, or
how they are organized. If he/she tries to use one that doesn't exist,
main memory can be used. Or rather, the previous contents of the coveted
register(s) can be sent to memory, because chances are, the newer the
value, the sooner it will need to be used.
As I said, this is the only part of the language I can talk about to any
great length, and this posting is long enough all ready, so I'll leave
all the rest for later.
>
> Tzvetan
Jonathan Neve.
For what it's worth, I have heard that the Elate/Intent embedded CPU
systems from TAO use a vaguely similar VM, with an effectively infinite
number of (typed) virtual registers, but a difference in that the VM
instruction set (designed for compactness rather than anything else)
is compiled down to real machine instructions on the spot, so that
once load/link/whatever is done, no one sees the VM version.
paul
> Note that gcc provides something fairly similar to computed gotos
> (label variables) as an extension to C, C++, and possibly other
> languages.
Fortran (g77) doesn't need it - we have ASSIGNED GOTO in the Standard.
--
Toon Moene - mailto:to...@moene.indiv.nluug.nl - phoneto: +31 346 214290
Saturnushof 14, 3738 XG Maartensdijk, The Netherlands
GNU Fortran 77: http://gcc.gnu.org/onlinedocs/g77_news.html
GNU Fortran 95: http://g95.sourceforge.net/ (under construction)
Unfortunately there is nothing in your idea that would make the register
allocation simpler. The virtual assembler/translator would have to do the
same thing that a compiler does, fight the same problems but with more
severe limitations. Generally a compiler can extract additional information
from the source code and use it to heuristically employ different code
generation strategies.
Another problem with your approach is instruction scheduling. Scheduling is
in direct relation with register allocation in the sense that they have
conflicting goals. The register allocator attempts to decrease the number of
necessary registers while the scheduler needs more registers in order to
move instructions around. Anyway, obviously scheduling will also have to be
done by the virtual assembly translator. The programmer will have no way of
affecting the results - the quality of the generated code will depend on the
quality of the translator. Usually when using assembler one has exactly the
opposite goal - to be able to manually schedule everything and control the
output quality.
Another problem is spilling (loading/storing register contents from memory).
Spilling can safely be done by a compiler, but not by the virtual assembly
translator. When writing low level code there are cases when there is
nowhere to spill at all - the stack may not have been setup yet, the segment
registers may have undefined values, etc. Unfortunately there is no area
that will always be available for spilling - especially if you consider
multitasking, interrupt code, etc. I can imagine cases when there should be
absolutely no accesses to memory whatsoever.
So, when you consider the restrictions imposed by the virtual assembler it
is hard to image cases when one would want to use it to manually write code.
I agree that there is another potential area of application : distributing
platform independent code. Although it is considered that UNCOL has failed,
we are observing that JVM code is being used in essentially the same way. It
seems to me that JVM is too narrowly geared towards Java, so another
intermediate language could be more suitable common use.
Tzvetan
> > So, what is your solution then? Can you give some examples ilustrating
> > at least the basic concepts behind your Virtual Assembly language.
>
> The solution I mentioned concerns only the use of registers, as that's
> what Bruce was talking about. Pulled out of context it sounds like a
> rather lofty statement, far more general than I meant it. Unfortunately,
> I have not been on this language long enough for all points to be well
> thought out or clear to me. In fact, register allocation is the one I've
> thought the most about, and, as I said, I think I've found a solution
> concerning such problems. The rest of the language is still expressed in
> rather vague terms, and not fully specified by any means.
That's fine. If you can manage to improve the state-of-the-art for
register allocation that that's a major advance in itself.
I really don't see how this differs from being exactly what a
conventional modern compiler will do for virtually any language, where
the programmer writes...
int a = 34;
int b = 45;
int c = a + b;
... except that modern compilers will go one step further and actually
do ...
set $3,79
... by performing the add at compile-time, not runtime.
Any modern compiler *already* allows you to have an infinite number of
"virtual registers" (which are called "variables", or "bindings"), and
the compiler already allocates them to registers as required. In fact,
recent compilers go a good further than that, in that they consider each
assignment to a variable to create a *new* variable, which may live in a
different real register.
The big problem is not in doing this -- it's pretty trivial -- but in
establishing the policy for what to do when you run out of real
registers. How do you decide exactly which variable to kick out into
RAM? Ideally, it will be the one that is not needed for the longest
time in the future, but that is a very very hard problem.
In practical terms, this is a big problem on machines such as the x86,
which have few registers, but it's much less of a problem on typical
RISC machines which have 32 registers. Few functions have more than 32
arguments and local variables -- I think the only time in recent memory
when I've come across such a situation was in the RC5 encryption
cracking program, which really seemed to want about 34 registers.
Compilers also do lots of nice things for you that I don't think your
pseudo-assembly language is likely to do. For example, compilers often
automatically unroll or "software pipeline" loops.
> As I said, this is the only part of the language I can talk about to any
> great length, and this posting is long enough all ready, so I'll leave
> all the rest for later.
I'm afraid that, so far, I don't see anything new or valuable.
People have for decades been searching for (trying to design) an
architecture-neutral low level language, to serve as a target for
compilers, to allow distribution of "object code" between different
types of machines, and to provide a bit of security through obscurity
for commercial products. The only ones which have in any big way caught
on are ANSI C and the Java Virtual Machine (and the UCSD P-engine before
it).
Today there are a *lot* of language compilers out there that produce C
source code as their output. This has some problems, but register
allocation is *not* one of them.
Today there is also a lot of work being done on compiling Java bytecodes
to machine code, and compilers are starting to appear for other
languages that produce Java bytecodes.
Everything you say is quite sensible. The problem is that you don't
seem to be aware of the existing state of the art, and you might want to
study that a bit more.
-- Bruce
> "John R. Mashey" wrote:
> >
> > 2) This goes back, at least, to UNCOL: Universal Computer-Orientated Language,
> > proposed in the mid-1950s... It is easy to find references to UNCOL, example:
> > http://www.hole69.freeserve.co.uk/Uncol.html
> >
> > It also mentions ANDF, one of the more recent attempts, and has a good quote:
> >
> > "These are only some of the attempts in creating an UNCOL. Many have been unsuccessful and are rarely talked about, thus making
> > information on this very difficult to obtain. Also, there have been many attempts of creating parts of an UNCOL with the hope of developing
> > these into a true UNCOL when machine development and human knowledge has the ability to do so."
> >
> > "Problem with finding an UNCOL is that it's really hard, on a par with, say, automated translation of poetry from English into Chinese."
> >
> > Of course, at the binary level, one has Smalltalk, JAVA byte-codes, etc,
> > but that's not what was proposed.
> >
<snip>
> > (b) Good compiler construction was hard work, portable compilers were
> > few and far between, code generation strategies were rather ad hoc, etc.
> > The wish to fix this was embodied in UNCOL, with the idea that if you
> > had M languages and N machines, instead having M*N compilers, you
> > could have M translators to UNCOL, and N translators from UNCOL to machine
> > code. As noted, this never really made it.
>
> Fine. However, I'm not trying to do that. It might be another advantage
> if I could,
> but I could do without the ability to have high-level languages compile
> efficiently.
> I don't quite see why not though.
So you're not trying to write a machine independent assembly language, such
that it only needs a simple translator to turn it into machine code for the
specific architecture?
If that's so, then I don't understand what you are proposing, and I'm not
sure many others here do either.
If you are, then it sounds like a worthy goal, but John's points stand. I
suspect your language will end up being so general that it will look a lot
like a HLL. You will still have the register allocation task, in mapping
from your large virtual register set to a smaller real one (of varying
sizes). Essentially you have a register for each variable in the high level
program. The translator must decide when they are no longer needed, so the
register can be reused, or when it would be better to write it to memory to
free it for some other use. This is one of the major tasks of a compiler,
and is critical for efficiency.
You might want to take a look at intermediate languages from various
compilers, this sounds a lot like what you are proposing. The gcc
intermediate format has *many* target architectures.
>Scripting languages
> are fine for, well...
> scripts!! It's not enough just to say that the hardware is getter better
> and faster all the time, so we
> can afford sloppy, clumbersome, slow code.
No, but we can afford "fast enough" code. I don't really care if it takes
.5sec or 0.05sec to run my script. On my machine it doesn't make any
difference to me.
> I intend, precisly, to give "detailed control", and "things which aren't
> well expressed by high-
> level languages".
> For example, my virtual processor will provide a very large set of
> virtual registers. Anything that can't be fit
> in the real registers would get placed in memory by the translator. This
> may seem simplistic, but actually
> there's a lot more to it than that, and that part of the language has
> been very well thought out.
Do tell! Providing detailed control to the programmer without architecture
specific knowledge sounds like quite a task.
Scientists will always be sceptical, it's part of the job description! The
only way some people will be convinced is if you do it. I'd love to see
some simulator results giving hard evidence that it will work.
David
--
J A David McWha | ja...@SPAMcs.FREEwaikato.ac.nz
WarpEngine Group, | URL: http://www.cs.waikato.ac.nz/~jadm
Computer Science, | "Gather ye rosebuds while ye may,
University of Waikato | Old time is still a-flying..."
jth...@mach.thp.univie.ac.at (Jonathan Thornburg) writes:
>Note that gcc provides something fairly similar to computed gotos
>(label variables) as an extension to C, C++, and possibly other
>languages.
However, that does not mean that gcc makes C-- unnecessary. C-- was started
by people writing compilers targeting gcc, who have found that while gcc
is a better target than plain C and about as good as a general purpose
language can get, it still causes a lot of headaches, for the reasons
given in the list above. I also work on a compiler that targets gcc,
and my experience agrees with theirs.
Zoltan Somogyi <z...@cs.mu.OZ.AU> http://www.cs.mu.oz.au/~zs/
Department of Computer Science and Software Engineering, Univ. of Melbourne
> > > Here it is:
> > > I'd like to create a Virtual Assembly Language. That is, a very
> > > low-level language, that uses assembly-like syntax(ie: no control
> > > structures, no OOP, no if/else statements, etc). This basicly amounts to
> > > designing a simplified, virtual processor.
> > Hasn't this already been done under the name of "Java bytecode"?
> !!!!!!!!... er... NOOO!!!!
> Java?!? Java bytecode is interpreted. Java is a very high-level OOP
> language. Fine. But not much like assembly language.
My understanding is that Java byte-code is a low-level language into
which Java is compiled, it's like an assembly language, but it's not an
assembly language for a real machine, so that it can be used on any
machine. Sounds to me very much like your Virtual Assembly Language.
I'd have thought you'd appreciate what I pointed out, since others have
suggested your idea is nonsense. I'm saying that someone else has had a
similar idea, and it has been very successful.
> Besides, Java bytecode is designed to be interpreted, not staticly
> compiled.
And your Virtual Assembly Language is going to be ... ? Well, I didn't
see anything in your original message that suggested it wouldn't be
compiled. And a compiler for Java bytecode is not a highly complex thing.
> That means that it's not likely to be optimal. Sure the JVM,
> is, as its name implies, a virtual machine. But its assembly language(as
> it were!), if somewhat abstract, and, I dare say, very high-level(as
> compared to real machine-code). This is also due in part to the fact
> that Java tries to be(and, indeed, is), operating system independant. I
> do NOT plan to do any such thing.
But I thought the whole point of what you were doing was so that it could
run on anything.
Matthew Huntbach
>I'm afraid that, so far, I don't see anything new or valuable.
>People have for decades been searching for (trying to design) an
>architecture-neutral low level language, to serve as a target for
>compilers, to allow distribution of "object code" between different
>types of machines, and to provide a bit of security through obscurity
>for commercial products. The only ones which have in any big way caught
>on are ANSI C and the Java Virtual Machine (and the UCSD P-engine before
>it).
Yeah, the wheels surely get invented often, don't they?
I was thinking about the same thing, but more with regards
to an OS design. Inspired by the idea of exokernel, I was
wondering if certain protection mechanisms could be bypassed
by the use of an intermediary byte code. The performance
gains that MIT Exokernel OS claims is pretty wild (though
it doesn't provide a clear reference point and the web page
hasn't been updated in over two years). It would have a
side effect of having processor-specific optimization and
allow portable/mobile code.
>Today there is also a lot of work being done on compiling Java bytecodes
>to machine code, and compilers are starting to appear for other
>languages that produce Java bytecodes.
Just curious, if you translate Java bytecodes into a
specific processor, how much (in terms of speed of the
resulting machine code) do you lose by the loss of the
original source code (be it scheme, c, java, whatever)
How much do you gain by knowing the exact configuration
of the machine it runs on? When Java first came out, I
read some claims that JIT Compilers could actually bea
native compilers because it can do machine specific
optimizations whereas other compilers need to guess
about the target machine. I haven't seen these claims
much lately.
Dan.
> Thanks for the history lesson! Still, not very convincing!
Firstly you should probably research John's history a bit before
saying things like this. Secondly, your idea of virtual registers is
even present in an open-source compiler which you could take a
detailed look at before doing a lot more work on your project, look at
the following
http://www.gnu.org/philosophy/stallman-kth.html
and search for "pseudo register number"
--
Chris Morgan <cm at mihalis.net> http://mihalis.net
Temp sig. - Enquire within
>
> > Besides, Java bytecode is designed to be
interpreted, not staticly
> > compiled.
>
> And your Virtual Assembly Language is going to
be ... ? Well, I didn't
> see anything in your original message that
suggested it wouldn't be
> compiled. And a compiler for Java bytecode is
not a highly complex thing.
>
?
I didn't say that my Virtual Assembly Language
wouldn't be compiled. On the contrary. But it
wouldn't be interpreted.
> > That means that it's not likely to be optimal.
Sure the JVM,
> > is, as its name implies, a virtual machine.
But its assembly language(as
> > it were!), is somewhat abstract, and, I dare
say, very high-level(as
> > compared to real machine-code). This is also
due in part to the fact
> > that Java tries to be(and, indeed, is),
operating system independant. I
> > do NOT plan to do any such thing.
>
> But I thought the whole point of what you were
doing was so that it could
> run on anything.
Well yes and no. Any processor. But to be
OS-independant, you need to redefine a whole
abstract OS interface(as do Java and ANSI C). I'm
not sure I plan to do that. However, having said
this, it would certainly be useful. Perhaps it
could be added on later. I don't know. I just
don't want to concentrate on that for now, as
there are more vital things, like making it
processor independant in the first place. And
there are still plenty of things that a virtual
assembly language without OS-independance can be
useful for. When real assembly language is used,
it's not OS-independant either. So in places where
asm is used now(ok, perhaps not all, but at least
some), my Virtual asm could be used, provided, of
course, that it succeeds in its goal of being
low-level enough to go fast, but abstract enough
to be portable. There are a couple other things it
could be useful for, as I describe in some other
message. However, as I said, adding
OS-independance might be a worthwhile feature. My
ideas upon this project are subject to change, so
I'd like to get the main part done first(ie:
processor independance), and leave the rest for
later, if at all.
Jonathan Neve.
> Matthew Huntbach
>
Sent via Deja.com http://www.deja.com/
Before you buy.
> Typically in a compiler the intermediate code is similar to
> your Virtual Assembly language - it uses an unlimited amount of
temporaries
> (or registers if you like) that are later assigned to actual registers
by
> the register allocator and the code generator.
> I have to warn you that register allocation is one of the most
important
> aspects of a compiler. It is not a trivial task by any means and the
quality
> of register allocation is in direct correlation with the resulting
speed.
>
> Unfortunately there is nothing in your idea that would make the
register
> allocation simpler.
I'm starting to realize that, alas, there's nothing very innovative
about the strategy I suggested! Still, there _IS_ one thing I've found
(since I posted my last message), that makes this strategy, though not
very different from what is commonly done, somewhat more efficient than
the way it's done in, say, C, unless of course, I'm yet again completely
mixed up as to the inner workings of a C compiler! ;-(
That is, that the Vitual Registers are referred to by number, not by
name. To make the language easier to use, I had already planned to
provide a simple Vitual instruction, probably called 'def', that would
assign a name to one of the Virtual Registers. When a new variable is
needed, the programmer will have to know which of the Virtual Registers
are currently in use. There's no good reason why he/she would use a
register they had never used before, if it turns out that there's
another variable that they no longer need. In C however, it's generaly
considered bad practice to do such a thing, because, the name being
indissociable from the "virtual register", the good descriptive name
given to the first variable would then be used to describe some other,
entirely different variable. This and/or another Virtual Instruction
called 'free', that would be used to to up the space used by a variable
as soon as you are sure it's not needed any more might be enough to get
rid of garbage collection of any other kind. This may be more or less
important, as far as speed goes, but is _seems_ a good thing, even if it
is more of a psychological reason than anything else. Because of course,
the same thing can be done in C; however, I think good programming style
would usually avoid it.
> The virtual assembler/translator would have to do the
> same thing that a compiler does, fight the same problems but with
more
> severe limitations. Generally a compiler can extract additional
information
> from the source code and use it to heuristically employ different code
> generation strategies.
How's that? Why wouldn't the code generation strategies employed depend
directly upon the processor being compiled for? If there's exactly one
translator per processor, wouldn't things be fairly straight-forward?
> Another problem with your approach is instruction scheduling.
Scheduling is
> in direct relation with register allocation in the sense that they
have
> conflicting goals. The register allocator attempts to decrease the
number of
> necessary registers while the scheduler needs more registers in order
to
> move instructions around. Anyway, obviously scheduling will also have
to be
> done by the virtual assembly translator. The programmer will have no
way of
> affecting the results - the quality of the generated code will depend
on the
> quality of the translator.
I think this might be going a bit far. Unfortunately, I don't have much
theory of compiler technology, nor any real experience with them, so
I'll be hard to justify this. Still, why would I have to I have to
rearrage the code? Couldn't I simply design each Virtual Instruction
very carefully, so that translation is fairly straightforward, and in
such a way that each Virtual Instruction corresponds to a predefinable
instruction or sequence of instructions, or, when that's not possible,
to a predifened compile-time action within the compiler, that would then
affect the production of code, later on? I'm fully aware of course, that
there might well be some very good anwsers to these questions, alas!
>Usually when using assembler one has exactly the
> opposite goal - to be able to manually schedule everything and control
the
> output quality.
I don't see why one wouldn't still be able to. Why couln't the virtual
processor define the way code is made optimal for the virtual processor,
and not try to do any optimization whatsoever? If the programmer coded
something unoptimal(as defined by the virtual processor), then he/she
would probably get rather bad code. Of course, the optimization rules
could be made simple, and/or they could be made to simplify the
translator so that it could be sure to produce better code, by being
able to rely on certain otherwise unreliable features of the source
code(ie: in its organisation, or something like that). Of course, that's
stated in a pretty vague and imprecise way, but would something like
that be _totaly_ unconceivable?!? Don't bother jumping on me if it is,
because, as I said above, I don't know enough about compilers to really
be sure of what I'm saying. I'm not chanlenging you, I'm asking your
opinion!
> Another problem is spilling (loading/storing register contents from
memory).
> Spilling can safely be done by a compiler, but not by the virtual
assembly
> translator. When writing low level code there are cases when there is
> nowhere to spill at all - the stack may not have been setup yet, the
segment
> registers may have undefined values, etc.
What's to stop the programmer (or the translator) from setting that up,
if it's necessary? In what way is an HLL compiler diferrent? And how
would a compiler "safely spill"?
> Unfortunately there is no area
> that will always be available for spilling - especially if you
consider
> multitasking, interrupt code, etc. I can imagine cases when there
should be
> absolutely no accesses to memory whatsoever.
Hmm. On a processor like the x86, I don't see how you can write more
than a couple lines of code without ever accessing the memory. Could you
give me some examples?
> So, when you consider the restrictions imposed by the virtual
assembler it
> is hard to image cases when one would want to use it to manually write
code.
>
> I agree that there is another potential area of application :
distributing
> platform independent code. Although it is considered that UNCOL has
failed,
> we are observing that JVM code is being used in essentially the same
way. It
> seems to me that JVM is too narrowly geared towards Java, so another
> intermediate language could be more suitable common use.
>
> Tzvetan
>
>
Jonathan Neve.
A real assembler's generated code *is* OS-independant, because the assembler
simply specifies which instructions should be generated. It's up to the
programmer to observe any particular software conventions (e.g. linkage and
calling conventions).
There are OS dependencies in how the assembler is used, of course:
(1) The assembler is a program designed to run in a particular environment.
(Consider cross-assemblers, for example.)
(2) The object file format is often dictated by the OS in which the object
program is expected to run. The logical content thereof is however
independent of the OS, and a separate program could blindly translate
from one format to another.
(3) The collection of available or useful system macros (in a macro assembler)
will depend on the intended run-time environment. This often includes
prologue, epilogue and call macros (implementing a particular calling
convention).
Sometimes an assembler has directives that only make sense for a particular
object file format (and binder/linker/loader) -- but these can simply be
ignored (or perhaps re-interpreted) if one wants to use the assembler to
generate code for an unrelated OS.
Many HLL compilers -- like early K&R C compilers -- had these properties too.
OS dependencies were relegated to library routines. ANSI C changed that by
making the standard library part of the language -- useful for many, perhaps,
but unfortunate in other ways. To some extent, it deprived C from its early
status as a high-level machine-independent assembler!
Michel.
> > And your Virtual Assembly Language is going to
> > be ... ? Well, I didn't see anything in your original
> > message that suggested it wouldn't be
> > compiled. And a compiler for Java bytecode is
> > not a highly complex thing.
> ?
> I didn't say that my Virtual Assembly Language
> wouldn't be compiled. On the contrary. But it
> wouldn't be interpreted.
Sorry, that's my mistake - I had meant to write "interpreted" not
"compiled" in the lines you quote above.
Mattehw Huntbach
If you're not impressed with Mashey's "older" historical references
perhaps you should take a look at the Colusas' Omniware approach
that doesn't date quite so far back.
http://www.w3j.com/1/wahbe.165/paper/165.html
There is also a PLDI '96 paper on research done on this system entitled
"Efficient and language-independent mobile programs".
Microsoft assimilated Colusa around that time..... [ the technology
hasn't seen the light of day since. Although perhaps some of it is
in the upcoming .Net ]
Chasing through the refernces of these two papers should provide more
leads.
You may also what to search for papers on "Mobile Code". For instance
a brief query on google.com brought up.
http://www.cnri.reston.va.us/home/koe/bib/mobile.bib.html
The only difference between your proposal and many of the
mobile code folks is they are concerned with how compressible
the opcodes are..... truely universal high speed bandwidth isn't just
over the horizon as many would have you believe.
The Colusa RISC architecture was a modeled more on a generic
version of the popular archs at the time (MIPS, Sparc, Alpha. ..)
In particular their translator does NOT need a super sophisticated
register allocator.
> > possibly with multiple feature sets, and wanted something a little
> > higher-level, but not so high-level that they couldn't get to specific
> > machine features, or couldn't get very good code.
> This is not AT ALL what I'm trying to do! Rather, I want to go the other
> way:
> write assembly language code for a virtual processor, which is designed
> in such
> a way as to make the translation not too damaging to the speed of the
> code, no matter
> which processor it is translated on. I don't want "something a little
> high-level".
You may not intend to do this, but if you want broad architecture
coverage that is what you will end up doing. Unless you stick with
the lowest common denomiator then you will end up with "high level"
transformations on some architectures. For instance, the various
"multimedia" SIMD instructions that PowerPC Altivec, Sparc, and
x86 provide. On architectures without these you'll have to
emulate whatever lowest common denomitor you come up with.
That will be "a little higher-level" than "down to the metal"
for that architecture.
> > is some slightly-above-assembler-language thing that doesn't let
> > them do waht they need to do, in the name of a portability that
> > isn't very useful.
>
> I intend, precisly, to give "detailed control", and "things which aren't
> well expressed by high-
> level languages".
> For example, my virtual processor will provide a very large set of
> virtual registers. Anything that can't be fit
> in the real registers would get placed in memory by the translator. This
> may seem simplistic, but actually
> there's a lot more to it than that, and that part of the language has
> been very well thought out.
This statement seems inconsistant. You provide an abstraction which
provides detailed control. "detailed control" may mean that you
do NOT want any spill code in your inner loop. If your register
allocator manages to plop a spill in the middle of a loop than
won't be true.
IMHO, "detailed control" means that there not some sophisticated
optimizer between me and code that will be executed.
"More registers than you'd ever need" is more of a "ease of programming"
feature than providing the lowest level control. Closer in utility
may be "spill over" register mechanism that some have used. For instance,
see Knuth's new MMIX design.
http://www-cs-faculty.stanford.edu/~knuth/mmix-news.html
P.S. While it wasn't in Mashey's thread, eventually the rubber has to
hit the road the users of this assembly languge will have to
call to the OS to get something done (e.g. I/O ). Perhaps your
primary targets are assembly programs that run on raw hardware
that do no I/O? There could be some glue library that comes
with your translator that translated programs must bind to.
Lyman
Coalescing is assigning the same hardware register to different virtual
registers to. Often it can remove register moves by assigning the source and
destination to the same hardware register.
> [...] When a new variable is
> needed, the programmer will have to know which of the Virtual Registers
> are currently in use. There's no good reason why he/she would use a
> register they had never used before, if it turns out that there's
> another variable that they no longer need
Well, again this is exactly what a good compiler does. It will reuse the
registers as much as possible by coalescing different variables into the
same registers. It calculates the live range of a variable (the range of
code where the value stored in a variable is used) and is able to assign the
same register to two variables if their live ranges do not overlap. It can
do the opposite too - use different registers for one variable.
> How's that? Why wouldn't the code generation strategies employed depend
> directly upon the processor being compiled for? If there's exactly one
> translator per processor, wouldn't things be fairly straight-forward?
Well, this is a more general issue and I admit it is not directly related to
your virtual assembler.
Sometimes the compiler has more information available to it (dependant on
the language semantics) than what can be deduced from the output binary
code. For example in some languages functions are not allowed to have side
effects. The compiler can safely assume that calling a function will not
modify any variables and perhaps optimize accordingly (e.g. preserving
variables in registers across function calls). A language-neutral
translator/optimizer that is looking at the output code generated by a
compiler cannot assume that calling a function has no side effects, so it is
more limited in what it can do.
> I'll be hard to justify this. Still, why would I have to I have to
> rearrage the code? Couldn't I simply design each Virtual Instruction
> very carefully, so that translation is fairly straightforward, and in
> such a way that each Virtual Instruction corresponds to a predefinable
> instruction or sequence of instructions, or, when that's not possible,
> to a predifened compile-time action within the compiler, that would then
> affect the production of code, later on? I'm fully aware of course, that
> there might well be some very good anwsers to these questions, alas!
Scheduling (arranging the instructions in a way that allows them to use more
CPU resources in parallel) has to be done at the low instruction level
looking at a large window of instructions. Even if you separately schedule
the code generated by each virtual instruction, still the combined result
will not be optimal.
One reason for people writing assembly is to be able to schedule small
snippets of code optimally. If they have to rely on the translator to do it
they could as well use C. Like with register allocation there is no reason
why a virtual assembly translator can do better than a C compiler.
(Lately however this doesn't seem so important as many CPUs are able to do
scheduling by themselves.)
> I don't see why one wouldn't still be able to. Why couln't the virtual
> processor define the way code is made optimal for the virtual processor,
> and not try to do any optimization whatsoever? If the programmer coded
> something unoptimal(as defined by the virtual processor), then he/she
> would probably get rather bad code.
The problem is that the programmer will not be able to control the quality
of the generated native code. In some important aspects (register
allocation, scheduling) it will depend almost entirely on the translator.
The translator may as well be able to do a very good job at it but still
there is no reason why it would be better than a C compiler.
> Of course, the optimization rules
> could be made simple, and/or they could be made to simplify the
> translator so that it could be sure to produce better code, by being
> able to rely on certain otherwise unreliable features of the source
> code(ie: in its organisation, or something like that). Of course, that's
> stated in a pretty vague and imprecise way, but would something like
> that be _totaly_ unconceivable?!? Don't bother jumping on me if it is,
> because, as I said above, I don't know enough about compilers to really
> be sure of what I'm saying. I'm not chanlenging you, I'm asking your
> opinion!
Unfortunately, I don't see a way to achieve this. You are welcome to suggest
something new and we will discuss it.
> What's to stop the programmer (or the translator) from setting that up,
> if it's necessary? In what way is an HLL compiler diferrent? And how
> would a compiler "safely spill"?
There are different approaches to spilling dependant not only on the CPU and
OS but also on the context the code is executing in. A compiler assumes the
code will be executing in a "safe" and predictable environment - at least
the stack and the data segment have been setup by the loader and the runtime
library. Often the compiler can even assume that the stack can grow and is
practically unlimited. Assembler is used under more "severe" conditions -
there may be no stack at all for example.
It is conceivable to add directives like ".no_spill", ".spill_to_stack",
".spill_to_memory", ".spill_area_address/size", etc. to the virtual
assembler but I am not sure that they would encompass all the cases. Also
they will add significant complexity to things that are otherwise simple.
> Hmm. On a processor like the x86, I don't see how you can write more
> than a couple lines of code without ever accessing the memory. Could you
> give me some examples?
For example in the PC BIOS there is an area of code (larger than a few
instructions) where the memory controller has not been initialized yet and
no memory at all is available.
Tzvetan
|> In practical terms, this is a big problem on machines such as the x86,
|> which have few registers, but it's much less of a problem on typical
|> RISC machines which have 32 registers. Few functions have more than 32
|> Compilers also do lots of nice things for you that I don't think your
|> pseudo-assembly language is likely to do. For example, compilers often
|> automatically unroll or "software pipeline" loops.
Actually, note that ~32 integer + ~32 floating point registers are typically
simultaneously visible in most RISCs, and in fact, given the loop-unrolling
mentioned above, 32 FP registers can easily be filled, especially in
vector/matrix codes. For typical itneger C code, I'd agree with Bruce,
except when compilers are doing heavy interprocedural inlining and
optimization ... quite a bit of C still is reminiscent of it's PDP-11 heritage.
|> Everything you say is quite sensible. The problem is that you don't
|> seem to be aware of the existing state of the art, and you might want to
|> study that a bit more.
It is sad, but true, that the less you know about the past, the easier it
is to invent wonderful "new" ideas, especially if you haven't actually
yet implemented them and can stay with vague descriptions. This field is
filled with numerous ideas that sounded good, were produced by world-class
experts ... and still didn't work out.
Jonathan Neve
That is roughly what Microsoft has produced with its Common Language
Runtime. It is supposed to give better support for a wider range of
languages than the JVM, including things like Fortran, Haskell, Eiffel
and Smalltalk. Search for "C#" or "C Sharp" for details.
(This supposes a suitable definition of "platform independent", of
course.)
Microsoft also have some interesting research papers on this subject
generally. For example, "Finite State Code Generation" talks about
virtual machines that make code production especially easy. From the
abstract:
GBURG translates the two-page LVM-to-x86 specification
into a code generator that fits entirely in an 8 KB I-cache
and that emits x86 code at 3.6 MB/sec on a 266-MHz P6.
I gather one difference between Microsoft's CLR and Sun's JVM is that the
JVM is designed to be easy to interpret, and Microsoft now believes that
it is OK to require a compilation step.
Dave Harris, Nottingham, UK | "Weave a circle round him thrice,
bran...@cix.co.uk | And close your eyes with holy dread,
| For he on honey dew hath fed
http://www.bhresearch.co.uk/ | And drunk the milk of Paradise."
> (This supposes a suitable definition of "platform independent", of
> course.)
What exactly do you mean by that?
> Microsoft also have some interesting research papers on this subject
> generally. For example, "Finite State Code Generation" talks about
> virtual machines that make code production especially easy. From the
> abstract:
>
> GBURG translates the two-page LVM-to-x86 specification
> into a code generator that fits entirely in an 8 KB I-cache
> and that emits x86 code at 3.6 MB/sec on a 266-MHz P6.
>
> I gather one difference between Microsoft's CLR and Sun's JVM is that
the
> JVM is designed to be easy to interpret, and Microsoft now believes
that
> it is OK to require a compilation step.
> Dave Harris, Nottingham, UK | "Weave a circle round him thrice,
> bran...@cix.co.uk | And close your eyes with holy dread,
> | For he on honey dew hath fed
> http://www.bhresearch.co.uk/ | And drunk the milk of Paradise."
Who's that a quote from?
Jonathan Neve.
--
The manual said: "Requires Windows95 or better", so I installed Linux!
> > That is roughly what Microsoft has produced with its Common Language
> > Runtime. It is supposed to give better support for a wider range of
> > languages than the JVM, including things like Fortran, Haskell, Eiffel
> > and Smalltalk. Search for "C#" or "C Sharp" for details.
the following seems to fill in a few details:
http://msdn.microsoft.com/msdnmag/issues/0900/Framework/print.asp
http://msdn.microsoft.com/msdnmag/issues/1000/Framework2/print.asp
The Microsoft Intermediate Lanuage (MSIL) is slightly higher level than
the concept being discussed here. However, it also wants to be portable
across 32 and 64 bit CPUs.... which is another not previously mentioned
gotcha that makes universal translation with low performance loss difficult.
> > (This supposes a suitable definition of "platform independent", of
> > course.)
> What exactly do you mean by that?
Currently CLR is portable across the following "platforms" (from the
first article)
"...
Execute on many platforms [--] Today, there are many different flavors of
Windows: Windows 95, Windows 98, Windows 98 SE, Windows Me,
Windows NTยฎ 4.0, Windows 2000 (with various service packs), Windows CE, and
soon a 64-bit version of Windows 2000. ... "
Notice the common adjective in those platform names? :-)
Personally, I would say useful platform independence would entail
running on operating systems that aren't all written by the same
company. The only CPU variability is CE and 64-bit W2000.
Unfortunately, I didn't manage to find a detailed descript of MSIL. I imagine
it is tweaked toward C#, but not close as Javabyte codes are tweaked toward
Java.
> > I gather one difference between Microsoft's CLR and Sun's JVM is that
> the
> > JVM is designed to be easy to interpret, and Microsoft now believes
> that
> > it is OK to require a compilation step.
I'd would say that the major differences lie in that the JVM implements
some semblance of security. I'm not so sure about CLR ( the diagram in
the first story looks suspiciously like a proprietary JVM. )
Lyman
Although details at this level are hard to come by, I am sure
CLR will have some semblance of security. The high level
language C# is mostly "safe" (no unchecked casts or pointer
arithmetic, etc), with a few unsafe constructs that have to be
labelled as such. I expect this is reflected in the bytecode,
and that the VM can have the option of trivially rejecting
code which uses "unsafe" constructs, to give a security model
very like Java's.
Whether Microsoft's implementation will be open to peer review
is, of course, another matter.
>> Note that gcc provides something fairly similar to computed gotos
>> (label variables) as an extension to C, C++, and possibly other
>> languages.
>Fortran (g77) doesn't need it - we have ASSIGNED GOTO in the Standard.
The case for a computed goto is that one does not always want
to make the entire list of ASSIGN statements.
--
This address is for information only. I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
hru...@stat.purdue.edu Phone: (765)494-6054 FAX: (765)494-0558
> I'd like to create a Virtual Assembly Language.
As others have already pointed out please take a look at
C-- (http://www.cminusminus.org/) and links found on that site.
Dis Virtual Machine (part of Inferno project) is an interesting
portable VM, whose instruction set tries to similar
existing processors.
http://www.vitanuova.com/papers/dis.html
http://www.vitanuova.com/papers/hotchips.html
Petri
--
<(O)> Petri Kuittinen, also known as Eye, Dj Eye or Peku <(O)>
<(O)> ADDRESS: Postipuuntie 10 A 14, FIN-02600 Espoo, Finland <(O)>
<(O)> EMAIL: e...@iki.fi WWW: http://www.iki.fi/~eye/ PHONE: 09-5472380 <(O)>
~What we cannot speak about we must pass over in silence. -L. Wittgenstein
I am fairly confident in the "ill will" of MSFT in this that they
intend some proprietary "spin" much as Sun had some similar intent
with JVM.
If it were not for a general mistrust of anything MSFT proposes, I'd
think it a _good_ thing to see some further experimentation in the
area of Virtual Machines.
The thing about CLR that is particularly interesting is its explicit
intent to support multiple languages, in contrast with JVM being
designed solely to support one language, namely Java.
The _problems_ with CLR are severalfold:
- Being a MSFT thing, there is cause for suspicion of there being
serious underlying agenda for this to give MSFT power over what is
to be deployed atop it;
- It likely has some _strong_ Win32 biases;
- Blah, blah, blah, limited disclosure of portability issues, blah,
blah, blah...
- It _SHOULD_ be treated as a "Version 0.1 Experimental thing," where
the design is still in flux. Implementing compilers for a number of
different languages on various platforms would show up shortcomings
and thereby allow iteration to versions more powerful and more
suitable to general deployment.
The point here is that there seems to me to be merit to there being a
"lively" set of VM developments so that we may move towards a more
powerful model for VMs, including such things as:
- Supporting garbage collection well;
- Supporting continuations well.
I rather think that more exploration of this is useful; there may be
more abstractions that will prove useful than are presently commonly
implemented.
If CLR exposes a route to people being open to there being more
exploration of VMs, that is a Good Thing.
Of course, if all it exposes is a route to "One VM to Bind Them, and
In The Darkness Rule Them, In The Land of Richmond, Where the
Marketers Lie," well, the conclusion should be obvious :-).
--
(concatenate 'string "cbbrowne" "@" "acm.org")
<http://www.hex.net/~cbbrowne/java.html>
"Life. Don't talk to me about life." -- Marvin the Paranoid Android
re "Art of Assembly"!
--
.hpr - h.-peter recktenwald, berlin
mailto:ph...@snafu.de - http://home.snafu.de/phpr - t: +49 30 85967858
Weitergabe oder Verwendung obiger Angaben zu kommerziellen Zwecken ist
gem. ยง 28 Abs. 3 BDSG untersagt.
david | david...@cognent.com
I always thought it was RISC machine thinking it was a
CISC machine. You have a lot of fancy addressing modes
but you still have to generate a load/store architecture
if you want use stacks ( pop ax, add bx,ax ) or bytes
(mov al,memory, btw al,add bx,ax).
So what happens with a 6800, or 6502 or even cpu like mine, that is
a Memory to Accumulator design. I have 2 accumulators
and 3 index registers, but I don't have register
to register operations other than transfers.
Ben.
--
"We do not inherit our time on this planet from our parents...
We borrow it from our children."
"New 24 bit CPU" http://www.jetnet.ab.ca/users/bfranchuk/index.html
The "safe" subset is large and useful - it can do pretty much everything
you could do in Java, for example. The "unsafe" parts of C# are for
things which would need native code in Java. We can reasonably hope that
unsafe code will be rarely or never used, outside of trusted libraries.
The language makes it easy to separate the "safe" from the "unsafe". This
means the security model can really be better than Active/X. It can
simply reject any untrusted code that uses "unsafe" constructs.
Why?
> - Programs that are now written in real assembly language could be
>written in virtual assembly language, which would make it portable,
>while still providing many low-level features and being very fast.
If it provides low-level features, it's not portable.
>After all, even something as simple-seeming as an if/else statement
>gets compiled into several lines of assembly language which are hard
>to make optimal in a general sort of way.
But that's okay, because every system uses a different optimizer.
>This is of course a classic
>argument in favor of assembly language. The reason it applies here is
>because my virtual assembly language will deal directly with
>registers, jumps, etc.
How?
I think you need to look up "Scott Nudds". The mighty flame wars involving
him cover this.
To summarize: You will not improve on C. If you allow better access to
low-level functionality, programs will no longer be portable. If you maintain
portability, your access to low-level functionality will not be any better.
To understand this, imagine for a moment three architectures. On one, there
are a number of special purpose registers, each of which can only be used
with certain instructions. On another, all registers are generic. On the
third, there are "data" and "address" registers, which have special semantics,
but which are otherwise interchangable (i.e., any data register may be
substituted for any other in any operation).
One of the three archictectures has a single group of branches, which do
either branch if less than, branch if less than or equal to, branch if equal,
branch if greater than or equal, branch if greater, or branch if non-equal;
they always operate on a pair of data registers, and must always jump to a
valeu in an address register.
One of them has only branch if less than, but allows the use of immediate data
or optionally even an immediate address.
One of them has no branch options, but always compares the value in
the "first" data register with the value pointed at by the "first" address
register. Depending on which of *three* states the branch takes, it will
jump to the second, third, or fourth address register. However, a flag allows
you to specify that instead it should jump to the 0th, 1st, or 2nd address
stored at the location in the second address register.
Oh, and on one of them, the next instruction after a branch is always
executed, no matter which way the branch goes.
Please explain how your system would handle
if (*a < b && c > *d) {
++e;
}
on these architectures. Now, compare it to the "long assembly" output a C
compiler would probably produce for each of these systems. Note that, while
the assembly produced for each specific system is doubtless unportable, the
C code works fine on all of them, and in each case, the assembly produced is
substantially cleaner than any "generic assembly" you care to invent.
The belief in a portable assembler seems to stem from a lack of experience
with a variety of genuinely *different* architectures.
-s
p.s.: I stole the various features I described from a variety of different
systems. A couple were even made up, but rest assured, real architectures are
much stranger.
p.p.s.: How does your assembler allow you to optimize effectively for systems
with different limitations on pipelining?
--
Copyright 2000, All rights reserved. Peter Seebach / se...@plethora.net
C/Unix wizard, Pro-commerce radical, Spam fighter. Boycott Spamazon!
Consulting & Computers: http://www.plethora.net/
Peter Seebach <se...@plethora.net> wrote in message
news:39c99954$0$28243$3c09...@news.plethora.net...
It might be useful from an educational standpoint; actually, based on
the remainder of your post, if the gentle reader went further and
built _several_ such assembly languages with different architectural
assumptions, that would address the mistaken notion that there could
be One True Assembly Language.
>>Programs that are now written in real assembly language could be
>>written in virtual assembly language, which would make it portable,
>>while still providing many low-level features and being very fast.
>If it provides low-level features, it's not portable.
True; there is still useful insight available.
Note that Donald Knuth has revised his "MIX" assembly language used
for TAOCP, producing the "RISCy" version, "MMIX."
<http://www-cs-faculty.stanford.edu/~knuth/mmixware.html>
MMIXmasters <http://www.mmixmasters.org/~mmixmasters/> are converting
TAOCP volume 1-3 programs to MMIX...
>>This is of course a classic
>>argument in favor of assembly language. The reason it applies here is
>>because my virtual assembly language will deal directly with
>>registers, jumps, etc.
>
>How?
>
>I think you need to look up "Scott Nudds". The mighty flame wars
>involving him cover this.
>
>To summarize: You will not improve on C. If you allow better access
>to low-level functionality, programs will no longer be portable. If
>you maintain portability, your access to low-level functionality will
>not be any better.
Mind you, if you provide some set of higher level _language_
constructs such as arrays, lists, and garbage-collected structures,
you might be able to do better than C, but that _does_ assume having a
higher level language involved in addition to assembler.
>To understand this, imagine for a moment three architectures. On one, there
>are a number of special purpose registers, each of which can only be used
>with certain instructions. On another, all registers are generic. On the
>third, there are "data" and "address" registers, which have special semantics,
>but which are otherwise interchangable (i.e., any data register may be
>substituted for any other in any operation).
Hmmm... IA-32, "generic RISC," and 68000. Right?
>The belief in a portable assembler seems to stem from a lack of
>experience with a variety of genuinely *different* architectures.
Indeed. Which probably means that there is a lot of merit to the
construction exercise, as it may demonstrate that there truly are
"different" architectures.
--
(concatenate 'string "aa454" "@" "freenet.carleton.ca")
<http://www.hex.net/~cbbrowne/linux.html>
Why are there flotation devices under plane seats instead of
parachutes?
#In our last episode (21 Sep 2000 05:15:01 GMT),
#the artist formerly known as Peter Seebach said:
#>In article <39A4198B...@acm.org>, Jonathan Neve
#<jon...@acm.org> wrote:
#>>I'd like to create a Virtual Assembly Language. That is, a very
#>>low-level language, that uses assembly-like syntax(ie: no control
#>>structures, no OOP, no if/else statements, etc).
#>
#>Why?
#
#It might be useful from an educational standpoint; actually, based on
#the remainder of your post, if the gentle reader went further and
#built _several_ such assembly languages with different architectural
#assumptions, that would address the mistaken notion that there could
#be One True Assembly Language.
#
#>>Programs that are now written in real assembly language could be
#>>written in virtual assembly language, which would make it portable,
#>>while still providing many low-level features and being very fast.
#
#>If it provides low-level features, it's not portable.
#
Not true, it has been done with the Univac1107 Assembler and linker.
Nothing about the machine architecture was built into the assembler.
I used it to assemble code for a plotter by defining its instructions.
Any field of any size could be a general relocatable expression.
Parameter passing to macros was very general.
You could actually do GaussElimination at assembler time.
and assemble only the results.
The only thing it couldn't do was instruction scheduling, at least not
directly.
#True; there is still useful insight available.
#
#Note that Donald Knuth has revised his "MIX" assembly language used
#for TAOCP, producing the "RISCy" version, "MMIX."
#<http://www-cs-faculty.stanford.edu/~knuth/mmixware.html>
#
#MMIXmasters <http://www.mmixmasters.org/~mmixmasters/> are converting
#TAOCP volume 1-3 programs to MMIX...
#
#>>This is of course a classic
#>>argument in favor of assembly language. The reason it applies here is
#>>because my virtual assembly language will deal directly with
#>>registers, jumps, etc.
#>
#>How?
#>
#>I think you need to look up "Scott Nudds". The mighty flame wars
#>involving him cover this.
#>
#>To summarize: You will not improve on C. If you allow better access
#>to low-level functionality, programs will no longer be portable. If
#>you maintain portability, your access to low-level functionality will
#>not be any better.
#
#Mind you, if you provide some set of higher level _language_
#constructs such as arrays, lists, and garbage-collected structures,
#you might be able to do better than C, but that _does_ assume having a
#higher level language involved in addition to assembler.
#
#>To understand this, imagine for a moment three architectures. On one, there
#>are a number of special purpose registers, each of which can only be used
#>with certain instructions. On another, all registers are generic. On the
#>third, there are "data" and "address" registers, which have special semantics,
#>but which are otherwise interchangable (i.e., any data register may be
#>substituted for any other in any operation).
#
#Hmmm... IA-32, "generic RISC," and 68000. Right?
#
#>The belief in a portable assembler seems to stem from a lack of
#>experience with a variety of genuinely *different* architectures.
#
#Indeed. Which probably means that there is a lot of merit to the
#construction exercise, as it may demonstrate that there truly are
#"different" architectures.
#--
#(concatenate 'string "aa454" "@" "freenet.carleton.ca")
#<http://www.hex.net/~cbbrowne/linux.html>
#Why are there flotation devices under plane seats instead of
#parachutes?
Ken Walter
Remove -zamboni to reply
All the above is hearsay and the opinion of no one in particular
Check out
UNCOL -- the Universal Computer Oriented Language -- RIP
Fraser Duncan's excellent little machine independent assembler.
(I'll never forget this founding father of Algol looking at his code
and saying: "its more readable than Algol60"
)
The UNIX Assembler also known as "as".
rbotting at CSUSB edu
Computer Scientist, Ex Sys Admin, Consultant, Researcher, and Reviewer
http://hometown.aol.com/rjbotting/myhomepage/business.html
>Hmmm... IA-32, "generic RISC," and 68000. Right?
Actually, I was just making stuff up, but it wouldn't surprise me if my vague
knowledge of those architectures fed into my example. I don't know any
assembly languages at all. ;)
>Indeed. Which probably means that there is a lot of merit to the
>construction exercise, as it may demonstrate that there truly are
>"different" architectures.
But for that to work, the guy has to have access to at least two, and he
probably doesn't. Or if he does, they're the Pentium and the Pentium Pro.
-s
What about SPIM? It's a MIPS emulator that they use to teach assembly at
many colleges.
--
David Starner - dstar...@aasaa.ofe.org
http/ftp: dvdeug.dhis.org
And crawling, on the planet's face, some insects called the human race.
Lost in space, lost in time, and meaning.
-- RHPS
:-) on [mostly]
Actually, this is a terrific idea, and would probably help the signal-to-noise
ratio here. In fact, maybe we could put part of comp.arch off into
a new newsgroup, such that:
(1) It is dedicated to opinions and proposals, but only of
posters who can convincingly claim to have minimal or, even better,
*zero* relevant experience in the areas of the posted opinions.
[I.e., "No Asm experience" is a good recommendation in this case.]
(2) It is to be avoided by anyone with actual experience,
and in particular, comments like "this idea is at least 30 years
old (references), has been tried many times, and hasn't been successful"
should be considered unacceptable, as they would distract from the
discussions of (1).
:-) off
--
-John Mashey EMAIL: ma...@sgi.com DDD: 650-933-3090 FAX: 650-851-4620
USPS: SGI 1600 Amphitheatre Pkwy., ms. 562, Mountain View, CA 94043-1351
SGI employee 25% time; cell phone = 650-575-6347.