Ideally, I want a programming language that compiles into nice, tight
machine code. Currently I'm using Hitech C, but the code it produces isn't
great, and it's very very big and cumbersome. Something more low level
would suit me.
So, what sort of thing is available? I originally thought that Forth would
suit me, but most Forth systems turn out to be semi-interpreted and not
actually that efficient. Plus, I couldn't find one for the Z80 that didn't
need a vast run-time.
Turbo Pascal is supposed to be pretty good, but I don't know how to make it
produce standalone code.
Any suggestions? I'm willing to learn any decent programming language...
--
+- David Given --McQ-+
| d...@cowlark.com | "Two of my imaginary friends reproduced once ...
| (d...@tao-group.com) | with negative results." --- Ben, from a.s.r
+- www.cowlark.com --+
Writing RISC code with anything other than a high level compiler is
masochism as far as I'm concerned. Particularly on those nasty little
PIC's. Z-80 is one of the most orthogonal CISC architectures I've ever seen
so even a RISC programmer should be able to learn it without too much
trouble.
> Ideally, I want a programming language that compiles into nice, tight
> machine code. Currently I'm using Hitech C, but the code it produces isn't
> great, and it's very very big and cumbersome. Something more low level
> would suit me.
You're not going to get much more efficient than C. Just avoid floating
point data types and the library routines as much as possible, particularly
the floating point math libraries which get very large. Then it will
produce code that is nice and compact.
> So, what sort of thing is available? I originally thought that Forth would
> suit me, but most Forth systems turn out to be semi-interpreted and not
> actually that efficient. Plus, I couldn't find one for the Z80 that didn't
> need a vast run-time.
>
> Turbo Pascal is supposed to be pretty good, but I don't know how to make
it
> produce standalone code.
>
> Any suggestions? I'm willing to learn any decent programming language...
Nothing will be more efficient than C except assembly language. You can
always pretend you're coding on a PIC and just ignore 155 of the 178
instructions the Z-80 offers.
Amardeep
PICs? RISC? Uncontrollable laughter!
The problem I have is that I spend about three-quarters of my time
shuffling registers around to get my values into registers that the next
instruction has an encoding for. I'm used to processors where all the
registers are identical and can be used in all instructions.
[...]
> You're not going to get much more efficient than C. Just avoid floating
> point data types and the library routines as much as possible, particularly
> the floating point math libraries which get very large. Then it will
> produce code that is nice and compact.
Hardly.
For example: Hitech C wants to pass parameters on the stack. Always. This
means that each function needs to contain code to calculate the position of
the frame pointer, because the Z80 doesn't do stack-relative addressing.
And it's optimiser is a bit noddy, so this code gets emitted in every
function, regardless of whether it's used or not.
This means that a function that only uses one or two arguments contains
dozens of bytes worth of argument processing instructions, every time. If
it passed the first few parameters in registers, none of this would be
necessary. Smaller code, faster code.
One of the problems with C is that it's reentrant. The Z80 doesn't do that
sort of thing well, hence the stack frame calculations described above. A
language that didn't have reentrant functions would suit me just as well
and produce much smaller code.
Hell, I'd even consider PLM if I could make any sense of it. (And if it
produced Z80 source rather than 8080 source.)
[...]
> Nothing will be more efficient than C except assembly language. You can
> always pretend you're coding on a PIC and just ignore 155 of the 178
> instructions the Z-80 offers.
You've never programmed a PIC, have you?
--
+- David Given --McQ-+ "The cup of Ireland's misfortunes has been
| d...@cowlark.com | overflowing for centuries, and it is not full yet."
| (d...@tao-group.com) | --- Sir Boyle Roche
+- www.cowlark.com --+
>Turbo Pascal is supposed to be pretty good, but I don't know how to make it
>produce standalone code.
In a very old issue of the german magazine C'T there was an article
about this subject. I do not know the year/number; it must have been
about 1985...1988, IIRC.
If any german reader here has the DVD of the 80ies, (s)he might look.
I do have all issues (about 5 meters) in a storage; so I might get it
out and do a scan.
greetings, Holger
> Turbo Pascal is supposed to be pretty good, but I don't know how to make it
> produce standalone code.
With version 3.0 this is the C option for Com file.
The scanned manual is at:
http://oldcomputers.dyndns.org/public/pub/rechner/epson/~fjkraan/comp/tp30/doc/
Fred Jan Kraan
> I'm trying to write a program for CP/M. I want it to be fast and
> efficient, but I'm not particularly good at Z80 machine code (I'm used
> to RISC; dealing with all... those... weird... registers makes my head
> hurt.)
Hmmm, fast and efficient (small?) .. hard to do both at once, but if you
use a high-level language to get the logic right, then re-code critical
areas in assembly to get the speed, you might come to an acceptable
point.
> Ideally, I want a programming language that compiles into nice, tight
> machine code. Currently I'm using Hitech C, but the code it produces
> isn't great, and it's very very big and cumbersome. Something more low
> level would suit me.
HiTech's C (at least the one they released for free in 1987) is
reasonably efficient, and can emit assembly source. you can break it at
various points to see intermediate stages of optimization. With this,
you can manually optimize to your heart's content.
For another approach, Peter Hochstrasser's Modula-2 compiler (on my web
page) also runs on Z80s under CP/M-compatible systems, can produce
assembly source, and does about as good of a job as HiTech's C compiler,
but from Modula-2 source. I've used it to program everything from a
modem program (with screen-oriented block graphics down to raw IO port
banging) to hand-optimized modules for parts of a compiler for another
language.
BTW, producing a .COM file from TurboPascal locks you into a CP/M Size
(available TPA or larger) based on the system used for compilation. In
other words, if you compile a program to a .COM file on a 60kB system,
you will not be able to run it on a 53kB CP/M system because the program
will write to high memory (from the compile base) and clobber the OS in
the smaller system when run. Also TurboPascal will not (AFAIK) produce
standalone code which runs without CP/M, HiTech and Peter's Modula-2
system can (UZI180, Unix Z80 Implementation for the Z180, on my site is
written exclusively in Hitech C and assembly).
Hal
When trying to use a RISC programming model on a CISC CPU such problems are
inevitable. CISC CPU's use data structures in RAM for operand storage
instead of loading everything into registers before operating on them.
PIC CPU cores are indeed RISC. You seem to be confusing orthogonal
instruction set architecture (all registers having the same addressing
modes) with RISC (register-centric operations with few or no memory
operands). The PIC's are not orthogonal but they are definately RISC.
The Z-80 is somewhat orthogonal in that although the accumulator always must
have one operand any other register (or RAM address) may contain the other
operand for most operations. It is not fully orthogonal because of the
specialized role of certain registers in CISC instructions such as block
operations and indexed memory addressing modes. When using a CISC
architecture it is important to treat the RAM array as an extension to the
register set. The most powerful instructions of the CPU rely on this
paradigm.
> > You're not going to get much more efficient than C.
>
> Hardly.
>
> For example: Hitech C wants to pass parameters on the stack. Always. This
> means that each function needs to contain code to calculate the position
of
> the frame pointer, because the Z80 doesn't do stack-relative addressing.
> And it's optimiser is a bit noddy, so this code gets emitted in every
> function, regardless of whether it's used or not.
Your objection is due to compiler implementation efficiency not C language
efficiency. If you want to avoid the overhead of those function calls use
inline functions or macros and even the dreaded "GOTO".
> > Nothing will be more efficient than C except assembly language. You can
> > always pretend you're coding on a PIC and just ignore 155 of the 178
> > instructions the Z-80 offers.
>
> You've never programmed a PIC, have you?
I most certainly have much to my chagrin. It has about 23 instructions and
one register (W). A real PITA. I'll take the non-orthogonal 8051 over it
in a heartbeat.
No problem, z80 or even 8080. My code generator for the 8080
created code of the form:
lxi h, offset
dad sp
mov e, m
inx h
mov d, m
to access a 16 bit value on the stack. Since the code generator
kept track of what things were pointing to, subsequent accesses
might replace the first two instruction (4 bytes) with just "inx
h" (1 byte), or even nothing to reaccess the same location (to
store, for example). Thus a stack access consumed somewhere
between 3 and 7 bytes of code. No frame pointer needed.
Note that offset was a constant at execution time. The generator
just had to keep track of pushes and pops to compute it.
--
Chuck F (cbfal...@yahoo.com) (cbfal...@worldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!
> For example: Hitech C wants to pass parameters on the stack. Always. This
> means that each function needs to contain code to calculate the position of
> the frame pointer, because the Z80 doesn't do stack-relative addressing.
Yes, it does. You load the sp into either ix or iy, and access the stack
parameters via offsets.
>
> This means that a function that only uses one or two arguments contains
> dozens of bytes worth of argument processing instructions, every time. If
> it passed the first few parameters in registers, none of this would be
> necessary. Smaller code, faster code.
Agreed. But that is true on virtually all modern processors.
>
> One of the problems with C is that it's reentrant. The Z80 doesn't do that
> sort of thing well, hence the stack frame calculations described above. A
> language that didn't have reentrant functions would suit me just as well
> and produce much smaller code.
I never really encountered any problem generating reentrant code for the
Z80.
The Z80 is NOT orthogonal (I have no idea what that other poster was
smoking when he said that). However, there is no real problem generating
code for a non-orthogonal processor, it just means the encoding is more
work. Since the Z80 is a simplier processor than many of the later
processors that were orthogonal, the encoder is fairly compact.
The "C is the most efficient language" part is funny too. My compiler was
beating all the Cs available on the Z80 back in 1980 by a factor of
twice or more. Most of the C compilers for Z80 didn't do register allocation,
whereas mine did (BTW, it was Pascal).
Most compiler writers would not term an accumulator oriented instruction
set as "orthogonal". Accumulator orientation tends to cause lots of
register shuffling to get anything done.
> > Turbo Pascal is supposed to be pretty good, but I don't know how to
make it
> > produce standalone code.
> With version 3.0 this is the C option for Com file.
Prior to doing that at the main menu, you should select go into the
Compiler Options menu by pressing 'O', then press 'C' (to compile as a
COM file) & then 'Q' to get back to the main menu, for where you can
press 'C' to compile (which produces the COM file).
Regards,
Ross.
> I'm trying to write a program for CP/M. I want it to be fast and
efficient,
> but I'm not particularly good at Z80 machine code (I'm used to RISC;
> dealing with all... those... weird... registers makes my head hurt.)
> Ideally, I want a programming language that compiles into nice, tight
> machine code. Currently I'm using Hitech C, but the code it produces
isn't
> great, and it's very very big and cumbersome. Something more low level
> would suit me.
> So, what sort of thing is available? I originally thought that Forth
would
> suit me, but most Forth systems turn out to be semi-interpreted and
not
> actually that efficient. Plus, I couldn't find one for the Z80 that
didn't
> need a vast run-time.
Forth is supposed to be good, but you'll be writing programs which run
under Forth (well some Forths do that, I'm less sure about CP/M Forth
compilers, to the point where I'm confused! ;-) Still Forth can become a
powerful tool for writing programs with.
> Turbo Pascal is supposed to be pretty good, but I don't know how to
make it
> produce standalone code.
Okay, I'll just stop you here. If your looking for highly optimised
code, then Turbo Pascal may not be what you're after (unless you deem
10k a decent size), in comparision to compiled Assembly files it's a
real blowout.
A word of warning, if you're looking for a language which has your
program doing millions of calculations, then Turbo Pascal mightn't be
suitable, unless you doing everyday calculations (which can be worked
out & stored in arrays). TP does remarkibly better when you use Arrays
with values in, rather than giving TP the problems (this works well when
you have a loop & something in it might is calculated over & over with
variants, answers like that can be stored in arrays & a remarkible time
difference is the result).
> Any suggestions? I'm willing to learn any decent programming
language...
You could probably have a look at SmallC (if you think it's decent),
you'd just need to find one which comes with a few routines, the earlier
versions are pretty bare & assembly knowledge is fairly essential.
SmallC translates your SmallC code into assembly, which can be fairly
easily compiled & linked! :-)
Cheers,
Ross.
Up to persuading it to pass parameters in registers? Hmm.
Do you have a source for a decent Small C for the Z80? The versions on the
archive all seem to support the 8080 only.
Of course, what I'd really like is a really nice gcc port.
[...]
> The trick in Z80 is knowing when to pass data in a register or use a
> register pair as a pointer for passing data. The stack while compact
> coding tends to be inefficient for data passsing due to the lack of
> stack relative addressing though it can be faked (cost is in code).
> Either way getting to know the iron is the best way to make you code
> efficient regardless of the compiler.
Oh, yes. What I'm doing now is compiling stuff and then studying the output
to make sure it's doing what I expect.
Z80s are so ubiquitous --- there *have* to be good tools out there for
programming them! I suspect the trick is finding free, open good tools...
--
+- David Given --McQ-+
| d...@cowlark.com | "While I write this letter, I have a pistol in one
| (d...@tao-group.com) | hand and a sword in the other." --- Sir Boyle Roche
+- www.cowlark.com --+
Agreed. I was only trying to point out that it is unnecessary to fill up
the registers with all operands since half of the operands of most
operations may remain in RAM. I suppose "slightly" orthogonal is kind of
like "slightly" dead so I'll happily eat my words.
And somewhere inbetween you have the option to limit the (high)
memory that will be assumed.
yours, Holger
Might it be this one:
rub: Know-how
tit: Pascal in EPROMs
sbt: Kleinstcomputer komfortabel programmieren
aut: Klaus Münter
red: ja
zts: c't
hft: 1985/ 12; S.74
stw: Turbo-Pascal
The Z80 benifits from register allocation in compilers because most operations
tend to use the same factors over and over. For example, many array
accesses involve the fetch of a member, operation on it, then replacement in
the array.
In addition, the Z80 makes external ram operations fairly expensive, since
both a bigger instruction, such as a fully addressed fetch, is needed.
I have found in hand programming the Z80 that 16 bit registers, usually for
holding an address, are needed so badly that using "ex hl,(sp)" to get an
extra "virtual" register on top of the stack can sometimes help. Of course,
I never programmed that trick into a compiler :)
--
+----------------------------------------------------------------+
| Charles and Francis Richmond richmond at plano dot net |
+----------------------------------------------------------------+
>*Holger Petersen* wrote on Wed, 04-02-04 19:10:
>>In a very old issue of the german magazine C'T there was an article
>aut: Klaus Münter
>hft: 1985/ 12; S.74
Yes. Two pages; but the listing (from a needle-printer :-) is not
very readable. With comments in german; as ist the text.
I will scan it on sunday.
Yours, Holger
> Yeah, kind of like the x86 instruction set, which has a lot
> of specialized registers and takes some shuffling around...
> It's hard for me to imagine why any university would have
> an intro to assembly language class and teach x86 assembly.
> IMHO, that is one of the *worst* assembly language around,
> unless you want to look at something like the National
> SC/MP processor. But ISTM that ancient processors like the
> SC/MP were crippled because of lack of real estate on the
> chip.
>
> --
> +----------------------------------------------------------------+
> | Charles and Francis Richmond richmond at plano dot net |
> +----------------------------------------------------------------+
The answer should be terribly obvious: PCs are ubiquitous. Except for
those wierdos with Macs, you can be pretty sure that just anyone entering a
university either has a x86 machine, or access to one. And there are
plenty of simulators for those few that don't. Teaching on a "perfect"
processor is pointless, because it's not real world.
--jc
The point of using something reasonable, like the 68000, is that
it has a more regular instruction set. The student can pick up
the essentials of what assembly language is all about, *without*
having to deal with all the special registers and hoo-haa. Then
if the student has to face the morass of x86 assembly, the student
is grounded in what assembly language is trying to do...which may
*not* be so evident when dealing with all the special register
mumbo-jumbo. IMHO.
Personally, I don't see you peoples problem with the x86. I've been writing
assembly for years, on a number of platforms, from 8051 to 68000 to AM29000
to MSP430 and others. x86 is NOT THAT BAD. True suckage is only seen in
the PIC architecture, or the 8042. x86 gets pretty nice in protected mode.
You can effectively lose the 4:16 segment aspect of programming, which is,
to me, the only real downside of the x86.
Sure, a 68000 maybe be better. So's a PDP-11. But this nearly psychotic
reaction to the x86 arch just says to me that you shouldn't be writing code
on it. Don't like it? Don't use it. Simple philosophy. Works for me and
PICs. And in 20+ years of programming, they're the only microcontroller
I've EVER run across that I truly hate. It's my problem, and I solve it by
not using it.
My advice? Try that with the x86. You'll obviously be happier.
--jc
That would be great if the goal were to give students warm fuzzies about
assembly language. In practice, when you spoon-feed students with easy
technologies (68000 assembly, Pascal, etc.), they go into shock when they come
face-to-face with less sanitized variants (x86 assembly, C/C++, etc.) to the
point where they often can't (or won't) make the transition. Coddling students
isn't doing them a favor. The talented ones aren't challenged and the untalented
ones find out too late that professional macramé was their real calling.
Claudio Puviani
Come on, there is very little left in the 80386 and beyond instruction
sets that is non-orthogonal. The eax register is the original accumulator,
but in name only. There are few operations that must be done only with
that register. These include div and integer division, unsigned multiply,
and eax typically can perform immediates like sub r,i with one less
instruction byte. There are a few instructions like movs that require
the edi and esi registers, but with superscalar processors, rolling your
own sequence that uses other registers is just as fast or even faster.
Now it could be that much of what you are calling the "worst" assembly
language is the monstrosity known as MASM, but most serious programmers
dropped that assembler long ago for gas or nasm, which have untyped
instructions and data. Now I personally like the power PC better, but
that is mainly for archtectural cleanlyness, not orthogonality problems.
> BDS C was one of the faster compilers and produced fairly efficient
> code but the resulting code was anything but pretty. I sitll have and
> use it. If I'm not going to "edit" the assembly code I use it and
> like the performance especially for standalone Z80 apps.
Sure, if you look at any CP/M Compiled Languages, you'd find something
odd about it, wouldn't there be a program out there to pretty the source
code?
Cheers,
Ross.
Obviously, everything is machine code in the end. But it is difficult
and tedious to write in assembler. High level languages are a way to let
you write code faster and easier; but the price you pay is that it will
take more memory and execute slower.
Many compilers will output assembly language, so you can see what it is
doing and hand-optimize it. This can range from easy (eliminate some
unnecessary PUSH/POP pairs) to exceedingly difficult (having to alter
the fundamental model for the way the compiler does things).
Here's what works for me: I start by looking at the problem I need to
solve. What does the program have to do? Then pick a language that makes
it easy to program it. For example, C or Pascal for well-defined data
processing applications, that will move and manipulate data; FORTH for
ill-defined I/O intensive applications; BASIC if you just need to throw
something together in a hurry; etc. Every language has strengths and
weaknesses. Plus, we all have favorite languages that we prefer even
when they aren't the best choice (to a man with a hammer, every problem
looks like a nail. :-)
Once I get the application working, then I can see if it is fast enough
or fits in the available memory. Sometimes, it is just fine as-is!
(Quick; ship it before they want to add any more features!)
But often, I have to go back in and optimize it. Some gains can be had
in the high-level language, by changing the structure or algorithms
used. This works best when you can trade off speed for memory, or vice
versa.
But if you need significantly more speed *and* less memory, I have to
get into the assember level. Certain key routines can be rewritten. In
rare cases, I've had to completely rewrite the program in assembler. But
at least having a working example gives me a basis to do this much more
efficiently.
> I originally thought that Forth would suit me, but most Forth
> systems turn out to be semi-interpreted and not actually that
> efficient.
FORTH typically runs at about half the speed, but has the peculiarity
that I can produce programs with it that are *smaller* than assembler.
This is mainly because FORTH uses tricks that people wouldn't normally
do in assembler. Obviously you can use the same tricks writing in
assembler, but the bookkeeping becomes intractible and errors are
likely.
> Plus, I couldn't find one for the Z80 that didn't need a vast run-time.
Huh? Have you talked to FORTH Inc? With FORTH you use the development
environment to create a word that does your task, then cross-compile
just this word to your target system (which can be a different computer,
with even a different CPU). Only those routines actually used in your
word are compiled.
--
Lee A. Hart Ring the bells that still can ring
814 8th Ave. N. Forget your perfect offering
Sartell, MN 56377 USA There is a crack in everything
leeahart_at_earthlink.net That's how the light gets in - Leonard Cohen
You could have a look at:
z88dk - The z88 Development Kit
z88dk is a z80 C cross compiler supplied with an assembler/linker and a set
of libraries implementing the C standard library for a number of different
z80 based machines. The name z88dk originates from the time when the project
was founded and targetted only the Cambridge z88 portable.
Best Regards
Dennis Gröning
> >Sure, if you look at any CP/M Compiled Languages, you'd find
something
> >odd about it, wouldn't there be a program out there to pretty the
source
> >code?
> As a cpm user and C user as well.... Most CP/M C compilers didn't do
> much opimization and those that did (BDS C did) tended to do some
> ugly looking things. Not to say they didn't work either. What was
> ugly about the BDS C compiler (early versions) was they had a
> seperate parameter stack from the subroutine stack. That made for
> ugly looking code that compiled fast and ran fairly fast.
Hi Allison,
How do you mean "ugly looking things", I felt you were referning to the
way the C code was compiled & translated into perhaps an assembly file
which was messy to look at (& needed a source code beautifier to solve
this issue! :-)
But I suppose, then if you're referning to the end code being all over
the place & hard to translate (like an assembly code which just goes
anywhere! :-), then perhaps that would be bad for anyone trying to
translate it (or perhaps optimise it a little).
> I did at one time have a post optimizer/editor written in TECO that
> was effective in showing redundant code and allowing a hand edit.
> Generally I've never been C fan, used it enough but it was not my
> favorite. For oddball Non-von stuff like PIC, 804x, 8051 asm was
> prefered. The Z80 case is that it isn't that dirty, nor is it text
> book perfect either but, it is functionally efffective. In the scale
> of things many cpus that were prettier like the ti9900, PDP-11
> and a few others were never seen as often as the z80 and it's rival
> the 6502. The only one that most people relate to that is more
> prevelent are the PCs, X86 and many people don't or avoid
> directly programing the iron with those.
> That leads to a point as well. Many cpus in a vacuum are better
> looking then with in their enviornment constraints added. When you
> have IO and memory maps to fight with many otherwise nice looking cpus
> start to look very ugly or can even look presentable depending on the
> other system choices. An example was the XT PC. Back then the 8088
> was well known to many and in a clean systems like multibus based ones
> it was fast but, the PC implentation was not as nice. I still feel
> the bare 8088/86 is really not much better to program than z80
> and it took the 386 to get past the limitations that were the
> significant tie breaker.
In terms of what a Z80 could do & what a 8086/88 could do?
That would explain why Z80's were around much longer than planned,
though the 6502 & enhancements of that also lived for some time in
machines like Apple IIs, C64s, not sure what others would have :-(
> In the end if you really want to program z80 efficiently a good
> macroasm and a well developed library of marcos and "stock"
> code can do very well.
Yes, while as a TP programmer, I don't mind the size of the code, the
speed is perhaps more important which I've resolved using arrays (within
calculaing programs with loops! :-)
Cheers,
Ross.
(...)
> Any suggestions?
Using Google -- Groups -- Avanced search
search for: "Wanted: SAL/80 "compiler"
published the 30 July of 2001
This text should interest you.
(Unfortunately, as usual when I ask for something,
nobody found it. If I don't do it, it seems never
to be done... Pay me a trip to California (say
during 2 weeks) and I am sure that I will bring
back lots of useful CP/M stuff.)
Yours Sincerely,
"French Luser"
>David Given wrote:
>>
>... snip ...
>>
>> For example: Hitech C wants to pass parameters on the stack. Always. This
>> means that each function needs to contain code to calculate the position of
>> the frame pointer, because the Z80 doesn't do stack-relative addressing.
>> And it's optimiser is a bit noddy, so this code gets emitted in every
>> function, regardless of whether it's used or not.
>
>No problem, z80 or even 8080. My code generator for the 8080
>created code of the form:
>
> lxi h, offset
> dad sp
> mov e, m
> inx h
> mov d, m
>
>to access a 16 bit value on the stack. Since the code generator
>kept track of what things were pointing to, subsequent accesses
>might replace the first two instruction (4 bytes) with just "inx
>h" (1 byte), or even nothing to reaccess the same location (to
>store, for example). Thus a stack access consumed somewhere
>between 3 and 7 bytes of code. No frame pointer needed.
>
>Note that offset was a constant at execution time. The generator
>just had to keep track of pushes and pops to compute it.
That's the trick I wish I'd thought of for BDS C.
If speed isn't as important as space, there's this trick that I
actually implemented in BDS C: to access a local off the stack whose
position relative to the stack is with an 8-bit offset, it took a two
byte "instruction"
rst n ; where n was 1-7, depending on which
db offset ; restart vectors are available
Down at restart vector "n" is a little subroutine (it may just have
been a jump vector, since the entire subroutine may not have fit into
8 bytes in order to keep from crowding out the _next_ restart
vector...) to fetch the 8- (or 16-, there were two flavors) -bit
value from the stack frame. (My frame pointer was in BC, grumble
grumble, but it could/should as easily just have been the SP).
-leor
Leor Zolman
BD Software
le...@bdsoft.com
www.bdsoft.com -- On-Site Training in C/C++, Java, Perl & Unix
C++ users: Download BD Software's free STL Error Message
Decryptor at www.bdsoft.com/tools/stlfilt.html
Hi Allison,
> >How do you mean "ugly looking things", I felt you were referning to
the
> >way the C code was compiled & translated into perhaps an assembly
file
> >which was messy to look at (& needed a source code beautifier to
solve
> >this issue! :-)
> No, I meant things like many moves from register to register and pops
> and pushes that could be coded tigher or have a high redundant content
> (safe but slow).
Oh okay.
> >But I suppose, then if you're referning to the end code being all
over
> >the place & hard to translate (like an assembly code which just goes
> >anywhere! :-), then perhaps that would be bad for anyone trying to
> >translate it (or perhaps optimise it a little).
> There is that also.
> Look at CPM, it was written in PLM and there are sections where you
> see:
> Push B
> mov H,B
> mov L,C
> POP H
> To swap the contents of the BC pair and HL pair (example only).
> The reason is PLM has an internal convention for parameter passing
> and HL is generally a 16 bit pointer and BC is often a value(byte or
> word). That also comes from the artifact that in the 8080 (remember
> this is the starting point) the HL pair is basically the default 16bit
> accumulator. This is why getting close to the iron is important as
> you begin to see why some conventions became.
> >In terms of what a Z80 could do & what a 8086/88 could do?
> ??? I'll launch a guess here. To me the 8088 was Intels way of out
> Z80ing the z80 reaching for 16 bits while sticking to the 8080/8085 as
> the base. The result is a machine that often looks like a double wide
> 8080 with a funky MMU. The Z80 with a simple MMU is still cleaner
> and is more orthogonal to my eye. They both suffer from specialized
> registers and instruction that revolve around that. In the end it was
> that 8080-->8088 similarity and translatability that helped launch it.
> Zilog did the Z8000 and while I thought it was better it's not very
> z80. Even intel had a different direction going for 32bits (IAPX432)
> that barely flew before extinction.
I felt you were saying that the 8086/88 & Z80 were on par with one
another & it wasn't until the 80386 came out, that then it was capable
of doing more. Naturally the 8086/88 was 16bit & the Z80 being 8bit,
when I think about it the 8086/88 had more registers, however I was
unsure if you were referning to the types of programs being on par with
one another.
Of course Intel perhaps felt that they needed to develop more powerful
CPUs, since Z80 some advantages over the 8080.
> >That would explain why Z80's were around much longer than planned,
> >though the 6502 & enhancements of that also lived for some time in
> >machines like Apple IIs, C64s, not sure what others would have :-(
> The 6502 was also common in embedded systems and form the heart of
> a lot of modems and the like because of it's shear speed and good
> programability. It was also popular as it was available as an
> economical (cheap and low gate count) gate array library because
> of it's internal simplicity.
> Some cpus like the 8048 and others lived becuase they were ugly
> but useful and cheap to produce.
> The Z80 managed to live long as there were improved versions (Z180)
> and it was powerful enough that if it was too small the next step up
> was substantial. It didn't hurt that early on it was one of the few
> that had direct support for Dram which was an inexpensive way to
> get to large(64k or larger) memory arrays. Designers like easy
> (low cost, simple...) hardware interfaces as much as pretty
> archetecture.
> So the Z80 and 6502 (and 8085) were popular if only because they
> easy to apply, cost effective, usually powerful enough and
> with CPM for the z80/8085, Apple for 6502 well supported creating
> a large pool of designers familiar with them.
> >Yes, while as a TP programmer, I don't mind the size of the code, the
> >speed is perhaps more important which I've resolved using arrays
(within
> >calculaing programs with loops! :-)
> True, once you understood how the language implementation works on
> a given cpu you can make intelligent tradeoffs. The Borland TP and TC
> products also produced fairly compact code in exchange for more rapid
> development and some of the first useful IDE (Integrated Development
> Environment) since the launch of UCSD Pascal. That little detail is
> important as often just getting it done quickly with fewer errors is
> an adaquate offset against more costly memory(ram or rom).
> When you look at context, consider this. Today people use a PC
> that has 120GB of disk, 1 GB of ram, and maybe 3GHZ of 32bit
> processing speed to develop and app on PIC like 16F84A! Back when
> 6502 and Z80 were the game, having 64k of ram, a 2 or 4mhz cpu and
> a floppy (100kb to 300kb range) was a BIG deal.
I'd reckon that People least worry about the size of their programs or
the speed of it, back when it was 386s/486s ruling the roost, that's all
people ever talked about, but since that's a problem long gone with
newer faster & more disk spaced computers, it's hardly a problem.
Cheers,
Ross.
I suspect we could swap war stories about coding for some time.
You can still find some of mine on my page, download/cpm section.
My code almost always was restricted to the 8080 set, because
that's what my own embedded hardware used and I couldn't justify
taking the time and effort to design a z80 cpu board. We really
should have gotten together about 1980 or so. :-) I had Pascal,
you had C.
--
Chuck F (cbfal...@yahoo.com) (cbfal...@worldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!
>My code almost always was restricted to the 8080 set, because
>that's what my own embedded hardware used and I couldn't justify
>taking the time and effort to design a z80 cpu board.
I studied the Z80 set for something I could take advantage of in the
compiler, and came up almost completely empty handed...the few diddly
things I did find that might have aided general purpose code
generation were so meager that they would not have been worth the
confusion their being made an option would have caused (just think
about everyone having to keep track of whether something was compiled
for the correct processor...)
So there are exactly three ways in which the BDS C package took
advantage of Z80-specific features:
1. The movmem function used the Z80 block move (using the parity-bit
set on incrementing A trick, or whatever it was, to auto-detect which
processor was present)
2. The compiler itself used the same trick (it moved big chunks of
stuff around in memory during a compile. Those with IMSAIs and their
flashing LEDs got a real show for their springing for a front
panel...)
3. Everything ran at 4MHz instead of 2 (the only *real* benefit)
>As a cpm user and C user as well.... Most CP/M C compilers didn't do
>much opimization and those that did (BDS C did) tended to do some
>ugly looking things.
Guilty.
> Not to say they didn't work either. What was
>ugly about the BDS C compiler (early versions) was they had a
>seperate parameter stack from the subroutine stack. That made for
>ugly looking code that compiled fast and ran fairly fast.
Elsewhere in this thread I just described how I kept a
base-of-local-stack-frame pointer in BC for use in addressing locals,
rather than just addressing relative to the SP (as a moving target).
But BC's value was copied from the SP upon function entry. _That_ was
stupid, ugly and wasteful of resources (it resulted in my having no
registers left for use as "register" variables), but there were hardly
two separate stacks... that would have been _seriously_ ugly ;-)
-leor
P.S. But I can see how you might have thought that. It would possibly
have seemed more believable that the reality that I had been too
unimaginative to figure out how to free up BC. What's that they say
about the Amazing Talking Dog? "The amazing part is not 'How Well the
Dog Talks', it's that the sucker can talk at all'. Most of the time I
was working on the design of BDS C, I wasn't all that confident it
would actually even work...
Wracking my poor brain on that one, I cannot recall a single place I
ever needed to have code save the SP anywhere, efficiently or
otherwise.
>and a few more load and store via register pairs that were
>more symetrical. The LDIR was a good addition though.
I remember hearing about all the cool new indexed addressing
instructions the Z80 supposedly had, and my disappointment at
discovering they seemed to be limited to loading and storing 8 bits at
a time! I know it was an "8 bit processor", but geez... that was
pretty useless from a C compiler implementor's point of view.
-leor
>
>Allison
Definitely needed, especially for debugging or creating stack
markers. You can see some samples in DDTZ and DOSPLUS25.
The other area that gave 8080s trouble was generalized use of the
i/o ports. The z80 could address through the b register (or was
it c?). I simply created the following:
ina port; 2 bytes
ret; 1 byte
nop; 1 byte
on the fly and pushed two 16 bit words on the stack. Then follow
it by:
lxi h, 0
dad sp;
pchl; calling the stacked routine
... which returns here. pop twice to clean
The only difference between input and output was the single
instruction, so the generator/executor was mostly common code.
Replace follow ... above by ret and it all becomes a subroutine.
Perfectly safe against interrupts too. Wrap it all in push h pop
h and the only register disturbed is a (and flags). That gave me
putport and getport routines in higher level languages.
> With version 3.0 this is the C option for Com file.
> The scanned manual is at:
> http://oldcomputers.dyndns.org/public/pub/rechner/epson/~fjkraan/comp/tp30/doc/
> Fred Jan Kraan
Thank you for doing this. I have recently downloaded version 3
of Turbo Pascal from the Borland museum site and this makes it
a lot easier than trying to revive 20-year old memories! I have
not done anything serious with TP 3 yet but I have enjoyed
playing with it. Programming seems a lot more fun, somehow.
Dave Daniels
And then the keyboard had *another* 8051.
A DECmate II had a slew of them. There was an 8051 in the floppy
controller, an 8051 in the keyboard, and another 8051 in the hard
disk controller. Sometimes it made me wonder why they didn't
just program a fourth 8051 with a PDP-8 emulation and toss the
6120 altogether...
--
Roger Ivie
ri...@ridgenet.net
(Rated a 10 on the Fox Scale of Forth-Hatred)
Handy for multitasking. I did the firmware for a quad VAXBI<->IEEE-488
device using a small multitasking kernel. Development was under CP/M
using M80 and ZSID. For quite a lot of the development time, the only
processor in my VAXBI backplane was the Z80. That's right, I had a
VAXBI CP/M machine. Booted lots quicker than my VAXstation 8000.
Oh yeah. Forgot to mention one of the really cool bits. The Z80 code knew
how to look up user-space addresses in the VAX page table. I spent too
much time with the DR-780 when I was a VAX newbie; it colored my opinion
about how things *should* work.
And why not? Micro$oft deliberately gives out cheap student versions
and looks the other side with a nudge and wink when loads of stuff is
bootlegged so students become totally enmeshed in that sort of
environment. Later their employers face the choice of either buying all
Micro$oft whether they want to or not and often against better
judgement or reeducate new employess at full pay and their expense.
So why should not the same approach be used to entrench good practice?
--
Tschö wa
Axel
> > > Turbo Pascal is supposed to be pretty good,
> > > but I don't know how to make it produce
> > > standalone code.
> > With version 3.0 this is the C option for Com file.
> > The scanned manual is at:
> Thank you for doing this. I have recently downloaded version 3
> of Turbo Pascal from the Borland museum site and this makes it
> a lot easier than trying to revive 20-year old memories! I have
> not done anything serious with TP 3 yet but I have enjoyed
> playing with it. Programming seems a lot more fun, somehow.
Cause you could also buy the manual fairly easily from one of the online
bookstores, price varies depending on condition! ;-)
Cause there's a whole heap of other legal online material like Tutorials
specific to this compiler & Norton Guide turned into Web Pages (even
though it's a bit vague - it does give examples of all the TP commands
in the form of program), but no-ones interested! ;-)
Ross.
> >they go into shock when they come face-to-face with less sanitized
> >variants (x86 assembly, C/C++, etc.) to the point where they often
> >can't (or won't) make the transition.
> And why not? Micro$oft deliberately gives out cheap student versions
> and looks the other side with a nudge and wink when loads of stuff is
> bootlegged so students become totally enmeshed in that sort of
> environment.
Not that I'm a bootlegger, I can understand why people did it, for
example Visural BASIC 5 came as a freebie in one of the books we got.
Can't remember how much the book was, but it came with CD-ROM with this
Interpreted version of VB5, cause it had all the functionability as the
full fledge thing did (except you couldn't compile programs! ;-)
Of course I've seen better languages which were available (at the time)
which a student discount version, but like all things they were
languages beyond the scope of that course! ;-)
Cheers,
Ross.
Hi Allison,
> >I felt you were saying that the 8086/88 & Z80 were on par with one
> >another & it wasn't until the 80386 came out, that then it was
capable
> >of doing more. Naturally the 8086/88 was 16bit & the Z80 being 8bit,
> >when I think about it the 8086/88 had more registers, however I was
> >unsure if you were referning to the types of programs being on par
with
> >one another.
> They were more or less, especially at the same clock speed as
> most common instructions needed about the same number of cycles
> to ececute. The 16bit aspect of the 8088 was not that significant as
> the z80 had a fairly good 16 bit capability. What both didnt' have
> was the abilitiy to move around in the 1mb space without crutches.
> In reality the 8088 was limited to not enough of everything and it
> was the jump to 32 bits that pushed it over the hump. Graphics want
> SPACE and a cpu that can only address a contigious linear space of 64k
> (z80 and 8088/6) is limited. One might argue the x286 was the tip of
> the hump but it was only a bridge to the 386.
I have seen some areas in assembly where some 286 instructions were
added which took more code on 8086/88. Some would also argue it took
programmers some time to write programs which took full advantage of the
386! :-)
> >I'd reckon that People least worry about the size of their programs
or
> >the speed of it, back when it was 386s/486s ruling the roost, that's
all
> >people ever talked about, but since that's a problem long gone with
> >newer faster & more disk spaced computers, it's hardly a problem.
> Not really. It's obvious that itainium (64bit intel) and Alpha were
> trying to solve the "space" problem. Disk space, well thats for
> movies and music if you see what they are loaded with. <g>
Not familiar with the Itainium & Alpha, I has it a guess, that they are
relatively new processors?
Cheers,
Ross.
>And why not? Micro$oft deliberately gives out cheap student versions
>and looks the other side with a nudge and wink when loads of stuff is
>bootlegged so students become totally enmeshed in that sort of
>environment. Later their employers face the choice of either buying all
>Micro$oft whether they want to or not and often against better
>judgement or reeducate new employess at full pay and their expense.
>
>So why should not the same approach be used to entrench good practice?
Wow, how a very interesting discussion, which i enjoy reading, can be turned
into Microsoft bashing ....
>In article <6seb20tvn2p0dbdhj...@4ax.com>, Leor Zolman wrote:
>> On Sun, 08 Feb 2004 04:14:55 GMT, nos...@nouce.bellatlantic.net wrote:
>>>
>>>On the whole the only instruction that the z80 had that really helped
>>>were the ability to save the SP (8080 you had to clear the HL and add
>>>sp to it)
>>
>> Wracking my poor brain on that one, I cannot recall a single place I
>> ever needed to have code save the SP anywhere, efficiently or
>> otherwise.
>
>Handy for multitasking.
Didn't mean to imply that all these Z80 ops weren't useful, just that
I didn't have a use for them in BDS C. In fact, about the closest I
came to multitasking with CP/M was when I fired up the assembler to
build the compiler and then went down to the kitchen for lunch.
-leor
An indication of the extent of the Micro$oft octopus.
Sure. You could take a subjective philosophical stand and teach students only
the easy technologies like Java. I'm sure you'd feel great thumbing your nose at
a indifferent mega-corporations while your students stood in unemployment lines
because better trained students got all the jobs.
Since I interview about 75 software engineering candidates a year, I can give
you a pretty good realistic perspective. People who put the effort into
mastering the difficult and less pleasant aspects of the field stand a good
chance of being hired. People who come in with the wrong experience or lofty
ideals and brain-dead assertions like "I don't need to know because the compiler
takes care of it" or "speed doesn't matter because computers are getting faster"
just get a doughnut and a handshake.
Claudio Puviani
Perhaps your development on the compiler would have been faster
paced...if you had written you *own* assembler first.
"It's not that they do it well, it's that they do it at all."
The Itanium is well known to processor historians as the Intel iAPX432
successor for the 20th, no make that 21st, well maybe the 22nd century...
Jack Peacock
>Leor Zolman wrote:
>>
>> [snip...] [snip...] [snip...]
>>
>> Didn't mean to imply that all these Z80 ops weren't useful, just that
>> I didn't have a use for them in BDS C. In fact, about the closest I
>> came to multitasking with CP/M was when I fired up the assembler to
>> build the compiler and then went down to the kitchen for lunch.
>>
>It would *not* have taken so *long* to assemble yuor compiler...
>if the assembler kept the symbol table some other way than
>using a sequential search. Is a hash table *that* hard to
>program??? :-(
>
>Perhaps your development on the compiler would have been faster
>paced...if you had written you *own* assembler first.
It got faster after I got my first hard disk (an 8" Morrow M10, 10
MB, $3600, shook the room on spin-up). I think before that I may have
been just a little bit I/O bound...
-eor
Gee, might that be why it's broken? <grin>
Or if he had eaten quicker lunches? :-)
>Leor Zolman wrote:
>>
>> [snip...] [snip...] [snip...]
>>
>> P.S. But I can see how you might have thought that. It would possibly
>> have seemed more believable that the reality that I had been too
>> unimaginative to figure out how to free up BC. What's that they say
>> about the Amazing Talking Dog? "The amazing part is not 'How Well the
>> Dog Talks', it's that the sucker can talk at all'. Most of the time I
>> was working on the design of BDS C, I wasn't all that confident it
>> would actually even work...
>>
>ISTM that Samuel Johnson had a quote about women and public speaking:
>
>"It's not that they do it well, it's that they do it at all."
That got me curious...this seems to be one of the more credible
citations I found via Google, citing Samuel Johnson's gibe against
women preaching in the 1770s:
"Sir, a woman's preaching is like a dog's walking on his
hinder legs. It is not done well; but you are surprised to
find it done at all."
So, there _was_ a dog involved...but no, it wasn't doing the speaking.
-leor
The Alpha is not dead. Intel got it (yes, they did). Parts of its internals
will doubtless show up later in Intel future products.
Z80 instructions that I would sorely miss if having to code in 8080:
LDIR/LDDR
- sooo useful for block memory copies/moves
LD (addr),reg16/LD reg16,(addr)
- so much easier/quicker/more logical/side-effect free than
PUSH BC:POP HL:LD (addr),HL and you can store/fetch the SP
LD reg8,(XY+dis)/LD (XY+dis),reg8
- almost essential for control block manipulation. So much easier
to do LD (IX+2),E:LD (IX+3),D:LD (IX+4),0:LD (IX+5),0
Everything else would be an irritant for not being there, but I
could get by without.
--
JGH - www.mdfs.net
I can *so* agree with that. When we finally got on to C in my third year at
University I was so angry that I had wasted two years pissing about with
'toy' languages when I realised that I could have gone straight on to C
two years earlier, especially as I'd left school with rudimentary C
programming skills and just needed to be taught cleaner and more
disciplined programming techniques.
Year 1, Semester 1 - Karel The Robot
Year 1, Semester 2 - Pascal
Year 2, Semester 1 - Modula-2
Year 2, Semester 2 - More Modula-2
Year 3, Semester 1 - C
I can see the logical progression for teaching empty vessles with no
knowledge or concept of programming, but they should have dropped me
straight into Year 3.
(If you're interested, you can see some of what I did here:
y2-s1: http://www.mdfs.net/User/JGH/Progs/FromUni/
y2-s2: http://www.mdfs.net/Software/PDP11/Assembler/ )
--
JGH - www.mdfs.net
That reminds me. There was an article (in Byte magazine?) many years ago
that compared the instruction set usage for various CPU. The author had
analyzed programs for the 8080, Z80, 6502, 8088, 286 (and perhaps a few
other CPUs; I don't remember).
The interesting point was that there were many instructions that were
rarely used, and quite a few that were NEVER used. And, the more complex
the CPU, the more instructions that fell into this never-used category.
The 6502 and 8080 had the broadest range of instruction usage, while the
286 was the worst. The author concluded that programmers tended to learn
or use only a subset of the CPU's instruction set, or used compilers
that never used certain instructions.
I know from personal experience that a lot of CP/M code uses only 8080
opcodes, even when the software was sold explicitly for a computer with
a Z80. But it is conceivable that the author might someday want to sell
it for a computer with an 8085 (the Zenith Z-100, for example).
I'm curious. What would happen today if you scanned modern programs and
prepared a histogram of instruction set usage? Are the added
instructions of a Pentium actually being used? Or is code still being
written as if it were running on an earlier CPU?
One example was an interrupt handler. Some program is running, with its
stack somewhere with an unknown amount of stack space available. An
interrupt occurs. The interrupt handler can't depend on having enough
stack space to push all the registers it needs to save. So, the fast and
safe thing to do is to have the interrput handler:
1. save the stack pointer
2. create its own stack
3. save registers
4. perform its operations
5. restore registers
6. restore stack pointer
7. return
In general, the instruction set benefits of the Z80 were minor compared
to its hardware advantages. It ran off a single 5v supply, had a simple
TTL-compatible single-phase clock, built-in refresh for dynamic memory,
much faster and more powerful interrupt handling, better I/O
capabilities, non-multiplexed address/data/control pins, and much more.
The instruction set additions were largely made to support these extra
hardware features.
agreed.
> LD (addr),reg16/LD reg16,(addr)
> - so much easier/quicker/more logical/side-effect free than
> PUSH BC:POP HL:LD (addr),HL and you can store/fetch the SP
only useful for BC. for DE use:
xchg; lhld <addr>; xchg
> LD reg8,(XY+dis)/LD (XY+dis),reg8
> - almost essential for control block manipulation. So much easier
> to do LD (IX+2),E:LD (IX+3),D:LD (IX+4),0:LD (IX+5),0
some indexing subroutines fill this need, and allow for indexing
by multiples of 1, 2, 4 with no fuss.
index4: add a,a
index2: add a,a
index: add a,l
mov l,a
rnc
inr h
ret
mvi a,value; call indexN; mov a,m ...
Prior to the 90's, CPUs tended to be designed for the human programmer and for
usage patterns that weren't yet clear, so the instruction sets tended to reflect
what the engineers thought might be useful for those humans.
Today, almost all code is machine generated and the usage patterns are well
known. When instructions are added to a CPU, they're added to improve a
particular usage pattern and there's every expectation that they'll be used once
compilers catch up. In some cases, as with SIMD, the need is so great that usage
precedes the compiler updates.
Given that, I'd expect that a profile of the instruction usage of recent
applications would show that older, human-friendly instructions are growing into
disuse while newer, compiler-friendly or performance-driven instructions are
used most.
It would be nice if someone less lazy than I would verify that. ;-)
Claudio Puviani
Some of the 386 instructions are being "depreciated" by Intel because their
performance is not better than an equivalent sequence of ordinary instructions.
AMD64 has been actually *eliminating* these instructions. They can do that
because, by definition, the 64 bit mode needs a recompile, and that compiler
can drop any remaining usage.
I will try to get a picture of the board and post it on
the web...
What people failed to realize was that they were generally slower
than the equivalent absolute jumps, and that their range was
strictly limited. What Zilog failed to provide was a PC relative
range unlimited jump, although you could build one from a restart.
Same thing with PC relative subroutine calls. I used the "synthesize some
useful instructions with RST" trick, too (relative jumps and calls, as well
as integer multiply and some other things - well before the Z180 came out),
but real PC relative subroutine calls would have been very useful to have.
Which is why I so liked the 6809, but I am *not* going to re-ignite the Z80
vs. 6809 flamewars here! ;-)
I always used absolute jumps disk read/write loops, even if I had enough
clock cycles to allow relative jumps. I always felt better when I had room
to spare. Not to mention the ability to run on more systems (such as 8085's
etc).
> > I have seen some areas in assembly where some 286 instructions
> > were added which took more code on 8086/88. Some would also
> > argue it took programmers some time to write programs which
> > took full advantage of the 386! :-)
> That reminds me. There was an article (in Byte magazine?) many
> years ago that compared the instruction set usage for various
> CPU. The author had analyzed programs for the 8080, Z80, 6502,
> 8088, 286 (and perhaps a few other CPUs; I don't remember).
That would seem like an interesting read.
I know on my Amstrad (Z80) that it had some instructions which most
monitor programs wouldn't know about. Most of them I believe were
registers (XH was one & I think XL was another) & you could use them for
storing values in (but I think it was just the Accumulator you could use
it in conjunction with). Other Z80 machines I would imagine use this
perhaps (after all that's where these instruction come from).
> The interesting point was that there were many instructions
> that were rarely used, and quite a few that were NEVER used.
> And, the more complex the CPU, the more instructions that
> fell into this never-used category. The 6502 and 8080 had
> the broadest range of instruction usage, while the 286 was
> the worst. The author concluded that programmers tended to
> learn or use only a subset of the CPU's instruction set, or
> used compilers that never used certain instructions.
Which is why having the Opcodes was essential to use those instructions.
> I know from personal experience that a lot of CP/M code uses only
> 8080 opcodes, even when the software was sold explicitly for a
> computer with a Z80. But it is conceivable that the author might
> someday want to sell it for a computer with an 8085 (the Zenith
> Z-100, for example).
It's natural that they have looked at 3 processors & written it in terms
of thinking about portability. Perhaps it was a shame Borland didn't
take the same approach, but I guess there were certain issues which made
it difficult to port to 8080.
> I'm curious. What would happen today if you scanned modern
> programs and prepared a histogram of instruction set usage?
> Are the added instructions of a Pentium actually being used?
> Or is code still being written as if it were running on an
> earlier CPU?
I wouldn't be so sure, occasionally I've downloaded a monitor based
program at Simtel which works well with 8086/8088 programs, but
struggles becuase it comes up against instructions it doesn't know.
Cheers,
Ross.
DEC was testing VMS and TRU64 UNIX on IA64 pre-release chips and emulators as
early as 1999, and I suspect that effort started in 1998. When I was at [bank
name omitted] in 2001, we were given an Itanium box to evaluate performance.
This isn't an exhaustive history by any stretch of the imagination, but at least
it gives specific reference points in time. Since I'm ideologically opposed to
EPIC, I didn't follow the evolution of the IA64 very closely.
Claudio Puviani
I wrote one relative program, completely relocatable, in Z80, not
using restart, for the fun of it. I never tried it again. After that,
I decided it was a lot easier to dynamically relocate programs
based on a table of program addreses (like .dll or cp/m gensys).
Basically, the program used "djnz" to get past the 128 jump limit.
jr over ! as required
relay1:
djnz relay2
! code to be executed
The idea was you loaded up the C register with the number of "relay
jumps" you wanted to perform, then jumped to the nearest relay
point. The CPU would then pass through C number of relay
jumps, and drop out at the proper location.
It was a nightmare. Never again.
Why are you "idealogically opposed" to EPIC ?
Just curious.
I know I'm getting old when I find course textbooks to be enjoyable reading!
I was just forced to buy
Computer Architecture: A Quantitative Approach
by John L. Hennessy, David A. Patterson, David Goldberg
and despite being the required testbook for a class,
it's a rather good book, explaining the evolution of processors,
RAM, how they work together and such.
The book elaborates that design is mostly evolutionary, not revolutionary.
It's not just people's resistance to change, but other pressures such as
- price/performance
- where's the performance bottleneck now
For a long time, RAM was so expensive that it was the most precious resource.
Code compaction was essential.
I remember early machines gave errors for "unaligned data"
(accessing a word that's not on a word boundary).
Then later machines had byte-oriented access
(albeit with a time penalty for the address-addition
instead of just or-ing the low bits).
Now that RAM is cheap and the memory access is the bottleneck,
it's back to word aligned access (particularly for RISC CPUs).
I'm lending the book to a friend so I can't cite it specifically,
but I kinda agree with the original assertion.
I have "warm fuzzy feelings" from using IBM system 360/370 assembler
since the instructions did a lot.
Assembler programs assembled and ran in seconds
as compared to minutes for compiled languages.
I worked at Concurrent Computer Corporation, maker of real time systems
when they were converting from their microcoded CISC CPU
[which was definitely "inspired" by the IBM 360]
to the MIPS R3000 RISC.
Internal classes were held which clearly showed why it was a good move
and after it all, I was truly convinced and converted. To oversummarize:
- RISC CPUs were already faster in speed than CISC
and the gap was widening
- time-to-market is increased with RISC since there's less custom
development with microcode and easier to interface
- hand coding RISC is nearly impossible since there are too many details
to remember such as delayed branch, settling condition codes, etc.
Even hand-written assembler depended on the software compiler/assembler
to optimize properly.
As to guessing wrong about the needs of the CPU and architecture,
everybody was guilty. AT&T tried the "CRISP" C-risc CPU
with instructions specifically to help C calling structures
and constructs typical for the "C" programming language.
Sun's Sparc RISC CPU is still used only by them despite the clever
overlapping register "windows" to assist fast function calls.
Then there are performance bottlenecks.
Memory access too slow? add cache.
Execution too slow? use pipelineing, add execution units
>Today, almost all code is machine generated and the usage patterns are well
>known. When instructions are added to a CPU, they're added to improve a
>particular usage pattern and there's every expectation that they'll be used once
>compilers catch up. In some cases, as with SIMD, the need is so great that usage
>precedes the compiler updates.
Ah, someone's doing their homework!
SIMD: Single Instruction Multiple Data is an old old concept
when it was called a vector processor (yes, way way before MMX).
But there are ways to expand upon that
- very wide instructions: instead of dynamic ILP
(Instruction Level Parallelism: finding instructions on-the-fly
that can be run in parallel without tripping over each other),
just have a W I D E instruction handling all the execution units,
so it's pre-determined that there's no "data hazard"
(one instruction reading data before it's completed from another, etc.
To read more, look up RAW/WAR/WAW hazards:
read-after-write, write-after-read, write-after-write).
- DSP (Digital Signal Processors) are allegedly notorious for
defying compilers since they have so much to keep busy at once!
>Given that, I'd expect that a profile of the instruction usage of recent
>applications would show that older, human-friendly instructions are growing into
>disuse while newer, compiler-friendly or performance-driven instructions are
>used most.
Another data point to confuse the issue: embedded systems vs. non-embedded.
Embedded systems emphasize cost, power-consumption (battery lifetime)
and performance over development cycles.
That's probably why portable equipment (PDAs, Cell phones, etc)
have different CPUs from PCs.
Restarts were too precious to waste on CPU extensions. 0 was
gobbled for initialization, and 7 was reserved for debugging. The
rest were used (in my systems) for prioritized interrupts.
There was really no need for location independant code in most
cases. The Page Relocation Table or PRL technique could relocate
anything in place, at the cost of 1/8th + 256 bytes load module
expansion. This was used for such things as DDTZ or various RFXs
(in CP/M). I published code to automate the generation of such
packages, and I think it is still there in the DDTZ release.
PRL: Page Relocatable
RSX: Resident System eXtension.
Ive gone on to learn 8080, Z80, HP2100, 80x86 assembly, and have written
a lot of FORTRAN and C since then, along with some Pascal.
C++ represents a learning cliff thats still largely before me.
Dave
"J.G.Harston" <j...@arcade.demon.co.uk> wrote in message
news:73210e4c.04020...@posting.google.com...
Yes, I've looked at that.
The *old* z88dk had a compiler based on lcc. It generated nice code ---
parameter passing in registers, reasonable optimisation as far as I could
tell --- but was so buggy it fell over all the time. Hence the 'as far as I
could tell'; I never actually made it compile any non-trivial programs.
Plus the license is hellish.
The *new* z88dk has a compiler based on sdcc. It's been a little while
since I've looked at this, but the code generated by sdcc is pretty lousy.
Very little optimisation, plus for a lot of things it prefers doing things
in a roundabout way that uses up bytes and registers. Hightech C is
better.
I might give sdcc another look, though; it might have improved.
--
+- David Given --McQ-+ "Why should we put ourselves out of our way to
| d...@cowlark.com | serve posterity? For what has posterity ever done
| (d...@tao-group.com) | for us?" --- Sir Boyle Roche
+- www.cowlark.com --+
With the exception of RST-0, RST-7 I tended to use them only for
extensions(only RST-2(ret+byte, extended instruction register oriented)
and RST-4(RST+3byte, with word pointer to parameter(S)). My hardware
(those that ran with interrupts) all used the zilog mode 2 with 8bit
address insert or mode 0 with Call address16 usually the latter as it
was easy enough to do and easier to place address where I wanted it.
This was for cases where I was working at system level and mostly for
closed apps as public versions had to be more careful of usage of
resources to be portable.
Allison
using google
The whole idea behind EPIC is that a smart compiler will pre-schedule the
instructions rather than having the CPU itself try to do it. Looks good at first
glance, but then reality quickly intrudes. Across a family of EPIC chips
(versions and models) or across variants from different vendors, optimal
scheduling will change significantly since such things as instruction timings,
number of cores, pipeline sizes, etc. will vary. This means that when you build
a program, the compiler will optimize it for a specific model of processor and
other processors, while still able to run the code, will execute it
sub-optimally.
Scenario 1: you own an EPIC box. You get/write an application that is optimized
for it. You upgrade the box. Now, your application runs much slower than it
should because its instruction scheduling is based on different specs. You can
either rebuild it (if you have the source AND a compiler that knows about your
new EPIC chip) or live with the mediocre speed.
Scenario 2: you run an IT shop and after a while, you end up with a large number
of different EPIC systems deployed. You need to deploy an application to them.
Do you (a) rebuild that application for every single model in your organization
or (b) optimize for one version and let everyone else run suboptimally?
Remember that misscheduling things would have about the same damping effect as
inserting wait states, and we've all seen the difference that can make.
The money that went into EPIC would have been better spent improving dynamic
schedulers, as far as I'm concerned.
Claudio Puviani
Then you weren't using an 8080 or 8085 :-) I was.
This isn't new to EPIC; the same general debate has been a minor schism in
the RISC world for decades, first applied to pipeline hazards. One faction
argued that the compiler can detect pipeline hazards and thus code around
them; the other faction wanted to add forwarding logic so the CPU. History
has pretty much decided that transistors are cheap and no one sweats
pipeline hazards much any more.
Personally, I sit in both camps for parallel scheduling. I think the
compiler can see a lot more of the code and ought to be able to make some
good global scheduling decisions. I'd also prefer the dynamic instruction
scheduler to exist and make good *local* decisions to keep the pipelines
filled. As the number of execution pipelines grow, I can understand the
hardware guys reluctance to have full n^2 dynamic schedulers.
Kelly
>"Claudio Puviani" <puv...@hotmail.com> wrote in message
>news:CO6Wb.4481$rv1.2...@news4.srv.hcvlny.cv.net...
>> The whole idea behind EPIC is that a smart compiler will pre-schedule the
>> instructions rather than having the CPU itself try to do it.
>
>This isn't new to EPIC; the same general debate has been a minor schism in
>the RISC world for decades, first applied to pipeline hazards. One faction
>argued that the compiler can detect pipeline hazards and thus code around
>them; the other faction wanted to add forwarding logic so the CPU. History
>has pretty much decided that transistors are cheap and no one sweats
>pipeline hazards much any more.
Exactly. And in the case of things like load delays, it was clear to
everyone except academics that memory hierarchies would continue
to evolve, and binaries that enshrined a particulars implementation's
latencies were a bad idea.
The same problem beset VLIW machines, which ran code tailored
for them beautifully, but if a larger one were built, with more execution
units, the old code got no benefit.
EPIC was designed to fix this problem by substituting the fixed-size
"parallel dispatch" container with variable numbers of "bundles"
with embedded "punctuation" that indicates how many bundles can
be dispatched in parallel without hazards. The bundling templates
also allow for more efficient use of the slots in various kinds of code,
improving code compaction over VLIW.
>Personally, I sit in both camps for parallel scheduling. I think the
>compiler can see a lot more of the code and ought to be able to make some
>good global scheduling decisions. I'd also prefer the dynamic instruction
>scheduler to exist and make good *local* decisions to keep the pipelines
>filled. As the number of execution pipelines grow, I can understand the
>hardware guys reluctance to have full n^2 dynamic schedulers.
The N^2 cost of scheduling is decisive, from an evolutionalry standpoint.
As silicon continues to support more computational units, N must grow,
and we would soon reach the point where more silicon is spent doing
dynamic scheduling than doing useful computation.
The EPIC approach allows the best of both worlds--the compiler uses
its huge instruction "window" to build the best static schedule it can
(which is phenomenally good for loops), and the silicon can later do
dynamic scheduling of the hazard-free dispatch groups of bundles
(as the number of simultaneously dispatched bundles grows).
Initial implementations of Itanium only dispatched two bundles
(six operations) simultaneously, but four-bundle and more versions
will surely come with increased integration density.
Compilers properly compile to a target of "N" simultaneous bundles,
marked with punctuation, and an implementation chews off as many
simultaneous bundles as its resources permit.
Now, the typical concurrency a compiler can target often exceeds
the two bundles initially implemented, so there is little need for
dynamic scheduling. Later, when the chip resources permit more
simultaneous bundles than a compiler typically finds, dynamic
scheduling can help find additional opportunities. This is particularly
true in non-looping, function-call-rich code.
-michael
Check out amazing quality sound for 8-bit Apples on my
Home page: http://members.aol.com/MJMahon/
I try to ensure that all my back-end code that I write in Z80 that
calls other code (such as CPM's .COM load-and-execute code) defines
a register as having the entry address to make the above task easier.
ie, something like:
LD HL,(addr)
JP (HL) ; HL defined as holding entry address
or
PUSH BC
RET ; BC defined as holding entry address
This was particularly useful when I was coding on the ZX Spectrum as
USR x entered code with BC=x, and I extended this to be the defined
entry state for CALL x and RUN "machinecode". Then the start of the
code could if it need do things like:
ORG 0
; On entry: BC=entry address
LD HL,table
ADD HL,BC ; HL=address of table
etc...
.table
One of these days I'll finish extracting all that code from my
microdrive catridges and upload it...
--
JGH - www.mdfs.net
> > z88dk is a z80 C cross compiler supplied with an assembler/linker and a
set
> > of libraries implementing the C standard library for a number of
different
> > z80 based machines. The name z88dk originates from the time when the
project
> > was founded and targetted only the Cambridge z88 portable.
> The *old* z88dk had a compiler based on lcc.
No.
> The *new* z88dk has a compiler based on sdcc.
No.
http://www.z88dk.org/old/history.html:
"Pre-release v0.1 August 1998
First release of the Z80 compatible Small-C compiler based on the zcc v0.96
sources by Ken Yap."
http://www.z88dk.org/old/zcc.html:
"The Compiler is based upon the Small C/Plus compiler which has long history
but can be basically be attributed to in chronological order: Cain, Van
Zandt, Hendrix and Ronald Yorston. This last person in particular developed
the compiler very considerably beyond the original specification for Small C
set down by Ron Cain. James R. Van Zandt modified it to include floating
point using floating routines originally written by Neil Colvin."
BR
Dennis Groning
Using the 16bit index registers as 8bit high and low registers. Allowable
in almost all non-CB and non-ED operations that would otherwise use L or H:
LD A,IXL
ADD A,IYL
LD IXH,B
CP IXL
LD IYH,90
INC IYL
See http://www.mdfs.net/Docs/Comp/Z80/OpList for a full list.
You're right in that most monitors/disassemblers would list these as:
FD ??
4C LD C,H
--
JGH - www.mdfs.net
I once wrote a PRL relocator intended for use on the Spectrum. You assemble
it to a Hex file, and use HEXPAT (or similar) to overlay it on the first
256 bytes of the PRL file. Then on the Spectrum, load the result at address
xx00h and run it at xx06h.
Here it is:
;
ORG 0106h
;
;Enter with B = high byte of load address. Program at (B+1)*256.
;PRL at B*256. Entry point at B*256+6.
;
PUSH IX
LD C,0 ;BC->PRL header
PUSH BC
POP IX ;IX=base page address
LD L,C
LD H,B ;HL=base page address
LD C,(IX+1)
LD B,(IX+2) ;BC=length of program, bytes.
INC H ;HL=start of program image.
LD D,H
LD E,L ;DE=start of program image.
ADD HL,BC ;HL=address of relocation bitmap.
PUSH DE ;DE=address of program start.
LD E,D
DEC E ;E =high byte of program address - 1.
PUSH HL ;HL=address of relocation map; stack it.
LD H,E ;H =offset to add to 0100h.
LD E,00H ;DE=address of program start.
;
; Each iteration of this loop relocates one byte.
; BC = count of bytes remaining.
; DE = address of byte to relocate.
; H = offset to add to relocated bytes.
; L = relocation bitmap for current byte
; (SP) = address of the bitmap entry where we got L.
;
LOOP1: LD A,B
OR C ;If BC (bytes remaining)=0 then leave.
JR Z,ENDLOOP
DEC BC ;BC=BC-1
LD A,E ;
AND 07H ;If E is not divisible by 8, skip.
JR NZ,NOINC
EX (SP),HL ;HL=current relocation map address.
LD A,(HL) ;A =current relocation map entry.
INC HL ;Move to next position in map.
EX (SP),HL
LD L,A ;L:=bitmap for next 8 bytes.
NOINC: LD A,L ;
RLA ;Is the current byte relocatable?
LD L,A
JR NC,NOREL
LD A,(DE) ;If so, add the necessary offset.
ADD A,H
LD (DE),A
NOREL: INC DE ;Go to next entry in loaded program.
JR LOOP1
;
ENDLOOP:
POP DE ;Pop and discard relocation map pointer.
POP DE
;
; Program fixup is now complete.
; IX still = address of PRL header.
;
PUSH IX
POP HL
INC H ;HL = address of actual program entry.
LD (IX+6),0C3h
LD (IX+7),L
LD (IX+8),H ;Replace relocation with a jump to reloc'ed
;program, so this does not run again.
POP IX ;Restore caller's IX.
JP (HL) ;Run the program.
;
end
--
------------- http://www.seasip.demon.co.uk/index.html --------------------
John Elliott |BLOODNOK: "But why have you got such a long face?"
|SEAGOON: "Heavy dentures, Sir!" - The Goon Show
:-------------------------------------------------------------------------)
You're quite right, I was thinking of the Game Boy development kit:
I've had a look at z88dk; possibly I haven't been driving the compiler
properly, but it doesn't seem to want to use ix and iy. It also uses lots
of external helper functions, leading to code that's riddled with calls.
Not necessarily a bad thing, but this is the sort of thing it comes up
with:
int bar(int a)
{
return a + foo();
}
-->
._bar
ld hl,2 ;const
add hl,sp
call l_gint ;
push hl
call _foo
pop de
add hl,de
ret
[fiddles]
Aha. If I tell it everything's unsigned, it doesn't do this:
._bar
ld hl,2 ;const
add hl,sp
ld l,(hl)
ld h,0
push hl
call _foo
pop de
add hl,de
ret
I shall investigate further. I think Hitech C is still producing better
code, but it's definitely worth investigating. I wish it passed parameters
in registers, but I like the way it doesn't use a seperate frame pointer.
(Now I just need a CP/M native version...)
--
+- David Given --McQ-+
| d...@cowlark.com | UN-altered REPRODUCTION and DISSEMINATION of this
| (d...@tao-group.com) | IMPORTANT information is ENCOURAGED
+- www.cowlark.com --+
Sadly, few of these assumptions have borne significant fruits in practice in the
past 6 years. Whether that's due to immature hardware, immature compilers, or
faulty premises is yet to be shown. I'd love to be proven wrong on this, but I
want the proof on silicon, not on paper. :-)
Claudio Puviani
Wow! That's really useful. Thanks.
--
JGH - www.mdfs.net
>"Michael J. Mahon" <mjm...@aol.com> wrote
<snip>
>> The EPIC approach allows the best of both worlds--the compiler uses
>> its huge instruction "window" to build the best static schedule it can
>> (which is phenomenally good for loops), and the silicon can later do
>> dynamic scheduling of the hazard-free dispatch groups of bundles
>> (as the number of simultaneously dispatched bundles grows).
>>
>> Initial implementations of Itanium only dispatched two bundles
>> (six operations) simultaneously, but four-bundle and more versions
>> will surely come with increased integration density.
>>
>> Compilers properly compile to a target of "N" simultaneous bundles,
>> marked with punctuation, and an implementation chews off as many
>> simultaneous bundles as its resources permit.
>>
>> Now, the typical concurrency a compiler can target often exceeds
>> the two bundles initially implemented, so there is little need for
>> dynamic scheduling. Later, when the chip resources permit more
>> simultaneous bundles than a compiler typically finds, dynamic
>> scheduling can help find additional opportunities. This is particularly
>> true in non-looping, function-call-rich code.
>
>Sadly, few of these assumptions have borne significant fruits in practice in
the
>past 6 years. Whether that's due to immature hardware, immature compilers, or
>faulty premises is yet to be shown. I'd love to be proven wrong on this, but
>I want the proof on silicon, not on paper. :-)
I agree with you--the promise is largely unfulfilled.
Initial silicon implementations were immature and hampered with
unnecessary hardware concessions to x86 compatibility (completely
and efficiently achievable with dynamic translation techniques) and
with a needlessly deep and restrictive cache hierarchy.
Initial compiler efforts have been too much about learning and not
enough about doing the required scheduling and code motion.
Finally, and potentially more worrisome, there is the tendency of object-
oriented software designs to replace data access loads and stores
with short function calls. This is not easily optimized across separately
compiled "objects", but may eventually require dynamic optimization
at run time--a more radical approach which also enables more rapid
evolution of hardware architecture.
> Sadly, few of these assumptions have borne significant fruits in practice in the
> past 6 years. Whether that's due to immature hardware, immature compilers, or
> faulty premises is yet to be shown. I'd love to be proven wrong on this, but I
> want the proof on silicon, not on paper. :-)
Well, usually, the proof is in the pudding :-)
'Andreas
--
Wherever I lay my .emacs, there's my $HOME.
> "Claudio Puviani" <puv...@hotmail.com> writes:
>
> > Sadly, few of these assumptions have borne significant fruits in practice in the
> > past 6 years. Whether that's due to immature hardware, immature compilers, or
> > faulty premises is yet to be shown. I'd love to be proven wrong on this, but I
> > want the proof on silicon, not on paper. :-)
>
> Well, usually, the proof is in the pudding :-)
I believe the common expression to be a mutation of "The proof of the
pudding is in the eating." :P </NITPICK>
-uso.
> I've had a look at z88dk; possibly I haven't been driving the compiler
> properly, but it doesn't seem to want to use ix and iy.
I am not really into how the compiler works. You could ask Dominic Morris
who is the driving force in the project.
> I shall investigate further. I think Hitech C is still producing better
> code, but it's definitely worth investigating. I wish it passed parameters
> in registers, but I like the way it doesn't use a seperate frame pointer.
> (Now I just need a CP/M native version...)
Perhaps you can compile it under CP/M yourself. There is already some CP/M
libs.
Dominic did most of the work on an Amiga running Free-BSD. I helped him out
with testing and porting to MS-DOS using Borland 3.1 and later Win32 using
MS Visual C. So the source is fairly portable.
BR
Dennis
There were also .REL files, as created by Microsoft's M80 assembler.
These were essentially the same size as a .COM file, but were
relocatable by the L80 linker.
Alan Bomberger used .REL files for his Write-Hand-Man CP/M-80 programs.
I have the source for it, and he wrote a little loader that attached to
the CCP so you could load a .REL file as well as a .COM file at the CP/M
prompt. While .COM files always load at 100h, the .REL files were loaded
somewhere in high memory, so they could co-exist with a .COM file. This
gave the ability (for example) to be running Wordstar, hit a 'hot' key
which pops up a calculator, do some calculations, then return to
Wordstar right where you left off.
--
"Never doubt that a small group of committed people can change the
world. Indeed, it's the only thing that ever has!" -- Margaret Meade
--
Lee A. Hart 814 8th Ave N Sartell MN 56377 leeahart_at_earthlink.net
> I've had a look at z88dk; possibly I haven't been driving the compiler
> properly, but it doesn't seem to want to use ix and iy.
Which is a good thing because accessing stack parameters
with ix and iy is just too slow. It uses hl to access
parameters on stack and is smart enough to use push / pop
combinations if that is faster. This is the fastest way
to do things if you're always passing params on the stack.
> Aha. If I tell it everything's unsigned, it doesn't do this:
>
> ._bar
> ld hl,2 ;const
> add hl,sp
> ld l,(hl)
> ld h,0
> push hl
> call _foo
> pop de
> add hl,de
> ret
Code is also optimized for unsigned char and char types.
> I shall investigate further. I think Hitech C is still producing better
> code, but it's definitely worth investigating.
I also think Hitech C generates slightly better code. z88dk
has a respectable set of peephole optimization rules but you
will see instances where the compiler could benefit from some
dead variable pruning (from the assembly! There's really no
need to see "ld h,0" separated by a few instructions or
"ld l,(hl); ld h,0; ld hl,(_global)". I guess if this really
gets you in a wad, you could fix it up by hand.
What I like about z88dk is that it seems to generate bug-free code
and has a large library for many platforms. Many library routines
are written in assembler, which is what I like to see -- bring the
assembler "close to the surface" for best performance.
As for passing params in registers, well, I've yet to see that
in any free compilers but I'd like to see it as much as you.
With z88dk you can qualify a function as "__FASTCALL__" that
will pass a single up to 16-bit parameter in the HL register
pair.
Alvin