Sometime back, I vaguely recall a rather heated discussion on which
arrangement was better: big-endian or little-endian. What was the outcome
of this discussion? Did anyone "win"?
As I see it, there are advantages to both approaches. Since I find
the terms big-endian and little-endian as being terribly nondescriptive of
what they represent (can anyone supply a mnemonic?), I'll use unambiguous
terms.
Least-significant-byte first has the advantage of being able to
specify the low order byte or word of a longword in memory by specifying the
same address for all three. I.e. if you had the value 0xAABBCCDD stored at
location <foo>, then you'd specify <foo> to get at the low order byte
(0xDD), word (0xCCDD), and full longword. No funny pointer arithmetic is
needed.
Most-significant-byte first lets you read a hex dump more easily.
If you're reading a hex dump as individual bytes, you don't have to mentally
swap bytes around to read addresses and such; they will appear in an order
such that you can read them off directly.
Both representations are useful. Are there any other respective
advantages I've left out?
Flames to /dev/null, or my mailbox, please...
_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
Leo L. Schwab -- The Guy in The Cape INET: well!ew...@ucbvax.Berkeley.EDU
\_ -_ Recumbent Bikes: UUCP: pacbell > !{well,unicom}!ewhac
O----^o The Only Way To Fly. hplabs / (pronounced "AE-wack")
"Work FOR? I don't work FOR anybody! I'm just having fun." -- The Doctor
Mark Horton
Big endian has the significant advantage that, when properly aligned,
character strings can be compared using the full width of the machine's
ALU. For 32-bit machines, this means that two four-character (sub)strings
can be compared at one time. This is because the lowest address always
points to the *first* character in the string. Little endian requires
character-at-a-time processing or hardware gymnastics.
Since it forces inefficiency, little-endian is for CISCs. :-) :-)
Oops, I meant to say that the first character is always in a more
significant position than the second and succeeding characters. This
corresponds to the convention that the first character in a string is
the most important in determining its position in an alphabetically
sorted list of strings. Thus, after properly aligning, (sub)strings
can be compared as if they were simple (unsigned) integers.
Does that make the IBM 370 a RISC? ;-)
> Big endian has the significant advantage that, when properly aligned,
> character strings can be compared using the full width of the machine's
> ALU. For 32-bit machines, this means that two four-character (sub)strings
> can be compared at one time. This is because the lowest address always
> points to the *first* character in the string. Little endian requires
> character-at-a-time processing or hardware gymnastics.
I do the same thing on little endian. It all depends on how you store
the characters. Read the "HOLY WAR" article for a detailed explanation.
The problem is, there are no consistent little-endian machines, the
big-endian infiltrators have sabotaged every last one (that I know of).
The major (dis)advantages are:
BIGend (numeric) compares / divides are faster
LITTLEend adds / multiplies are faster
--
Stuart D. Gathman <stu...@bms-at.uucp>
<..!{vrdxhq|daitc}!bms-at!stuart>
Actually, where most little-endian machines screw up is storing the
bits in the byte in the wrong order. It is good to hear that somebody got it
right and stored a one as 100000...0000 rather than 00000001000...000.
(That is what you meant wasn't it?). Note that this implies one
multiplies by 2 by using a RIGHT shift (else there is an inconsistancy
in the little-endian view in the registers)! The Inmos sounds interesting.
David Hutchens
hu...@hubcap.clemson.edu
...!gatech!hubcap!hutch
Sorry, no. Little endian means that if two addressed objects (on the
Trasnputer, the smallest object that can be addressed is a byte) are
part of the same number, the object (byte) with the lower address is
less significant.
Note two things:
-> There is no ordering implied on bits within bytes; a byte is an atomic
object, and you can't say which bit of it comes "first." (Of course,
in serial communications, the other site of the Holy War, this is
significant.)
-> Both big- and little-endian types agree that more significant bits should
be to the left, conceptually (the Arabic heritage, remember?); they
*don't* agree on whether addresses increase left-to-right (big-endian) or
right-to-left (little-endian). See On Holy Wars and a Plea For Peace for
more diagrams. A one is stored as 00000000 00000000 00000000 00000001,
with the bytes' addresses being base+3 base+2 base+1 base+0.
Thus, it is impossible for a byte-addressed machine to store the bits in
a byte in the wrong order, unless it has bitfield instructions or some
such bit-addressing kludge.
--
-Colin (uunet!microsof!w-colinp)
You mean the beginning of a double looks like a float?
f(x) float x; { g(&x); } /* g() is actually passed a (double *) */
> Actually, where most little-endian machines screw up is storing the
> bits in the byte in the wrong order. It is good to hear that somebody got it
On most of these machines, bits are stored vertically :-/. [half :-)]
If you can't index or address bits, there is no order. If it makes you happy,
call a right shift (to less significance) a down shift, a left shift an up
shift. The big/little thing only has meaning in addressing parts.
Another notational screw-up is where to put address 0 when drawing memory. I
always put it at the top ("up there at the bottom of memory").
>Since it forces inefficiency, little-endian is for CISCs. :-) :-)
I could be wrong, but I think a fully consistent little-endian machine
(e.g. nsc 32xxx) does not have this disadvantage.
All this was covered about 2 years ago on this group: the conclusion then
was that little-endian had a small advantage on tiny machines (e.g. 8008
class and slower) needing to do BCD arithmetic, big endian machines have
the "advantage" that it is easier to read dumps, and there are no other
significant differences. VAXes, of course, are not consistent little-
endian or big-endian, but then, we are not supposed to have to read dumps
anymore anyway, remember ? :-)
--
Hugh LaMaster, m/s 233-9, UUCP ames!lamaster
NASA Ames Research Center ARPA lama...@ames.arc.nasa.gov
Moffett Field, CA 94035
Phone: (415)694-6117
No No No. Little-endian machines do store one as 100000...0000 where the
left most digit (the 1) is the LSBit. This is consistent with the fact that
in little-endian machines the left most byte of any sequence of bytes is also
the LSByte. It is you (and me, and almost everybody else for that matter),
that insisted on writing bit sequences in a big-endian order - MSBit first.
It is us that writes one as 00000001000...000; any screw-ups happen only
in our heads.
In general, in little-endian machines, the left most storage unit of any
sequence of storage units is also the least significant storage unit of the
sequence. However, we have the habit of writing the sub-fields of a storage
unit in the big-endian order. We also have the habit of storing strings
in the big-endian order - the most significant character goes to the left
most storage cell. Both of these give a little advantage to big-endian
machines. However, if by our minds happen to work the other way: if we
were used to write sub-fields of a storage unit so that the left most
field is the LSField, and if we were to store the first character of a
string in the right most storage cell, then the relationship between the
two conventions would be reversed. It is all in our heads.
As for the advantage that is traditionally claimed by the small-endian
machines (that it simplifies many extended precison arithmatic operations,
including BCD string operations and double word operations) we really have
to look a little closer to see whether it is caused by the difference in
the byte-ordering convention. Take the double word operation for example,
"add-immediate double-word value to register" instruction may look like this
in a
little-endian machine: <op-code> <LSWord> <MSWord>
big-endian machine : <op-code> <MSWord> <LSWord>
This little endian machine has a slight edge because half of the addition
operation can take place while MSWord is being fetched. (I am not saying
that any such disadvantages cannot be compensated in a real big-endian
machine.) However, if we look carefully, we see that in the case of the
above little-endian machine, the PC moves from low address to high address
which is the direction of increasing word-significance; yet, the big-endian
example has PC move in the direction of decreasing word-significance.
Suppose we have a big-endian machine in which instruction number 0 is stored
in the least significant program word (high address, or right most) and the
PC moves from high address to low address, we will have:
big-endian machine : <MSWord> <LSWord> <op-code>
This big-endian machine will be exactly as good as previous little-endian
machine in handling double word operations. Similarly for character string
comparisons, a little-endian machines will be exactly as good as a big-endian
machines if we store the strings backward from the way we store them now
(which means we change the way we index the bytes, s[3] = address of s - 3).
There is nothing wrong with either the big-endian or the little-endian
byte-ordering conventions; any difficulties that we encounter are in our
heads; any perceived differences between machines of the two conventions
are really artifacts of how each machine move their PC and how it index
arrays, and has nothing to do with byte ordering.
In fact, given any processor, one can construct another processor that is
exactly as good as the first one, but having the opposite convention.
Assuming of course that we don't read the hex-dumps. The things that can
be argued about are things like the direction of PC movements and array
indexing conventions. Byte ordreing by themselves do not make any real
difference.
--
/*------------------------------------------------------------------------*\
| Wen-King Su wen-...@vlsi.caltech.edu Caltech Corp of Cosmic Engineers |
\*------------------------------------------------------------------------*/
You're wrong. On a NSC 32k, addresses are in the wrong order (actually, I
think it might just be displacements), because the upper 1 or 2 bits
determine the size of the address (and means that you can't use a
displacement of 2gigs unsigned, or 1 gig signed. everybody sigh in unison
8-)). Also, I'd bet that the FP format is backwards (wrt big vs. little
endian).
Now, *Cybers* don't have this problem, you betcha. It's kinda nice not
having to worry about byte addressing...
--
Sean Eric Fagan | "Merry Christmas, drive carefully and have some great sex."
se...@sco.UUCP | -- Art Hoppe
(408) 458-1422 | Any opinions expressed are my own, not my employers'.
I was assuming an equality comparison. Most people seem to assume strcmp(),
for which it does make a difference (this could lead to a very long discussion
of how important strcmp-like comparisons are, etc., which I will avoid.)
call sub (1)
and pass a number to it (a longword - 4 bytes) which would be interpreted
correctly whether the receiving formal parameter was a byte, a word (2 bytes),
or a longword (4 bytes). This is not possible in a "big endian" machine -
you have to know how many bytes of high order 0's to write before you get to
the low order byte. Considering that the Fortran of the day had no way to
declare the formal parameters for subroutines, and the importance of
Fortran in the early days of the VAX (and the fact that the VAX was built
with a great deal of input from the software guys), could this be the REAL
motivation for "little endian"?
Of course the fact I even thought of the possibility of such a trick
probably shows I'm just an old Fortrash hacker ...
Bruce C. Wright
>I was assuming an equality comparison. Most people seem to assume strcmp(),
>for which it does make a difference (this could lead to a very long discussion
>of how important strcmp-like comparisons are, etc., which I will avoid.)
Well, I won't. :-)
The literature on sorting algorithms focuses on the use of a "<=" oracle,
by analogy with the mathematical definition of "well order", which is what
a sort is supposed to do. In a previous life I derived a sort algorithm
that used a three-way oracle (strcmp, in fact) to good advantage.
I based the work on the fact that a large part of the comparison expense
for strings is in scanning the initial equal part; the three-way answer
comes for free after that. My algorithm maintained in-core data in a
trinary tree with a degenerate (linear) subtree for the equals case. The
expected data had significant clumping around discrete values so the extra
space was well justified. The disk-resident format for intermediate runs
included a bit for "known equal" so the tests didn't have to be repeated
during merging. It was a very fast sort, given the expected input
distribution.
(It used a number of other tricks, including bidirectional run management
and very-high-order merging. The other tricks exploited the unavoidable
disk block cachine in unix, but the trinary tree is quite general.)
So, don't discount strcmp's value. Very many progams use it for
an equality test only, but sorting still consumes a great deal of
computer time in the real world, and when sorting we need to
know which way it went. It would be a good idea for computer
architects to bear this in mind: As mundane as sorting may
seem, it is the benchmark of choice for a great many check-signers.
--
Steve Nuchia South Coast Computing Services
uunet!nuchat!steve POB 890952 Houston, Texas 77289
(713) 964 2462 Consultation & Systems, Support for PD Software.
This cannot have anything to do with byte-ordering because the two
byte-ordering conventions are totally symmetrical and isomorphic. Any
difference between two machines must have been a result of some asymmetries
that was imposed on on the machine when the machine was designed. In
the example above, the asymmetry was imposed when the following question
is answered:
If a data unit is consisted of a sequence of bytes, what should the
address of the data unit be: the address of the MSByte or the address
of the LSByte.
In VAX, and in most little-endian machines, the address of the LSByte was
used to represent the address of the data unit. In 68K and most big-endian
machines, the address of the MSByte was used. The choice is quite arbitrary,
but the important thing is that it imposes an asymmetry. The supposed
"advantage" of the little-endian byte-ordering is really the advantage of
choosing the address of the LSByte to be the address of a multi-byte unit.
We can build a big-endian machine with exactly the same advantage if we make
the same choice for it as we have made for VAX. In this case, a 'long' that
occupies byte address 0x20 0x21 0x22 0x23, will have 0x23 as its address.
In general, given any little-endian machine, we can build a big-endian
machine that is exactly as good as the little-endian machine (in fact,
they will be duals), and vice versa. Byte-ordering should cease to be the
focal point of any arguments; talks about the decisions that lead to the
asymmetries should replace it.
> call sub (1)
>and pass a number to it (a longword - 4 bytes) which would be interpreted
>correctly whether the receiving formal parameter was a byte, a word (2 bytes),
>or a longword (4 bytes). This is not possible in a "big endian" machine -
>you have to know how many bytes of high order 0's to write before you get to
>the low order byte. Considering that the Fortran of the day had no way to
>declare the formal parameters for subroutines, and the importance of
>Fortran in the early days of the VAX (and the fact that the VAX was built
>with a great deal of input from the software guys), could this be the REAL
^^^^^^^^^^^^^^^^^^^^^^
>motivation for "little endian"?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
If I thought that it was, my respect for DEC would take a BIG drop.
Passing a longword and receiving a word is an ERROR. Sure, it works fine
for your example, but what if the statement were:
call sub (70000)
Now both the big- and little-endian machines are receiving the "wrong"
value. On the big-endian machine, the developer will probably find
his/her mistake during initial checkout. On the little-endian machine
the error _may_ not be found until the module has been in production,
spewing out wrong answers for months. I don't know about what kind of
environment you work in, but where I work this kind of error could cost my
company $k in data that had to be reprocessed. (Not too mention egg on
our corporate face if a client were to discover the gaffe.)
And now to lighten up.... No, this cannot be the _REAL_ motivation
for the little-endian data format, because this is INTEGER data (-::-)
snicker ... snicker ...
Carrington Dixon
UUCP: { convex, killer }!mic!d25001
Well, you can assert that the FORTRAN language itself is an error; this is
essentially what you are saying. The point is that there is NO WAY (repeat:
NO WAY) in the Fortran 77 standard to declare a formal argument list. That
means that there is NO WAY to declare to the compiler that it is to pass
an INTEGER*2 value (as opposed to, say, an INTEGER*4 value) as a parameter
to the subroutine. In other words, the INTENT OF THE PROGRAMMER was all
along to pass a word rather than a longword, but the DEFINITION OF THE
LANGUAGE does not allow this to be explicitly declared.
Now I am not going to defend FORTRAN as a "safe" language, or an "elegant"
language, or even a "good" language. It is however a very commercially
significant language - which is not at all the same thing (eg, COBOL).
FORTRAN is still in considerable use (even for new development in some
environments).
> the error _may_ not be found until the module has been in production,
> spewing out wrong answers for months. I don't know about what kind of
> environment you work in, but where I work this kind of error could cost my
> company $k in data that had to be reprocessed. (Not too mention egg on
> our corporate face if a client were to discover the gaffe.)
The classic FORTRAN error had nothing whatsoever to do with this kind
of error, but with the terseness that FORTRAN uses for its syntax:
a statement something like
do 100 i=1,10
got permuted to something like
do 100 i=1.10
The former statement starts a loop varying "I" from 1 to 10, and the latter
assigns a value of 1.10 to a variable named "DO100I". Because of the
structure of FORTRAN, where there is no explicit end-loop construct (that
is specified by the statement label "100"), so the error went undetected ...
until the satellite got dumped in the ocean and NASA had lots of egg on
its face.
In other words, if you want to flame anything about safe computing, you
should probably be flaming FORTRAN, not DEC or the VAX or me.
Bruce C. Wright
That's an URBAN LEGEND!!! Fortran may deserve bashing, but the rumor
that it crashed a satellite launcher is false (and should be put to
rest).
The real story is as follows: a formula was written with a logical
"complement". Somewhere between the original formula and its
implementation in ASSEMBLY language, the complement was dropped.
This was discussed at length a while back on comp.risks and several
other newsgroups. Followups to comp.software-eng.
;-D on ( I take risks, I read USENET! ) Pardo
-----------> Excerpt from comp.risks (RISKS Digest 5.73): <------------
Date: Sun, 13 Dec 87 05:30:10 PST
>From: hoptoad.UUCP!g...@cgl.ucsf.edu (John Gilmore)
To: RI...@KL.SRI.COM
Subject: Finally, a primary source on Mariner 1
My friend Ted Flinn at NASA (fl...@toad.com) dug up this reference
to the Mariner 1 disaster, in a NASA publication SP-480, "Far Travelers --
The Exploring Machines", by Oran W. Nicks, NASA, 1985. "For sale by the
Superintendent of Documents, US Government Printing Office, Wash DC."
Nicks was Director of Lunar and Planetary Programs for NASA at the time.
The first chapter, entitled "For Want of a Hyphen", explains:
"We had witnessed the first launch from Cape Canaveral of a spacecraft
that was directed toward another planet. The target was Venus, and the
spacecraft blown up by a range safety officer was Mariner 1, fated to
ride aboard an Atlas/Agena that wobbled astray, potentially endangering
shipping lanes and human lives."
..."A short time later there was a briefing for reporters; all that
could be said -- all that was definitely known -- was that the launch
vehicle had strayed from its course for an unknown reason and had been
blown up by a range safety officer doing his prescribed duty."
"Engineers who analyzed the telemetry records soon discovered that two
separate faults had interacted fatally to do in our friend that
disheartening night. The guidance antenna on the Atlas performed
poorly, below specifications. When the signal received by the rocket
became weak and noisy, the rocket lost its lock on the ground guidance
signal that supplied steering commands. The possibility had been
foreseen; in the event that radio guidance was lost the internal
guidance computer was supposed to reject the spurious signals from the
faulty antenna and proceed on its stored program, which would probably
have resulted in a successful launch. However, at this point a second
fault took effect. Somehow a hyphen had been dropped from the guidance
program loaded aboard the computer, allowing the flawed signals to
command the rocket to veer left and nose down. The hyphen had been
missing on previous successful flights of the Atlas, but that portion of
the equation had not been needed since there was no radio guidance
failure. Suffice it to say, the first U.S. attempt at interplanetary
flight failed for want of a hyphen."
------------------------------------------------------------
>From: mink%c...@harvard.harvard.edu (Doug Mink)
To: ri...@csl.sri.com
Subject: Mariner 1 from NASA reports
JPL's Mariner Venus Final Project Report (NASA SP-59, 1965)
gives a chronology of the final minutes of Mariner 1 on page 87:
4:21.23 Liftoff
4:25 Unscheduled yaw-lift maneuver
"...steering commands were being supplied, but faulty application
of the guidance equations was taking the vehicle far off course."
4:26:16 Vehicle destroyed by range safety officer 6 seconds before
separation of Atlas and Agena would have made this impossible.
In this report, there is no detail of exactly what went wrong, but "faulty
application of the guidance equations" definitely points to computer error.
"Astronautical and Aeronautical Events of 1962," is a report of NASA to the
House Committee on Science and Astronautics made on June 12, 1963. It
contains a chronological list of all events related to NASA's areas of
interest. On page 131, in the entry for July 27, 1962, it states:
NASA-JPL-USAF Mariner R-1 Post-Flight Review Board determined that
the omission of a hyphen in coded computer instructions transmitted
incorrect guidance signals to Mariner spacecraft boosted by two-stage
Atlas-Agena from Cape Canaveral on July 21. Omission of hyphen in
data editing caused computer to swing automatically into a series of
unnecessary course correction signals which threw spacecraft off
course so that it had to be destroyed.
So it was a hyphen, after all. The review board report was followed by a
Congressional hearing on July 31, 1962 (ibid., p.133):
In testimony befre House Science and Astronautics Committee, Richard
B. Morrison, NASA's Launch Vehicles Director, testified that an error
in computer equations for Venus probe launch of Mariner R-1 space-
craft on July 21 led to its destruction when it veered off course.
Note that an internal review was called AND reached a conclusion SIX DAYS
after the mission was terminated. I haven't had time to look up Morrison's
testimony in the Congressional Record, but I would expect more detail
there. The speed with which an interagency group could be put together
to solve the problem so a second launch could be made before the 45-day
window expired and the lack of speed with which more recent problems
(not just the Challenger, but the Titan, Atlas, and Ariane problems
of 1986 says something about 1) how risks were accepted in the 60's,
2) growth in complexity of space-bound hardware and software, and/or
3) growth of the bureaucracy, each member of which is trying to avoid
taking the blame. It may be that the person who made the keypunch
error (the hyphen for minus theory sounds reasonable) was fired, but
the summary reports I found indicated that the spacecraft loss was
accepted as part of the cost of space exploration.
Doug Mink, Harvard-Smithsonian Center for Astrophysics, Cambridge, MA
Internet: mi...@cfa.harvard.edu
UUCP: {ihnp4|seismo}!harvard!cfa!mink
---------> End excerpt from comp.risks (RISKS Digest 5.73): <----------
--
pa...@cs.washington.edu
{rutgers,cornell,ucsd,ubc-cs,tektronix}!uw-beaver!june!pardo
>Well, you can assert that the FORTRAN language itself is an error; this is
>essentially what you are saying. The point is that there is NO WAY (repeat:
>NO WAY) in the Fortran 77 standard to declare a formal argument list. That
We both agree that there is no provision in FORTRAN to catch
mismatched arguments at compile time. We even seem to agree that this
is a failing of that language. Thus there is a large category of errors
that FORTRAN cannot find at compile time. I maintain that those who
wish to create "correct" programs will want to test these modules in
order to find as many errors as possible before dumping the mess on some
hapless user.
With this in mind, I maintain that some data formats lend themselves
to finding such latent errors more readily than do others and that it
would be pernicious of any vendor to choose its data formats with an
eye to making such checkout as difficult as possible. DEC and little-
endian integers was just the example at hand; I can think of other
architectures that allow the equally unfortunate passing of double-reals
and receiving single-reals with similar problems in runtime diagnoses.
>In other words, if you want to flame anything about safe computing, you
>should probably be flaming FORTRAN, not DEC or the VAX or me.
>
> Bruce C. Wright
I thought my response was a little mild to qualify as a full-
fledged usenet flame, but I suppose that opinons may differ.
For the record, I do not think that DEC was guilty of guilty of choosing
its data formats in some blind and misguided attempt to follow FORTRAN's
lead into the dismal swamp. They chose the "little-endian" format for
other reasons. I am sure that they were under no delusion that they had
to perpeptuate FORTRAN's shortcomings in their hardware.
Incidentally, I think that the phrase that you were trying to use
(twice) was "in error." I might be offended if I thought that you
really meant that I was "an error."
>Another notational screw-up is where to put address 0 when drawing memory. I
>always put it at the top ("up there at the bottom of memory").
I think there are two main reasons for what appears to be more
programmers liking big-endian (no flames, local observation) and hardware
people liking little-endian are:
1) Little-endian used to make it easier to support big integers on small-
buswidth machines (minor issue, solved or irrelevant now in general).
2) Hardware people like to draw diagrams with 0 at bottom-right, software
people, used to printers and screens that print top to bottom, left to right,
like to put 0 at upper-left. It also makes dumping memory with strings easier
to read.
--
Randell Jesup, Commodore Engineering {uunet|rutgers|allegra}!cbmvax!jesup
DEC VAX DUMP prints out in a format that makes both integers and strings
easy to read. Namely, it prints out each in their ``natural'' order:
Integers in little-endian (right to left), and strings from left to right.
Here's an example:
Virtual block number 1 (00000001), 512 (0200) bytes
4E4D4C4B 4A494847 46454443 4241002F /.ABCDEFGHIJKLMN 000000
69685420 5A595857 56555453 5251504F OPQRSTUVWXYZ Thi 000010
74736574 20612079 6C6E6F20 73692073 s is only a test 000020
00000000 00000000 00000000 FFFF0021 !............... 000030
00000000 00000000 00000000 00000000 ................ 000040
<----- numbers go this way <---*---> strings go this way --->
People who expect the first word (000000) to appear first (at left) will be
suprised by this, but it's perfectly consistent with the way we write
our numbers and strings.
Bernard A. Badger Jr. 407/984-6385 |``Use the Source, Luke!''
Secure UNIX Products |It's not a bug; it's a feature!
Harris GISD, Melbourne, FL |Buddy, can you paradigm?
Internet: bbadger@cobra@trantor.harris-atd.com|Recursive: see Recursive.
>DEC VAX DUMP prints out in a format that makes both integers and strings
>easy to read. Namely, it prints out each in their ``natural'' order:
>Integers in little-endian (right to left), and strings from left to right.
> 4E4D4C4B 4A494847 46454443 4241002F /.ABCDEFGHIJKLMN 000000
> 69685420 5A595857 56555453 5251504F OPQRSTUVWXYZ Thi 000010
> 74736574 20612079 6C6E6F20 73692073 s is only a test 000020
> <----- numbers go this way <---*---> strings go this way --->
>
>People who expect the first word (000000) to appear first (at left) will be
>suprised by this, but it's perfectly consistent with the way we write
>our numbers and strings.
I don't know about you (or your hardware), but I tend to write from
left to right, not right to left. :-) And I don't start writing in the
middle of the page, and go both left and right from there. :-)
Sure you can write this way, or even make things scroll up, but
most terminals/whatever are easier to deal with in a sequential, left to
right, top to bottom fashion. It's marginally more annoying to deal with
in your way. Also, I get a headache trying to find the word/byte/whatever
I'm looking for in a listing like that, I have to reverse my thinking. :-)
Personally, that's a nice kludge to get around the fact that little-
endian is "naturally" written right to left, bottom to top by most people.
However, people don't read that way, certainly not text.
I think little-endian is a long-standing joke played by hardware
engineers of software writers. :-)
Where `people' are defined to be those who happen to be members of the
Western cultures that read left to right. What does that make the others?
> I think little-endian is a long-standing joke played by hardware
> engineers of software writers. :-)
Big-endian is a long-standing mistake imposed on us by merchants from the
Middle Ages who missed the point. In transcribing the number system from
the Arabic, they should have had the sense to reverse the digits to compensate
for the strange Western custom of writing from left to right. ( :-), I suppose).
> --
> Randell Jesup, Commodore Engineering {uunet|rutgers|allegra}!cbmvax!jesup
--
Griff Smith AT&T (Bell Laboratories), Murray Hill
Phone: 1-201-582-7736
UUCP: {most AT&T sites}!ulysses!ggs
Internet: g...@ulysses.att.com
Yes, sorry, I forgot to qualify that as people is "Western"
cultures. This is the smallest problem with existing systems/software
for non-"Western" people (does your software support kanji? Arabic?)
A look at ancient writing of numbers, both in symbols and spelled out,
indicates that it is pretty much big-endian. Except for the units and
tens digits, I know of no language in either the Semitic or the Indo-European
group which does not express numbers with the most significant part first.
For example, in Hebrew (and probably also in Arabic, they are sufficiently
similar), one would say the equivalent of two hundred and thirty, NOT
thirty and two hundred. It would be written right-to-left big-endian,
just as the language is written.
These languages then introduced (mostly) decimal representations, using
different characters for multiples of different powers of 10. Again, they
were written big-endian. Then the idea of using the same symbol in each
place, with a zero to hold the place, originated in India. The Indian
writing is left-to-right. After the Moslem invasion of India, they adopted
the Indian decimal notation without change. That is why the Arabic expression
appears as little-endian.
There does not seem to be any support from "natural" languages for the
little-endian approach.
--
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
Phone: (317)494-6054
hru...@l.cc.purdue.edu (Internet, bitnet, UUCP)
The important part about little endian vs big endian (which can cause problems)
is overlaying of disimilar data types. If I overlay a byte onto a word on a
VAX (or any other little-endian processor), put in a word value < 256, and do a
byte read from the same address, I will get a correct response. If I do the
same thing with a big-endian processor, I will get zero.
Of course you don't usually overlay floating point numbers ... so the order of
the bytes in a floating-point number is (usually) irrelevant ...
Don Stokes
Systems Programmer
Government Printing Office, Wellington, New Zealand.
|There does not seem to be any support from "natural" languages for the
|little-endian approach.
Four and twenty black birds, baked in a pie....
|--
|Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
|Phone: (317)494-6054
|hru...@l.cc.purdue.edu (Internet, bitnet, UUCP)
Michael McNamara
m...@ardent.com
What about Danish: fem og halvfirsindtyve (75 (my Danish is rusty))
Or norwegian: en og femti (51). This fooled me once into believing
one could rent a room in Paris for Fr 1.50... :-)
Or better yet, German: Zwei und Vierzig (42!)
I believe Danish, Norwegian and German count as "natural" languages.
At least in Denmark, Norway and German[y|ies] :-)
John.
_______________________________________________________________________________
| | | | |\ | | /|\ | John Kallen "The light works. The gravity
| |\ \|/ \| * |/ | |/| | | PoBox 11215 works. Anything else we must
| |\ /|\ |\ * |\ | | | | Stanford CA 94309 take our chances with."
_|_|___|___|____|_\|___|__|__|_j...@csli.stanford.edu___________________________
Taking the first line of the dump as an example,
>> 4E4D4C4B 4A494847 46454443 4241002F /.ABCDEFGHIJKLMN 000000
note that the first two bytes of the file specify a single integer number,
LSB order: 002F ==> byte(0) = 2F, byte(1) = 00. It's certainly easier to
read written MSB (002F) than in storage order (2F00).
If the next element of the file were ``really'' an INTEGER*4 variable
(please excuse the use of FORTRAN in mixed company :-), you would catenate
the "4443 4241" into 44434241. But if it turned out to be two INTEGER*2
values you would read "4241" first, then "4443".
This does result in your eyes moving RtL to increment addressing -- as when
counting to a specific offset in a record structure -- and then scanning
back from LtR to read an integer. This is far easier to put up with than
printing hexadecimal output with addresses increasing from left-to-right on
a little-endian machine!
As far as consistency goes, I always liked the fact that on little-endian
architectures, the bit numbering (0..31) makes bit $ k $ represent
$ 2^k $ no matter what the word size is. Whereas on big-endian 32-bit words
bit $ k $ equals $ 2 ^ {31 - k} $ and on 16-bit (half) words, the value is
$ 2 ^ {15 - k}$.
That is:
LSB (little-endian):
3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
2^7 = 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
So 2^7 sets bit number 7.
MSB (little-endian):
0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
2^7 = [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0]
So 2^7 sets bit number 24.
0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
2^7 = 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
So 2^7 sets bit number 8.
Normally we can sweep these distinctions under a rug of abstraction. It's
only when we start to examine machine code or numeric representations that
we operate on that low a level.
> I think little-endian is a long-standing joke played by hardware
>engineers of software writers. :-)
Right. So if we just play along with the joke in DUMP output, we won't have
to tangle up our bits too badly. Of course, then there's communications
software where some data is MSB and some is LSB, depending whether you're
using the host format or the network format. In that case, no matter which
way we print our dump lines, some data will be written with the LSB on the
left.
P.S. You mentioned the bottom/top issue: whether to print the low addresses
at the top (normal first-things-first order) or at the bottom (like most
hardware address space diagrams, or STACK dumps). Again the most convenient
order depends on the use that is made of the data, what its internal format
*is*. Both forms of output are useful. The VAX DUMP doesn't have a "FFFFFFFF
at top" option. Too bad.
Bernard A. Badger Jr. 407/984-6385 | ``Use the Source, Luke!''
Secure UNIX Products | That's not a bug! It's a feature!
Harris GISD, Melbourne, FL 32902 | Buddy, can you paradigm?
Internet: bba...@x102c.harris-atd.com | 's/./&&/' Tom sed [sic] expansively.
>Or better yet, German: Zwei und Vierzig (42!)
----------
German doesn't count, how about
Drei Hundert Zwei und Vierzig (342)
German reads like the aforementioned "VAX" core dump!
--------------
Marc Sabatella
HP Colorado Language Lab
marc%hpf...@hplabs.hp.com
Rich Alderson
Stanford University
Ah, but consider the German for 1988: neunzehn hundert acht und achtzig
(nine-and-ten hundred eight and eighty). Middle-endian. AHA! Germans
are PDP-11s!
:-)
--
John Woods, Charles River Data Systems, Framingham MA, (508) 626-1101
...!decvax!frog!john, jo...@frog.UUCP, ...!mit-eddie!jfw, j...@eddie.mit.edu
Presumably this means that it is vital to get the wrong answers quickly.
Kernighan and Plauger, The Elements of Programming Style
In German: 24 == vierundzwnzig
In Dutch it's expressed similarly
Also compare English thirteen, fourteen, ... nineteen.
If you read my posting, I did state that there was reversal of the units and
tens digits in many languages. This occurs regular in the Germanic languages,
as many have posted. In Spanish, it only occurs from 11-15, and in French,
from 11-16. A correction to my statement about Hebrew; it also applies there
to hundreds, but either order can occur, and in fact both orders occur in the
same passage.
However, my statement still holds. To give a counterexample, it would be
necessary to come up with examples where such numbers as 46,378 have the
378 before the 46,000. I know of no such examples. The clear resolution
of this problem occurs in these cases of multi-"byte" expressions.
The early symbolic representation of numbers by alphabetic characters or other
symbols is, in every case to my knowledge, in the same order as the written
letters. Even the Roman numerals do this, in that if a less significant
symbol appears before a more significant one, it is treated anomalously.
But the Roman numerals were not used for calculating. The early numerical
representations used letters, but because of no 0 symbol, different letters
were used in different places, or other devices were used. I know of no
ancient little-endian devices. In Hebrew, 378 would always be 300 first,
then 70, then 8, in the right-to-left direction of the writing, even though
both word orders occur, and the other order would be unambiguous.
The apparent little-endianness of Arabic is due to the direct importation of
the left-to-right symbolic numerical writing from India.
Nein! Die PDP-11en sind Deutchen!
(No! PDP-11s are German!)
It seems that several of the ccurrent RISC architectures provide for
either big or little endian operation (Rx000, 88k, 29k, ?). They do
this buy providing a global operating status that can be set by the
user to either big or little endian.
It would be better to provide load and store operations that are
specifically big or little endian:
LLE.W -- load little endian word
SLE.W -- store little endian word
LLE.L -- load little endian long
SLE.L -- store little endian long
LBE.W -- load big endian word
SBE.W -- store big endian word
LBE.L -- load big endian long
SBE.L -- store big endian long
In this scheme, applications would be responsible for keeping themselves
in sync with the endianness of their data, just as they already must
keep themselves in sync with the byte-length of their data. Programmers
could choose the endianness of data as they saw fit--based either on
convenience or performance considerations. Even if LBE.L takes more cycles
than LLE.L, it might still be used advantageously in a string comparison
algorithm--or a 68k emulator.
If the Rx000 used this approach, then DEC would still have been free
to write little-endian code (and code generators), but other peoples'
big-endian code would still run on DEC machines--as long as such code
brings its big-endian data sets along with it.
--
Alan Lovejoy; alan@pdn; 813-530-2211; ATT-Paradyne: 8550 Ulmerton, Largo, FL.
Disclaimer: I do not speak for ATT-Paradyne. They do not speak for me.
___________ This Month's Slogan: Reach out and BUY someone (tm). ___________
Motto: If nanomachines will be able to reconstruct you, YOU AREN'T DEAD YET.
What about seventeen (seven-ten)? These are marked because they're
petrified. But it looks like this used to be the norm.
Perhaps I am missing something...but it appears to me that this
increases the size of the instruction set (and therefore gate size),
so we will have a more complex machine, less space for registers/other
goodies, and what we gain seems to be rather a small win.... perhaps
you could provide some statistics demonstrating how this will improve
performance. (the compatibility win is obvious, but unclear that is
sufficint to gunk up an otherwise clean design).
Keith H. Bierman
It's Not My Fault ---- I Voted for Bill & Opus
Here it is again, adding instructions to a RISC machine... won't
be long before we have a RISC machine with more instructions
than a VAX.... :-)
John Hascall
ISU Comp Center
Ames, IA
I used to work for a German company, and you haven't seen confusion
until you've seen a bunch of German engineers trying to say "68000"
in English, and it keeps coming out "86000", for exactly that reason.
It's an understandable mistake, and we rather got used to it after
a while.
--
Clayton E. Cramer
{pyramid,pixar,tekbspa}!optilink!cramer
Disclaimer? You must be kidding! No company would hold opinions like mine!
The Datapoint 6600 (definitely not a RISC -- think of an 8008 grown in
an entirely different fashion from the one Intel chose) had something
along these lines. It had a special register which it used to relocate
one page of memory cells which were then accessible with an 8-bit
offset. The instructions used to reference this page included:
DPL double paged load
DPS double pages store
DPLR double paged load reversed
DPSR double paged store reversed
Since I was writing SNA software for this beast, I frequently wished
that Datapoint's SIL had included a way to declare a field big-endian so
the compiler would know it could generate these instructions directly.
Ah, memories!
>Alan Lovejoy; alan@pdn; 813-530-2211; ATT-Paradyne: 8550 Ulmerton, Largo, FL.
>Disclaimer: I do not speak for ATT-Paradyne. They do not speak for me.
>___________ This Month's Slogan: Reach out and BUY someone (tm). ___________
>Motto: If nanomachines will be able to reconstruct you, YOU AREN'T DEAD YET.
--
Bob Teisberg @ Tandem Computers, Inc. | ...!rutgers!cs.utexas.edu!halley!rrt
14231 Tandem Blvd. |
Austin, Texas 78728 | (512) 244-8119
And again...
Sigh, RISC doesn't mean a small number of instructions. RISC means
simple instructions that, for one thing, all fit in the same uniform
pipeline. The number of instructions is a factor, there's no reason
to have more than are really necessary, but if the instructions are
of the right nature, simple to decode and uniform in semantics, you
can have a gazillion of them. Thus, the emphasis is on instruction
*encodings* and *semantics*, NOT THE QUANTITY.
So, adding the loads and stores in both little and big endian varieties
is not necessarily an anit-RISC idea. In fact, since they add to the
richness of the instruction set with out requiring new hardware (the
little/big endian hardware is already there in some machines!), they
are very good candidates for new instructions. What about encoding?
One extra bit will do, and this is not hard to decode, it should not
increase decode time.
What about use? Yes the code generators can be trained to deal with
this. But what about passing pointers to the operating system? I
guess we would need two versions of many (most?) system calls....
What else?
If, considering all the software implications (which I have not had
time to do), this works, I wish we had implemented it in the 29K.
This would require a large field in the coding of these instructions. Are
these bits available for use? Maybe this is a real good idea when we go
to 64 bit instructions. There are other ways to do this (ie. an "endian"
bit in a configuration register) but they aren't very RISCY.
Now that I think of it 64 bit instructions aren't very RISCY either.
--
Mark Clauss Hardware Engineering, NCR Wichita <Mark....@Wichita.NCR.COM>
NCR:654-8120 <{uunet}!ncrlnk!
(316)636-8120 <{ece-csc,hubcap,gould,rtech}!ncrcae!ncrwic!mark.clauss>
<{sdcsvax,cbatt,dcdwest,nosc.ARPA,ihnp4}!ncr-sd!
And was "corrected" by someone* thusly:
> And again...
> Sigh, RISC doesn't mean a small number of instructions. RISC means....
REDUCED Instruction Set Computer (i.e., a reduced number of instructions)
True, many RISC machine incorporate a number of other feature, which
because they have been used by a number of RISC machines, have come to
considered a part of RISC--but there is no reason that these feature
could not be part of a CISC machine (other than chip real-estate).
I think the real problem here is a poorly named acronym, but it
probably sounded "cute" (I, for one, am quite tired of papers
titled "A RISCy blah blah blah" etc).
Perhaps we could have a new buzzword contest, how about SOC (simple,
orthogonal computer)?
My $.02 (or less) worth,
John Hascall
ISU Comp Center
* My apologies for losing the attribution above, but rn barfed on the
overly long "References:" field and I had to do this by hand.
> > Sigh, RISC doesn't mean a small number of instructions. RISC means....
> REDUCED Instruction Set Computer (i.e., a reduced number of instructions)
Maybe reduced number of KINDS of instructions.
If you have an add instruction for a byte and another for a word ...
if you have an add instruction for signed and another for unsigned ...
do you think these are ciscy?
Having an add instruction for a little-endian word and another for a
big-endian word strikes me as a little silly (maybe a big silly :-),
but still riscy.
Incidentally, wouldn't little-beginnian and big-beginnian be more
accurate?
--
Norman Diamond, Sony Computer Science Lab (diamond%csl.s...@relay.cs.net)
The above opinions are my own. | Why are programmers criticized for
If they're also your opinions, | re-inventing the wheel, when car
you're infringing my copyright. | manufacturers are praised for it?
At the risk of starting more pointless RISC/CISC flameage, let me add
my 2 cents worth here: (I know many of you out there won't agree....:-)
The term RISC has been terribly misused, but my personal definition has
be widened to include machines that don't have a "small" number of
instructions.
E.g. the Multiflow Trace,( which I am using to compose this mail) has
a VERY large space of possible instructions. I would still term this
machine RISCy as each functional unit is controlled directly
by the instruction word, and is decoupled from instruction packets that
are wired to other functional units. Hence the original purpose of the
RISC idea is served.
Conventional RISCs are designed to approach 1 "op" per cycle. We designed
a multiple-functional unit machine that executes >1 ops / cycle.
The VLIW compiler is considerably "smarter" than a typical
RISC compiler, and the compiler <-> hardware fusion is even more important
than for a simple RISC, but the basic mind set is still the same.
Paul K. Rodman
rod...@mfci.uucp
| Perhaps I am missing something...but it appears to me that this
| increases the size of the instruction set (and therefore gate size),
| so we will have a more complex machine, less space for registers/other
| goodies, and what we gain seems to be rather a small win....
This is really not in keeping with (my) concept of RISC. Rather than
many new instructions, one instruction with an argument would allow
setting the endedness of the load/store operations.
SEL ; Set endian little
MOV R14,XX ; Start compare...
Please note: I am not saying that this is a good thing to do, just
that it would seem easier to have one instruct control the endedness.
Sure makes a neat way to flip the bits in a word, ie:
SEL ; little endian
MOV R14,XX ; Load xx into R14
SEB ; big endian
MOV XX,R14 ; store, reversed bits
Hummm... maybe this is a good idea, after all.
--
bill davidsen (we...@ge-crd.arpa)
{uunet | philabs}!steinmetz!crdos1!davidsen
"Stupidity, like virtue, is its own reward" -me