Endianness

Rudy Velthuis

unread,

Apr 21, 2017, 4:46:58 PM4/21/17

to

I can't really find where it is stated, but I have the impression that
Forth is, more or less, big-endian.

For instance, I have seen code like:

: U. 0 D. ;

This means that, at least for that Forth, the TOS is the high order
cell of a double word. Is this true for all Forth implementations, or
is this just implementation defined?

If this is the case, is this true for bytes in a cell too, or is that
implementation defined?

Is it also true, or implementation defined, that the data stack
generally grows downward?

--
Rudy Velthuis http://www.rvelthuis.de

"I know of no crime that has not been defended by the church,
in one form or other. The church is not a pioneer; it accepts a
new truth, last of all, and only when denial has become
useless."
-- Robert G. Ingersoll

Alex

unread,

Apr 21, 2017, 5:48:39 PM4/21/17

to

On 4/21/2017 21:46, Rudy Velthuis wrote:
> I can't really find where it is stated, but I have the impression that
> Forth is, more or less, big-endian.
>
> For instance, I have seen code like:
>
> : U. 0 D. ;
>
> This means that, at least for that Forth, the TOS is the high order
> cell of a double word. Is this true for all Forth implementations, or
> is this just implementation defined?

True for all.

>
> If this is the case, is this true for bytes in a cell too, or is that
> implementation defined?

Endianess is to do with byte order. Undefined.

>
> Is it also true, or implementation defined, that the data stack
> generally grows downward?
>

Undefined.

--
Alex

Elizabeth D. Rather

unread,

Apr 21, 2017, 6:16:18 PM4/21/17

to

On 4/21/17 11:48 AM, Alex wrote:
> On 4/21/2017 21:46, Rudy Velthuis wrote:
>> I can't really find where it is stated, but I have the impression that
>> Forth is, more or less, big-endian.
>>
>> For instance, I have seen code like:
>>
>> : U. 0 D. ;
>>
>> This means that, at least for that Forth, the TOS is the high order
>> cell of a double word. Is this true for all Forth implementations, or
>> is this just implementation defined?
>
> True for all.

Per Forth2012 (and Forth94) under Usage Requirements/Cell Pair Types:

3.1.4.1 Double-cell integers
On the stack, the cell containing the most significant part of a
double-cell integer shall be above the cell containing the least
significant part.

>> If this is the case, is this true for bytes in a cell too, or is that
>> implementation defined?
>
> Endianess is to do with byte order. Undefined.

Correct. Byte order is unspecified (generally the property of the
processor).

>> Is it also true, or implementation defined, that the data stack
>> generally grows downward?
>>
>
> Undefined.

Also correct, but downward is the most common implementation.
Traditionally, the dictionary grows up from low memory and the data
stack grows down from high memory, and the space in between is
undefined. The standards do not specify implementation, however.

Cheers,
Elizabeth

--
Elizabeth D. Rather
FORTH, Inc.
6080 Center Drive, Suite 600
Los Angeles, CA 90045
USA

HAA

unread,

Apr 21, 2017, 9:13:05 PM4/21/17

to

Rudy Velthuis wrote:
> ...

> If this is the case, is this true for bytes in a cell too, or is that
> implementation defined?

This is where things can get messy. Let's say you need to store a
character in a variable. Usual practice is to use VARIABLE for this
because there's nothing conveniently smaller. This leads to use
of @ ! in Forth code to handle characters which is an obvious type
mismatch. Worse still if you forget and use C@ because then
your code is no longer endian portable. Ideally Forth would come
with CVARIABLE as standard to encourage use of appropriately
typed operators but it never happened.

Elizabeth D. Rather

unread,

Apr 21, 2017, 10:11:44 PM4/21/17

to

Many (if not most) Forths include a CVARIABLE, certainly the ones that
deal with microcontrollers. There was a strong effort to include
CVARIABLE in Forth94, but it failed, mostly on the argument that the
need to save bytes was outdated (I disagreed then and still do). I
suggest these solutions:

1. If you're using a Forth that has CVARIABLE (or if you want to add
one) just use it and declare a dependence on it. Here you go:
: CVARIABLE ( -- ) CREATE 1 ALLOT
DOES> ( - a ) ;

2. Just use VARIABLE, and do C@ and C! and don't worry about which byte
the character is in. It will be consistently endian portable on any
particular device. After all, you never do a C! on one device hoping to
get that byte with a C@ on a different one.

You won't get in trouble unless you use a mixture of cell-wise and
byte-wise operators on the same memory location!

Rudy Velthuis

unread,

Apr 22, 2017, 12:42:26 AM4/22/17

to

Ok, thanks to the both of you. So the higer order cell must be above
the lower oder cell. Good to know or sure, although I already
implemented things that way.

--
Rudy Velthuis http://www.rvelthuis.de

"Programming today is a race between software engineers striving
to build bigger and better idiot-proof programs, and the
universe trying to build bigger and better idiots. So far, the
universe is winning."
-- Rick Cook

Rudy Velthuis

unread,

Apr 22, 2017, 12:45:48 AM4/22/17

to

HAA wrote:

> Rudy Velthuis wrote:
> > ...
> > If this is the case, is this true for bytes in a cell too, or is
> > that implementation defined?
>
> This is where things can get messy. Let's say you need to store a
> character in a variable. Usual practice is to use VARIABLE for this
> because there's nothing conveniently smaller. This leads to use
> of @ ! in Forth code to handle characters which is an obvious type
> mismatch. Worse still if you forget and use C@ because then
> your code is no longer endian portable.

I'd say it is, because C@ (and C!) deal with (how do they call them?
address units?) bytes, and in every language I know, no matter if it is
on a big- or little-endian platform, bytes are always stored at
consecutive addresses. It only gets hairy if you store a byte with ! or
fetch it with @. But you shouldn't do that anyway, in no language I
know.

--
Rudy Velthuis http://www.rvelthuis.de

"Nothing exists except atoms and empty space; everything else
is opinion."
-- Democritus

Elizabeth D. Rather

unread,

Apr 22, 2017, 3:46:02 AM4/22/17

to

It makes sense, because often you know you have a number that doesn't
require two cells, and you can simply drop the hi

HAA

unread,

Apr 22, 2017, 10:22:04 PM4/22/17

to

Rudy Velthuis wrote:
> HAA wrote:
>
> > Rudy Velthuis wrote:
> > > ...
> > > If this is the case, is this true for bytes in a cell too, or is
> > > that implementation defined?
> >
> > This is where things can get messy. Let's say you need to store a
> > character in a variable. Usual practice is to use VARIABLE for this
> > because there's nothing conveniently smaller. This leads to use
> > of @ ! in Forth code to handle characters which is an obvious type
> > mismatch. Worse still if you forget and use C@ because then
> > your code is no longer endian portable.
>
> I'd say it is, because C@ (and C!) deal with (how do they call them?
> address units?) bytes, and in every language I know, no matter if it is
> on a big- or little-endian platform, bytes are always stored at
> consecutive addresses. It only gets hairy if you store a byte with ! or
> fetch it with @. But you shouldn't do that anyway, in no language I
> know.

As far as Forth is concerned a character/byte on the stack *is* an
integer and using ! to store is quite legal assuming destination has
at least 1 cell of space.

Julian Fondren

unread,

Apr 23, 2017, 1:49:15 AM4/23/17

to

Why are you saying this like it's novel, or like it refutes what it
replies to? Fetching a cell from an address containing only a char is
(probably) a type error. In Forth, you manage your types yourself.
Your errors are still errors. "You shouldn't index out of array
bounds, in no language I know." also isn't rebutted by "Forth doesn't
check for that". If you have an off-by-one error in an loop pover
array contents you can still go on to display garbage to the user,
etc.

HAA

unread,

Apr 23, 2017, 4:58:41 AM4/23/17

to

This is what was said:

> > Rudy Velthuis wrote:
> > > ...

> > > It only gets hairy if you store a byte with ! or
> > > fetch it with @. But you shouldn't do that anyway, in no language I
> > > know.

And I was refuting it - politely.

Julian Fondren

unread,

Apr 23, 2017, 5:13:24 AM4/23/17

to

On Sunday, April 23, 2017 at 3:58:41 AM UTC-5, HAA wrote:
> This is what was said:
>
> > > Rudy Velthuis wrote:
> > > > ...
> > > > It only gets hairy if you store a byte with ! or
> > > > fetch it with @. But you shouldn't do that anyway, in no language I
> > > > know.
>
> And I was refuting it - politely.

OK. Then this is what I have to say about that:

> > Fetching a cell from an address containing only a char is
> > (probably) a type error. In Forth, you manage your types yourself.
> > Your errors are still errors. "You shouldn't index out of array
> > bounds, in no language I know." also isn't rebutted by "Forth doesn't

> > check for that". If you have an off-by-one error in an loop [over]

Andrew Haley

unread,

Apr 23, 2017, 5:45:55 AM4/23/17

to

HAA <som...@microsoft.com> wrote:
> Rudy Velthuis wrote:
>> ...
>> If this is the case, is this true for bytes in a cell too, or is that
>> implementation defined?
>
> This is where things can get messy. Let's say you need to store a
> character in a variable. Usual practice is to use VARIABLE for this
> because there's nothing conveniently smaller. This leads to use
> of @ ! in Forth code to handle characters which is an obvious type
> mismatch.

This doesn't really make any sense to me. You can always access
individual bytes of storage, so creating a VARIABLE and accessing it
with C@ and C! is just fine.

> Worse still if you forget and use C@ because then your code is no
> longer endian portable. Ideally Forth would come with CVARIABLE as
> standard to encourage use of appropriately typed operators but it
> never happened.

I guess you'd save a tiny amount of storage, which might be relevant
on a tiny machine. To get any benefit you might have to allocate
CVARIABLEs in a block, because cell-sized variables might have to be
aligned, as might dictionary headers.

Andrew.

hughag...@gmail.com

unread,

Apr 23, 2017, 4:41:27 PM4/23/17

to

On Friday, April 21, 2017 at 7:11:44 PM UTC-7, Elizabeth D. Rather wrote:
> On 4/21/17 3:12 PM, HAA wrote:
> > Ideally Forth would come
> > with CVARIABLE as standard to encourage use of appropriately
> > typed operators but it never happened.
>
> Many (if not most) Forths include a CVARIABLE, certainly the ones that
> deal with microcontrollers. There was a strong effort to include
> CVARIABLE in Forth94, but it failed, mostly on the argument that the
> need to save bytes was outdated (I disagreed then and still do). I
> suggest these solutions:

Was this really the kind of thing that the ANS-Forth committee geniuses were debating? This is so trivial as to be meaningless.

> Here you go:
> : CVARIABLE ( -- ) CREATE 1 ALLOT
> DOES> ( - a ) ;

You are so bad at Forth programming that you screwed up even this super-trivial Forth code!

Here you go:

: cvariable create 0 c, ;

The 0 C, is necessary so the program does the same thing every time if the programmer forgets to store a value in the variable before fetching from it (a common bug). With your code, you are initializing the variable to whatever happened to be in memory. The program may work after some compiles, and may not work after other compiles --- this is difficult bug to track down.

Note that this is the same design flaw that Anton Ertl made in his definition of {: --- he had the locals after the | being initialized to an unknown value --- the correct way to do this (what I did in the novice-package) is initialize them to zero.

HAA

unread,

Apr 23, 2017, 8:10:28 PM4/23/17

to

Andrew Haley wrote:
> HAA <som...@microsoft.com> wrote:
> > Rudy Velthuis wrote:
> >> ...
> >> If this is the case, is this true for bytes in a cell too, or is that
> >> implementation defined?
> >
> > This is where things can get messy. Let's say you need to store a
> > character in a variable. Usual practice is to use VARIABLE for this
> > because there's nothing conveniently smaller. This leads to use
> > of @ ! in Forth code to handle characters which is an obvious type
> > mismatch.
>
> This doesn't really make any sense to me. You can always access
> individual bytes of storage, so creating a VARIABLE and accessing it
> with C@ and C! is just fine.

It is fine. But when a programmer unfamiliar with the code notes
storage labelled VARIABLE, he may well conclude @ and ! was
appropriate and mistakenly apply it. Endianness may bring forth
the error, or hide it. Finding the bug could prove interesting.

Elizabeth D. Rather

unread,

Apr 23, 2017, 8:40:16 PM4/23/17

to

So long as you consistently use *either* c@/c! *or* @/! you're fine. You
only get in trouble if you've done a ! and then do a c@ and wonder what
you've got.

Cheers,
Elizabeth

HAA

unread,

Apr 23, 2017, 11:41:58 PM4/23/17

to

Too many assumptions. You use C@ C! on a VARIABLE. I come along
later to do a mod and see VARIABLE. I naturally assume @ ! is required.
Everything still works. Nobody knows there is now a ticking time bomb
in the code.

Elizabeth D. Rather

unread,

Apr 23, 2017, 11:58:53 PM4/23/17

to

From a team POV, you're absolutely right. I was coming from a purely
technical POV.

In today's world, it's necessary to have C@ and C! for writing to 8-bit
ports or putting a character into a string. Maybe CVARIABLE is useful
for notifying someone reading the code about the nature of the item, but
except in seriously resource-constrained microcontrollers it won't save
space, which I assume is the rationale for dropping it. I can still
remember the fight in the early 90's over declaring BASE to be a cell wide!

Julian Fondren

unread,

Apr 24, 2017, 12:18:24 AM4/24/17

to

On Sunday, April 23, 2017 at 10:41:58 PM UTC-5, HAA wrote:
> You use C@ C! on a VARIABLE. I come along
> later to do a mod and see VARIABLE. I naturally assume @ ! is required.
> Everything still works. Nobody knows there is now a ticking time bomb
> in the code.

Q. golly gosh, how could such a bomb be avoided?

A1.

CREATE var 0 c,

A2.

1 BUFFER: var

A3.

class stuff
cvariable a
cvariable b \ actually one byte away from a
cvariable c
end-class

A4.

VARIABLE var \ only accessed with C@ and C!

A5.

VARIABLE var
: reset ( -- ) 0 var c! ;
: incr ( -- ) 1 var +! ;
: odd? ( -- f ) var c@ 1 and ;

( never again refer to var )

A6.

PACKAGE variables-what-not-to-be-touched

PRIVATE

VARIABLE var

PUBLIC

: reset ... ;
...

END-PACKAGE

A7.

int main() {
char var;
...
}

Julian Fondren

unread,

Apr 24, 2017, 12:24:19 AM4/24/17

to

On Sunday, April 23, 2017 at 11:18:24 PM UTC-5, Julian Fondren wrote:
> VARIABLE var
> : reset ( -- ) 0 var c! ;
> : incr ( -- ) 1 var +! ;

I meant to do that.

This code has an environmental dependency on little-endianness.

The application involves incrementing a bunch of bits until they
finally disturb the 8 bits accessed by ODD?

It's a niche app. You probably haven't heard of it.

hughag...@gmail.com

unread,

Apr 24, 2017, 12:42:19 AM4/24/17

to

On Sunday, April 23, 2017 at 8:58:53 PM UTC-7, Elizabeth D. Rather wrote:
> On 4/23/17 5:41 PM, HAA wrote:
> > Elizabeth D. Rather wrote:
> >> So long as you consistently use *either* c@/c! *or* @/! you're fine. You
> >> only get in trouble if you've done a ! and then do a c@ and wonder what
> >> you've got.
> >

> > Too many assumptions. You use C@ C! on a VARIABLE. I come along
> > later to do a mod and see VARIABLE. I naturally assume @ ! is required.
> > Everything still works. Nobody knows there is now a ticking time bomb
> > in the code.
>
> From a team POV, you're absolutely right. I was coming from a purely
> technical POV.

You're coming from a "purely technical POV"??? LOL

> In today's world, it's necessary to have C@ and C! for writing to 8-bit
> ports or putting a character into a string.

Actually, in "today's word" micro-controllers don't have 8-bit ports. The MiniForth that came out in 1994/1995 didn't have 8-bit ports --- IIRC, it didn't have C@ and C! either (I may have written them using some logic to only access the lower byte; I don't remember).

Also, in Straight Forth it won't be necessary to have C@ and C! for accessing characters in strings. You give a quotation to a HOF and the HOF calls the quotation for every char, giving the char to the quotation as a parameter. So, you don't need C@. You don't need C! either because you don't write into existing strings --- you build a new string.

I will get rid of C@ and C! altogether. As HAA points out, it is error-prone. I will just have cells but not have chars.

I may have 8-bit chars in strings, but that is an internal representation that won't be exposed to the user --- the user doesn't know if the strings are of 8-bit or 32-bit or 64-bit elements --- different implementations of Straight Forth may have different representations of strings internally (ASCII, Extended-ASCII, UTF-8, UTF-16, UTF-32, etc.).

Julian Fondren

unread,

Apr 24, 2017, 12:58:22 AM4/24/17

to

On Sunday, April 23, 2017 at 11:42:19 PM UTC-5, hughag...@gmail.com wrote:
> I may have 8-bit chars in strings, but that is an internal representation that won't be exposed to the user --- the user doesn't know

How incredibly Forth-ish.

Anton Ertl

unread,

Apr 24, 2017, 12:58:26 PM4/24/17

to

"Rudy Velthuis" <newsg...@rvelthuis.de> writes:
>I'd say it is, because C@ (and C!) deal with (how do they call them?
>address units?) bytes,

They deal with chars. But "bytes" is good enough for a
general discussion.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2016: http://www.euroforth.org/ef16/

Anton Ertl

unread,

Apr 24, 2017, 1:04:54 PM4/24/17

to

"Rudy Velthuis" <newsg...@rvelthuis.de> writes:
>I can't really find where it is stated, but I have the impression that
>Forth is, more or less, big-endian.
>
>For instance, I have seen code like:
>
>: U. 0 D. ;
>
>This means that, at least for that Forth, the TOS is the high order
>cell of a double word. Is this true for all Forth implementations, or
>is this just implementation defined?

It is true of standard implementations.

>If this is the case, is this true for bytes in a cell too, or is that
>implementation defined?

Implementation defined.

>Is it also true, or implementation defined, that the data stack
>generally grows downward?

Implementation defined.

Also of relevance: 2! ( x1 x2 a-addr -- ) stores x2 at a-addr and x1
at the next cell, so if you use it to store a double, you have stored
the double in big-endian order indeed; 2@ reads the items in the
corresponding order.

HAA

unread,

Apr 25, 2017, 7:21:58 PM4/25/17

to

Julian Fondren wrote:
> On Sunday, April 23, 2017 at 10:41:58 PM UTC-5, HAA wrote:
> > You use C@ C! on a VARIABLE. I come along
> > later to do a mod and see VARIABLE. I naturally assume @ ! is required.
> > Everything still works. Nobody knows there is now a ticking time bomb
> > in the code.
>
> Q. golly gosh, how could such a bomb be avoided?
>
> A1.
>

> A2.
>
> A3.
>
> A4.

> A5.
>
> A6.
>
> A7.
>

So you admit there's an issue resulting from a variety of responses. Good.

Julian Fondren

unread,

Apr 25, 2017, 7:56:03 PM4/25/17

to

On Tuesday, April 25, 2017 at 6:21:58 PM UTC-5, HAA wrote:
> So you admit there's an issue resulting from a variety of responses. Good.

Ah, yes.

I didn't want to admit this, but you've cleverly sussed it out.

Well, I'll go on and state it clearly then.

Mother, please forgive me:

IT IS POSSIBLE

TO HAVE BUGS

IN SOFTWARE

Rudy Velthuis

unread,

Apr 28, 2017, 11:37:37 PM4/28/17

to

Julian Fondren wrote:

> On Saturday, April 22, 2017 at 9:22:04 PM UTC-5, HAA wrote:
> > Rudy Velthuis wrote:
> > > HAA wrote:
> > >
> > > > Rudy Velthuis wrote:
> > > > > ...
> > > > > If this is the case, is this true for bytes in a cell too, or
> > > > > is that implementation defined?
> > > >
> > > > This is where things can get messy. Let's say you need to
> > > > store a character in a variable. Usual practice is to use
> > > > VARIABLE for this because there's nothing conveniently smaller.
> > > > This leads to use of @ ! in Forth code to handle characters
> > > > which is an obvious type mismatch. Worse still if you forget
> > > > and use C@ because then your code is no longer endian portable.
> > >
> > > I'd say it is, because C@ (and C!) deal with (how do they call
> > > them? address units?) bytes, and in every language I know, no
> > > matter if it is on a big- or little-endian platform, bytes are
> > > always stored at consecutive addresses. It only gets hairy if you
> > > store a byte with ! or fetch it with @. But you shouldn't do that
> > > anyway, in no language I know.
> >

> > As far as Forth is concerned a character/byte on the stack is an

> > integer and using ! to store is quite legal assuming destination has
> > at least 1 cell of space.
>
> Why are you saying this like it's novel, or like it refutes what it
> replies to? Fetching a cell from an address containing only a char is
> (probably) a type error.

That is what I meant.

--
Rudy Velthuis http://www.rvelthuis.de

"Properly read, the Bible is the most potent force for atheism
ever conceived." -- Isaac Asimov"

Bernd Paysan

unread,

May 7, 2017, 6:55:53 PM5/7/17

to

Am Fri, 21 Apr 2017 16:11:37 -1000 schrieb Elizabeth D. Rather:
> There was a strong effort to include CVARIABLE in Forth94, but it
> failed, mostly on the argument that the need to save bytes was outdated
> (I disagreed then and still do). I suggest these solutions:

In a hosted system, the alignment restrictions for headers make it
difficult to save any space with CVARIABLE. Neither Gforth nor VFX has
CVARIABLE, and I'm surprised that SwiftForth hasn't, either.

Only on embedded systems with headers in ROM (or on the development
system) and the variable space in the precious tiny RAM you really
benefit from CVARIABLE.

--
Bernd Paysan
"If you want it done right, you have to do it yourself"
net2o ID: kQusJzA;7*?t=uy@X}1GWr!+0qqp_Cn176t4(dQ*
http://bernd-paysan.de/