20100301 Updated after discussion on groups and lists, Changed word names to those use by Mitch Bradly as they have more precedence than those originally proposed. 20100225 Corrections to section numbering, Corrections to reference implementation, Added Josh Grams unit tests, Moved L words to EXT. 20100224 Revised reference implementation 20090923 Restored B@ and B!. 20090921 Forth200x review release. 20090902 Forth200x review update. 20090829 First writing according to the Proposals Process as described in the Draft 200x Standard.
Problem =======
ANS Standard Forth lacks a way of accessing memory elements of a fixed width in a portable way. This is useful for sharing data between applications in the same or different machines.
We assume that the data is byte-oriented.
Solution ========
A new set of words is proposed, the MEMORY-ACCESS wordset, so that the desired memory size can be selected.
Typical use ===========
CREATE DATA 1 L, 2 W, 3 B,
DATA DUP L@ . 4 BYTES + DUP W@ . 2 BYTES + B@ .
Remarks =======
1. These words form the new Memory-Access wordset.
2. The names follow the notation:
<endian> - <size> <action> <space>
where:
endian: BE for big-endian LE for little-endian size: B for 8 bit byte W for 16 bit word L for 32 bit long-word X for 64 bit extended-word action: !, @ or , with the usual meaning
Systems with multiple address spaces, e.g. Harvard architectures and cross-compilers often require an address space indicator:
space: C code space D data space R register space T target address space in cross-compiler J JTAG or debug link
Note that all operations are unsigned. Should we allow for signed operations?
3. The term 'native order' as been chosen as 'local order' might be confused with some order related to local variables. The word 'host' is also used in the Berkeley Sockets API.
4. When operating on data larger than an address unit, memory operations shall be capable of unaligned operation, e.g. when fetching a 32 bit item from a non 32-bit aligned address, the operation will succeed.
5. Systems shall not implement words requiring or returning items larger than the cell size. It only causes portability issues rather than solving them.
The rationale for these remarks is that data transfer standards exist that are big-endian, e.g. TCP/IP, and little-endian, e.g. USB. This forces us to make a clear distinction between big- endian, little-endian and native data.
For cell addressed machines which use an address unit larger than 8 bits, it is assumed that the upper part of a cell is simply ignored. This proposal makes no attempt to deal with packing of bytes for memory efficiency. Providing that other operations such as TYPE produce the expected result, the implementation may deal with packing of data as it sees fit. However, the model of one byte per cell will always work with least implementation complexity.
Proposal ========
18. The optional Memory-Access word set
18.1 Introduction -----------------
All memory access operations in this wordset are defined on one or more successive address-aligned bytes. In MEMORY-ACCESS, wherever it say "store" and "fetch", the following applies:
(1) It is assumed that bytes do not require address alignment.
(2) For address units larger than 8 bits, each address unit contains one byte stored in the eight least significant bits. Additional bits shall be ignored by the fetch operations, and should be set to zero by the store operations.
(3) For address units smaller than 8 bits, end orientation of a byte stored in successive address units is implementation dependent.
The words in this wordlist generally take for form:
<endian> - <size> <action> <space>
where:
endian: BE for big-endian LE for little-endian size: B for 8 bit byte W for 16 bit word L for 32 bit long-word X for 64 bit extended-word action: !, @ or , with the usual meaning
Systems with multiple address spaces, e.g. Harvard architectures and cross-compilers often require an address space indicator:
space: C code space D data space R register space T target address space in cross-compiler J JTAG or debug link
18.2 Additional terms ---------------------
big-endian: The most significant byte of a multi-byte value is stored at the lowest memory address. Also known as network order.
little-endian: The lest significant byte of a multi-byte value is stored at the lowest memory address.
native-order: The byte ordering for multi-byte values which suites the system architecture. (See big-endian and little-endian.)
String Value data type Constant? Meaning MEMORY-ACCESS flag no Memory-access word set present MEMORY-ACCESS-EXT flag no Memory-access extensions word set present
18.5 Compliance and labeling ----------------------------
18.5.1 ANS Forth systems ------------------------ The phrase "Providing the Memory Access word set" shall be appended to the label of any Standard System that provides all of the Memory Access word set.
The phrase "Providing name(s) from the Memory Access Extensions word set" shall be appended to the label of any Standard System that provides portions of the Memory Access Extensions word set.
The phrase "Providing the Memory Access Extensions word set" shall be appended to the label of any Standard System that provides all of the Memory-Access and Memory Access Extensions word sets.
18.5.2 ANS Forth programs ------------------------- The phrase "Requiring the Memory Access word set" shall be appended to the label of Standard Programs that require the system to provide the Memory Access word set.
The phrase "Requiring name(s) from the Memory Access Extensions word set" shall be appended to the label of Standard Programs that require the system to provide portions of the Memory Access Extensions word set.
The phrase "Requiring the Facility Extensions word set" shall be appended to the label of Standard Programs that require the system to provide all of the Memory Access and Memory Access Extensions word sets.
18.6 Glossary -------------
18.6.1 Memory-Access words --------------------------
18.6.1.aaaa B! "b-store" MEMORY-ACCESS ( x addr -- )
Store the 8 LSBs of x at addr. In system where the address unit is larger than 8 bits, the upper bits are set to zero.
18.6.1.aaab B@ "b-fetch" MEMORY-ACCESS ( addr -- x )
Fetch the 8 LSBs of x from addr. If the cell size is greater than 8 bits, the result is zero-extended.
18.6.1.aaac BE-W! "b-e-w-store" MEMORY-ACCESS ( x addr -- )
Store the 16 LSBs of x at addr in big-endian format irrespective of alignment. In systems where the address unit is larger than 16 bits, the upper bits are set to zero.
18.6.1.aaad BE-W, "b-e-w-comma" MEMORY-ACCESS ( x -- )
Reserve 16 bits of data space and store the 16 LSBs of x in them in big-endian format irrespective of alignment.
18.6.1.aaae BE-W@ "b-e-w-fetch" MEMORY-ACCESS ( addr -- x )
Fetch the 16 LSBs of x from addr in big-endian format irrespective of alignment. If the cell size is greater than 16 bits, the result is zero-extended.
18.6.1.aaag LE-W! "l-e-w-store" MEMORY-ACCESS ( x addr -- )
Store the 16 LSBs of x at addr in litle-endian format irrespective of alignment. In systems where the address unit is larger than 16 bits, the upper bits are set to zero.
18.6.1.aaah LE-W, "l-e-w-comma" MEMORY-ACCESS ( x -- )
Reserve 16 bits of data space and store the 16 LSBs of x in them in little-endian format irrespective of alignment.
18.6.1.aaai LE-W@ "l-e-w-fetch" MEMORY-ACCESS ( x addr -- )
Fetch the 16 LSBs of x from addr in little-endian format irrespective of alignment. If the cell size is greater than 16 bits, the result is zero-extended.
18.6.1.aaaj W! "w-store" MEMORY-ACCESS ( x addr -- )
Store the 16 LSBs of x at addr in native order irrespective of alignment. In systems where the address unit is larger than 16 bits, the upper bits are set to zero.
18.6.1.aaak W, "w-comma" MEMORY-ACCESS ( x -- )
Reserve 16 bits of data space and store the 16 LSBs of x in them in native order irrespective of alignment.
18.6.1.aaal W@ "w-fetch" MEMORY-ACCESS ( addr -- x )
Fetch the 16 LSBs of x from addr in native order irrespective of alignment. If the cell size is greater than 16 bits, the result is zero-extended.
On Fri, 12 Mar 2010 00:57:36 -0000, Peter Knaggs <p...@bcs.org.uk> wrote:
> 2. The names follow the notation:
> <endian> - <size> <action> <space>
> where:
> endian: BE for big-endian > LE for little-endian > size: B for 8 bit byte > W for 16 bit word > L for 32 bit long-word > X for 64 bit extended-word > action: !, @ or , with the usual meaning
> Systems with multiple address spaces, e.g. Harvard architectures > and cross-compilers often require an address space indicator:
> space: C code space > D data space > R register space > T target address space in cross-compiler > J JTAG or debug link
> Note that all operations are unsigned. > Should we allow for signed operations?
Although there was reference to signed numbers in the first version, this has been removed from revised version as there are no signed words in the proposal.
As Bruce McFarling has identified this would not be so easy. We could define a set of sign extension words (B>S, W>S and L>S) but they would only extend the hosts native sign processing. As most of the time people will be wanting to move to/from the native sign and two's complement then maybe B>S and friends should be defined to convert from 2c to native.
I don't like this idea, and would rather two simple words to convert a native format cell to/from a 2c value. On the other hand, how often are negative values represented in this type of data.
> Store the 16 LSBs of x at addr in native order irrespective of > alignment. In systems where the address unit is larger than 16 bits, > the upper bits are set to zero.
> 18.6.1.aaak W, "w-comma" MEMORY-ACCESS > ( x -- )
> Reserve 16 bits of data space and store the 16 LSBs of x in them in > native order irrespective of alignment.
> Fetch the 16 LSBs of x from addr in native order irrespective of > alignment. If the cell size is greater than 16 bits, the result is > zero-extended.
> Store the 32 LSBs of x at addr in native order irrespective of > alignment. In systems where the address unit is larger than 32 bits, > the upper bits are set to zero.
> Fetch the 32 LSBs of x from addr in native order irrespective of > alignment. If the cell size is greater than 32 bits, the result is > zero-extended.
WALIGN, WALIGNED, LALIGN and LALIGNED are defined yet the are not actually required as the only words that would use them (W! W, W@ L! L, and L@) include the phrase "irrespective of alignment" in their definition.
Should we thus: (a) remove the xALIGN xALIGNED words or (b) remove the "irrespective of alignment" in the native order access words.
I prefer option (b) as that would allow a system to use the optimum memory access method.
On Fri, 12 Mar 2010 00:57:36 -0000, Peter Knaggs <p...@bcs.org.uk> wrote:
> 18.6.1.aaad BE-W, "b-e-w-comma" MEMORY-ACCESS > ( x -- )
> Reserve 16 bits of data space and store the 16 LSBs of x in them in > big-endian format irrespective of alignment.
> 18.6.1.aaah LE-W, "l-e-w-comma" MEMORY-ACCESS > ( x -- )
> Reserve 16 bits of data space and store the 16 LSBs of x in them in > little-endian format irrespective of alignment. > 18.6.2.aaab BE-L, "b-e-l-comma" MEMORY-ACCESS EXT > ( x -- )
> Reserve 32 bits of data space and store the 32 LSBs of x in them in > big-endian format irrespective of alignment.
> Reserve 32 bits of data space and store the 32 LSBs of x in them in > little-endian format irrespective of alignment.
Are these words that commonly used, or are they really only included just to complete the set? I can not see any occasion when I would actually use them.
On Mar 11, 8:28 pm, "Peter Knaggs" <p...@bcs.org.uk> wrote:
> Are these words that commonly used, or are they really only included > just to complete the set? I can not see any occasion when I would > actually use them.
Even if creating, eg, a pre-defined data packet ... BYTES ALLOT and MOVE suffice *provided* there are no alignment restrictions.
Indeed: ( int16 ) HERE 2 BYTES ALLOT W!
... also works in a pinch in the rare occasion that it might come up.
On Mar 11, 8:23 pm, "Peter Knaggs" <p...@bcs.org.uk> wrote:
> Should we thus: > (a) remove the xALIGN xALIGNED words or > (b) remove the "irrespective of alignment" in the native order access > words. > I prefer option (b) as that would allow a system to use the optimum > memory access method.
But this wordset is most importantly for externally defined binary data, and a sequence of externally defined *might not be defined in such a way that it will be aligned for maximum local efficiency*. Indeed, some data layouts will have been arrived in contexts and at a time where hardware that benefited from alignment above the 16bit level was not a concern.
Under (a), any sequence of 8bit bytes can be interpreted as any sequence of defined-width integers, as originally specified, irrespective of the native alignment restrictions.
If this is inefficient for tight inner loops, then given the memory access words proposed here, and knowledge of the local implementation cell size, its straightforward to copy the data into locally-aligned data structures for processing and then use the memory access words to copy them back. And, of course, that structure can be designed with the specific data specification at hand, rather than trying to solve the problem in general, which sometimes ends up swatting a fly with a sledgehammer.
So I'd prefer to keep the cans of worms closed, and opt for (a). This also has the advantage of further reducing the namespace hit of the proposal.
On Fri, 12 Mar 2010 01:28:56 -0000, "Peter Knaggs" <p...@bcs.org.uk> wrote:
>On Fri, 12 Mar 2010 00:57:36 -0000, Peter Knaggs <p...@bcs.org.uk> wrote:
>> 18.6.1.aaad BE-W, "b-e-w-comma" MEMORY-ACCESS >> 18.6.1.aaah LE-W, "l-e-w-comma" MEMORY-ACCESS >> 18.6.2.aaab BE-L, "b-e-l-comma" MEMORY-ACCESS EXT >> 18.6.2.aaaj LE-L, "l-e-l-comma" MEMORY-ACCESS EXT >Are these words that commonly used, or are they really only included >just to complete the set? I can not see any occasion when I would >actually use them.
Try writing a USB descriptor or TCP/IP template without them.
Stephen
-- Stephen Pelc, stephen...@mpeforth.com MicroProcessor Engineering Ltd - More Real, Less Time 133 Hill Lane, Southampton SO15 5AF, England tel: +44 (0)23 8063 1441, fax: +44 (0)23 8033 9691 web: http://www.mpeforth.com - free VFX Forth downloads
"Peter Knaggs" <p...@bcs.org.uk> writes: >As Bruce McFarling has identified this would not be so easy. We could >define a set of sign extension words (B>S, W>S and L>S) but they would >only extend the hosts native sign processing. As most of the time people >will be wanting to move to/from the native sign and two's complement then >maybe B>S and friends should be defined to convert from 2c to native.
I think that it is a bad idea to fetch some funny format into a stack cell, if the only thing one can do with it is to convert it to a proper format later. It's better to do the conversion on fetching (or, in the other direction, on storing). Yes, if we want big-endian, little-endian, and native byte order, then this means we have to introduce more words. I think that this is a small price to pay for avoiding the confusion that comes from introducing several new on-stack number formats.
As an additional benefit, it is also easier to implement such words efficiently, because most architectures have sign-extending fetching words (e.g., movsx on IA-32/AMD64).
>I don't like this idea, and would rather two simple words to convert >a native format cell to/from a 2c value.
It's unclear to me what you propose here. Do you want sign-extending 2s-complement loading words, plus on-stack conversion words from converting from 2s-complement to the native format. That would combine the disadvantages of the on-stack conversion with those of conversion on fetch/store; the only saving grace is, that because all implementations will have 2s-complement as native format, the additional words will be noops, so, like CHARS, they will not be used.
>On the other hand, how often >are negative values represented in this type of data.
What is "this type of data"? Stuff coming from files or over the net? That probably depends on the application. When writing a disassembler, I need sign-extension pretty often.
> 20100301 Updated after discussion on groups and lists, > Changed word names to those use by Mitch Bradly as they > have more precedence than those originally proposed. > 2. The names follow the notation:
> <endian> - <size> <action> <space>
So is OpenFirmware going to change, then? Last I knew they used the <size><action><endian> form (e.g. W@BE).
> WALIGN, WALIGNED, LALIGN and LALIGNED are defined yet the are not > actually required as the only words that would use them (W! W, W@ L! > L, and L@) include the phrase "irrespective of alignment" in their > definition.
> Should we thus: > (a) remove the xALIGN xALIGNED words or > (b) remove the "irrespective of alignment" in the native order access > words.
> I prefer option (b) as that would allow a system to use the optimum > memory access method.
Ditto. There are certainly protocols where you need misaligned access, but the obvious definitions using BYTES, B@, and B! should (I think) work everywhere this proposal is implemented, so I don't see that as a big problem.
On Fri, 12 Mar 2010 13:35:56 GMT, Josh Grams <j...@qualdan.com> wrote: >Ditto. There are certainly protocols where you need misaligned access, >but the obvious definitions using BYTES, B@, and B! should (I think) >work everywhere this proposal is implemented, so I don't see that as a >big problem.
Yes it will work but application programmers will hate it. Having unaligned operation solves all sorts of problems when dealing with some of the really horrid hardware and buffering schemes in existence.
There are some parts of TCP that are hard to do if BE-L@/! do not work at least on 16 bit aligned data rather than 32 bit aligned. For little-endian there's always USB.
Stephen
-- Stephen Pelc, stephen...@mpeforth.com MicroProcessor Engineering Ltd - More Real, Less Time 133 Hill Lane, Southampton SO15 5AF, England tel: +44 (0)23 8063 1441, fax: +44 (0)23 8033 9691 web: http://www.mpeforth.com - free VFX Forth downloads
On Mar 12, 4:53 am, stephen...@mpeforth.com (Stephen Pelc) wrote:
> Try writing a USB descriptor or TCP/IP template without them.
Aha. Without the commas present, I'd likely write it with WFIELD: etc. for a structure and use the structure items and LE-W@ BE-W@ etc. to write the template.
But I can see how one could usefully clone a pre-defined template and only fill in the fields to be modified with the structure items.
On Mar 11, 8:23 pm, "Peter Knaggs" <p...@bcs.org.uk> wrote:
> WALIGN, WALIGNED, LALIGN and LALIGNED are defined yet the are not > actually required as the only words that would use them (W! W, W@ L! > L, and L@) include the phrase "irrespective of alignment" in their > definition.
Mitch Bradley in the 200x email list has pointed out information that means this premise is false.
That is, the premise is that the sole producer/consumer of the data is implementation memory access. But for, eg, network data packets, another producer/consumer of the data will be DMA hardware, which may require alignment of their base on appropriate boundaries.
> Should we thus: > (a) remove the xALIGN xALIGNED words or
No, since they are required to set data up for some operations involving a primary target of this wordset.
> (b) remove the "irrespective of alignment" in the native order access > words.
No, since working with data organized according to some arbitrary externally defined specification does not permit assuming that individual items in a block of data will be aligned for greatest access speed on the hardware of the system.
> I prefer option (b) as that would allow a system to use the optimum > memory access method.
A two lane each way wide open expressway has an overpass that runs over a dirt and gravel road. Which right of way is optimal? Answer: the one that connects to your destination.
The optimum memory access method is the one that allows *the task to be performed*. @ and ! and kindred are the "implementation can aligned for best speed if desired" words.
W@ and W! and kindred are the "greatest flexibility in coping with arbitrary data layouts" words. Imposing alignment restrictions *on* them breaks their functionality.
> > 20100301 Updated after discussion on groups and lists, > > Changed word names to those use by Mitch Bradly as they > > have more precedence than those originally proposed. > > 2. The names follow the notation:
> > <endian> - <size> <action> <space>
> So is OpenFirmware going to change, then? Last I knew they used the > <size><action><endian> form (e.g. W@BE).
OpenFirmware doesn't have them, though an OpenFirmware implementation may of course provide them. From the email list, Mitch Bradley's implementation provides the above:
"My OFW implementation includes the following words in the global Forth dictionary: le-l@, le-l!, le-w@, le-w!, be-l@, be-l!, be-w@, and be- w!."
> I think that it is a bad idea to fetch some funny format into a stack > cell, if the only thing one can do with it is to convert it to a > proper format later. It's better to do the conversion on fetching (or, > in the other direction, on storing). Yes, if we want big-endian, > little-endian, and native byte order, then this means we have to > introduce more words. I think that this is a small price to pay for > avoiding the confusion that comes from introducing several new > on-stack number formats.
If it is unsigned it will be in the correct format, the proposed words would convert from the unsigned format to the native signed format.
> As an additional benefit, it is also easier to implement such words > efficiently, because most architectures have sign-extending fetching > words (e.g., movsx on IA-32/AMD64).
True.
>> I don't like this idea, and would rather two simple words to convert >> a native format cell to/from a 2c value.
> It's unclear to me what you propose here.
I am not actually making a proposal at the moment. I wanted to provoke a discussion on this to see how the group felt on the matter.
> On Fri, 12 Mar 2010 00:57:36 -0000, Peter Knaggs <p...@bcs.org.uk> wrote:
> > 2. The names follow the notation:
> > <endian> - <size> <action> <space>
> > where:
> > endian: BE for big-endian > > LE for little-endian > > size: B for 8 bit byte > > W for 16 bit word > > L for 32 bit long-word > > X for 64 bit extended-word > > action: !, @ or , with the usual meaning
> > Systems with multiple address spaces, e.g. Harvard architectures > > and cross-compilers often require an address space indicator:
> > space: C code space > > D data space > > R register space > > T target address space in cross-compiler > > J JTAG or debug link
> > Note that all operations are unsigned. > > Should we allow for signed operations?
> Although there was reference to signed numbers in the first version, > this has been removed from revised version as there are no signed words > in the proposal.
> As Bruce McFarling has identified this would not be so easy. We could > define a set of sign extension words (B>S, W>S and L>S) but they would > only extend the hosts native sign processing.
Yes, except since "S" is overloaded as an abbreviation for both "sign" and "string" I prefer:
What they would do is to sign extend from the external data size to the native cell (and conversely). The fact that the specification needs not be concerned with native cell size or numeric representation is just a side effect of providing the main functionality as compactly as possible.
The fact that a exotic numeric representation of non-exotic cell size would get standard names to use for what would likely be already existing two's>native and native>two's conversions is also a side- effect.
The extension of that proposal might also be used to reserve the meaning of the signed equivalents of the Memory Access words.
> As most of the time people will be wanting to move to/from the native sign and two's complement then maybe B>S and friends should be defined to convert from 2c to native.
Yes, as externally defined binary data formats relying on numeric representations other than two's complement are nearly as rare as Forth systems that do, standardizing conversions from external sign- complement, one's complement etc. of defined size to the native cell seems to not be worthwhile.
> I don't like this idea, and would rather two simple words to convert > a native format cell to/from a 2c value. On the other hand, how often > are negative values represented in this type of data.
That's what these do. Unlike unsigned, there is no such thing as "a" 2c signed value ... you need to know width to know when you wrap around from Max-Positive to Max-Negative.
"Peter Knaggs" <p...@bcs.org.uk> writes: >On Fri, 12 Mar 2010 10:49:03 -0000, Anton Ertl ><an...@mips.complang.tuwien.ac.at> wrote:
>> I think that it is a bad idea to fetch some funny format into a stack >> cell, if the only thing one can do with it is to convert it to a >> proper format later. It's better to do the conversion on fetching (or, >> in the other direction, on storing). Yes, if we want big-endian, >> little-endian, and native byte order, then this means we have to >> introduce more words. I think that this is a small price to pay for >> avoiding the confusion that comes from introducing several new >> on-stack number formats.
>If it is unsigned it will be in the correct format, the proposed words >would convert from the unsigned format to the native signed format.
These are signed non-native numbers that have been zero-extended when fetching them onto the stack. They have no meaning (at least as far as the application is concerned) as unsigned numbers, not more than e.g., their meaning as bit patterns, floating-point numbers or addresses. Given that, I think your sentence makes no sense, especially not the bit about the "correct format".
>I am not actually making a proposal at the moment. I wanted to provoke >a discussion on this to see how the group felt on the matter.
My feeling is that we should have signed and unsigned variants of these words, and that the syntax should be as usual: no prefix for signed, U for unsigned.
Gforth does get it not quite right: It has SW@ UW@ W! SL@ UL@ L!, and an undocumented W@ and L@ (for backwards compatibility) that are aliases for UW@ and UL@.
Note that if we assume 2s-complement arithmetic (which is sensible IMO) and if a cell is at least as wide as the target memory location, we don't need to differentiate between signed and unsigned stores, that's why there is only W! and L!.
"Peter Knaggs" <p...@bcs.org.uk> writes: >2. The names follow the notation:
> <endian> - <size> <action> <space>
<space> is often used for the BL character in syntax descriptions. Better call it <address space>.
> where:
> endian: BE for big-endian > LE for little-endian
The noun for "endian" is "byte order". Are you missing a line for "native"?
> Systems with multiple address spaces, e.g. Harvard architectures > and cross-compilers often require an address space indicator:
> space: C code space > D data space > R register space > T target address space in cross-compiler > J JTAG or debug link
I don't see any address spaces in the proposed words. I assume that they all refer to data space (nothing else is standardised). Maybe you should just leave this complication away. Given that they are not defined anywhere I would not know how to implement such spaces.
> Note that all operations are unsigned. > Should we allow for signed operations?
Yes!
I.e., have
<byte order>-<unsigned?><size><action>
where unsigned?: for signed or alternatively S for signed (if there is a conflict with "") U for unsigned
>5. Systems shall not implement words requiring or returning items > larger than the cell size. It only causes portability issues > rather than solving them.
Then you have to divide it into several extensions (for voting and extension queries).
Also, I really don't see Gforth implementing X@ only on 64-bit platforms. Either everywhere or nowhere.
You could also have variations of the L and X words that deal with doubles on the stack.
>For cell addressed machines which use an address unit larger than >8 bits, it is assumed that the upper part of a cell is simply >ignored. This proposal makes no attempt to deal with packing of >bytes for memory efficiency.
Good.
However, I see a difference between native accesses and the others: I expect "native" data to come from other subsystems on the same machine instead of from files or over the network, and that data will not be unpacked. I am not sure if the native non-cell access words are useful on cell-addressed machines at all. But if they are, it's most likely quite different from the BE and LE variants. Maybe the native ones should also have a separate extension from the other stuff.
>Providing that other operations such >as TYPE produce the expected result, the implementation may deal >with packing of data as it sees fit. However, the model of one >byte per cell will always work with least implementation >complexity.
I don't see how TYPE can work as expected with, e.g., /STRING unless each character has its own address. So I think the first sentence might be misleading, and the second sentence could be formulated more strongly.
Another reason why this might be misleading is that TYPE works with chars, and this proposal deals with bytes, so on the surface they have nothing to do with each other. And on a cell-addressed machines there may actually be a real difference between bytes and characters.
>All memory access operations in this wordset are defined on one or >more successive address-aligned bytes. In MEMORY-ACCESS, wherever >it say "store" and "fetch", the following applies:
>(1) It is assumed that bytes do not require address alignment.
>(2) For address units larger than 8 bits, each address unit > contains one byte stored in the eight least significant bits. > Additional bits shall be ignored by the fetch operations, and > should be set to zero by the store operations.
Good.
>(3) For address units smaller than 8 bits, end orientation of a > byte stored in successive address units is implementation > dependent.
Yes. Does this play a role? Do we have wording for the memory order of cell parts or dfloat parts in the present standard?
>18.2 Additional terms >---------------------
>big-endian: > The most significant byte of a multi-byte value is stored at the > lowest memory address. Also known as network order.
>little-endian: > The lest significant byte of a multi-byte value is stored at the > lowest memory address.
s/lest/least/
>native-order: > The byte ordering for multi-byte values which suites the system > architecture. (See big-endian and little-endian.)
That will be interesting for cell-addressed machines:-).
>String Value data type Constant? Meaning >MEMORY-ACCESS flag no Memory-access word set present >MEMORY-ACCESS-EXT flag no Memory-access extensions word > set present
Given the reactions to the "wordset queries" RfD, which show that wordset queries are a feature that is hardly used, and also not universally implemented, my present plan is to leave wordset queries as a Forth-94 feature that will be supported in Forth200x only as Forth-94 compatibility feature (i.e., you can query for the Forth-94 versions of the wordsets, not for the Forth200x versions), and to make it a deprecated feature in the next standard and remove it in some later standard.
So I don't think we should introduce wordset queries for any new wordsets.
>18.6 Glossary >-------------
>18.6.1 Memory-Access words >--------------------------
>18.6.1.aaaa B! "b-store" MEMORY-ACCESS > ( x addr -- )
> Store the 8 LSBs of x at addr. In system where the address unit is > larger than 8 bits, the upper bits are set to zero.
IMO repeating the second sentence everywhere is just bloat. You spelled it out earlier.
>18.6.1.aaab B@ "b-fetch" MEMORY-ACCESS > ( addr -- x )
> Fetch the 8 LSBs of x from addr. If the cell size is greater than > 8 bits, the result is zero-extended.
The cell size is guaranteed to be greater than 8 bits.
>18.6.1.aaac BE-W! "b-e-w-store" MEMORY-ACCESS > ( x addr -- )
> Store the 16 LSBs of x at addr in big-endian format irrespective of > alignment. In systems where the address unit is larger than 16 bits, > the upper bits are set to zero.
This should be: Store the 8 LSBs zero-extended at addr+1 (or, if you want to cater for 1 BYTES>1, at addr BYTE+), and the next 8 bits zero-extended at addr.
Likewise for all the other words dealing with multiple bytes, except maybe the native order words.
>18.6.1.aaad BE-W, "b-e-w-comma" MEMORY-ACCESS > ( x -- )
> Reserve 16 bits of data space and store the 16 LSBs of x in them in > big-endian format irrespective of alignment.
No. Reserve 2 BYTES (as in the proposed word BYTES) of data space ...
> If the data-space pointer is not 16-bit aligned, reserve enough > space to align it.
I assume that natural alignment to 2 bytes is meant here, irrespective of whether the hardware benefits from this or not. That should be spelled out (and in the other alignment words).
I miss the X words. Maybe this is just as well. Or maybe we should have the X words in the extension wordset, with a double as on-stack representation.
>Should a program ever need to detect the native order of the system, >it can do so by using the following code:
> $1234 PAD ! PAD B@ $34 =
>This is true when the programming is running on a little-endian system >and false otherwise.
It will also return true on a cell-addressed machine. Of course there is no native byte order on a cell-addressed machine.
>Testing >=======
Yes! Tests! Very good!
Overall I think this proposal is turning out pretty well.
"Peter Knaggs" <p...@bcs.org.uk> writes: >WALIGN, WALIGNED, LALIGN and LALIGNED are defined yet the are not >actually required as the only words that would use them (W! W, W@ L! >L, and L@) include the phrase "irrespective of alignment" in their >definition.
They may be useful for defining data structures that are aligned in that way. OTOH, such external data structures are typically pretty rigidly defined (or you have other problems), so you might just as well insert "1 BYTES +" or somesuch in the appropriate place, or define the fields with explicit offsets from the base, e.g.,
14 bytes 0 +FIELD foo-bar drop 16 bytes 0 +FIELD foo-flip drop
>Should we thus: >(a) remove the xALIGN xALIGNED words or >(b) remove the "irrespective of alignment" in the native order access > words.
I don't have a clear picture yet of how the native order access words will be used. And if we require alignment for them (their Gforth implementations certainly do), for which situations will it not be possible to use the LE or BE variants to perform the unaligned access.
For the purposes I have in mind, either the data is aligned (coming from another system component), the hardware does not care about alignment (IA-32, IA-64, PPC), or I know the byte order (for an assembler/disassembler, or for access to some hardware).
But OTOH I expect that these words will be used rarely enough that the unaligned access overhead will rarely be incurred (and on a lot hardware that overhead depends only on whether the data is really unaligned).
> "Peter Knaggs" <p...@bcs.org.uk> writes: >> 2. The names follow the notation:
>> <endian> - <size> <action> <space>
> <space> is often used for the BL character in syntax descriptions. > Better call it <address space>.
Agreed.
>> where:
>> endian: BE for big-endian >> LE for little-endian
> The noun for "endian" is "byte order". > Are you missing a line for "native"?
True, how does <order> sound, where a blank is used to indicate the native order.
>> Systems with multiple address spaces, e.g. Harvard architectures >> and cross-compilers often require an address space indicator:
>> space: C code space >> D data space >> R register space >> T target address space in cross-compiler >> J JTAG or debug link
> I don't see any address spaces in the proposed words. I assume that > they all refer to data space (nothing else is standardised). Maybe > you should just leave this complication away. Given that they are not > defined anywhere I would not know how to implement such spaces.
So we make D (data space) the default.
>> Note that all operations are unsigned. >> Should we allow for signed operations?
> Yes!
> I.e., have
> <byte order>-<unsigned?><size><action>
> where > unsigned?: for signed or alternatively > S for signed (if there is a conflict with "") > U for unsigned
I would prefer to make the unsigned version the default.
>> 5. Systems shall not implement words requiring or returning items >> larger than the cell size. It only causes portability issues >> rather than solving them.
> Then you have to divide it into several extensions (for voting and > extension queries).
The L (32-bit) words are now in the EXT.
> Also, I really don't see Gforth implementing X@ only on 64-bit > platforms. Either everywhere or nowhere.
The X (64-bit) words are not actually in the proposal. The text is simply their to reserve the names.
> You could also have variations of the L and X words that deal with > doubles on the stack.
True, but I see two problems: (a) the order of the items on the stack, L is not so bad as the words simply take/return a double, X is not so simple; (b) the name of such words is unclear to me, unless we go for an <action> of 2! 2@ and 2,
>> (3) For address units smaller than 8 bits, end orientation of a >> byte stored in successive address units is implementation >> dependent.
> Yes. Does this play a role? Do we have wording for the memory order > of cell parts or dfloat parts in the present standard?
ANS:
> 3.1.4.1 Double-cell integers
> On the stack, the cell containing the most significant part of a > double-cell integer shall be above the cell containing the least > significant part. ... > Placing the single-cell integer zero on the stack above a single- > cell unsigned integer produces a double-cell unsigned integer with > the same value. See 3.2.1.1 Internal number representation.
So I guess we probably can state the order of the bits, although I am not sure to do so.
>> String Value data type Constant? Meaning >> MEMORY-ACCESS flag no Memory-access word set present >> MEMORY-ACCESS-EXT flag no Memory-access extensions word >> set present
> Given the reactions to the "wordset queries" RfD, which show that > wordset queries are a feature that is hardly used, and also not > universally implemented, my present plan is to leave wordset queries > as a Forth-94 feature that will be supported in Forth200x only as > Forth-94 compatibility feature (i.e., you can query for the Forth-94 > versions of the wordsets, not for the Forth200x versions), and to make > it a deprecated feature in the next standard and remove it in some > later standard.
> So I don't think we should introduce wordset queries for any new > wordsets.
It is still the only standard way of discovering if the wordset is available. [DEFINED] and [UNDEFINED] are part of the optional tools ext word set.
>> Store the 16 LSBs of x at addr in big-endian format irrespective of >> alignment. In systems where the address unit is larger than 16 bits, >> the upper bits are set to zero.
> This should be: Store the 8 LSBs zero-extended at addr+1 (or, if you > want to cater for 1 BYTES>1, at addr BYTE+), and the next 8 bits > zero-extended at addr.
As we have defined big- and little-endian we may as well refer to it. Alternatively, I would prefer something like "Store the upper 8 bits at the lowest memory address, and the lower 8 bits at the next available memory address".
> Likewise for all the other words dealing with multiple bytes, except > maybe the native order words.
Surely this is more bloat, simply referring back to big- and little- endian resolves this.
>> 18.6.1.aaad BE-W, "b-e-w-comma" MEMORY-ACCESS >> ( x -- )
>> Reserve 16 bits of data space and store the 16 LSBs of x in them in >> big-endian format irrespective of alignment.
> No. Reserve 2 BYTES (as in the proposed word BYTES) of data space ...
How about: Reserve sufficient data space to store 16 bits, storing the 16 LSBs ...
>> If the data-space pointer is not 16-bit aligned, reserve enough >> space to align it.
> I assume that natural alignment to 2 bytes is meant here, irrespective > of whether the hardware benefits from this or not. That should be > spelled out (and in the other alignment words).
> I miss the X words. Maybe this is just as well. Or maybe we should > have the X words in the extension wordset, with a double as on-stack > representation.
To bring it into accordance with the existing ALIGN words it should read:
If the data-space pointer is not 16-bit aligned, reserve enough data space to make it so.
On Sat, 13 Mar 2010 19:51:18 -0000, Peter Knaggs <p...@bcs.org.uk> wrote: > Should we consider adding BFIELD: WFIELD: LFIELD: and XFIELD: to this > wordlist as suggested by A.10.6.2. BEGIN-STRUCTURE.
On Mar 13, 5:04 am, an...@mips.complang.tuwien.ac.at (Anton Ertl) wrote:
> These are signed non-native numbers that have been zero-extended when > fetching them onto the stack. They have no meaning (at least as far > as the application is concerned) as unsigned numbers, not more than > e.g., their meaning as bit patterns, floating-point numbers or > addresses. Given that, I think your sentence makes no sense, > especially not the bit about the "correct format".
Provided it fits into the cell, under the proposal, there is one unique bit pattern on stack that means a given unsigned integer value, with for address units of eight bits or larger two distinct patterns of the lower eight bits at each memory address that correspond to that unique bit pattern on the stack.
So you only need to know that it represents an unsigned integer value within the implementation range of unsigned integer values, and that the correct end orientation was used to fetch the value to the stack, and there is no ambiguity as to what unsigned integer value that it represents.
By contrast, it represents a signed two's complement integer, you cannot determine which one it is without the original width. Hence, either: (1) a corresponding signed version for each and every defined width memory access word (2) conversion of a defined width two's complement signed value to and from local (3) a signed version of the local defined width fetch and store alone, and defined width big/little endian converters.
When this RfD is again brought to a CfV and passes, having fixed the glitches in the language from the first introduction, the horse will have left the barn on (3), leaving (1) or (2).
A proposal could well combine (1) and (2), provided that (1) is in the EXT section of the proposal. Either (1) is a fetch first than a sign extension, in which case (2) is a factor, or (1) is supported by sign extending hardware, which can also be used to provide (2) efficiently.