write binary files

rvm

unread,

Dec 27, 2010, 10:36:52 AM12/27/10

to

Hello!

Is there a standard way to write a binary files?

The question arose since the word WRITE-FILE writes characters, not a
binary data blocks.
11.6.1.2480 WRITE-FILE FILE ( c-addr u ﬁleid -- ior )
Write u characters from c-addr to the ﬁle identiﬁed by ﬁleid
starting at its current position.

A binary file may be not mapped to the characters.

The word MOVE was introduced since "ANS Forth needs a move instruction
capable of dealing with address units". What about files?

I think the common practice is to use the WRITE-FILE to write binary
blocks (consecutive address units).
The same is concerning FILE-SIZE, FILE-POSITION, REPOSITION-FILE, READ-
FILE.

--
Ruvim

Brad

unread,

Dec 27, 2010, 11:02:04 AM12/27/10

to

On Dec 27, 8:36 am, rvm <ruvim.pi...@gmail.com> wrote:
> Hello!
>
> Is there a standard way to write a binary files?
>
> The question arose since the word WRITE-FILE writes characters, not a
> binary data blocks.
> 11.6.1.2480 WRITE-FILE FILE ( c-addr u ﬁleid -- ior )
> Write u characters from c-addr to the ﬁle identiﬁed by ﬁleid
> starting at its current position.
>

No, that's it. I have never seen a Forth whose WRITE-FILE used
anything but 8-bit bytes. Practically speaking, in all Forths that run
on an OS, characters are bytes.

-Brad

Anton Ertl

unread,

Dec 27, 2010, 11:21:12 AM12/27/10

to

rvm <ruvim...@gmail.com> writes:
>Hello!
>
>Is there a standard way to write a binary files?
>
>The question arose since the word WRITE-FILE writes characters, not a
>binary data blocks.

> 11.6.1.2480 WRITE-FILE FILE ( c-addr u =EF=AC=81leid -- ior )
> Write u characters from c-addr to the =EF=AC=81le identi=EF=AC=81ed by =
>=EF=AC=81leid

>starting at its current position.
>
>A binary file may be not mapped to the characters.

So, yes, there is no standard way to deal with binary files.

In practice, though, there are no maintained systems with 1 chars !=
1, and the systems that have address units >1 byte are embedded
systems that often don't support files; however, that is changing
(such systems are getting file systems), and one of the issues we have
to deal with in standardization is how to deal with binary files on
such systems. We have discussed these issues, but have not found
consensus on a solution yet.

>I think the common practice is to use the WRITE-FILE to write binary
>blocks (consecutive address units).
>The same is concerning FILE-SIZE, FILE-POSITION, REPOSITION-FILE, READ-
>FILE.

Yes, certainly for systems on byte-addressed machines.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2010: http://www.euroforth.org/ef10/

BruceMcF

unread,

Dec 28, 2010, 7:49:47 PM12/28/10

to

On Dec 27, 11:21 am, an...@mips.complang.tuwien.ac.at (Anton Ertl)
wrote:

> In practice, though, there are no maintained systems with 1 chars !=
> 1, and the systems that have address units >1 byte are embedded
> systems that often don't support files; however, that is changing
> (such systems are getting file systems), and one of the issues we have
> to deal with in standardization is how to deal with binary files on
> such systems. We have discussed these issues, but have not found
> consensus on a solution yet.

Since the file systems in those systems tend to be flash media storing
data in some commonplace filesystem, defined in terms of bytes, a
critical question is whether to preserve *processes* across the
au>bytes divide. An au>bytes system that implements access to a byte-
oriented filesystem could well have a byte-oriented solution
implemented underneath ``bin'' WRITE-FILE and etc, in which case a
``byte'' filetype modifier would require providing standard access to
already existing capabilities rather than adding new capabilities.

The approach that has the least namespace proliferation is a ``byte''
file access modifier, which replaces the ``bin'' file access modifier
and under which counts in existing binary oriented words are taken as
byte counts rather than character counts, and:

BYTE-UNPACK ( a ca u -- )
copy u packed bytes starting at address a to character address ca,
one byte per character.

BYTE-PACK ( ca a u -- )
copy u bytes, each stored as the lower eight bits of a character, to
address a as a sequence of packed bytes.

BYTES ( u -- u' ) The number of address units required to store u
packed bytes.

For byte=au=char systems:

: BYTE BIN ;
: BYTE-UNPACK MOVE ;
: BYTE-PACK MOVE ;
: BYTES ; IMMEDIATE

The Beez'

unread,

Dec 29, 2010, 3:57:24 AM12/29/10

to

On Dec 27, 4:36 pm, rvm <ruvim.pi...@gmail.com> wrote:
> Is there a standard way to write a binary files?

Technically, I think you're right. That should have been "address-
units". Pra

The Beez'

unread,

Dec 29, 2010, 4:04:50 AM12/29/10

to

On Dec 27, 4:36 pm, rvm <ruvim.pi...@gmail.com> wrote:

> Is there a standard way to write a binary files?

Technically, you're right. That should have been "address-units".
Practically, it doesn't make much difference, since all chars are one
address-unit (as most observed). Note the difference between TEXT
files and BINARY files was introduced when C made the jump to non-*Nix
platforms, where the EOL was NOT equal to CR/LF.

In Forth the sequence is output by one single word, CR. Unfortunately,
the whole I/O library was modeled after C instead of inventing a true
Forth one. In 4tH, output words are channeled through the 4tH I/O
system, so there is no need for this "extra" I/O layer. Consequently,
CR is the single word that decides whether you're writing a TEXT or a
BINARY file and does the translation between OS-es transparently.
"char == address unit" helps, of course. ;-)

Hans Bezemer

Andrew Haley

unread,

Dec 29, 2010, 8:36:19 AM12/29/10

to

Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
>
> So, yes, there is no standard way to deal with binary files.
>
> In practice, though, there are no maintained systems with 1 chars !=
> 1, and the systems that have address units >1 byte are embedded
> systems that often don't support files; however, that is changing
> (such systems are getting file systems), and one of the issues we have
> to deal with in standardization is how to deal with binary files on
> such systems. We have discussed these issues, but have not found
> consensus on a solution yet.

I think the best way to deal with this is let people who have
experience of porting to and using these systems to come up with
suitable designs. Greg Bailey's proposals at
http://www.mpeforth.com/arena/octets.txt seem sensible to me, but I
haven't used them.

Andrew.

BruceMcF

unread,

Dec 29, 2010, 12:31:50 PM12/29/10

to

On Dec 29, 4:04 am, "The Beez'" <hans...@bigfoot.com> wrote:
> On Dec 27, 4:36 pm, rvm <ruvim.pi...@gmail.com> wrote:> Is there a standard way to write a binary files?
>
> Technically, you're right. That should have been "address-units".
> Practically, it doesn't make much difference, since all chars are one
> address-unit (as most observed).

Which is why the portable binary file problem is writing in terms of
bytes when the au wanders away from being a byte. This was observed
early on when it was noticed that a compliant BLOCK system on a 16-bit
char system was not portable with the more common 8-bit char systems.

The Beez'

unread,

Dec 30, 2010, 3:35:25 AM12/30/10

to

On Dec 29, 6:31 pm, BruceMcF <agil...@netscape.net> wrote:
> Which is why the portable binary file problem is writing in terms of
> bytes when the au wanders away from being a byte. This was observed
> early on when it was noticed that a compliant BLOCK system on a 16-bit
> char system was not portable with the more common 8-bit char systems.

Well, I wouldn't oppose a change in the standard to address-units.
IMHO this word should really be there to write chunks of raw memory,
not to write chars.

Has Bezemer

BruceMcF

unread,

Dec 30, 2010, 3:52:42 PM12/30/10

to

On Dec 30, 3:35 am, "The Beez'" <hans...@bigfoot.com> wrote:
> Well, I wouldn't oppose a change in the standard to address-units.
> IMHO this word should really be there to write chunks of raw memory,
> not to write chars.

Starting with that as the base:

RAW ( fam -- fam' ) put the file into a mode to read and write strings
of address units (and all counts in terms of characters in the
relevant words are amended to counts in terms of address units), and
then there is no need for a sizer word. If address units are not an
integral number of the base unit of the filesystem, the mapping from
filesystem to memory is system dependent.

Since most external filesystems are defined in terms of bytes
(octets), reliable translation from external filesystem specifications
to multiple implementations which might have a different number of
octets per address unit then requires:

BYTES ( u -- au ) au is the smallest number of address units that are
sufficient to contain u bytes (octets).

Indeed, now byte access is even more critical, because in the large
majority of implementations no translation is necessary, which means
that importing routines from the large majority of implementations
requires that the translation be a single, well-defined step rather
than spread across the ported code. That would be:
BYTE-UNPACK ( a ca u -- ) copy u bytes from address a to the u
character locations beginning at ca. The layout of bytes at address a
is compatible with RAW mode read and write.
and
BYTE-PACK ( ca a u -- ) copy the u bytes contained in the u
character locations beginning at ca to address a. The layout of bytes
at address a is compatible with RAW mode read and write.

rvm

unread,

Jan 7, 2011, 11:06:25 PM1/7/11

to

I totaly agree with making the change in the standard.
Something like this:
11.6.1.2480 WRITE-FILE FILE ( addr u fileid -- ior )
Write u consecutive address units from addr to the file identified
by fileid

starting at its current position.

(and similarly to other file operations)

I know two forth implementations that have 16-bit char-size, 8-bit
address-unit and address-unit sizing for file oparations. There are
dsForth for Windows CE and unicode version of SP-Forth/3 (used in the
nnbackup).

And I don't know any implementations that have char-sizing for file
operations different from address-unit size.

Besides, large amount of existed code that writes binary files, writes
them (files) in address-units (notably bytes).

--
Ruvim

Elizabeth D Rather

unread,

Jan 8, 2011, 12:45:59 AM1/8/11

to

On 1/7/11 6:06 PM, rvm wrote:
> On Dec 30 2010, 11:35 am, "The Beez'"<hans...@bigfoot.com> wrote:
>> On Dec 29, 6:31 pm, BruceMcF<agil...@netscape.net> wrote:
>>> Which is why the portable binary file problem is writing in terms of
>>> bytes when the au wanders away from being a byte. This was observed
>>> early on when it was noticed that a compliant BLOCK system on a 16-bit
>>> char system was not portable with the more common 8-bit char systems.
>>
>> Well, I wouldn't oppose a change in the standard to address-units.
>> IMHO this word should really be there to write chunks of raw memory,
>> not to write chars.

I'm not sure. I think that files are written in 8-bit bytes. That is
by no means a universal "address unit". The Intellasys/Greenarrays
parts have larger address units, for example, and I have encountered a
number of other platforms with AU other than bytes. Although Forth94
strenuously avoided fixed-size entities, IMO that this instance is not
the only one for which fixed-sized bytes or octets would be appropriate.
Communications protocols also come to mind.

Cheers,
Elizabeth

>> Has Bezemer
>
> I totaly agree with making the change in the standard.
> Something like this:
> 11.6.1.2480 WRITE-FILE FILE ( addr u fileid -- ior )
> Write u consecutive address units from addr to the file identified
> by fileid
> starting at its current position.
>
> (and similarly to other file operations)
>
>
> I know two forth implementations that have 16-bit char-size, 8-bit
> address-unit and address-unit sizing for file oparations. There are
> dsForth for Windows CE and unicode version of SP-Forth/3 (used in the
> nnbackup).
>
> And I don't know any implementations that have char-sizing for file
> operations different from address-unit size.
>
> Besides, large amount of existed code that writes binary files, writes
> them (files) in address-units (notably bytes).
>
> --
> Ruvim

--
==================================================
Elizabeth D. Rather (US & Canada) 800-55-FORTH
FORTH Inc. +1 310.999.6784
5959 West Century Blvd. Suite 700
Los Angeles, CA 90045
http://www.forth.com

"Forth-based products and Services for real-time
applications since 1973."
==================================================

Anton Ertl

unread,

Jan 8, 2011, 9:22:54 AM1/8/11

to

rvm <ruvim...@gmail.com> writes:
>I totaly agree with making the change in the standard.
>Something like this:
>11.6.1.2480 WRITE-FILE FILE ( addr u fileid -- ior )
> Write u consecutive address units from addr to the file identified
>by fileid
>starting at its current position.
>
>(and similarly to other file operations)
>
>
>I know two forth implementations that have 16-bit char-size, 8-bit
>address-unit and address-unit sizing for file oparations. There are
>dsForth for Windows CE and unicode version of SP-Forth/3 (used in the
>nnbackup).

That's interesting. And I guess that these systems are maintained,
unlike JaxForth (IIRC this was the other system with this property).

Systems with 1 chars > 1 au are a bad idea, mainly because most
programs dealing with chars have a dependency on 1 chars = 1. If
some systems chose to implement this bad idea, they have to live with
the consequences.

Ironically, the suggested change would break existing standard
programs that actually work on systems with 1 CHARS > 1 and use
WRITE-FILE. And more ironically, this breakage would only affect such
systems. I.e., your suggested change would reduce the number of
programs that run on these systems even more (while these programs
would continue to run on systems with 1 CHARS = 1).

The time-tested approach to such a need is not to change existing
words in an incompatible way, but to introduce new words that have the
needed functionality.

BruceMcF

unread,

Jan 8, 2011, 6:55:20 PM1/8/11

to

On Jan 8, 9:22 am, an...@mips.complang.tuwien.ac.at (Anton Ertl)
wrote:

> The time-tested approach to such a need is not to change existing
> words in an incompatible way, but to introduce new words that have the
> needed functionality.

Is there any other approach that is as concise and as easy for the
majority of systems to support as a "octet" or "byte" file access
method modifier word and a "BYTES" buffer sizing word?