Buffer access with bounds checking...

Mark Wills

unread,

Aug 31, 2012, 9:51:58 AM8/31/12

to

While writing about memory buffer overruns in a different thread
earlier today, I was inspired to have a bash at writing some
code that would allow safe read/write access to memory buffers.

I came up with the code and wonder if it could be simplified or
improved any.

It's very simple. When a buffer is created, the first four cells
are reserved for the following:
* The pfa of the buffer (i'll explain in a minute)
* The size of the buffer in bytes
* The lowest legally accessible address
* The highest legally accessible address

It's possible to compute the last two items on the fly of
course, but I chose to do the math once and store the computed
result, rather than compute it on each buffer access, for
performance reasons.

The pfa of the buffer is stored so that accesses to the *same*
buffer can be detected, thus the buffer management variables
do not have to be re-computed.

I think the code below is portable (gave it a quick spin in
MINOS and it ran fine (disclaimer: my system doesn't have
CELLS+)

It struck me after writing it that if one used an offset to
reference a buffers' contents rather than an absolute address
then the code could be simplified somewhat.

-------------------------

variable _bufPfa
variable _bufSize
variable _lowBound
variable _topBound

: cells+ compile cells compile + ; immediate

: buffer ( int: size "name" -- children: -- address)
create here , \ compile pfa
dup dup , \ compile buffer size
here 2 cells+ , \ pre-computed lower bound
here 1 cells+ + 1- , \ pre-computed upper bound
allot
does>
dup @ _bufPfa @ <> if
dup @ _bufPfa !
dup 1 cells+ @ _bufSize !
dup 2 cells+ @ _lowBound !
dup 3 cells+ @ _topBound !
then
4 cells+ ;

: sizeOf ( buffer -- u)
\ report size of buffer
drop _bufSize @ ;

: <>bounds ( address -- address flag)
\ check if address is within buffer bounds
dup dup _lowBound @ >= swap _topBound @ <= AND ;

: b@ ( address -- u)
\ fetch a cell from the buffer address
<>bounds if @ else true abort" Out of bounds in B@" then ;

: b! ( u address -- )
\ write a cell to the buffer address
<>bounds if ! else true abort" Out of bounds in B!" then ;

: bc@ ( address -- u)
\ fetch a char from the buffer address
<>bounds if c@ else true abort" Out of bounds in BC@" then ;

: bc! ( u address -- )
\ write a char to the buffer address
<>bounds if c! else true abort" Out of bounds in BC!" then ;

-------------------------
Tests:

100 buffer fred
: test
fred dup sizeOf 0 do
i over i + bc!
loop drop ;

999 fred 50 + b!
fred 50 + b@ .
999 ok

fred 104 + bc@ .
Out of bounds in BC@

Andrew Haley

unread,

Aug 31, 2012, 12:04:50 PM8/31/12

to

Indeed, then all that b@ and b! have to do is check the bounds and
then do the right thing.

So:

: buffer ( size -) create dup 1- , allot ;

88 constant bounds-error

: index ( buffer offset - a)
over @ over u< bounds-error and throw
cell+ + ;

: b@ ( buffer offset - x) index @ ;

... etc.

Andrew.

Alex McDonald

unread,

Aug 31, 2012, 12:20:40 PM8/31/12

to

You might want to reconsider B@ B! and so on; iirc they've been
proposed as byte equivalents of C@ C!.

Doug Hoffman

unread,

Aug 31, 2012, 2:15:38 PM8/31/12

to

I'm a proponent of error checking during development such as your buffer
bounds checking. I zealously use array index checking. Of course for
final (debugged/tested) code the checks can be bypassed for efficiency.

-Doug

Mark Wills

unread,

Aug 31, 2012, 3:56:39 PM8/31/12

to

That's a great idea, Doug. Bypassing the checks could be done using
immediate words. For example:

variable checkBounds
true checkBounds !

: bounds checkBounds @ if compile (bounds) then ; immediate

When you're happy that your code is debugged, set checkBounds to false
in the source code and re-compile. Bounds checking won't be compiled
into the program at all.

Regards

Mark

Mark Wills

unread,

Aug 31, 2012, 3:57:34 PM8/31/12

to

Oh no! That's the great "a char is the same as a byte" debate re-
ignited ;-)

For the record: A char is the same as a byte.

I'll get my coat.

Paul Rubin

unread,

Aug 31, 2012, 4:32:15 PM8/31/12

to

Doug Hoffman <glid...@gmail.com> writes:
> for final (debugged/tested) code the checks can be bypassed for
> efficiency.

The checks should probably be left in, except in the specific places
where the efficiency hit is really noticable in the overall program
performance, or if there is some other self-checking and program restart
capability in case of something going wrong.

humptydumpty

unread,

Aug 31, 2012, 5:21:26 PM8/31/12

to

Hi!

A version:
---
: cell+ 1 cells + ;
: $@ ( a -- ca u ) dup @ swap cell+ swap ;
\
\ cache map:
\ |_bufPFA|_bufSize|_lowBound|_topBound|
\
create cache 4 cells allot
cache cell+ constant _bufSize
cache 2 cells + constant _lowBound
cache 3 cells + constant _topBound

: buffer ( size "name -- ; -- a )
create dup , allot
does> dup cache @ <>
IF dup cache !
dup $@ _bufSize ! _lowBound !
dup $@ + 1- _topBound !
THEN cell+
;
---
Have a nice day,
humptydumpty

Doug Hoffman

unread,

Aug 31, 2012, 5:45:51 PM8/31/12

to

On 8/31/12 4:32 PM, Paul Rubin wrote:
> Doug Hoffman <glid...@gmail.com> writes:
>> for final (debugged/tested) code the checks can be bypassed for
>> efficiency.
>
> The checks should probably be left in, except in the specific places
> where the efficiency hit is really noticable in the overall program
> performance

Why?

-Doug

Paul Rubin

unread,

Aug 31, 2012, 6:07:57 PM8/31/12

to

Doug Hoffman <glid...@gmail.com> writes:
>> The checks should probably be left in, except in the specific places
>> where the efficiency hit is really noticable in the overall program
>> performance
>
> Why?

To misquote Kernighan and Plauger from a while back: having the checks
during testing and taking them out for production is like wearing a
parachute on the ground, but taking it off once you're in the air.

Rod Pemberton

unread,

Aug 31, 2012, 6:46:01 PM8/31/12

to

"Mark Wills" <markrob...@yahoo.co.uk> wrote in message
news:ca821896-0ed0-4314...@r4g2000vbn.googlegroups.com...
[...]

> For the record: A char is the same as a byte.

Is that apparently open-ended claim _just_ for Forth because this is c.l.f.?

Because, if the open-ended claim is for other languages too, like C, then
you're just wrong. In C, a byte must be the same size as or larger than a
char. There is a minimum size the char must be in bits too. C's strings
are terminated by a nul (all bits cleared) byte. I.e., on a 16-bit word
addressable machine, the terminating sting nul could be 16-bits, while the
characters could be 8-bits or 9-bits etc. In such a situation, C ignores
those upper bits for a char, but not for the nul. I.e., C's "nul character"
string terminator is not a character at all, but a C byte. C doesn't define
a byte as 8-bits per ASCII or EBCDIC.

Rod Pemberton

Rod Pemberton

unread,

Aug 31, 2012, 6:48:37 PM8/31/12

to

"Mark Wills" <markrob...@yahoo.co.uk> wrote in message

news:86bccc99-b093-42c4...@p12g2000vbm.googlegroups.com...

> While writing about memory buffer overruns in a different thread
> earlier today, I was inspired to have a bash at writing some
> code that would allow safe read/write access to memory buffers.
>
> I came up with the code and wonder if it could be simplified or
> improved any.
>
> It's very simple. When a buffer is created, the first four cells
> are reserved for the following:
> * The pfa of the buffer (i'll explain in a minute)
> * The size of the buffer in bytes
> * The lowest legally accessible address
> * The highest legally accessible address
>

The problem here is the same issue that C has with pointers. It's the same
reason why Java removed pointers for safety. C has the issue of determining
whether a pointer points to or into a valid C object, or not.

If you have an address in Forth, you don't know if it points into a buffer,
which buffer, or somewhere else, unless you check the ranges of _all_ known
buffers. So, let's say you've got over 50 buffers and an address. Does the
address point into a buffer and which one? You'll need a dictionary of
buffer entries. You need to loop through all of them for each address to
determine if the address is within the range of one of the buffers. As you
demonstrated, this only works _if_ you already know which buffer an address
points into.

The other option is to not allow pointers to point into an object. Then,
the pointer always points to the object's header, or to invalid data. But,
that complicates accessing data within an object, such as a character from a
string, or an item from a structure.

Of course, if you don't realize it, what you're doing is implementing part
of a type system. You're storing all the relevant info a compiler would
need to check the type, but doing so just for your buffer.

Rod Pemberton

Bernd Paysan

unread,

Aug 31, 2012, 7:41:35 PM8/31/12

to

Mark Wills wrote:

> While writing about memory buffer overruns in a different thread
> earlier today, I was inspired to have a bash at writing some
> code that would allow safe read/write access to memory buffers.
>
> I came up with the code and wonder if it could be simplified or
> improved any.
>
> It's very simple. When a buffer is created, the first four cells
> are reserved for the following:
> * The pfa of the buffer (i'll explain in a minute)
> * The size of the buffer in bytes
> * The lowest legally accessible address
> * The highest legally accessible address
>
> It's possible to compute the last two items on the fly of
> course, but I chose to do the math once and store the computed
> result, rather than compute it on each buffer access, for
> performance reasons.

I'm happy with your style. For performance reasons, please store two
things: start address an size, let's say into _start and _bound. Then
you can do

Variable _start
Variable _bound

: bound-error ( -- ) true abort" out of bound" ;

: ?bound ( addr [size] -- )
]] dup _start @ - _bound @ Literal - u<= IF [[ ; immediate
: bound? ( -- ) ]] ELSE bound-error THEN [[ ; immediate

: b@ ( addr -- x ) [ cell ] ?bound @ bound? ;
: b! ( x addr -- ) [ cell ] ?bound ! bound? ;
: bc@ ( addr -- c ) [ 1 ] ?bound c@ bound? ;
: bc! ( c addr -- ) [ 1 ] ?bound c! bound? ;

And for checking whether to update?

: buffer: ( size "name" -- )
Create dup , allot
Does> ( -- addr ) dup @ _bound ! cell+ dup _start ! ;

You should run some benchmarks which one is faster - IMHO just writing
two variables and not checking if they have already been used is faster.
And you definitely should check close to the range ends (checking is
done with 64 bit Gforth):

100 buffer: foo
foo 92 + b@ drop ok
foo 93 + b@ drop
:21: out of bound
foo 93 + >>>b@<<< drop
Backtrace:
$7F91EF5C3E20 throw
$7F91EF61F788 c(abort")
$7F91EF61FA50 bound-error
foo 99 + bc@ drop ok
foo 100 + bc@ drop
:23: out of bound
foo 100 + >>>bc@<<< drop
Backtrace:
$7F91EF5C3E20 throw
$7F91EF61F788 c(abort")
$7F91EF61FBE0 bound-error

If you test with bigForth/MINOS, bulk postponing is not compiled into
the main part, so you need to define

: [[ ; \ token to end bulk-postponing
: ]] BEGIN >in @ ' ['] [[ <> WHILE >in ! postpone postpone REPEAT
drop ; immediate

before. Well, I really should put that in the core, it is so useful for
this kind of thing. The check I do breaks if your buffer is actually
too small for one cell...

--
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://bernd-paysan.de/

Bernd Paysan

unread,

Aug 31, 2012, 8:37:40 PM8/31/12

to

Bernd Paysan wrote:
> I'm happy with your style.

There's a not missing here... I'm actually not that happy with your
style...

Mark Wills

unread,

Sep 1, 2012, 3:49:57 AM9/1/12

to

On Aug 31, 11:07 pm, Paul Rubin <no.em...@nospam.invalid> wrote:

Why is that? If your program has been properly tested then there
shouldn't be a problem.

The argument you quoted reminded me of an argument we had at work
(well, it wasn't really an *argument*) about detecting buffer over/
under flows in a serial protocol converter. I advocated removing error
checking once the embedded software had been fully tested (when I say
tested I mean tested by test engineers who's full time job it is to
find a way to break your code! Not testing it by the dude that wrote
the code!). The test guys said that buffer overflows should be
trapped, logged, and the system halt. I said "Please prove it is
possible to overrun the serial input and output buffers". I also
argued that a halt was useless - from the users perspective (who would
be some 1800 meters above the embedded device, in the warmth and
dryness of a nice drilling rig) a halt was a crash. It's not like he's
going to send an ROV down to retrieve the device, bring it to the
surface and dump the logs. Best you can is re-start the thing.

Of course, there are situations where that wouldn't be appropriate;
flight systems on aircraft for example. I don't know what the strategy
is with those types of software (I presume they are somewhere at
SIL-3?)... I mean, okay, you've detected a run-time fault, for example
dereferencing null memory. Now what? Trapping the condition only gets
you halfway! I guess all you can do is fail over to the redundant
device (which should be running *different* software) and reset.
That's what we do in the subsea industry - we fail over to the
redundant device - but the devices tend to be identical. Though we're
mostly in the non SIL or SIL-1 territory.

Mark Wills

unread,

Sep 1, 2012, 3:59:59 AM9/1/12

to

Well, I'm very much a novice ploughing my own furrow, so to speak. I'd
welcome suggestions on how my style could be improved. I did wonder
what the semantic differences between ?bound and bound? were.

Code can be a very personal thing, can't it? I've looked at other
people's VB code over the years and not liked it, though in reality
there wasn't anything wrong with it, and in some cases actually came
to appreciate a particular style. We learn from it. So, I'm always
happy to receive suggestions on how my Forth style can be improved.
Did you develop your own style, or is it a 'shop' style, or are you
adhering to long-standing Forth conventions?

I guess I should take another look at Thinking Forth! All my stuff is
packed up in boxes at the moment though.

I used to get 'style admonishments' by personal email by the late Jeff
Fox! Alas, those days are sadly gone.

Andrew Haley

unread,

Sep 1, 2012, 5:05:11 AM9/1/12

to

Rod Pemberton <do_no...@notemailnot.cmm> wrote:
> "Mark Wills" <markrob...@yahoo.co.uk> wrote in message
> news:ca821896-0ed0-4314...@r4g2000vbn.googlegroups.com...
> [...]
>
>> For the record: A char is the same as a byte.
>
> Is that apparently open-ended claim _just_ for Forth because this is c.l.f.?
>
> Because, if the open-ended claim is for other languages too, like C, then
> you're just wrong. In C, a byte must be the same size as or larger than a
> char.

Eh? In C a char is the same size as a byte. The number of bits in a
byte is implementation-dependent. A char is large enough to store any
member of the basic execution character set.

Andrew.

Marcel Hendrix

unread,

Sep 1, 2012, 5:11:52 AM9/1/12

to

Bernd Paysan <bernd....@gmx.de> writes Re: Buffer access with bounds checking...
[..]

> You should run some benchmarks which one is faster - IMHO just writing
> two variables and not checking if they have already been used is faster.
> And you definitely should check close to the range ends (checking is
> done with 64 bit Gforth):

[..]

Your scheme is faster, but not much.

#100000000 VALUE #times

: stest ( -- )
CR ." (Paysan) #times = " #times U>D (n,3)
CR ." protected: " timer-reset
#times 0 ?DO foo #88 + b@
foo #80 + b@ +
foo #72 + b@ +
foo #64 + b@ + DROP
LOOP
.elapsed

CR ." UNprotected: " timer-reset
#times 0 ?DO foo #88 + @
foo #80 + @ +
foo #72 + @ +
foo #64 + @ + DROP
LOOP
.elapsed ;

FORTH> stest
#times = 100,000,000
protected: 1.273 seconds elapsed.
UNprotected: 0.345 seconds elapsed. ok

Mark's:

100 buffer fred
: test
fred dup sizeOf 0 do
i over i + bc!
loop drop ;

: stest2 ( -- )
CR ." (Wills) #times = " #times U>D (n,3)
CR ." protected: " timer-reset
#times 0 ?DO fred #88 + b@
fred #80 + b@ +
fred #72 + b@ +
fred #64 + b@ + DROP
LOOP
.elapsed

CR ." UNprotected: " timer-reset
#times 0 ?DO fred #88 + @
fred #80 + @ +
fred #72 + @ +
fred #64 + @ + DROP
LOOP
.elapsed ;

FORTH> stest2
#times = 100,000,000
protected: 1.827 seconds elapsed.
UNprotected: 0.720 seconds elapsed. ok

-marcel

Doug Hoffman

unread,

Sep 1, 2012, 6:59:04 AM9/1/12

to

As Mark Wills points out, the checks are not fail-safe mechanisms (like
a parachute). They are there only to assist writing software that does
not fail. The checks are not required even during development.
Properly debugged and tested programs should not fail, whether or not
they were developed with bounds/index/message/etc. checks. Having the
checks during development can only only reduce possible headaches for
the programmer.

A check in a production program could at best flag a condition for a
fail-safe if the debugging/testing was inadequate, but still doesn't
help the end user. Creating that fail-safe (or parachute) is another
topic altogether. Leaving the checks in *will* give the end user a
larger and slower program.

-Doug

Rod Pemberton

unread,

Sep 1, 2012, 1:45:37 PM9/1/12

to

"Andrew Haley" <andr...@littlepinkcloud.invalid> wrote in message
news:55OdnX4ccq1aUtzN...@supernews.com...

> Rod Pemberton <do_no...@notemailnot.cmm> wrote:
> > "Mark Wills" <markrob...@yahoo.co.uk> wrote in message
> > news:ca821896-0ed0-4314...@r4g2000vbn.googlegroups.com...
> > [...]
> >
> >> For the record: A char is the same as a byte.
> >
> > Is that apparently open-ended claim _just_ for Forth because this is
> > c.l.f.?
> >

Actually, Forth-94 doesn't define a byte, but it says Forth-83 defines it as
8-bits. A C byte is what the Forth specifications call an "address unit".

> > Because, if the open-ended claim is for other languages too, like C,
> > then you're just wrong. In C, a byte must be the same size as or larger
> > than a char.
>
> Eh? In C a char is the same size as a byte.

No, it's not. I just explained it to you. Read the C specifications some
time.

For C, a char fits into a byte. A byte is comprised of one or more
addressable units of bits sufficiently large to contain a character. On
modern 8-bit byte-addressable machines, they are usually implemented as the
same size.

E.g., let's take a 16-bit word addressable machine with 9-bit characters.
In this case, a char in C is 9-bits and C's byte is 16-bits, not 8-bits.
The size for the char returned by sizeof() will be one(1) by definition even
though the char is 9-bits and consumes two 8-bit bytes. That's because C
defines a byte to a non 8-bit definition. It defines a byte as the address
unit or units large enough to contain a character. The null character in C
is a C byte, 16-bits not 9-bits, with all bits cleared. I.e., 0x0000 would
be a null, but 0xFE00 (lower 9-bits cleared) would not be. I.e., a null
character is not a character in C but a byte. The higher bits are ignored
for non-null characters, e.g., 0x004, 0xFE41, 0xA541, etc, would all be an
ASCII 'A'. Now, if for some reason a C implementation implemented 9-bit
characters on an 8-bit machine word addressable machine, the same would hold
true. The difference being that then a C byte would be comprised of two
8-bit address units. The C byte must be large enough to contain the C
character.

Rod Pemberton

Paul Rubin

unread,

Sep 1, 2012, 5:06:34 PM9/1/12

to

Mark Wills <forth...@gmail.com> writes:
> The test guys said that buffer overflows should be
> trapped, logged, and the system halt. I said "Please prove it is
> possible to overrun the serial input and output buffers".

But that's backwards. They shouldn't have to prove something is
possible. If you're asserting the checks should be removed, you are the
one who has to prove overruns are impossible.

> I also argued that a halt was useless - from the users perspective
> (who would be some 1800 meters above the embedded device, in the
> warmth and dryness of a nice drilling rig) a halt was a crash.

If there is a buffer overrun, the result might be much worse than a mere
crash (where the thing stops operating). It gadget might keep operating
while doing something completely crazy, setting itself on fire,
whatever.

> Best you can is re-start the thing.

Yes. That sounds better than letting the program keep running into the
weeds. It's no longer under the programmer's control, so it's better to
shut it off.

> I mean, okay, you've detected a run-time fault, for example
> dereferencing null memory.

If the fault didn't show up during testing, chances are it was caused by
some weird, non-deterministic condition unlikely to repeat. So log the
error, restart the program, and analyze the log later.

Erlang is written around this idea, that software failures are
inevitable, so there are extensive provisions for recovering from them.
Programs are organized into isolated processes and there is a
supervision tree that restarts crashed ones.

> That's what we do in the subsea industry - we fail over to the
> redundant device - but the devices tend to be identical. Though we're
> mostly in the non SIL or SIL-1 territory.

Yeah, I gather that ultra-critical stuff has backups using completely
different hardware and software developed by separate teams.

Andrew Haley

unread,

Sep 2, 2012, 5:19:01 AM9/2/12

to

Rod Pemberton <do_no...@notemailnot.cmm> wrote:
> "Andrew Haley" <andr...@littlepinkcloud.invalid> wrote in message
> news:55OdnX4ccq1aUtzN...@supernews.com...
>> Rod Pemberton <do_no...@notemailnot.cmm> wrote:
>> > "Mark Wills" <markrob...@yahoo.co.uk> wrote in message
>> > news:ca821896-0ed0-4314...@r4g2000vbn.googlegroups.com...
>> > [...]
>> >
>> >> For the record: A char is the same as a byte.
>> >
>> > Is that apparently open-ended claim _just_ for Forth because this is
>> > c.l.f.?
>> >
>
> Actually, Forth-94 doesn't define a byte, but it says Forth-83 defines it as
> 8-bits. A C byte is what the Forth specifications call an "address unit".
>
>> > Because, if the open-ended claim is for other languages too, like C,
>> > then you're just wrong. In C, a byte must be the same size as or larger
>> > than a char.
>>
>> Eh? In C a char is the same size as a byte.
>
> No, it's not. I just explained it to you. Read the C specifications some
> time.
>
> For C, a char fits into a byte.

No, a _character_ fits into a byte. chars and characters are not the
same thing.

> A byte is comprised of one or more addressable units of bits
> sufficiently large to contain a character. On modern 8-bit
> byte-addressable machines, they are usually implemented as the same
> size.
>
> E.g., let's take a 16-bit word addressable machine with 9-bit characters.
> In this case, a char in C is 9-bits and C's byte is 16-bits, not 8-bits.

No. A char on such a system is 16 bits. A character may be 9 bits,
but a char isn't.

> The size for the char returned by sizeof() will be one(1) by
> definition even though the char is 9-bits and consumes two 8-bit
> bytes.

No. On such a system a byte is 16 bits; there are no 8-bit bytes.

All objects in C can be accessed as arrays of chars. When you copy
one object to another a char at a time, all of the bits of the object
are copied. This is a fundamental property of C.

> That's because C defines a byte to a non 8-bit definition. It
> defines a byte as the address unit or units large enough to contain
> a character. The null character in C is a C byte, 16-bits not
> 9-bits, with all bits cleared.

Correct.

> I.e., 0x0000 would be a null, but 0xFE00 (lower 9-bits cleared)
> would not be. I.e., a null character is not a character in C but a
> byte. The higher bits are ignored for non-null characters, e.g.,
> 0x004, 0xFE41, 0xA541, etc, would all be an ASCII 'A'.

Would

char foo = 0xFE41;
('A' == foo)

return 1 on such a system? I don't think so. The upper bits of a
char are not "ignored".

Andrew.

Andrew Haley

unread,

Sep 2, 2012, 5:21:02 AM9/2/12

to

Paul Rubin <no.e...@nospam.invalid> wrote:
> Mark Wills <forth...@gmail.com> writes:
>> The test guys said that buffer overflows should be
>> trapped, logged, and the system halt. I said "Please prove it is
>> possible to overrun the serial input and output buffers".
>
> But that's backwards. They shouldn't have to prove something is
> possible. If you're asserting the checks should be removed, you are the
> one who has to prove overruns are impossible.
>
>> I also argued that a halt was useless - from the users perspective
>> (who would be some 1800 meters above the embedded device, in the
>> warmth and dryness of a nice drilling rig) a halt was a crash.
>
> If there is a buffer overrun, the result might be much worse than a mere
> crash (where the thing stops operating). It gadget might keep operating
> while doing something completely crazy, setting itself on fire,
> whatever.

How do you know that?

>> Best you can is re-start the thing.
>
> Yes. That sounds better than letting the program keep running into the
> weeds. It's no longer under the programmer's control, so it's better to
> shut it off.

Maybe it isn't. That depends on the application area. You can't
possibly know until you know what it's doing.

Andrew.

Anton Ertl

unread,

Sep 2, 2012, 6:27:03 AM9/2/12

to

Paul Rubin <no.e...@nospam.invalid> writes:

>Mark Wills <forth...@gmail.com> writes:
>> I also argued that a halt was useless - from the users perspective
>> (who would be some 1800 meters above the embedded device, in the
>> warmth and dryness of a nice drilling rig) a halt was a crash.
>
>If there is a buffer overrun, the result might be much worse than a mere
>crash (where the thing stops operating). It gadget might keep operating
>while doing something completely crazy, setting itself on fire,
>whatever.

...

>Yes. That sounds better than letting the program keep running into the
>weeds. It's no longer under the programmer's control, so it's better to
>shut it off.

That thinking blew up the Ariane 5. It would have been totally safe
to ignore the overflow, but the default was to do something that some
people considered better (IIRC it sent error messages on the
internal bus), probably because it's a better-defined result.

Oh, and the software in the Ariane 5 had been proven to be correct.

>Erlang is written around this idea, that software failures are
>inevitable, so there are extensive provisions for recovering from them.
>Programs are organized into isolated processes and there is a
>supervision tree that restarts crashed ones.

That sounds much more sensible for this kind of stuff than just
stopping. OTOH, if the failover works well, the bugs never get fixed.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2012: http://www.euroforth.org/ef12/

Rod Pemberton

unread,

Sep 2, 2012, 4:15:23 PM9/2/12

to

"Andrew Haley" <andr...@littlepinkcloud.invalid> wrote in message

news:0fWdnfhfNvUYud7N...@supernews.com...

> Rod Pemberton <do_no...@notemailnot.cmm> wrote:
> > "Andrew Haley" <andr...@littlepinkcloud.invalid> wrote in message
> > news:55OdnX4ccq1aUtzN...@supernews.com...
> >> Rod Pemberton <do_no...@notemailnot.cmm> wrote:
> >> > "Mark Wills" <markrob...@yahoo.co.uk> wrote in message
> >> >
news:ca821896-0ed0-4314...@r4g2000vbn.googlegroups.com...
> >> > [...]
> >> >
> >> >> For the record: A char is the same as a byte.
> >> >
> >> > Is that apparently open-ended claim _just_ for Forth because this is
> >> > c.l.f.?
> >> >
> >
> > Actually, Forth-94 doesn't define a byte, but it says Forth-83 defines
> > it as 8-bits. A C byte is what the Forth specifications call an
> > "address unit".
> >
> >> > Because, if the open-ended claim is for other languages too, like C,
> >> > then you're just wrong. In C, a byte must be the same size as or
> >> > larger than a char.
> >>
> >> Eh? In C a char is the same size as a byte.
> >
> > No, it's not. I just explained it to you. Read the C specifications
> > some time.
> >
> > For C, a char fits into a byte.
>
> No, a _character_ fits into a byte. chars and characters are not the
> same thing.
>

Wrong. 'char' is the C keyword declaring an object to be of type character.

> > A byte is comprised of one or more addressable units of bits
> > sufficiently large to contain a character. On modern 8-bit
> > byte-addressable machines, they are usually implemented as the same
> > size.
> >
> > E.g., let's take a 16-bit word addressable machine with 9-bit
> > characters. In this case, a char in C is 9-bits and C's byte is
> > 16-bits, not 8-bits.
>
> No. A char on such a system is 16 bits. A character may be 9 bits,
> but a char isn't.
>

Wrong. That's a byte, not a char.

> > The size for the char returned by sizeof() will be one(1) by
> > definition even though the char is 9-bits and consumes two 8-bit
> > bytes.
>
> No. On such a system a byte is 16 bits; there are no 8-bit bytes.

I said the C byte is 16-bits. What you mean is that C doesn't have 8-bit
bytes but has 16-bit bytes. That's true. However, the host machine does
have 8-bit bytes, where a C byte of 16-bits consumes two of them.

> All objects in C can be accessed as arrays of chars. When you copy
> one object to another a char at a time, all of the bits of the object
> are copied. This is a fundamental property of C.
>

No. You're _almost_ correct though.

If you replace your use 'char' here with 'byte', you will be. I.e., all C
objects are arrays of bytes. I can quote for you the C specification, C
Rationale, Johnson & Ritchie, Douglas Gwyn, etc. I.e., none mention
'char' while all mention 'byte'.

> > I.e., 0x0000 would be a null, but 0xFE00 (lower 9-bits cleared)
> > would not be. I.e., a null character is not a character in C but a
> > byte. The higher bits are ignored for non-null characters, e.g.,
> > 0x004, 0xFE41, 0xA541, etc, would all be an ASCII 'A'.
>
> Would
>
> char foo = 0xFE41;
> ('A' == foo)
>
> return 1 on such a system? I don't think so. The upper bits of a
> char are not "ignored".

You're confusing what is accessible in the C context with what is outside
it. Within the C context, you can't set foo equal to 0xFE41. C's context
only allows 9-bits to be set. Those upper bits are inaccesable from C.

Rod Pemberton

j...@rainbarrel.com

unread,

Sep 3, 2012, 2:08:37 PM9/3/12

to

> The pfa of the buffer is stored so that accesses to the *same*
>
> buffer can be detected, thus the buffer management variables
>
> do not have to be re-computed.
>
>
>
> I think the code below is portable (gave it a quick spin in
>
> MINOS and it ran fine (disclaimer: my system doesn't have
>
> CELLS+)
>
>
>
> It struck me after writing it that if one used an offset to
>
> reference a buffers' contents rather than an absolute address
>
> then the code could be simplified somewhat.
>

Diaperglu Forth uses a buffer id which is basically an index into an array of buffer handles. The buffer handles track:
. a pointer to the buffer's memory
. the size of the buffer allocated from the operating system (length of buffer from operating systems point of view)
. how much to grow the buffer when it overflows
. the maximum allowed size of the buffer
. the current length of the buffer (length of buffer from the user's point of view)
. a 'next free index' for maintaining a linked list of free buffer handles
. a current offset pointer - a convenience for thinks like parsing
. a 'magic' value so the system can do error checking that this structure is in fact a buffer handle (this is probably not necessary)

The system you came up with looks like a great start and will probably work well for an embedded system where you have access to the entire system's memory, but will not scale to systems such as Windows or Mac OS X where you are allocating memory from the operating system. Also, it would be nice to be able to free the buffers when you are done with them. The ability to free buffers is pretty much mandatory in any larger application where you are interacting with an operating system such as Windows or Mac OS X. (If I'm wrong about your code not supporting the ability to free the buffers, I apologize.)

Why I tracked the size of the buffer from the operating system's point of view: On Linux, FreeBSD, and probably Mac OS X, you have to allocate memory in units of the system page size. On Linux this is in blocks of 1k bytes. On Mac OS X 5.8 this is in units of 4k bytes.

Using a buffer id instead of addr length makes it faster. If it's just addr length, and you are going to overflow the buffer, and want the buffer to automatically grow, you have to find the buffer handle structure.... I suppose you could just use the address of the buffer handle structure... but using an ID makes it possible to detect obviously erroneous buffer ids.

On the other hand... having one large array for buffer handles could have problems of it's own... but it's thoroughly tested, and working, an in C, and available under the gnu copyleft license from my website rainbarrel.com.

In any case, I hope a buffer management system becomes part of the Forth standard. Right now we just have the ALLOCATE, FREE, and RESIZE commands in the standard, which means anyone who is using them on Windows, FreeBSD, Linux, or Mac OS X will have to do their own tracking and bounds checking.

Andrew Haley

unread,

Sep 4, 2012, 1:02:27 PM9/4/12

to

Rod Pemberton <do_no...@notemailnot.cmm> wrote:
> "Andrew Haley" <andr...@littlepinkcloud.invalid> wrote in message

> news:0fWdnfhfNvUYud7N...@supernews.com...

>> All objects in C can be accessed as arrays of chars. When you copy
>> one object to another a char at a time, all of the bits of the object
>> are copied. This is a fundamental property of C.
>
> No. You're _almost_ correct though.

LOL! Thank you for your kindness.

> If you replace your use 'char' here with 'byte', you will be. I.e.,
> all C objects are arrays of bytes. I can quote for you the C
> specification, C Rationale, Johnson & Ritchie, Douglas Gwyn, etc.
> I.e., none mention 'char' while all mention 'byte'.

And how would you access this array of bytes, if not via a character
type?

>> > I.e., 0x0000 would be a null, but 0xFE00 (lower 9-bits cleared)
>> > would not be. I.e., a null character is not a character in C but a
>> > byte. The higher bits are ignored for non-null characters, e.g.,
>> > 0x004, 0xFE41, 0xA541, etc, would all be an ASCII 'A'.
>>
>> Would
>>
>> char foo = 0xFE41;
>> ('A' == foo)
>>
>> return 1 on such a system? I don't think so. The upper bits of a
>> char are not "ignored".
>
> You're confusing what is accessible in the C context with what is outside
> it. Within the C context, you can't set foo equal to 0xFE41. C's context
> only allows 9-bits to be set. Those upper bits are inaccesable from C.

Consider this routine:

void memcopy(char *dest, char *src, size_t n) {
int i;
for (i = 0; i < n; i++)
dest[i] = src[i];
}

Used like this:

some_object a, b;

...

memcopy((char *)&a, (char *)&b, sizeof a);

Are you trying to tell us that this is not portable C? And if it is
not, how would you go about writing it?

Andrew.