Bounded pointers

Showing 1-138 of 138 messages
Bounded pointers Ivan Godard 10/15/13 6:09 AM
A while back there were several threads here about bounds checking of
access to memory via pointers. The proposed solutions required, if not
full capabilities, at least a double-width data structure comprising an
address and a length. At the time, I said we had an improvement on that
and would describe it when the relevant filing had been submitted.

That filing is now in, so here's the description. The basic idea is to
represent the address in floating point, and to flag any
address-manipulation operation that alters the exponent. The method is
not as general as a full descriptor and has several drawbacks: checked
regions must be of power-of-two sizes, and erroneous internal
misaddressing (such as mis-alignment of an indexed element) are not
caught. On the other hand, checked pointers are of the same size as
unchecked pointers, and many off-by-one or wild address errors will be
caught by the hardware.

We have (so far) not included this idea in the Mill definition; we are
not sure that the game is worth the candle. It costs six bits out of the
64-bit address space, and as a SAS machine we are careful of address
space. In addition, it will break some language-invalid but functionally
correct code, such as code that moves a pointer off a target array but
then moves it back before dereferencing.

So I solicit your opinion: is this idea of sufficient utility to merit
inclusion?

Here's the description from the provisional:
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1.4        Bounded pointers

Languages such as C that permit arbitrary creation and manipulation of
pointers are notorious for wild-address bugs. A pointer can be easily be
made to address a location that does not contain program data, or
contains different data than the logic of the program expects. When the
incorrect pointer is dereferenced to access the pointed-to location the
result may be incorrect program execution or even system crash.

Hardware paging and protection may detect some of these errors, but the
granularity of detection is very large and many bugs are uncaught.
Certain programming languages, such as Java, are immune to pointer wild
addresses; all changes to a pointer value are explicitly checked for
consistency with the program model and will raise an exception if the
check fails. Such checks can also be applied manually in languages like
C. However, the checks have cost in execution time and program and data
space, and are eschewed by many programmers.

The general form of a pointer check presumes that the target of a
pointer is a single program object, frequently an array, and that a
valid pointer will always refer to a location within the address range
occupied by the object. Hence, checking commonly requires retaining the
least and greatest addresses of the occupied range, and verifying that
subsequent pointer manipulation does not cause the pointer to exceed
those bounds. This system is simple and general, but triples the data
required for a checked pointer over that for an unchecked pointer, and
the check itself can add several instructions at each point the pointer
is used. Also note that the range check does not catch errors due to
misaligned pointers within the range.

The present invention provides an economical alternative to the general
range check of pointers. It is not completely general, because the
checked region is required to be a naturally-aligned space of
power-of-two size. However, in practice allocators for power-of-two
spaces are simple and cheap, and the waste space arising from rounding
allocations up to power-of-two boundaries can be reused for other
purposes. The only significant drawback to the present invention, when
compared to the completely general way of checking, is that erroneous
pointer movements within the smallest power-of-two range encompassing
the object are not caught, whereas they would be by the general
solution. Hence the present invention may be considered to be an
economical aid to debugging rather than a panacea.

The present invention alters the representation used to represent an
address within a pointer. Without loss of generality, this discussion
will assume 64-bit addresses. The pointer is given an added field large
enough to hold a bit number, a six-bit field in this case. The remainder
of the bits of the pointer, 58 in this case, contains the actual address
location. This reduces the size of the potential address space from 2^64
to 2^58, which is unimportant in practice on modern CPUs.

The bit number in the bit number field is interpreted as splitting the
address field into two parts, a chunk address and an offset within the
chunk. Thus if the bit number is 3, then the least significant three
bits of the address comprise the offset and the most significant 55 bits
of the address field comprise the chunk. The pointer in this example
thus indicates a location within an eight byte chunk, beginning at the
offset from the start of the chunk; the actual address is the
concatenation of the chunk and offset fields, interpreted as a byte
address. Necessarily the chunk must have power-of-two size and natural
alignment, as a consequence of the representation used.

In use, any modification of the pointer that changes only the offset is
deemed safe and correct, while any operation (other than certain special
operations as described later) that modifies the bit number or chunk
fields is deemed unsafe and incorrect. Thus pointer manipulation within
a chunk is permitted, while manipulation across chunks is not.
This check must be applied at every use of the pointer. While a program
using this pointer representation could perform the checks by added
instructions, it is more economical and less prone to error if the
checks are incorporated into the pointer-manipulating operations in
hardware.

A typical pointer manipulation operation is to increment a pointer�s
address to refer to a successor element in an array. The increment
comprises adding a small constant equal to the size of the array element
to the bits of the pointer. On many architectures, pointer
incrementation is performed by the same hardware operation that does
integer arithmetic on pointer-sized numbers. In the context of the
present invention, the need for the extra check mandates that the
pointer-increment operation must be distinct from integer addition, and
the programmer (or the compiler) must distinguish the cases and use the
operation appropriate to the case at hand.

In operation, the pointer-increment operation adds the increment to the
pointer and checks whether the bit number or chunk fields changed. If
not, the increment was solely within the offset and is permitted.
However if the bit number or chunk fields did change then a program
error has been detected and will be reported using whatever facilities
the rest of the architecture provides for reporting of hardware-detected
inconsistencies.

The mechanism to support pointer increment used in this example is also
used for pointer decrement operations and for computing effective
addresses in load, store, load-effective-address, and other operations
in which an address is derived from but not identical to a pointer.
However, all these operations presume an initial pointer whose target
address is being modified. The CPU architecture using the present
invention must provide means by which a pointer may be constructed.
Typical means include conversion between integer and pointer, and
construction from a literal value contained in the program; both these
operations are likely to be implicit (requiring no code) except on
architectures in which data is identified by type at run time.

�Refine� is operation that can be profitably added to the instruction
set of a CPU using the present invention. The refine operation
decrements the bit number field of a pointer, checking for underflow.
The effect of this is to leave the pointer referring to the same byte
address (the concatenated chunk and offset bits are unchanged, although
the chunk field is larger and the offset smaller), but the size of the
encompassing chunk is reduced for checking purposes.  The inverse
operation which increments the bit number field is inherently risky (the
range of memory within which the pointer is deemed to be valid is
increased), but may be of use in allocators.

If the target datum is not itself of a natural power-of-two size then
the chunk used in pointers referring to it must be larger than the
object itself; the excess may lie in front of the object, behind it, or
both. Moving the pointer from the object into these excess regions will
be an undetected error in the present invention. It is most common for
programs to step through arrays in increasing-address order, so placing
the excess in front of the object (with no excess after it) is more
likely to catch array overruns than other arrangements would.

Operations that use pointers but do not modify them must be aware of but
ignore the bit number field. Thus relational operations, which detect
whether a pointer refers to the same location as another pointer, or to
a greater or lesser address, must compare the concatenated chunk and
address regardless of the bit number. Similarly, the delta operation,
which determines the byte distance from one address to another, must
subtract only the concatenated chunk and offset, and ignore the bit
number fields.

Any program manipulation that leaves a pointer referring within the same
object as it did before the manipulation is by definition valid.
However, the C++ language also defines that a pointer may validly refer
to one element beyond an array, outside of the addressed object, but
only by one element size, and the result must not be dereferenced. If
the object is of less than power-of-two size this language requirement
can be accommodated by having the allocator place the object one element
from the end of the chunk. Increment of the pointer to the last-plus-one
position will then still be within the chunk and no error will be reported.

However, this stratagem is unavailable if the object is of exactly
power-of-two size, because leaving space for the after-the-end position
would wastefully force a chunk of double the natural size. Instead, the
present invention provides for this C++ usage by defining a single
�valid� bit in the pointer representation; valid pointers may be
dereferenced normally, while attempts to dereference an invalid point
result in an exception. Program actions to create a pointer (by
conversion of a literal or an integer datum) produce a valid pointer. A
pointer increment of a valid pointer that causes the chunk field to
increment by one and the offset to become zero marks the pointer as
invalid without raising an exception.

Operations on invalid pointers are restricted. Invalid pointers may be
compared with valid or invalid pointers by relational operation without
error, and the distance between an invalid pointer and another pointer
may be determined, also without error. However address arithmetic (such
as increment, decrement, or computing an effective address) on an
invalid pointer must produce either the original invalid pointer
unchanged, or a pointer into the chunk immediately previous to the one
indicated by the invalid pointer, in which case the produced pointer is
marked valid. These rules permit incrementing a pointer to the
one-beyond-the-end position and then restoring it to refer within the
original object, while preventing access to the past-the-end-position or
to any wild address further on.

It will be evident that CPU architectures using the present invention
will be able to detect many, but not all, program errors involving
pointers. The data requirement is the same as for unchecked pointers and
no extra code is required for the added safety. Latency of pointer
manipulating operations is not increased because the checking is simple
and may be performed in parallel with the manipulation.
Re: Bounded pointers Ryan Hitchman 10/15/13 12:35 PM
This is essentially hardware-supported Baggy Bounds Checking (Akritidis 2009 - http://research.microsoft.com/pubs/101450/baggy-usenix2009.pdf). See especially section 5.

It's a good low-overhead way to prevent many buffer overruns. Implementing it as completely zero-overhead could enhance usage at additional hardware complication, but most of the benefits could be gained by making the code sequence in Figure 18 as efficient as possible. Maybe some sort of "branch if upper N bits nonequal"?

In combination with compiler passes that hoist redundant bounds checks out of inner loops, overhead would be very low with a few accelerator instructions. Even on x86-64, the overhead on this kind of bounds checking is quite low, <10%.
Re: Bounded pointers Andy (Super) Glew 10/15/13 7:16 PM
On 10/15/2013 6:09 AM, Ivan Godard wrote:
> A while back there were several threads here about bounds checking of
> access to memory via pointers. The proposed solutions required, if not
> full capabilities, at least a double-width data structure comprising an
> address and a length. At the time, I said we had an improvement on that
> and would describe it when the relevant filing had been submitted.
>
> That filing is now in, so here's the description. The basic idea is to
> represent the address in floating point, and to flag any
> address-manipulation operation that alters the exponent. The method is
> not as general as a full descriptor and has several drawbacks: checked
> regions must be of power-of-two sizes, and erroneous internal
> misaddressing (such as mis-alignment of an indexed element) are not
> caught. On the other hand, checked pointers are of the same size as
> unchecked pointers, and many off-by-one or wild address errors will be
> caught by the hardware.

Similar ideas have been widely discussed. I.e. much prior art.

I am pretty sure that comp.arch has discussed, years ago, using the
"extra bit" trick to describe power-of-2 sized and aligned regions.

When I worked on this at Intel we considered and rejected this approach.
Insufficient benefit.






--
The content of this message is my personal opinion only. Although I am
an employee (currently Imagination Technologies, which acquired MIPS; in
the past of companies such as Intellectual Ventures/QIPS, Intel, AMD,
Motorola, and Gould), I reveal this only so that the reader may account
for any possible bias I may have towards my employers' products. The
statements I make here in no way represent my employers' positions on
the issue, nor am I authorized to speak on behalf of my employers, past
or present.
Re: Bounded pointers Terje Mathisen 10/15/13 11:13 PM
Andy (Super) Glew wrote:
> Similar ideas have been widely discussed. I.e. much prior art.
>
> I am pretty sure that comp.arch has discussed, years ago, using the
> "extra bit" trick to describe power-of-2 sized and aligned regions.

_You_ did, less than 2 years ago. :-)

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
Re: Bounded pointers Ivan Godard 10/16/13 12:58 PM
On 10/15/2013 7:16 PM, Andy (Super) Glew wrote:
> On 10/15/2013 6:09 AM, Ivan Godard wrote:
>> A while back there were several threads here about bounds checking of
>> access to memory via pointers. The proposed solutions required, if not
>> full capabilities, at least a double-width data structure comprising an
>> address and a length. At the time, I said we had an improvement on that
>> and would describe it when the relevant filing had been submitted.
>>
>> That filing is now in, so here's the description. The basic idea is to
>> represent the address in floating point, and to flag any
>> address-manipulation operation that alters the exponent. The method is
>> not as general as a full descriptor and has several drawbacks: checked
>> regions must be of power-of-two sizes, and erroneous internal
>> misaddressing (such as mis-alignment of an indexed element) are not
>> caught. On the other hand, checked pointers are of the same size as
>> unchecked pointers, and many off-by-one or wild address errors will be
>> caught by the hardware.
>
> Similar ideas have been widely discussed. I.e. much prior art.
>
> I am pretty sure that comp.arch has discussed, years ago, using the
> "extra bit" trick to describe power-of-2 sized and aligned regions.
>
> When I worked on this at Intel we considered and rejected this approach.
> Insufficient benefit.

Do you have a link? All I recall on your site was using an extra-bit as
tagging to trigger checks against out-of-line metadata.

About "insufficient benefit": in your opinion would the same conclusion
be true for a new instructions set that does not have to deal with
legacy asm in an ISA lacking the feature?

Re: Bounded pointers Ivan Godard 10/16/13 1:15 PM
Again! Happens all the time - come up with a great idea only to discover
it is already well know to everybody - but you

The proposal is different from Baggies in that, while both use
power-of-two to achieve single-width pointer checking, the
implementation is for a hardware instruction set and not a software
facility on top of x86. The particular implementation may be novel
enough to survive the PTO, although Baggies will certainly need to be
cited as prior art.


p.s. Thank you.
Re: Bounded pointers Andy (Super) Glew 10/16/13 9:49 PM
I thought the Modriaan hardware bounds checking proposal used the
power-of-2 scheme, but re-checking their papers seemed to say not.
Still, I have vague memories of some other MIT proposal.

So, although this certainly was something we looked at at Intel, and
something I have thought about since the 1980s, if it did not get posted
on comp.arch, it may not be otherwise public.

I would be surprised if I have not posted about it somewhere, since once
you know the extra bit trick, it's obvious.  But I am too lazy to go
searching - when I retire somebody can hire me to find the prior art to
break your patent. Or not.  Hope I can paid for time spent, not success
or not.

By the way, although the extra bit trick I consider obvious, you have
got me thinking about a more completely FP-like implementation, with a
true mantissa and exponent. Possibly even with the hidden bit.  (OK,
fractional bits seem silly...)

I think the extra bit trick suffices to specify all 2^N sized regions,
with N bit address, with just N+1 bits.

But a representation with a K bit exponent and and M bit mantissa allows
you to specify many different object sizes in an address space that may
be much larger than 2^(K+M). We might not want to use such a
represenation for the actual values in registers, but we might want to
use such a representation for addressing constants, e.g. embedded in
literals.  Perhaps constant pool compaction.  Trouble is, how choose K
and M?
Re: Bounded pointers Andy (Super) Glew 10/16/13 10:04 PM
On 10/16/2013 12:58 PM, Ivan Godard wrote:
> On 10/15/2013 7:16 PM, Andy (Super) Glew wrote:
>> On 10/15/2013 6:09 AM, Ivan Godard wrote:
>>> A while back there were several threads here about bounds checking of
>>> access to memory via pointers. The proposed solutions required, if not
>>> full capabilities, at least a double-width data structure comprising an
>>> address and a length. At the time, I said we had an improvement on that
>>> and would describe it when the relevant filing had been submitted.
>>>
>>> That filing is now in, so here's the description. The basic idea is to
>>> represent the address in floating point, and to flag any
>>> address-manipulation operation that alters the exponent. The method is
>>> not as general as a full descriptor and has several drawbacks: checked
>>> regions must be of power-of-two sizes, and erroneous internal
>>> misaddressing (such as mis-alignment of an indexed element) are not
>>> caught. On the other hand, checked pointers are of the same size as
>>> unchecked pointers, and many off-by-one or wild address errors will be
>>> caught by the hardware.
>>
>> Similar ideas have been widely discussed. I.e. much prior art.
>>
>> I am pretty sure that comp.arch has discussed, years ago, using the
>> "extra bit" trick to describe power-of-2 sized and aligned regions.
>>
>> When I worked on this at Intel we considered and rejected this approach.
>> Insufficient benefit.
>
> Do you have a link? All I recall on your site was using an extra-bit as
> tagging to trigger checks against out-of-line metadata.

I have been posting to comp.arch much longer than I have been working on
the wiki.

But, no, I don't have a link - although Terje seems to remember that the
comp.arch discussion happened recently, in the last two years or so. (I
have lousy luck searching comp.arch.)


>
> About "insufficient benefit": in your opinion would the same conclusion
> be true for a new instructions set that does not have to deal with
> legacy asm in an ISA lacking the feature?


I don't think that it is so much the assembly as the data structures:

"It is easier to change instruction sets than it is to change the size
and layout of standard data types." E.g. sizeof(pointer).  E.g. offsets
in structs.

And, in particular, look at the security benchmarks on, what is it, the
NIST site. (This is a collection of C programs with various bugs typical
of security bugs, like buffer overflows.) Given a standard compiler,
like gcc/x86, with the ABI standard data structure addresses. Count how
many of them you can detect with 2^N sized and aligned regions. It is
certainly not all, and I don't think it is even most - the strings are
not all 2^N sized.

(You can do the same thing with various real-world buffer overflows
rather than the NIST microbenchmarks. More work dredging through open
source project bug reports.  Perhaps some academic has collected them
into a better benchmark suite.  (By the way, if you don't find several
bugs in the SPEC benchmarks when you have the bounds checking turned on,
you aren't trying hard enough.)

Certainly, 2^N checking is better than nothing.

It is sufficient if you can force everyone to change data structure layout.

It will certainly catch some, but not all, buffer overflows in real
world code.

I think it is the usual difference between you and me, Ivan:

At Intel I had to work in a highly constrained environment:
* I could not assume that everyone would recompile
* I could not assume that the compiler could change data layout

Plus, software bounds checking tools are pretty good.  Hardware support
can only be justified if significantly faster and/or better coverage.
(And, IMHO, in the long run can only be justified if they can show a
path to higher levels of security, not just advisory bounds checking.)
Re: Bounded pointers Andy (Super) Glew 10/16/13 10:08 PM
On 10/16/2013 12:58 PM, Ivan Godard wrote:

> About "insufficient benefit": in your opinion would the same conclusion
> be true for a new instructions set that does not have to deal with
> legacy asm in an ISA lacking the feature?

Q: will the mill adhere to any established standards for data structure
layout?

I know, you don't have to.

But doing so increases the chance of people willing to try out a new ISA.

Like I said earlier, I think it is easier to change instruction set
completely than it is to change data structure layout, for a large body
of code that, e.g. saves data to disk, memory maps such files, and
exchanges such files between different ISAs.
Re: Bounded pointers Ivan Godard 10/16/13 10:52 PM
On 10/16/2013 10:08 PM, Andy (Super) Glew wrote:
> On 10/16/2013 12:58 PM, Ivan Godard wrote:
>
>> About "insufficient benefit": in your opinion would the same conclusion
>> be true for a new instructions set that does not have to deal with
>> legacy asm in an ISA lacking the feature?
>
> Q: will the mill adhere to any established standards for data structure
> layout?

I'm not sure where this question comes from. We follow language-standard
layouts throughout. In the places where the language leave it up to the
implementation:
        1) we support quad
        2) we support both binary and decimal FP, both conforming to IEEE754
        3) we support only 64-bit pointers
        4) we support bit fields that cross boundaries
        5) we support both big- and little-endian memory organization, although
the core internally is little-endian

The floating-point pointers described in the filing are plain 64-bits
each, not 64+1 and not 128 (nor larger).  Why do you think that the Mill
has a non-standard memory layout?



> I know, you don't have to.
>
> But doing so increases the chance of people willing to try out a new ISA.

Yes, we know, which is why we haven't changed memory layouts.

> Like I said earlier, I think it is easier to change instruction set
> completely than it is to change data structure layout, for a large body
> of code that, e.g. saves data to disk, memory maps such files, and
> exchanges such files between different ISAs.

All date that is possible to port will port in and out of a Mill as
easily as between any other architecture pair. So data is inherently not
portable: pointers and file descriptors won't port to a different
process, never mind a different ISA. But everything else goes.

Why did you think otherwise? Perhaps you should reread the filing?

Ivan
unk...@googlegroups.com 10/17/13 11:34 AM <This message has been deleted.>
Re: Bounded pointers Ryan Hitchman 10/17/13 11:39 AM
On Wednesday, October 16, 2013 10:52:59 PM UTC-7, Ivan Godard wrote:
>         4) we support bit fields that cross boundaries

Are unaligned loads that involve two cachelines (cacheline splits) penalized?
Re: Bounded pointers Ivan Godard 10/17/13 12:12 PM
Member dependent.

Individual members configure the number of physical pathways through the
hierarchy. While the two cache accesses involved in a cross-line load
are an atomic pair (nothing can come between them), they are not
necessarily concurrent, so if there are not two available pathways then
there will be a one-cycle penalty. However, due to the use of deferred
loads in the Mill (as described in yesterday's Stanford talk and the
video that will be up soon), Mill code generally does not see cache
latency anyway, so the extra cycle will often be invisible even on a
low-end member with only a single pathway.
Re: Bounded pointers Öö Tiib 10/17/13 2:34 PM
On Tuesday, 15 October 2013 16:09:23 UTC+3, Ivan Godard  wrote:
> A while back there were several threads here about bounds checking of
> access to memory via pointers. The proposed solutions required, if not
> full capabilities, at least a double-width data structure comprising an
> address and a length. At the time, I said we had an improvement on that
> and would describe it when the relevant filing had been submitted.

Your whole Mill seems interesting.

If I understand correctly you want to split address bits in pointer to
mutable and immutable parts (for some "operations"). The proportions are
indicated by part of pointer that does not encode address (you call it
"bit number").

A suggestion (that I did not see addressed, sorry if it actually is).
Make sure that there is a state where whole pointer is immutable for
those "operations". Like if "bit number" is 0 then the pointer is
immutable. Make it easy for program to set it. Rationale: Sometimes
programmers accidentally navigate with pointers that are not meant
for that (pointer to some member, not array element).

Then that "invalid pointer". Do you use 7-th bit for "invalidness"?
Sorry, it may be obvious, I perhaps need to make some calculations
myself. Anyway I see several ways how 'std::vector' in C++ (most
popular "array" there) may benefit from such approach.

About incrementing and decrementing pointers. It left me dim a bit.
Usually it does not happen by one byte (or word or bit, whatever the
granularity of "address" in that architecture) but by length of element
of array or by offset of member object of a struct. So that "immediately
previous to the one indicated by the invalid pointer" is likely either
ambiguous description of the idea or somewhat useless location.

Typical memory managers provide memory in certain chunks. I don't know
any that give less than 16 bytes, it just is not rational in current
world of cheap memory. That means there seem to be even more ways
to fine-tune things. It benefits both architecture and operating
system if those assume sane attitude from each other. Also, what you
do with unused "bit number" values (59, 60, 61, 62, 63)?

Feel free to use these thoughts for benefit of your Mill ... whatever
is it academic project or real thing ... I like it. :)
Re: Bounded pointers Andy (Super) Glew 10/17/13 9:41 PM
On 10/16/2013 10:52 PM, Ivan Godard wrote:
> On 10/16/2013 10:08 PM, Andy (Super) Glew wrote:
>> On 10/16/2013 12:58 PM, Ivan Godard wrote:
>>
>>> About "insufficient benefit": in your opinion would the same conclusion
>>> be true for a new instructions set that does not have to deal with
>>> legacy asm in an ISA lacking the feature?
>>
>> Q: will the mill adhere to any established standards for data structure
>> layout?
>
> I'm not sure where this question comes from. We follow language-standard
> layouts throughout. In the places where the language leave it up to the
> implementation:
>      1) we support quad
>      2) we support both binary and decimal FP, both conforming to IEEE754
>      3) we support only 64-bit pointers
>      4) we support bit fields that cross boundaries
>      5) we support both big- and little-endian memory organization,
> although the core internally is little-endian
>
> The floating-point pointers described in the filing are plain 64-bits
> each, not 64+1 and not 128 (nor larger).  Why do you think that the Mill
> has a non-standard memory layout?

Our usual terminology mismatch.  What you list above are data element
SIZES, not data layout (or, only the most basic part of data layout).

As far as I know, no language standard (except Ada) defines data layout.
  C/C++ slight exception: they say that structure elements must be in
increasing address, but otherwise you can add padding.

What I mean by data layout:

When you say

struct S {
      uint8_t_t a;
      uint64_t b;
      uint32_t c;
}

what are sizeof(S),
the alignment of S (aligned on an address that is a multiple of N)
and the offsets of a, b?

packed 13/1/0,1,9
typical x86 16/4/0,4,12
     - many x86 systems only align uint64_t on a 32 bit boundary
fully aligned and padded to ensure alignment 24/20/0,8,16
fully aligned and padded to cache line bank 32/20/0,8,16

etc.


Ada does a good job of allowing programmer specification of data layout,
and if the programmer doesn't specify, I think allows a lot of freedom
to the compiler.

Many languages make "packed" moderately well defined, and give compiler
freedom otherwise.

C/C++ only say that the offsets are monotonically increasing, but not
much else.

ABIs often define data layout in more detail.

I think Java allows much rearrangement of fields.


Reason I ask:  when you have 2^N sizes, it is *nice* to sort fields from
largest to smallest, to reduce fragmentation:

struct S {
      uint64_t b;
      uint32_t c;
      uint8_t_t a;
}



But you aren't allowed to do that in C/C++.



Still, naturally aligning all fields to 2^ceil(log2(sizeof(field)))
will make 2^N bounds more secure. In fact, I think as secure as
arbitrary base,bounds.

But you aren't allowed to do that in many ABIs.

And it wastes a lot of memory.

see, for example
"An empirical investigation of OR indexing." ACM SIGMETRICS Performance
Evaluation Review 17.2 (1990): 41-49.





Now, I say again, 2^N bounds are better than nothing, even if you must
use a legacy data layout.

They will catch many more bugs if you can use a custom data layout.


So I was just asking if you were considering a custom data layout.



I was guessing that you were willing to use a more custom data layout
than I would be.






> All date that is possible to port will port in and out of a Mill as
> easily as between any other architecture pair. So data is inherently not
> portable: pointers and file descriptors won't port to a different
> process, never mind a different ISA. But everything else goes.

Umm, I work with HSA, the Heterogenous Systems Architecture.

This decidedly does try to make pointers port to different ISAs.


> Why did you think otherwise? Perhaps you should reread the filing?

I haven't read the filing at all, Ivan.

Working engineers at most companies, in my experience, are under
direction from corporate legal not to go and read patents.
Re: Bounded pointers Ivan Godard 10/17/13 10:31 PM
You have me mightily confused. The Mill is *hardware*. Hardware does not
define data organization above the data elements that are operands of
*hardware* operations. Thus in C terms it does not define structs,
unions, or arrays (although it might define "vectors", which are not the
same as arrays). Anything not eaten by a functional unit is the province
of languages, compilers, runtime systems and OSs, not of a hardware spec.

The Mill *does* define (for example) the "layout" of quad decimal FP
(and defines it to be as specified in IEEE754R). It does not define the
layout of structs, because the Mill has no operations that take structs
as input. The compilers can do whatever they want.

The proposal, had you read it, explains that it is possible to  check
pointer arithmetic 1) using 64 bit pointers with no metadata; 2) within
the bounds of a power-of-two granule; 3) but not catching misaligment
within a granule; 4) nor off-object-but-within-granule when the object
is smaller than its granule.

As all of memory is a power-of-two, the checking is satisfied if all
allocators use the maximal granule throughout; there will be no
unexpected exceptions, though of course there will be no checking either.


 >> All date that is possible to port will port in and out of a Mill as
 >> easily as between any other architecture pair. So data is inherently not
 >> portable: pointers and file descriptors won't port to a different
 >> process, never mind a different ISA. But everything else goes.
 >
 > Umm, I work with HSA, the Heterogenous Systems Architecture.
 >
 > This decidedly does try to make pointers port to different ISAs.

Not hardware value of pointer type you don't. You may reconstruct a
pointer at the destination that refers to the reconstructed object that
the original referenced, but you did not copy the bit pattern, which is
all the hardware knows.

 >> Why did you think otherwise? Perhaps you should reread the filing?
 >
 > I haven't read the filing at all, Ivan.
 >
 > Working engineers at most companies, in my experience, are under
 > direction from corporate legal not to go and read patents.


That explains a lot. Perhaps you should not comment on postings you
haven't read; it confused the rest of us.
Re: Bounded pointers robert...@yahoo.com 10/17/13 11:33 PM
On Thu, 17 Oct 2013 21:41:09 -0700, "Andy (Super) Glew"
<an...@SPAM.comp-arch.net> wrote:

>As far as I know, no language standard (except Ada) defines data layout.
>  C/C++ slight exception: they say that structure elements must be in
>increasing address, but otherwise you can add padding.


Cobol comes reasonably close.  While many of the important basic types
do have some implementation defined aspects (include range and exact
representation*), the way a structure ("record") is laid out is
defined, and there is no padding unless you ask for it (by declaring
some or all of the structure "SYNCHRONIZED").


*For example a "USAGE BINARY-SHORT SIGNED" item must be binary and
support a range of -32767 to +32767, it could, like the equivalent C
type be bigger, and could be one's complement, big or little endian,
etc.
Re: Bounded pointers Ivan Godard 10/18/13 12:12 AM
Yeah, but that's COBOL talking, not hardware. Andy asked whether the
*Mill* restructured data layout. The Mill doesn't, although COBOL, on a
Mill or otherwise, might.
Re: Bounded pointers Terje Mathisen 10/18/13 1:52 AM
Ivan Godard wrote:
> On 10/17/2013 11:33 PM, Robert Wessel wrote:
>> *For example a "USAGE BINARY-SHORT SIGNED" item must be binary and
>> support a range of -32767 to +32767, it could, like the equivalent C
>> type be bigger, and could be one's complement, big or little endian,
>> etc.
>>
>
> Yeah, but that's COBOL talking, not hardware. Andy asked whether the
> *Mill* restructured data layout. The Mill doesn't, although COBOL, on a
> Mill or otherwise, might.

The Mill could require all data items to be "naturally aligned", i.e.
all long doubles have to start on a 16-byte boundary, or it can allow
arbitrary alignment (which is what I think it does, right?)

Once upon a time I used an interesting compiler on my Turbo Pascal code
(sorry, I can't remember the name just now...), it had C, ModulaII and
Pascal front ends and a common back end.

Using Pascal it would rearrange the order of local variables in a
procedure, so that they would be properly aligned and so that any pair
of small (i.e. char/byte) variables that were updated near each other
could be written with a single 16-bit store instead of two 8-bit operations.
Re: Bounded pointers Bill Findlay 10/18/13 5:04 AM
On 18/10/2013 05:41, in article 5260BBE5...@SPAM.comp-arch.net, "Andy
(Super) Glew" <an...@SPAM.comp-arch.net> wrote:
 
> As far as I know, no language standard (except Ada) defines data layout.

Just to be clear: It is not the language that defines data layout, it is the
programmer using standard Ada language features who does that; failing
which, the compiler chooses according to implementation-dependent criteria.

> What I mean by data layout:
>
> When you say
>
> struct S {
>       uint8_t_t a;
>       uint64_t b;
>       uint32_t c;
> }
>
> what are sizeof(S),
> the alignment of S (aligned on an address that is a multiple of N)
> and the offsets of a, b?
>
> packed 13/1/0,1,9
> typical x86 16/4/0,4,12
>      - many x86 systems only align uint64_t on a 32 bit boundary
> fully aligned and padded to ensure alignment 24/20/0,8,16
> fully aligned and padded to cache line bank 32/20/0,8,16
> ...
> Ada does a good job of allowing programmer specification of data layout,
> and if the programmer doesn't specify, I think allows a lot of freedom
> to the compiler.

Yup.  
E.g. the first of your layouts (I have not thought about the others):
 
   type S is record
      a : Unsigned_8;
      b : Unsigned_64;
      c : Unsigned_32;
   end record with Alignment => 64, Size => 104;

   for S use record
      a at 0 range 0 ..  7;
      b at 1 range 0 .. 63;
      c at 9 range 0 .. 31;
   end record;

If you do not want to go into that level of detail, you can tell the
compiler to focus on space-saving, or time-saving; or just let it use its
defaults.  To be honest, you cannot expect the compiler to implement
arbitrary layouts. They have to be reasonably implementable, FSVO
reasonably; e.g., specifying an alignment of 17 and a size of 113 might well
evoke an error message.

--
Bill Findlay
with blueyonder.co.uk;
use  surname & forename;

Re: Bounded pointers Paul A. Clayton 10/18/13 8:33 AM
On Friday, October 18, 2013 1:31:27 AM UTC-4, Ivan Godard wrote:
[snip]
>>> On 10/16/2013 10:08 PM, Andy (Super) Glew wrote:
[snip]
>>>> Q: will the mill adhere to any established standards for data structure
>>>> layout?
[snip]
> You have me mightily confused. The Mill is *hardware*. Hardware does not
> define data organization above the data elements that are operands of
> *hardware* operations. Thus in C terms it does not define structs,
> unions, or arrays (although it might define "vectors", which are not the
> same as arrays). Anything not eaten by a functional unit is the province
> of languages, compilers, runtime systems and OSs, not of a hardware spec.

Architectures can provide ABIs which define not only
translation of C types to architectural types but
also provide more details of structure layout than C
provides. Providing such information in the ABI has
the obvious benefit that different compilers are
pushed to greater interoperability (avoiding the need
for competing compiler teams to come to an agreement).

As Andy Glew pointed out, there are cases where
compatibility of data layout is desired across ISAs.

------

I dislike the absolute constraint on data layout based
on structure layout. In my opinion, there is a
difference between a layout that is most friendly to
the programmer and a layout (or even layouts) that are
most friendly to execution. ISTR reading that Sun used
an array-of-structures to structure-of-arrays
transformation on the SPEC CPU art benchmark component;
such is presumably illegal in C but legal for SPEC CPU.

I have suggested the concept of implementation
overloading, where more than one expression with the
same basic meaning could be provided--similar to method
overloading except that instead of differing based on
different object/parameters, the difference would be
based on context such as hardware used or even dataset
characteristics.

Andy Glew suggested a similar concept with more
constrained use but with an emphasis on proven
equivalence. His concept would allow providing
assembly code versions of code written in a
higher level (more programmer-friendly) language:
http://blog.andy.glew.ca/2012/04/compiler-use-and-prove-asm.html
Re: Bounded pointers Ivan Godard 10/18/13 10:05 AM
On 10/18/2013 1:52 AM, Terje Mathisen wrote:
> Ivan Godard wrote:
>> On 10/17/2013 11:33 PM, Robert Wessel wrote:
>>> *For example a "USAGE BINARY-SHORT SIGNED" item must be binary and
>>> support a range of -32767 to +32767, it could, like the equivalent C
>>> type be bigger, and could be one's complement, big or little endian,
>>> etc.
>>>
>>
>> Yeah, but that's COBOL talking, not hardware. Andy asked whether the
>> *Mill* restructured data layout. The Mill doesn't, although COBOL, on a
>> Mill or otherwise, might.
>
> The Mill could require all data items to be "naturally aligned", i.e.
> all long doubles have to start on a 16-byte boundary, or it can allow
> arbitrary alignment (which is what I think it does, right?)

Yes, we support unaligned. But assuming that we did not, the restriction
would affect *access*, not structure. The hardware would not prevent the
compiler from mapping a struct any way it wanted, but would require that
accessing parts (or all) of the struct conform to hardware access rules,
possibly requiring multiple loads that the code then LEGOs together,
just like we do with bit fields now.

Hardware is *not* a high level language :-)

> Once upon a time I used an interesting compiler on my Turbo Pascal code
> (sorry, I can't remember the name just now...), it had C, ModulaII and
> Pascal front ends and a common back end.
>
> Using Pascal it would rearrange the order of local variables in a
> procedure, so that they would be properly aligned and so that any pair
> of small (i.e. char/byte) variables that were updated near each other
> could be written with a single 16-bit store instead of two 8-bit
> operations.

An that is legal for the *compiler* to do. The hardware has nothing to
say about it.

Re: Bounded pointers Chris Gray 10/18/13 12:44 PM
"Paul A. Clayton" <paaron...@gmail.com> writes:

> I dislike the absolute constraint on data layout based
> on structure layout. In my opinion, there is a
> difference between a layout that is most friendly to
> the programmer and a layout (or even layouts) that are
> most friendly to execution.

My small contribution here:

My language Zed has both 'struct' and 'record' types. 'struct' is pretty much
a C 'struct'. 'record' is like those in AlgolW - they are always referenced
via a pointer, and can only be created by allocation.

For 'struct' types, the compiler strictly follows the order of the fields as
declared, and only inserts padding as needed (8 byte alignment for 64 bit
values).  For 'record' types, the compiler is free to do whatever it wants,
and it does do that. For example, if you scatter a bunch of 'bool' fields
through a 'record', they will usually end up all together. Similar for larger
fields that are less than the full 8 byte alignment. The same offset
allocation code is re-used for variable allocation. Note that 'capsule's
(Zed's classes) do the same, including when adding fields to a capsule being
extended.

--
-Chris Gray
Re: Bounded pointers Ivan Godard 10/18/13 1:05 PM
A reasonable language approach. My Mary2 did it similarly but with
different syntax: by default the Mary equivalent did rearranging the way
your record types do, but rather than another type constructor (your
"struct") we use a type attribute "sic" that could be applied to other
types as well. Tghus "sic struct {...}".

Another use for "sic" was to define peculiar FP representations,
especially in sub-word FP where there is no standard representation
across machines and often no hardware and several different software FP
reps in the same program. With "sic" you got the compiler's choice of
representation that was big enough to hold the exponent range and value
precision given in the type; with "sic" you had bit-level control over
the rep.
Re: Bounded pointers willia...@gmail.com 10/18/13 6:31 PM
Patches to add bounds checking into gcc http://williambader.com/bounds/example.html

Modules compiled with and without bounds checking may be linked together without special preparation.
Checked pointers may be passed to modules compiled without bounds checking.
Structures containing pointers may be passed between modules compiled with and without bounds checking.
Modules compiled without bounds checking still check some pointer accesses.
Checked pointers (pointers created by malloc or by modules compiled with bounds checking) are checked in calls to the str* and mem* library functions and in calls to modules compiled with bounds checking.
The checks are similar to a malloc debug library, except more powerful because they also check local and static variables created in modules compiled with checking.
The GCC_BOUNDS_OPTS environment variable passes options to the bounds checker.
The -never-fatal option continues after errors.
The -print-heap option shows memory leaks.
Re: Bounded pointers Andy (Super) Glew 10/21/13 2:01 PM
On 10/17/2013 10:31 PM, Ivan Godard wrote:
> On 10/17/2013 9:41 PM, Andy (Super) Glew wrote:

 >  > Working engineers at most companies, in my experience, are under
 >  > direction from corporate legal not to go and read patents.
 >
 > That explains a lot. Perhaps you should not comment on postings you
 > haven't read; it confused the rest of us.

Ivan: it appears that my assumptions about what you were proposing are
correct - assuming that what you say in the post I am replying to are
correct:

There's nothing particularly fancy about it


>  > I was guessing that you were willing to use a more custom data layout
>  > than I would be.
>
> You have me mightily confused. The Mill is *hardware*. Hardware does not
> define data organization above the data elements that are operands of
> *hardware* operations. Thus in C terms it does not define structs,
> unions, or arrays (although it might define "vectors", which are not the
> same as arrays). Anything not eaten by a functional unit is the province
> of languages, compilers, runtime systems and OSs, not of a hardware spec.
>
> The Mill *does* define (for example) the "layout" of quad decimal FP
> (and defines it to be as specified in IEEE754R). It does not define the
> layout of structs, because the Mill has no operations that take structs
> as input. The compilers can do whatever they want.


> The proposal, had you read it, explains that it is possible to  check
> pointer arithmetic 1) using 64 bit pointers with no metadata; 2) within
> the bounds of a power-of-two granule; 3) but not catching misaligment
> within a granule; 4) nor off-object-but-within-granule when the object
> is smaller than its granule.

Which is what I assumed.


> As all of memory is a power-of-two, the checking is satisfied if all
> allocators use the maximal granule throughout; there will be no
> unexpected exceptions, though of course there will be no checking either.

And this is what I was commenting about.  You seem to be happy with
allocators using power-of-2 sizes.  I was not.

Which I think amounts to our usual disagreement: you seem to be happy
requiring software to change.  I am not.

Computer architects think about not just hardware, not just compilers,
but also the ecosystem.

If you think that your customers, the customers who might want to use
your checked memory, would be happy with power-of-2 sizes, power to you!


--
The content of this message is my personal opinion only. Although I am
an employee (currently Imagination Technologies, which acquired MIPS; in
the past of companies such as Intellectual Ventures/QIPS, Intel, AMD,
Motorola, and Gould), I reveal this only so that the reader may account
for any possible bias I may have towards my employers' products. The
statements I make here in no way represent my employers' positions on
the issue, nor am I authorized to speak on behalf of my employers, past
or present.
Re: Bounded pointers Andy (Super) Glew 10/21/13 2:13 PM
On 10/18/2013 12:12 AM, Ivan Godard wrote:
> On 10/17/2013 11:33 PM, Robert Wessel wrote:
>> On Thu, 17 Oct 2013 21:41:09 -0700, "Andy (Super) Glew"
>> <an...@SPAM.comp-arch.net> wrote:
>>
>>> As far as I know, no language standard (except Ada) defines data layout.
>>>   C/C++ slight exception: they say that structure elements must be in
>>> increasing address, but otherwise you can add padding.
>>
>>
>> Cobol comes reasonably close.  While many of the important basic types
>> do have some implementation defined aspects (include range and exact
>> representation*), the way a structure ("record") is laid out is
>> defined, and there is no padding unless you ask for it (by declaring
>> some or all of the structure "SYNCHRONIZED").
>>
>>
>> *For example a "USAGE BINARY-SHORT SIGNED" item must be binary and
>> support a range of -32767 to +32767, it could, like the equivalent C
>> type be bigger, and could be one's complement, big or little endian,
>> etc.
>>
>
> Yeah, but that's COBOL talking, not hardware. Andy asked whether the
> *Mill* restructured data layout. The Mill doesn't, although COBOL, on a
> Mill or otherwise, might.

OK, Ivan, how about

"Do you (Out of the Box Computing) or does the software ecosystem that
you have or hope to develop for the Mill, or for a hypothetical system,
which might include the Mill or some other CPU, which uses the
power-of-2 memory checking we have been discussing, expect to use some
standard data layout?   Or do you think that your customers would be
willing to live with more use of power-of-2 sized objects, whether
returned by dynamic allocators like malloc, or defined as fields of a
struct or elements of an array?"

Long way of saying it.  Have I got all the qualifications in the right
places?
Re: Bounded pointers Andy (Super) Glew 10/21/13 2:16 PM
On 10/18/2013 6:31 PM, willia...@gmail.com wrote:
> Patches to add bounds checking into gcc http://williambader.com/bounds/example.html
>
> Modules compiled with and without bounds checking may be linked together without special preparation.
> Checked pointers may be passed to modules compiled without bounds checking.
> Structures containing pointers may be passed between modules compiled with and without bounds checking.
> Modules compiled without bounds checking still check some pointer accesses.
> Checked pointers (pointers created by malloc or by modules compiled with bounds checking) are checked in calls to the str* and mem* library functions and in calls to modules compiled with bounds checking.
> The checks are similar to a malloc debug library, except more powerful because they also check local and static variables created in modules compiled with checking.
> The GCC_BOUNDS_OPTS environment variable passes options to the bounds checker.
> The -never-fatal option continues after errors.
> The -print-heap option shows memory leaks.
>

Cool.   It will be interesting to compare your patches to whatever Intel
makes available to use MPX.

It would bed particularly embarassing if your pure SW implementation is
faster than Intel's using the MPX instructions.
Re: Bounded pointers Ivan Godard 10/21/13 2:33 PM
On 10/21/2013 2:01 PM, Andy (Super) Glew wrote:
<snip>


>> As all of memory is a power-of-two, the checking is satisfied if all
>> allocators use the maximal granule throughout; there will be no
>> unexpected exceptions, though of course there will be no checking either.
>
> And this is what I was commenting about.  You seem to be happy with
> allocators using power-of-2 sizes.  I was not.
>
> Which I think amounts to our usual disagreement: you seem to be happy
> *requiring* software to change.  I am not.

> Computer architects think about not just hardware, not just compilers,
> but also the ecosystem.
>
> If you think that your customers, the customers who might want to use
> your checked memory, would be happy with power-of-2 sizes, power to you!
>
>

Again,if you read the writeup you would see that it does not *require*
software to do anything. The software/OS/RTS can treat all of process
memory as a monolith (maximal granule) and allocate from that exactly as
it does on a conventional. Hardware checks will always pass, because the
address arithmetic always results in a pointer that still points into
all-of-memory.

However, if the allocator *wants* finer granule checking without the
overhead and/or program rewrite of a software checker, it *can* bind its
allocations into the smallest encompassing power-of-two granule. The
allocation itself is not restricted to power-of-two, but address errors
will only be caught if they move the address outside the granule as well
as outside the object.

More, restriction may nest. mmap() may or may not return a bounded
pointer encompassing the mmap'd space, from which malloc may *if it
wants* return bounded pointers to malloc'd objects. If mmap returns
bounded, then arithmetic that moves a pointer outside the mmap will
fault. If malloc returns bounded,then arithmetic that moves outside the
object wrapper will fault regardless of what mmap did.

You are creating a straw man, arguing against what the proposal doesn't
do in the first place. Please don't treat me like an idiot - I know
perfectly well that a CPU design will fail in the market if it tries to
*force* changes to existing programs or HLLs, and *nothing* in the Mill
forces that.

Well, in my ire I am somewhat exaggerating: the Mill *does* force 64-bit
pointers and twos complement arithmetic. So what.

To repeat the question I asked here: use of this facility is not
required, and if allocators do use it then some "working" programs that
violate language standards will get faults. Will the user community 1)
not bother to use it?; 2) use it and complain that the CPU is broken
when their code faults?

In other words, would it be used and if used will the gain in error
checking be worth the nuisance?

I would appreciate a real, thoughtful answer based on what the proposal
*actually does*.



Re: Bounded pointers Öö Tiib 10/21/13 2:52 PM
On Tuesday, 22 October 2013 00:33:24 UTC+3, Ivan Godard  wrote:
> I would appreciate a real, thoughtful answer based on what the proposal
> *actually does*.

Most implementations of C++ 'std::vector' already try to double the
capacity of underlying buffer when vector runs out of room. Doubling
a buffer results with next power-of-2 anyway. Some people sometimes
classify that as "inefficiency" but practice is opposite. Well-written
C++ program is very hard to beat in performance.

Your design fits perfectly with most popular container of a popular
well-performing language. Checked iterators of vector for free? That
alone sounds like good market so Andy is likely just jealous. ;)
Re: Bounded pointers Ivan Godard 10/21/13 3:05 PM
On 10/21/2013 2:13 PM, Andy (Super) Glew wrote:

>
> OK, Ivan, how about
>
> "Do you (Out of the Box Computing) or does the software ecosystem that
> you have or hope to develop for the Mill, or for a hypothetical system,
> which might include the Mill or some other CPU, which uses the
> power-of-2 memory checking we have been discussing, expect to use some
> standard data layout?   Or do you think that your customers would be
> willing to live with more use of power-of-2 sized objects, whether
> returned by dynamic allocators like malloc, or defined as fields of a
> struct or elements of an array?"
>
> Long way of saying it.  Have I got all the qualifications in the right
> places?
>

No, you haven't.

OOTBC does not propose to define or require data structure format beyond
that of the individual operand, at all. That is: we don't accept ones
complement numbers, but you can organize your structs any way you like.

We expect that existing ecosystems will migrate to the Mill using their
existing data-format ABIs unchanged. We have done our best to ensure
that the Mill hardware will not get in the way of the designers of
future ecosystems as yet undreamt of.

By not reading the proposal, you have confused power-of-two objects with
power-of-two checking granules. This proposal and the Mill generally
have no idea what constitutes a program object, and the proposal does
not and cannot check violations of object addressing.

The proposal permits pointer addresses to be bound within a checking
granule, using a single-width representation of checked pointers. It
then checks that address arithmetic remains within a *granule*. The
granule is power-of-two sized and aligned, regardless of the size and
alignment of the pointed-to object.

What granule size should be used for a given allocation is up to the
allocator. Two cases seem most useful. In the one case, the granule is
the smallest power-of-two convex hull of the object. In the other, the
granule is the largest possible one, encompassing the process's whole
address space. In the former, moves outside the object but within the
granule are not caught, but moves outside the granule will fault. In the
latter, no moves of any kind will fault.

There is no requirement that distinct objects have distinct granules.

There is no requirement that the layout of objects be changed.

There is no requirement that the algorithm use by or the addresses
returned from allocators be changed.

There is no requirement that the checking be used at all, and it won't
be unless code is added to the allocator to set a granule.

Relational, increment, decrement, assignment and delta operations on
pointers work on checked pointers as they do on any machine. There are
some additional operations for granule query and management; you don't
have to use them.

Certain programs that assume (in violation of language standards) that
pointers are integers may fail. This is true of Mill code in general;
even without this proposal Mill pointers are not simple integers. For
example, the program cannot assume that the upper byte of a pointer can
be used to store flags, as was assumed in early Apples.

The proposal is an aid to debugging only. It cannot be used to enforce
system or application security.
Re: Bounded pointers Andy (Super) Glew 10/23/13 10:30 PM
Blah, blah, blah.

Do you think that overflowing a non-power-of-2 object within a
power-of-2 granule, overflowing into another object within the same
granule, without detecting an overflow is acceptable?


> The proposal is an aid to debugging only. It cannot be used to enforce
> system or application security.

Right.
Re: Bounded pointers Ivan Godard 10/23/13 11:25 PM
On 10/23/2013 10:30 PM, Andy (Super) Glew wrote:
> On 10/21/2013 3:05 PM, Ivan Godard wrote:


<snip>

>> There is no requirement that the checking be used at all, and it won't
>> be unless code is added to the allocator to set a granule.
>
> Blah, blah, blah.
>
> Do you think that overflowing a non-power-of-2 object within a
> power-of-2 granule, overflowing into another object within the same
> granule, without detecting an overflow is acceptable?

Acceptable?  When the alternative is no checking at all?

The proposal is unfortunately minimal, I agree. But something is better
than nothing IMO, and I personally welcome all the checking I can get,
complete or not. However, I am not the target market. Fr the market, I'm
much less worried about what we don't catch than I am about user
reactions when we *do* catch something in their "working" programs.

The checking alternative that I know of I consider unacceptable to the
market:

1) wide pointers will break far too many programs that are otherwise
valid and do not in fact violate address restrictions

2) extra-bit schemes are too invasive of the OS to have hope of being
incorporated into commercial OSs

3) software checking by compiler modification is also unlikely to be
adopted by major tools, and in any case runs afoul of the
third-party-code problem. Such checking is useful when a bug is known to
exist and is being looked for; no software scheme seems suitable for
always-on production code usage.

The proposal is not invasive, and requires no app, compiler, or OS
changes and minimal (one function call) changes to allocators, so it has
some chance of adoption. On the other hand, while it will catch truly
wild addresses it will miss many of the common "off by one" bugs. The
hardware cost (mostly entropy from needing distinct pointer-arithmetic
operations) is trivial.


>
>> The proposal is an aid to debugging only. It cannot be used to enforce
>> system or application security.
>
> Right.
>
>

Exactly: no more has ever been claimed. The question remains: will the
occasional and incomplete aid be appreciated enough by the developer
community to outweigh those who don't want to hear about faults?

This is a user-psyche question that I put to the group here, not a
technical one. The proposal cannot be changed for tighter checking; it
is inherently limited, so complaining that it isn't tight enough is
pointless - it is what it is. As it stands, I would still welcome your
opinion whether incorporation it in the Mill would be, on average, seen
as a bug or a feature by our potential market.

Ivan
Re: Bounded pointers Noob 10/24/13 1:24 AM
Ivan Godard wrote:

> We have (so far) not included this idea in the Mill definition; we are
> not sure that the game is worth the candle. It costs six bits out of the
> 64-bit address space, and as a SAS machine we are careful of address
> space. In addition, it will break some language-invalid but functionally
> correct code, such as code that moves a pointer off a target array but
> then moves it back before dereferencing.

Hmmm, UB is UB, nasal demons and all.

But I get your point. If the new kid on the block faults while
the other players don't flinch, people are going to blame the
new kid...

cf. the memcpy/memmove Linus (yes, "s") fiasco.

Or this one:
int foo(struct toto *p) {
  int n = p->u;
  if (p == NULL) <- optimized away

Or this one:
for (int i=1; i > 0; ++i) { <- optimized into an infinite loop

Re: Bounded pointers Terje Mathisen 10/24/13 1:45 AM
Noob wrote:
> cf. the memcpy/memmove Linus (yes, "s") fiasco.

Link?
>
> Or this one:
> int foo(struct toto *p) {
>    int n = p->u;
>    if (p == NULL) <- optimized away

That is simply broken code, it should fault on the "int n..." statement.
>
> Or this one:
> for (int i=1; i > 0; ++i) { <- optimized into an infinite loop
>

Is that legal?

With twos-complement the integer _will_ wrap around into negatie values,
while a sign+mantissa implementation will wrap to zero at the same point.

In fact, I don't think you can come up with any (fixed/limited-storage)
integer representation which won't wrap around to either negative or zero!

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
Re: Bounded pointers Noob 10/24/13 1:53 AM
Andy (Super) Glew wrote:

> But, no, I don't have a link - although Terje seems to remember that the
> comp.arch discussion happened recently, in the last two years or so. (I
> have lousy luck searching comp.arch.)

Google makes baby jesus cry. (So much pointless javascript...
Remember a few years back, when Google was light-weight, and
didn't sleep with the NSA?)
https://groups.google.com/advanced_search
https://groups.google.com/groups/search?as_epq=buffer+overflow&as_ugroup=comp.arch&as_uauthors=glew

Date: Sun, 02 Aug 2009 12:46:41 -0700
From: "Andy \"Krazy\" Glew"
Subject: HardBound and SoftBound  (was "The State of Software")
https://groups.google.com/forum/message/raw?msg=comp.arch/iQs02MvjHo4/7r_AxbqMiGMJ

Re: Bounded pointers Noob 10/24/13 2:03 AM
Andy (Super) Glew wrote:

> Working engineers at most companies, in my experience, are under
> direction from corporate legal not to go and read patents.

Evidence that the patent system is borked.

Re: Bounded pointers Noob 10/24/13 2:31 AM
Terje Mathisen wrote:

> Noob wrote:
>
>> cf. the memcpy/memmove Linus (yes, "s") fiasco.
>
> Link?

When glibc changed the memcpy implementation to one that failed
if src and dest overlapped (AS SPECIFIED) Linus argued that memcpy
should behave like memmove, because that's what most people expect.

https://lwn.net/Articles/414467/
https://bugzilla.redhat.com/show_bug.cgi?id=638477

>> Or this one:
>> int foo(struct toto *p) {
>>    int n = p->u;
>>    if (p == NULL) <- optimized away
>
> That is simply broken code, it should fault on the "int n..." statement.

No fault in kernel code => pwnage, CVE, etc
https://lwn.net/Articles/342330/
http://blog.cr0.org/2009/06/bypassing-linux-null-pointer.html

>> Or this one:
>> for (int i=1; i > 0; ++i) { <- optimized into an infinite loop
>
> Is that legal?

What do you mean? Is it a constraint violation? No.
The platform is allowed to assume "no overflow on signed integers
arithmetic", so the positive integer will remain positive.

> With twos-complement the integer _will_ wrap around into negatie values,
> while a sign+mantissa implementation will wrap to zero at the same point.
>
> In fact, I don't think you can come up with any (fixed/limited-storage)
> integer representation which won't wrap around to either negative or zero!

They won't wrap because /The Scripture/ says so! ;-)

http://stackoverflow.com/questions/7682477/why-does-integer-overflow-on-x86-with-gcc-cause-an-infinite-loop
https://lwn.net/Articles/511259/

Regards.

Re: Bounded pointers Anton Ertl 10/24/13 3:10 AM
Terje Mathisen <"terje.mathisen at tmsw.no"> writes:
>Noob wrote:
>> for (int i=1; i > 0; ++i) { <- optimized into an infinite loop
>>
>
>Is that legal?
>
>With twos-complement the integer _will_ wrap around into negatie values,
>while a sign+mantissa implementation will wrap to zero at the same point.
>
>In fact, I don't think you can come up with any (fixed/limited-storage)
>integer representation which won't wrap around to either negative or zero!

All correct.  But the current breed of C compilers (or C miscompilers)
works based on the assumption that signed overflow does not happen,
because it is undefined behaviour in the C standard.  And from that
assumption it "optimizes" "i>0" in the loop above into 1.

While the motivation for not defining signed-integer overflow in the C
standard was the existence of several different signed-integer
representations with different results on overflow, C miscompilers use
this non-definition as an excuse for miscompiling, even if they only
ever generate code for twos-complement machines.

Ironically signed-integer overflow is defined in Java, so Java may be
a better low-level language these days than C.

- anton
--
M. Anton Ertl                    Some things have to be seen to be believed
an...@mips.complang.tuwien.ac.at Most things have to be believed to be seen
http://www.complang.tuwien.ac.at/anton/home.html
Re: Bounded pointers MitchAlsup 10/24/13 10:11 AM
On Thursday, October 24, 2013 3:24:06 AM UTC-5, Noob wrote:
> Or this one:
>
> for (int i=1; i > 0; ++i) { <- optimized into an infinite loop

But it is NOT an infinite loop. It is a loop that runs for 2^31 or
2^63 passes. The 32-bit version will only take a few seconds on a
modern fast processor--hardly infinite.

Mitch
Re: Bounded pointers George Neuner 10/24/13 6:06 PM

On Thu, 24 Oct 2013 10:53:55 +0200, Noob <ro...@127.0.0.1> wrote:

>Andy (Super) Glew wrote:
>
>> (I have lousy luck searching comp.arch.)
>
>Remember a few years back, when Google was light-weight

Remember [back before Google took over] when the AltaVista engine
actually excelled at searching newsgroups?

George
Re: Bounded pointers George Neuner 10/24/13 6:32 PM
On Fri, 18 Oct 2013 08:33:28 -0700 (PDT), "Paul A. Clayton"
<paaron...@gmail.com> wrote:


>I dislike the absolute constraint on data layout based
>on structure layout. In my opinion, there is a
>difference between a layout that is most friendly to
>the programmer and a layout (or even layouts) that are
>most friendly to execution.

And yet another that's necessary for hardware interfacing ...
something C used to be really good at before application programmers
co-opted it  8-(


>ISTR reading that Sun used
>an array-of-structures to structure-of-arrays
>transformation on the SPEC CPU art benchmark component;
>such is presumably illegal in C but legal for SPEC CPU.

I don't think it's prohibited per se because, although the layout of
structures is defined, the results of comparing the addresses of
structure members technically is undefined behavior.

So long as the transformation preserves the relative ordering of the
structure addresses - which it does - I think it possibly could be
argued that the transformation either is legal or is acceptable as
implementation defined behavior.

Even if not, it is a good option to have because it can save a lot of
data swizzling for certain SIMD codes.

George
Re: Bounded pointers Andy (Super) Glew 10/24/13 6:52 PM
On 10/23/2013 11:25 PM, Ivan Godard wrote:
> On 10/23/2013 10:30 PM, Andy (Super) Glew wrote:
>> On 10/21/2013 3:05 PM, Ivan Godard wrote:
>
>
> <snip>
>
>>> There is no requirement that the checking be used at all, and it won't
>>> be unless code is added to the allocator to set a granule.
>>
>> Blah, blah, blah.

Sorry, I was grumpy last night. Annoyed at you explaining obvious things
to me, and not answering my question. Although perhaps you did:

I was thinking that you were more willing than I am to say "If you want
to get good checking, round your data structures up to power-of-2 sizes".

I think that instead you are saying "power-of-2 checking only: if you
don't round your data structures up, (1) some errors may not be detected
at all, and (2) some errors may be detected very late".

Although I suspect that you are going to tell me once again that I have
totally misunderstood you, and that I am not allowed to comment on your
proposal.

>>
>> Do you think that overflowing a non-power-of-2 object within a
>> power-of-2 granule, overflowing into another object within the same
>> granule, without detecting an overflow is acceptable?
>
> Acceptable?  When the alternative is no checking at all?

I suppose the example of guard pages shows that people are willing to
live with checking that catches errors potentially a very long time
after the actual overflow.


> The proposal is unfortunately minimal, I agree. But something is better
> than nothing IMO, and I personally welcome all the checking I can get,
> complete or not. However, I am not the target market. Fr the market, I'm
> much less worried about what we don't catch than I am about user
> reactions when we *do* catch something in their "working" programs.
>
> The checking alternative that I know of I consider unacceptable to the
> market:

You forgot the MPX scheme. Which is pretty well proven not to break
programs or OSes.

Although I agree with you that Intel's chosen implementation requires a
bit too much compiler involvement for my taste.
Actually, my feeling bad about grumpy "blah, blah, blah" leads me to
suspect that there may be a possibility of tighter checking.

One of the design principles behind my work that led to MPX... (I'm
getting tired of that circumlocution - I called it "Cp/PL". I think that
there is no harm in letting that be known. "Cp" stood for "checked
pointers"; "PL" was short for a longer codename in the Intel geographic
pattern. I won't say that.)

One of the design principles behind my work that led to MPX or Cp/PL was
"no struct descriptors".

      Capability ISAs are not so bad, whether using power-of-2 bounded
pointers or wide pointers with [lwb,upb) (especially if the wide bits
are stored non-adjacent).

---+ Struct and array descriptors considered bad

But many capability ISAs eventually have array descriptors - not simple
[lwb,upb) descriptors, but
      [base_address,elem_type,ndim,dim1,dim2...].

Worse still, many capability ISAs have struct descriptors:
      [struct_type,nfields,field1_type, field2_type,...]

Variable length.

Not something I want to manage in hardware.

---+ Living without direct struct descriptors

Part of the thought process that led to Cp-PL / MPX involved
+ I don't want struct descriptors
+ okay, how do I protect structs - how do I prevent overflow of an
element with a struct

Reluctantly applying the adage "any problem in computer science can be
solved by a level of indirection":

Instead of struct descriptors, use pointers to struct elements.
Possibly in a separate struct or array:

struct Target { int a; char b; double c; ... };

struct Pointers_to_Target_Fields(Target& t) {
        int* a_ptr(&t.a);
        char* b_ptr(&t.b);
        double* c_ptr(&t.c);
        ...
};

Basically, this amounts to saying that a compiler can create its own
struct_descriptors, if it really wants to, albeit using this level of
indirection.

Now, considering where Cp-PL ended up, this is largely irrelevant.  But
nevertheless, it was important to my thought process: I would not have
bothered to continue thinking about Cp-PL if I did not see how it could
ultimately be extended to security just as good as any other capability
system.

---+ What does this have to do with power-of-2 checked pointers?

Most primitive objects understood by the hardware have power-of-2 sizes:
8, 16, 32, 64, 128 bits.

You could probably get away with power-of-2 sized pointers in the thing
that looks like a struct descriptor:

struct Pointers_to_Target_Fields(Target& t) {
        Power_of_2_Checked_Pointer<int*> a_ptr(&t.a);
        Power_of_2_Checked_Pointer<char*> b_ptr(&t.b);
        Power_of_2_Checked_Pointer<double*> c_ptr(&t.c);
        ...
};

noting that sizeof<Power_of_2_Checked_Pointer<T*>> = sizeof<T*>.

Tight checking.

Unfortunately, this doesn't handle pointing to non-power-of-2 arrays as
elements in structs.  Nor does it handle the fact that the
Pointers_to_Target_Fields thing that looks like a struct descriptor is
non-power-of-2.



--

Oh, by the way this reminds me: one of the biggest reasons not to use
1-extra-bit power-of-two pointers was that such pointers are only
[ptr-base-address,size), i.e. the pointer is always to the bottom of the
memory region.  Instead of ptr,[lwb,ubp) - where the pointer is separate
from the bounds.   C allows the pointers to be outside the bounds, in a
very limited way, but never to be dereferenced.  Many C programs use
negative offsets.   Disallowing negative offsets breaks far too many
programs.

      If I am correct in understanding, Ivan's proposal is not my
1-extra-bit trick. Instead it provides an N-bit field (at one time Ivan
described it as floating point, at another time as a 6-bit field).

      If my understanding is correct, Ivan's proposal indicates how many
low bits to ignore in the 2^N bounds comparison, but allows a pointer
into the middle of the region.  So it does not have the problem that my
1-extra-bit power-of-two pointers had. Although it costs 6 extra address
bits.

So, see: the 1-extra-bit approach is prior art for power-of-2-aligned
checking. (If it is publicly visible anywhere.)  But Ivan's N-bit field
approach is different.

(BTW^2: when Cp/PL - MPX started, it was still considered essential to
make it available on a 32-bit machine, so consuming 5 address bits would
not have been acceptable.  Even considering 1 was borderline. On
64-bits, taking away 6 address bits may be acceptable.  But then it
interferes with the various other software systems that do things like
placing tags in the upper or lower bits of the address. Like ARMv8
allowing the upper 8 bits of address to be ignored by the hardware
virtual address translation facility, reserved for tagged pointers.
Therefore for MPX/Cp-PL we just left the user's pointer bits alone, and
placed our metadata somewhere else.)
Re: Bounded pointers Andy (Super) Glew 10/24/13 7:07 PM
First, I must say: companies do not ignore prior art.  They direct
engineers and inventors not to go looking for prior art. The inventors
disclose any prior art they know about, but don't go searching for it
themselves. Instead the patent department hires professional patent
searchers to search for prior art.  And engineers are directed not to
read patents "just because they are interesting", whether at work or in
their own spare time.

I agree: this is a pity.  I used to learn a lot from reading patents,
early in my career, before the patent wars became so heated. I read them
far less often nowadays.  (Although, as you can imagine, I read
thousands and thousands of patents while I was at Intellectual Ventures.
  If I am ever accused of having violated a patent that I read while I
was at IV, my defense is that I read so many that I cannot remember
details of any of them.)

Quite regularly some independent inventor, somebody with a "startup"
that is him and a couple of friends working for free, asks me to look at
their patents.  I nearly always decline, since this is a bit of a red
flag - such startups oftem fail, and such patents often end up being
sold to patent trolls, ahem, Non-Practicing Entities (or even to
practicing entities that will not actually implement the patent, but
will just use it to sue other companies - often big companies suing the
next generation of startups, or at least frightening off their funding),
so having an email record of a conversation where you review a patent is
often likely to land you in 3X trouble.
Re: Bounded pointers Andy (Super) Glew 10/24/13 7:12 PM
Thanks.  Although I don't think that discloses using the 1-extra-bit
trick for power-of-2 sized pointers.
Re: Bounded pointers Ivan Godard 10/24/13 7:19 PM
On 10/24/2013 6:52 PM, Andy (Super) Glew wrote:
> On 10/23/2013 11:25 PM, Ivan Godard wrote:
>> On 10/23/2013 10:30 PM, Andy (Super) Glew wrote:
>>> On 10/21/2013 3:05 PM, Ivan Godard wrote:
>>
>>
>> <snip>
>>
>>>> There is no requirement that the checking be used at all, and it won't
>>>> be unless code is added to the allocator to set a granule.
>>>
>>> Blah, blah, blah.
>
> Sorry, I was grumpy last night. Annoyed at you explaining obvious things
> to me, and not answering my question. Although perhaps you did:
>
> I was thinking that you were more willing than I am to say "If you want
> to get good checking, round your data structures up to power-of-2 sizes".


Likely.

> I think that instead you are saying "power-of-2 checking only: if you
> don't round your data structures up, (1) some errors may not be detected
> at all, and (2) some errors may be detected very late".


Yep

> Although I suspect that you are going to tell me once again that I have
> totally misunderstood you, and that I am not allowed to comment on your
> proposal.

Nope

>>>
>>> Do you think that overflowing a non-power-of-2 object within a
>>> power-of-2 granule, overflowing into another object within the same
>>> granule, without detecting an overflow is acceptable?
>>
>> Acceptable?  When the alternative is no checking at all?
>
> I suppose the example of guard pages shows that people are willing to
> live with checking that catches errors potentially a very long time
> after the actual overflow.
>

People who want checking will take what they can get IMO. I'm one.


>> The proposal is unfortunately minimal, I agree. But something is better
>> than nothing IMO, and I personally welcome all the checking I can get,
>> complete or not. However, I am not the target market. Fr the market, I'm
>> much less worried about what we don't catch than I am about user
>> reactions when we *do* catch something in their "working" programs.
>>

<snip>

>>
>>>
>>>> The proposal is an aid to debugging only. It cannot be used to enforce
>>>> system or application security.
>>>
>>> Right.
>>>
>>>
>>
>> Exactly: no more has ever been claimed. The question remains: will the
>> occasional and incomplete aid be appreciated enough by the developer
>> community to outweigh those who don't want to hear about faults?
>>
>> This is a user-psyche question that I put to the group here, not a
>> technical one. The proposal cannot be changed for tighter checking; it
>> is inherently limited, so complaining that it isn't tight enough is
>> pointless - it is what it is. As it stands, I would still welcome your
>> opinion whether incorporation it in the Mill would be, on average, seen
>> as a bug or a feature by our potential market.
>>
>> Ivan
>
> Actually, my feeling bad about grumpy "blah, blah, blah" leads me to
> suspect that there may be a possibility of tighter checking.

Go patent it then; I don't see one. MPX is much too much hardware  IMO,
and the latencies work only on OOO.
Yes, if I understand you. Either you can go unchecked (or barely
checked, as in guard pages or my floating-pointers), or you go all the
way to a type (or at least structure) system in the hardware; there
seems no middle ground to me. I think that there is no technical reason
not to go all the way - ops are complex, but you need fewer of them and
  we have the gates, so fully typed may actually be faster than purist
RISC, 432 not withstanding.

But I can't sell it, and neither could you :-(

> --
>
> Oh, by the way this reminds me: one of the biggest reasons not to use
> 1-extra-bit power-of-two pointers was that such pointers are only
> [ptr-base-address,size), i.e. the pointer is always to the bottom of the
> memory region.  Instead of ptr,[lwb,ubp) - where the pointer is separate
> from the bounds.   C allows the pointers to be outside the bounds, in a
> very limited way, but never to be dereferenced.  Many C programs use
> negative offsets.   Disallowing negative offsets breaks far too many
> programs.
>
>       If I am correct in understanding, Ivan's proposal is not my
> 1-extra-bit trick. Instead it provides an N-bit field (at one time Ivan
> described it as floating point, at another time as a 6-bit field).


Six bit exponent in a floating-point representation.

>       If my understanding is correct, Ivan's proposal indicates how many
> low bits to ignore in the 2^N bounds comparison, but allows a pointer
> into the middle of the region.  So it does not have the problem that my
> 1-extra-bit power-of-two pointers had. Although it costs 6 extra address
> bits.

Yes :-(

> So, see: the 1-extra-bit approach is prior art for power-of-2-aligned
> checking. (If it is publicly visible anywhere.)  But Ivan's N-bit field
> approach is different.

So I had thought.

> (BTW^2: when Cp/PL - MPX started, it was still considered essential to
> make it available on a 32-bit machine, so consuming 5 address bits would
> not have been acceptable.  Even considering 1 was borderline. On
> 64-bits, taking away 6 address bits may be acceptable.  But then it
> interferes with the various other software systems that do things like
> placing tags in the upper or lower bits of the address. Like ARMv8
> allowing the upper 8 bits of address to be ignored by the hardware
> virtual address translation facility, reserved for tagged pointers.
> Therefore for MPX/Cp-PL we just left the user's pointer bits alone, and
> placed our metadata somewhere else.)

We are already using four bits for other reasons although only one of
those is truly essential to the architecture; using six more has to
offer real benefit.

Re: Bounded pointers Andy (Super) Glew 10/24/13 7:20 PM
On 10/23/2013 11:25 PM, Ivan Godard wrote:
> On 10/23/2013 10:30 PM, Andy (Super) Glew wrote:
>> On 10/21/2013 3:05 PM, Ivan Godard wrote:

> Exactly: no more has ever been claimed. The question remains: will the
> occasional and incomplete aid be appreciated enough by the developer
> community to outweigh those who don't want to hear about faults?
>
> This is a user-psyche question that I put to the group here, not a
> technical one. The proposal cannot be changed for tighter checking; it
> is inherently limited, so complaining that it isn't tight enough is
> pointless - it is what it is. As it stands, I would still welcome your
> opinion whether incorporation it in the Mill would be, on average, seen
> as a bug or a feature by our potential market.
>
> Ivan

What does "incorporation in the Mill" mean?

Implementing it in your simulator? FPGA? Sure - as long as it does not
displace something more important, or make it too big too fit.

If you are using your own $$$ to pay for a TSMC real chip - probably not.

Placing it in RTL - sure, modulo effort.  It can't hurt, except for
effort.  It may attract some customers. But I think that it would be
more useful to evaluate what fraction of security bugs - e.g. in your
favorite security benchmark suite - your technique finds.  That would be
data that customers might be interested in.

But be sure it can be disabled.   As I mentioned in my last post, it can
be very annoying to run into a program that does things like sticking
tags in upper bits, or shifting the pointer left, in a manner than
interferes with your "floating point address checked pointers".
Re: Bounded pointers Ivan Godard 10/24/13 7:38 PM
On 10/24/2013 7:20 PM, Andy (Super) Glew wrote:
> On 10/23/2013 11:25 PM, Ivan Godard wrote:
>> On 10/23/2013 10:30 PM, Andy (Super) Glew wrote:
>>> On 10/21/2013 3:05 PM, Ivan Godard wrote:
>
>> Exactly: no more has ever been claimed. The question remains: will the
>> occasional and incomplete aid be appreciated enough by the developer
>> community to outweigh those who don't want to hear about faults?
>>
>> This is a user-psyche question that I put to the group here, not a
>> technical one. The proposal cannot be changed for tighter checking; it
>> is inherently limited, so complaining that it isn't tight enough is
>> pointless - it is what it is. As it stands, I would still welcome your
>> opinion whether incorporation it in the Mill would be, on average, seen
>> as a bug or a feature by our potential market.
>>
>> Ivan
>
> What does "incorporation in the Mill" mean?

Define it as part of the architecture.

> Implementing it in your simulator? FPGA? Sure - as long as it does not
> displace something more important, or make it too big too fit.

Implementation, power and area costs are trivial.

> If you are using your own $$$ to pay for a TSMC real chip - probably not.
>
> Placing it in RTL - sure, modulo effort.  It can't hurt, except for
> effort.  It may attract some customers. But I think that it would be
> more useful to evaluate what fraction of security bugs - e.g. in your
> favorite security benchmark suite - your technique finds.  That would be
> data that customers might be interested in.
>
> But be sure it can be disabled.   As I mentioned in my last post, it can
> be very annoying to run into a program that does things like sticking
> tags in upper bits, or shifting the pointer left, in a manner than
> interferes with your "floating point address checked pointers".

We'll be facing that anyway. Mill pointers are not integers, whether
using this proposal or not.

What this will break, that wouldn't have broken anyway; is:

        int* p = new[100]int;
        p += 200;
        p -= 150;
        int i = *p;

The above code is illegal in the language, but will "work" if not
checked. Should it be faulted?
Re: Bounded pointers Andy (Super) Glew 10/24/13 10:55 PM
For Cp-PL / MPX, we chose not to fault. Fault only at dereference.

You have to allow

       int* p = new[100]int;
       p += 100;
       p -= 1;
       int i = *p;

I did not want Cp's checked pointers to have knowledge of the element type.
Re: Bounded pointers Ryan Hitchman 10/25/13 12:13 AM
On Thursday, October 24, 2013 7:20:37 PM UTC-7, Andy (Super) Glew wrote:
> Placing it in RTL - sure, modulo effort.  It can't hurt, except for
> effort.  It may attract some customers. But I think that it would be
> more useful to evaluate what fraction of security bugs - e.g. in your
> favorite security benchmark suite - your technique finds.  That would be
> data that customers might be interested in.

Agreed. I doubt the gains in adoption from such a security feature would outweigh the complication in the ISA/CPU, especially for a startup. I suspect extremely low-overhead equivalent implementations are possible by implementing general purpose instructions like "extract/insert bitfield" (Cf. ARM's UBFX/BFI/BFC) and "branch if bitfields not equal".

For other ISA-level security features, allowing --X permissions, where code can be executed but not loaded as data, would be beneficial for preventing on-the-fly ROP-chain assembly (http://cs.unc.edu/~fabian/papers/oakland2013.pdf), though I'm not sure how vulnerable the Mill is to ROP chains in the first place. IP-relative calls and loads are essential for PIC and ASLR.
Re: Bounded pointers Noob 10/25/13 3:05 AM
MitchAlsup wrote:

> Noob wrote:
>
>> for (int i=1; i > 0; ++i) { <- optimized into an infinite loop
>
> But it is NOT an infinite loop.

As far as the C abstract machine is concerned, it is the
responsibility of the programmer to guarantee that arithmetic
with signed integers NOT overflow. Therefore, there is no
such thing as "int wrap-around" (contrary to unsigned int,
where wrap-around is fully specified).

Maybe you are arguing that this is a defect in C89/C99/C11,
but I'm telling it like it is, not like how it should be.

Relying on int wrap-around is relying on UB.

If you want wrap-around, use unsigned integers.

Re: Bounded pointers Öö Tiib 10/25/13 5:07 AM
On Friday, 25 October 2013 13:05:08 UTC+3, Noob  wrote:
> MitchAlsup wrote:
>
> > Noob wrote:
> >
> >> for (int i=1; i > 0; ++i) { <- optimized into an infinite loop
> >
> > But it is NOT an infinite loop.
>
> As far as the C abstract machine is concerned, it is the
> responsibility of the programmer to guarantee that arithmetic
> with signed integers NOT overflow.

Nitpick: it is so for "standard C abstract machine".

> Therefore, there is no
> such thing as "int wrap-around" (contrary to unsigned int,
> where wrap-around is fully specified).

That may be defined behaviour by compiler; for example gcc and
clang (most popular compilers i believe) have option -fwrapv.

> Maybe you are arguing that this is a defect in C89/C99/C11,
> but I'm telling it like it is, not like how it should be.
>
> Relying on int wrap-around is relying on UB.
>
> If you want wrap-around, use unsigned integers.

I generally share that attitude but it is attitude _not_ universal
truth. People do what they want/need/are told to do and so most
code uses various extensions in practice.
 
Re: Bounded pointers Ivan Godard 10/25/13 7:32 AM
As described, the proposal permits this - it's the "one past the end" of
the host languages, and requires one more reserved bit to support.

Refusing to read proposals does seem counterproductive, whe you are
going to be exposed to the idea anyway if you participate in discussions
like this board. I realize that it's not your personal choice but
something that is pushed upon you, but even for your bosses purpose the
rule seems a half-measure - they should either permit and encourage you
to read and discuss everything you can find, or force you to complete
hermitage.

> I did not want Cp's checked pointers to have knowledge of the element type.

The proposal has no knowledge of type.
Re: Bounded pointers Ivan Godard 10/25/13 8:04 AM
On 10/25/2013 12:13 AM, Ryan Hitchman wrote:
> On Thursday, October 24, 2013 7:20:37 PM UTC-7, Andy (Super) Glew
> wrote:
>> Placing it in RTL - sure, modulo effort.  It can't hurt, except
>> for effort.  It may attract some customers. But I think that it
>> would be more useful to evaluate what fraction of security bugs -
>> e.g. in your favorite security benchmark suite - your technique
>> finds.  That would be data that customers might be interested in.
>
> Agreed. I doubt the gains in adoption from such a security feature
> would outweigh the complication in the ISA/CPU, especially for a
> startup.

We judge implementation issues to be trivial: we distinguish ADDP from
ADDI, ADDS, ADDB, and ADDD anyway, so all this is a minor logic change
in ADDP, and equivalent on other operations.

I suspect extremely low-overhead equivalent implementations
> are possible by implementing general purpose instructions like
> "extract/insert bitfield" (Cf. ARM's UBFX/BFI/BFC) and "branch if
> bitfields not equal".

Hard to beat one op/one cycle :-)

Implementing checking via extract/compare/branch is expensive - you have
three ops (internally the relational and the branch are separate ops),
plus the branch predictor slot. The only not to use a single checked
ADDP is a religious preference for RISC instruction sets.

> For other ISA-level security features, allowing --X permissions,
> where code can be executed but not loaded as data, would be
> beneficial for preventing on-the-fly ROP-chain assembly
> (http://cs.unc.edu/~fabian/papers/oakland2013.pdf), though I'm not
> sure how vulnerable the Mill is to ROP chains in the first place.

Not very. The requirement to decode in both directions from a branch
entry point means that you would be looking for a very far-fetched bit
pattern in the code. And our code isn't writeable.

> IP-relative calls and loads are essential for PIC and ASLR.
>

Static flow-of-control ops are all PIC.
Re: Bounded pointers Andy (Super) Glew 10/25/13 11:09 AM
On 10/25/2013 7:32 AM, Ivan Godard wrote:
> On 10/24/2013 10:55 PM, Andy (Super) Glew wrote:

>>> What this will break, that wouldn't have broken anyway; is:
>>>
>>>      int* p = new[100]int;
>>>      p += 200;
>>>      p -= 150;
>>>      int i = *p;
>>>
>>> The above code is illegal in the language, but will "work" if not
>>> checked. Should it be faulted?
>>
>> For Cp-PL / MPX, we chose not to fault. Fault only at dereference.
>>
>> You have to allow
>>
>>        int* p = new[100]int;
>>        p += 100;
>>        p -= 1;
>>        int i = *p;
>
> As described, the proposal permits this - it's the "one past the end" of
> the host languages, and requires one more reserved bit to support.

Perhaps I should have said

     You have to allow

         int* p = new[100]int;
         int *p2 = p - 1;
         int *p3 += 100;
         int i = *p3;

This is "one element before the end".

But in terms of address, p2 = p - sizeof(int).


>> I did not want Cp's checked pointers to have knowledge of the element
>> type.
>
> The proposal has no knowledge of type.

Perhaps yours would not.

But mine, where pointers are just addresses with metadata, would have
required knowledge of type, or at least sizeof(type),

By the way, here's another advantage of the "metadata implied by the
user modifiable pointer" approach - either your "floating point
addresses", or my "1-extra-bit", both power-of-2 sized:

You can do computations like XORing pointers, and still get checking.
Advisory checking.

---

Another "By The Way": I wanted to have 1 single bit inside the pointer,
1 single user modifiable bit, to improve performance / reduce hardware -
1 bit that said "no need to go look up the non-adjacent metadata",
to improve performance for "insecure" pointers.  But this did not fly.
Descriptor lookaside buffers all the way,
Re: Bounded pointers Ivan Godard 10/25/13 12:53 PM
Yes, programs do this even though it is illegal in the host language,
and the proposal would fault the usage.

>>> I did not want Cp's checked pointers to have knowledge of the element
>>> type.
>>
>> The proposal has no knowledge of type.
>
> Perhaps yours would not.
>
> But mine, where pointers are just addresses with metadata, would have
> required knowledge of type, or at least sizeof(type),
>
> By the way, here's another advantage of the "metadata implied by the
> user modifiable pointer" approach - either your "floating point
> addresses", or my "1-extra-bit", both power-of-2 sized:
>
> You can do computations like XORing pointers, and still get checking.
> Advisory checking.

The result (and arguments) of XOR are integral, not pointers, so there
must be a reinterpret_cast<...>(...) before the result can be used as an
address. There's an op to do such casting (and another to cast in the
other direction), which forces the granule to maximum. The result can be
used happily as an address, but you've lost all checking unless the code
explicitly reasserts the granule.
Re: Bounded pointers Stefan Monnier 10/25/13 1:17 PM
> In other words, would it be used and if used will the gain in error checking
> be worth the nuisance?

It might get some use (after all, people have use "hardening"
C compilers and OS tricks, like the "canary" scheme, or the NX bit).

But this can only benefit weakly typed languages.  While C++ is still
commonly used, I don't think its use is still growing, C is on the
decline, and all the growth is in languages that are (or can be)
strongly typed, where the compiler can do array bounds checking
efficiently, and reliably, without any special hardware support.


        Stefan
Re: Bounded pointers Andy (Super) Glew 10/25/13 6:16 PM
Last stats I saw, C and C++ are still growing, in terms of number of
lines of code shipping.

But they are declining in relative importance, i.e. as a fraction of all
lines of code shipping.

But you raise another point, or more than one:

As you may have noticed, Ivan and I disagree on the importance of
advisory versus mandatory checking. Or perhaps not the importance: I
think Ivan likes capabilities much as I do, perhaps more given his
Burroughs background. Market viability.

Obviously I was willing to work on Cp-PL / MPX, when it is "only"
advisory. Only a debugging aid.  Similarly, Ivan's checked pointers are
"only" a debugging aid, he emphasizes.

However, I hope that we can provide more than a debugging aid. I tried
to design an instruction set extension that could eventually be used to
provide "full security", as strong and as hard to avoid as page table
based virtual memory.  One of the reasons I am a bit unhappy with MPX is
that they gave that up.

So, two points:

* I think that it would be good if any hardware extension for memory
checking (buffer overflow, dangling pointer, etc.) was fast enough that
it could be used by "languages that are (or can be) strongly typed,
where the compiler can do array bounds checking efficiently, and
reliably, without any special hardware support."

Power-of-2 sized objects are less useful for this.

* Bad experience has biased people against language or compiler or
assembler based security.  Languages/compilers/assemblers have bugs;
they are large programs, hard to prove correct, and they have bugs that
can be exploited.

Arbitrary bounds and 2^N-bounds can both be applied here.
Re: Bounded pointers Paul A. Clayton 10/25/13 6:28 PM
On Thursday, October 24, 2013 9:32:38 PM UTC-4, George Neuner wrote:
> On Fri, 18 Oct 2013 08:33:28 -0700 (PDT), "Paul A. Clayton"
> <paaron...@gmail.com> wrote:
>
>
>>I dislike the absolute constraint on data layout based
>>on structure layout. In my opinion, there is a
>>difference between a layout that is most friendly to
>>the programmer and a layout (or even layouts) that are
>>most friendly to execution.
>
> And yet another that's necessary for hardware interfacing ...
> something C used to be really good at before application programmers
> co-opted it  8-(

In a sense that is "friendly to execution" in that it
can guarantee that peculiar memory behaviors (typically
I/O addresses where accesses have side effects, I
suspect) are properly handled (more correct execution
rather than higher performance, more energy efficient,
or more predictable/less variable in performance/timing
[or energy efficiency]).

From some comments posted on comp.arch (and/or perhaps
elsewhere), it sounds like Ada might provide annotation
mechanisms that are friendlier than C to to-the-metal
programming. (I receive the impression that the
compilers for embedded systems are more reliable in
this area, of necessity.)
 
>>ISTR reading that Sun used
>>an array-of-structures to structure-of-arrays
>>transformation on the SPEC CPU art benchmark component;
>>such is presumably illegal in C but legal for SPEC CPU.
>
> I don't think it's prohibited per se because, although the layout of
> structures is defined, the results of comparing the addresses of
> structure members technically is undefined behavior.

I am not a language lawyer (I am not even a programmer), but
I thought I had read somewhere that it was not a legal C
optimization (though I may be misremembering the statement
and the writer might have been little more of a language
lawyer for C than I am).
 
> So long as the transformation preserves the relative ordering of the
> structure addresses - which it does - I think it possibly could be
> argued that the transformation either is legal or is acceptable as
> implementation defined behavior.
>
> Even if not, it is a good option to have because it can save a lot of
> data swizzling for certain SIMD codes.

I do like the optimization, even without SIMD it can
have significant caching (and bandwidth) benefits. In
art, the structure was, I think, unusually large and
loops only touched some members.

I am a bit uncomfortable with non-standard language
variants. Such seems to work against the benefits of
standardization. Having a bunch of almost C languages
whose compilers can be used as C compilers seems
suboptimal.
Re: Bounded pointers Noob 10/26/13 1:15 AM
Öö Tiib wrote:
> On Friday, 25 October 2013 13:05:08 UTC+3, Noob  wrote:
>> MitchAlsup wrote:
>>
>>> Noob wrote:
>>>
>>>> for (int i=1; i > 0; ++i) { <- optimized into an infinite loop
>>>
>>> But it is NOT an infinite loop.
>>
>> As far as the C abstract machine is concerned, it is the
>> responsibility of the programmer to guarantee that arithmetic
>> with signed integers NOT overflow.
>
> Nitpick: it is so for "standard C abstract machine".

I do not understand the distinction.

The C standard defines an abstract machine model, which one might
call the "C abstract machine" or "standard C abstract machine".

What's the difference to you?

>> Therefore, there is no
>> such thing as "int wrap-around" (contrary to unsigned int,
>> where wrap-around is fully specified).
>
> That may be defined behaviour by compiler; for example gcc and
> clang (most popular compilers i believe) have option -fwrapv.

And as such, it is an extension, thus not C.
It is "C with extensions", "gnuc", etc.

>> Maybe you are arguing that this is a defect in C89/C99/C11,
>> but I'm telling it like it is, not like how it should be.
>>
>> Relying on int wrap-around is relying on UB.
>>
>> If you want wrap-around, use unsigned integers.
>
> I generally share that attitude but it is attitude _not_ universal
> truth. People do what they want/need/are told to do and so most
> code uses various extensions in practice.

In the embedded space, people are often tied to one tool-chain,
and can thus depend on their tool's extensions. In the desktop
space, especially when one is aiming for portability, one
cannot rely on extensions (obviously).

Regards.

Re: Bounded pointers Anton Ertl 10/26/13 3:09 AM
Noob <root@localhost> writes:
>嘱 Tiib wrote:
>> On Friday, 25 October 2013 13:05:08 UTC+3, Noob  wrote:
>>> [...] the C abstract machine [...]

>> Nitpick: it is so for "standard C abstract machine".
>
>I do not understand the distinction.

There is a difference between C and its standardized subset.

>In the embedded space, people are often tied to one tool-chain,
>and can thus depend on their tool's extensions. In the desktop
>space, especially when one is aiming for portability, one
>cannot rely on extensions (obviously).

It seems to me that most desktop programs are tied to the toolchain
just as much as embedded programs.  In particular, Windows programs
are tied to MSVC and/or the .net infrastructure.
Re: Bounded pointers Noob 10/26/13 8:44 AM
Anton Ertl wrote:

> There is a difference between C and its standardized subset.

Are you /really/ arguing that "C" is the union of
1) a standardized "subset"
2) all proprietary extensions under the sun?

Are you trying to make baby jesus cry?
(In fact, it is the "subset" that is C.)

> It seems to me that most desktop programs are tied to the toolchain
> just as much as embedded programs.  In particular, Windows programs
> are tied to MSVC and/or the .net infrastructure.

Take a look at projects such as lwip, a light-weight
network stack. It is intended to be used on scores
of different platforms (different architectures,
toolchains, OS). Which extensions would you make use
of in such a project?

I think it can be argued that, in that space, gcc is
dominant, but I'm not sure it's a good idea to program
in gnuc (and I despise proprietary compilers as much
as the next programmer).

Regards.

Re: Bounded pointers Noob 10/26/13 8:58 AM
Andy (Super) Glew wrote:
> On 10/24/2013 1:53 AM, Noob wrote:
>> Andy (Super) Glew wrote:
>>
>>> But, no, I don't have a link - although Terje seems to remember that the
>>> comp.arch discussion happened recently, in the last two years or so. (I
>>> have lousy luck searching comp.arch.)
>>
>> Google makes baby jesus cry. (So much pointless javascript...
>> Remember a few years back, when Google was light-weight, and
>> didn't sleep with the NSA?)
>> https://groups.google.com/advanced_search
>> https://groups.google.com/groups/search?as_epq=buffer+overflow&as_ugroup=comp.arch&as_uauthors=glew
>>
>> Date: Sun, 02 Aug 2009 12:46:41 -0700
>> From: "Andy \"Krazy\" Glew"
>> Subject: HardBound and SoftBound  (was "The State of Software")
>> https://groups.google.com/forum/message/raw?msg=comp.arch/iQs02MvjHo4/7r_AxbqMiGMJ
>
>
> Thanks.  Although I don't think that discloses using the 1-extra-bit
> trick for power-of-2 sized pointers.

Hmmm, this one perhaps?

https://groups.google.com/forum/message/raw?msg=comp.arch/sU-ymcogrMM/OCwKbi4qTUwJ

Date: Sat, 07 Jul 2012 08:18:13 -0700
From: "Andy (Super) Glew"
Subject: Re: COMA

I think I've lost my google-fu, which is to be expected
since I don't use google anymore.

Re: Bounded pointers Anton Ertl 10/27/13 6:14 AM
Noob <root@localhost> writes:
>Anton Ertl wrote:
>
>> There is a difference between C and its standardized subset.
>
>Are you /really/ arguing that "C" is the union of
>1) a standardized "subset"
>2) all proprietary extensions under the sun?

No.  That's a pretty lame straw man.

C is a low-level language that programs were written in and that C
compilers used to understand and compile correctly (if you turn off
optimization, they mostly still do).

A subset of C was standardized; it's often hard to specify what should
happen portably, so they copped out and made many things "undefined
behaviours".  E.g., if you want to support 2s-complement,
1s-complement, and sign-magnitude representations for signed integers,
how to specify what happens on overflow?  So they did not specify it.
As a result, there are a large number of "undefined behaviour"s in the
standard.

For a while, that was fine; so C had a standard with a lot of holes,
that hopefully would be filled in a future version of the standard,
but compilers and programs still worked fine.  Sure, maybe the loop
elsewhere in this thread would produce an overflow exception on a
sign-magnitude machine, but portability among 2s-complement machines
is good enough for nearly all current programs.

Then the C compiler writers got the idea to use the "undefined
behaviours" as entitlements for miscompiling perfectly fine programs.
They do it piecemeal, and if the cries of pain are too loud, they add
an option like -fwrapv (but only after publushing a version where the
miscompilation cannot be turned off), or they turn off the
miscompilation for a particularly high-profile application like a SPEC
benchmark.

But the long-term course they are heading is to turn C from a
successful low-level language to a variant of Pascal with Java syntax.

>> It seems to me that most desktop programs are tied to the toolchain
>> just as much as embedded programs.  In particular, Windows programs
>> are tied to MSVC and/or the .net infrastructure.
>
>Take a look at projects such as lwip, a light-weight
>network stack. It is intended to be used on scores
>of different platforms (different architectures,
>toolchains, OS). Which extensions would you make use
>of in such a project?

That's not the kind of program I was thinking of when you mentioned
"desktop".  Anyway, I don't know about lwip, but I know about Gforth,
which is also intended to be widely portable.  We use gcc, and we
actually use GNU C extensions (in particular labels-as-values, which
gives a factor of 2 speedup for threaded code and a factor of 4
speedup if we calso consider dynamic superinstructions).  We have not
had problems with the availability of gcc on the platforms build on.

We still have come to regret choosing gcc, because gcc has turned into
a miscompiler; but that seems to be a common attitude among C compiler
maintainers these days (the LLVM people have the same attitude:
<http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html>),
so getting rid of GNU C dependencies would not really help; the only
solution is to get rid of C.
Re: Bounded pointers Öö Tiib 10/27/13 7:19 AM
On Saturday, 26 October 2013 11:15:37 UTC+3, Noob  wrote:
> Öö Tiib wrote:
> > On Friday, 25 October 2013 13:05:08 UTC+3, Noob  wrote:
> >> Therefore, there is no
> >> such thing as "int wrap-around" (contrary to unsigned int,
> >> where wrap-around is fully specified).
> >
> > That may be defined behaviour by compiler; for example gcc and
> > clang (most popular compilers i believe) have option -fwrapv.
>
> And as such, it is an extension, thus not C.
> It is "C with extensions", "gnuc", etc.

So what? That is only strictly standard C where 'for(int i=1;i>0;++i){foo();}'
may be optimized into nasal demons. We use real compilers not language
standards for writing programs and so no nasal demons.

In practice on lot of compilers it is (or can be configured into) defined
behavior. '-ftrapv' on gcc and clang will also make it defined behavior (and
guarantee crash).

...

> In the embedded space, people are often tied to one tool-chain,
> and can thus depend on their tool's extensions. In the desktop
> space, especially when one is aiming for portability, one
> cannot rely on extensions (obviously).

We often are tied to tool chain. Switching tools (and even versions) is
always expensive. It is because it is only possible to write and to test
platform-specific code on each platform we target. We can not test on
C standard because it is ... pile of paper.
Re: Bounded pointers Dombo 10/27/13 9:23 AM
Op 26-Oct-13 12:09, Anton Ertl schreef:
> Noob <root@localhost> writes:
>> ᅵᅵ Tiib wrote:
>>> On Friday, 25 October 2013 13:05:08 UTC+3, Noob  wrote:
>>>> [...] the C abstract machine [...]
>
>>> Nitpick: it is so for "standard C abstract machine".
>>
>> I do not understand the distinction.
>
> There is a difference between C and its standardized subset.
>
>> In the embedded space, people are often tied to one tool-chain,
>> and can thus depend on their tool's extensions. In the desktop
>> space, especially when one is aiming for portability, one
>> cannot rely on extensions (obviously).
>
> It seems to me that most desktop programs are tied to the toolchain
> just as much as embedded programs.  In particular, Windows programs
> are tied to MSVC and/or the .net infrastructure.

Even though MSVC and .NET are popular development environments for
developing MS Windows software, there are alternatives like Intel,
MinGW, Borland/Embarcadero...etc. Then fact that one can use MSVC and/or
the .NET infrastructure for Windows programs doesn't mean one HAS too.
The embedded space the options are typically much more limited though
often even there one can choose between more than one compiler when
targeting a popular processor core, such ARM.


Re: Bounded pointers Andy (Super) Glew 10/27/13 5:05 PM
On 10/26/2013 8:58 AM, Noob wrote:
> Andy (Super) Glew wrote:
>> On 10/24/2013 1:53 AM, Noob wrote:
>>> Andy (Super) Glew wrote:
>>>
>>>> But, no, I don't have a link - although Terje seems to remember that the
>>>> comp.arch discussion happened recently, in the last two years or so. (I
>>>> have lousy luck searching comp.arch.)
>>>
>>> Google makes baby jesus cry. (So much pointless javascript...
>>> Remember a few years back, when Google was light-weight, and
>>> didn't sleep with the NSA?)
>>> https://groups.google.com/advanced_search
>>> https://groups.google.com/groups/search?as_epq=buffer+overflow&as_ugroup=comp.arch&as_uauthors=glew
>>>
>>> Date: Sun, 02 Aug 2009 12:46:41 -0700
>>> From: "Andy \"Krazy\" Glew"
>>> Subject: HardBound and SoftBound  (was "The State of Software")
>>> https://groups.google.com/forum/message/raw?msg=comp.arch/iQs02MvjHo4/7r_AxbqMiGMJ
>>
>>
>> Thanks.  Although I don't think that discloses using the 1-extra-bit
>> trick for power-of-2 sized pointers.
>
> Hmmm, this one perhaps?
>
> https://groups.google.com/forum/message/raw?msg=comp.arch/sU-ymcogrMM/

Close, but not quite.  The post itself is mainly about explicit
metadata, whether stored non-adjacent the way MPX does, or in space
freed up by compression (as I talked about at ASPLOS WACI One in 1998).

There's a passing reference to implicit metadata, "like requiring the
bounds to be power of 2 aligned".  But it doesn't talk in detail.
(Although given my love of the extra bit trick, which I have been using
since undegrad in the 1980s...)

 > But a "secure pointer" must have something like upper bound
 > (and, for C, lower bound) associated with it.   Many folks
 > have played games like requiring the bounds to be power
 > of 2 aligned.   Myself, I just see the extra stuff as
 > metadata. It must be stored somewhere, for a secure
 > pointer, but it should not be required for insecure pointers.


> I think I've lost my google-fu, which is to be expected
> since I don't use google anymore.

What do you use?
Alternatives to google search engine Noob 10/28/13 12:46 AM
Andy (Super) Glew wrote:

> Noob wrote:
>
>> I think I've lost my google-fu, which is to be expected
>> since I don't use google anymore.
>
> What do you use?

(Do you mean for searches in general, or usenet searches
in particular?)

I use startpage (which is a google meta-engine AFAIU, so
I should have said "I don't use google's /UI/ anymore".)

https://startpage.com/eng/aboutstartpage/
https://startpage.com/eng/privacy-policy.html
https://en.wikipedia.org/wiki/Ixquick

startpage has several selling points over google:
1) it is not google
2) it is not US-based
3) it promises not to collect any personal information

Regards.

Re: Bounded pointers Noob 10/28/13 9:45 AM
Noob wrote:

> As far as the C abstract machine is concerned, it is the
> responsibility of the programmer to guarantee that arithmetic
> with signed integers NOT overflow. Therefore, there is no
> such thing as "int wrap-around" (contrary to unsigned int,
> where wrap-around is fully specified).
>
> Maybe you are arguing that this is a defect in C89/C99/C11,
> but I'm telling it like it is, not like how it should be.
>
> Relying on int wrap-around is relying on UB.

On a related topic:

http://blog.llvm.org/2013/04/testing-libc-with-fsanitizeundefined.html
http://www.phoronix.com/scan.php?page=news_item&px=MTQ0OTc

Re: Bounded pointers Piotr Wyderski 10/30/13 6:00 AM
Ivan Godard wrote:

> We'll be facing that anyway. Mill pointers are not integers, whether
> using this proposal or not.

Then that's a problem: our company uses a lot of pointer bit stealing
tricks to achieve things not achievable using more standard techniques.

Say, you have a naturally aligned uint64_t pointer. It means
its 3 lowest bits are 0. It means we can grab them and encode
some additional information there. The zeros must be restored
just before dereferencing the pointer. Extremely useful.

        Best regards, Piotr

Re: Bounded pointers Ivan Godard 10/30/13 8:25 AM
During the time you have stolen the bits, the value is no longer a
pointer. You can see that in your own source: I bet there's a
reinterpret_cast<> in there. Steal away :-)

We steal the high bits, which (in pointer-land) means that the OS
reserved a portion of the address space. Quite a few other architectures
do too; the amount stolen varies.

If you want more than three bits, you should keep an index rather than a
pointer in your data structures. On the Mill an indexed load/store costs
no more an an unindexed one.


Re: Bounded pointers Piotr Wyderski 10/30/13 9:13 AM
Ivan Godard wrote:

> During the time you have stolen the bits, the value is no longer a
> pointer.

But we keep it within an instance of T*. If that is not
a problem for the Mill, it is not a proble for me either.

> If you want more than three bits, you should keep an index rather than a
> pointer in your data structures.

It depends. There are also page-aligned chunks, which yields
as many as 12+ bits and that is barely below the infinity in
terms of typical stealing needs.

        Best regards, Piotr

Re: Bounded pointers Stefan Monnier 10/30/13 11:24 AM
> * I think that it would be good if any hardware extension for memory
> checking (buffer overflow, dangling pointer, etc.) was fast enough that it
> could be used by "languages that are (or can be) strongly typed, where the
> compiler can do array bounds checking efficiently, and reliably, without any
> special hardware support."

But, if "the compiler can do array bounds checking efficiently, and
reliably, without any special hardware support", why would you waste
resources on some hardware extension?

> * Bad experience has biased people against language or compiler or assembler
> based security.  Languages/compilers/assemblers have bugs; they are large
> programs, hard to prove correct, and they have bugs that can be exploited.

I have not noticed such a bias, to tell you the truth.  Maybe I'm just
deluded ;-)

E.g. have there been known problems with Java's array bounds checking,
where people were able to circumvent the checks?


        Stefan
Re: Bounded pointers Andy (Super) Glew 11/3/13 8:37 PM
On 10/30/2013 11:24 AM, Stefan Monnier wrote:
>> * I think that it would be good if any hardware extension for memory
>> checking (buffer overflow, dangling pointer, etc.) was fast enough that it
>> could be used by "languages that are (or can be) strongly typed, where the
>> compiler can do array bounds checking efficiently, and reliably, without any
>> special hardware support."
>
> But, if "the compiler can do array bounds checking efficiently, and
> reliably, without any special hardware support", why would you waste
> resources on some hardware extension?

Security.

OS-level partitioning.

I don't trust assembly language programmers.  And I suspect that I will
still want to support assembly language programmers.

I don't trust compiler writers or Javascript or Java implementors not to
have bugs.

I freely admit that I want something like capability level security.

And by the way: I don't stop at array bounds checking. I wanto to extend
similar techniques to handling other security problems like dangling
pointers, and possibly also taint propagation.

>
>> * Bad experience has biased people against language or compiler or assembler
>> based security.  Languages/compilers/assemblers have bugs; they are large
>> programs, hard to prove correct, and they have bugs that can be exploited.
>
> I have not noticed such a bias, to tell you the truth.  Maybe I'm just
> deluded ;-)
>
> E.g. have there been known problems with Java's array bounds checking,
> where people were able to circumvent the checks?

There have been quite a few months when Java was the most
security-bug-prone piece of software on the net.

Many security folk recommend uninstalling Java.
http://krebsonsecurity.com/2013/10/java-update-plugs-51-security-holes/

http://arstechnica.com/security/2013/09/security-of-java-takes-a-dangerous-turn-for-the-worse-experts-say/

Admittedly, most Java security bugs are not bugs in Java itself, but are
bugs in the native layer, C, C++, or assembly code. But this is much of
my point: very few software systems are entirely self contained. Nearly
all higher level language systems (from not so high level (Java, Python,
Perl, Javascript) to much higher level) depend on fairly large libraries
of foreign code, C, C++, assembly, some of which are developed by the
language maintainer, some of which come from the OS, and some from other
places.  These are often NOT small, well-debugged libraries. Oftentimes
the bugs occur not in the actual native libraries, but in the interface
code that converts the calling languages' concepts to the callee
language concepts - and the OS concepts, and ...








>
>
>          Stefan
Re: Bounded pointers Michael S 11/4/13 12:58 AM
On Monday, November 4, 2013 6:37:02 AM UTC+2, Andy (Super) Glew wrote:
> On 10/30/2013 11:24 AM, Stefan Monnier wrote:
>
> >> * I think that it would be good if any hardware extension for memory
>
> >> checking (buffer overflow, dangling pointer, etc.) was fast enough that it
>
> >> could be used by "languages that are (or can be) strongly typed, where the
>
> >> compiler can do array bounds checking efficiently, and reliably, without any
>
> >> special hardware support."
>
> >
>
> > But, if "the compiler can do array bounds checking efficiently, and
>
> > reliably, without any special hardware support", why would you waste
>
> > resources on some hardware extension?
>
>
>
> Security.
>
>
>
> OS-level partitioning.
>
>
>
> I don't trust assembly language programmers.  And I suspect that I will
>
> still want to support assembly language programmers.
>
>
>
> I don't trust compiler writers or Javascript or Java implementors not to
>
> have bugs.
>
>
>
> I freely admit that I want something like capability level security.
>
>
>
> And by the way: I don't stop at array bounds checking. I wanto to extend
>
> similar techniques to handling other security problems like dangling
>
> pointers, and possibly also taint propagation.
>
>
>
>

What are those "capabilities" that you are mentioning so often?
In particular, how they help to augment chain of trust for "safe" languages?
Lacking imagination, I can't see how "Quis custodiet ipsos custodes?" puzzle could be possibly solved.

Re: Bounded pointers Ivan Godard 11/4/13 1:57 AM
On 11/4/2013 12:58 AM, Michael S wrote:
<snip>

> What are those "capabilities" that you are mentioning so often?
> In particular, how they help to augment chain of trust for "safe" languages?
> Lacking imagination, I can't see how "Quis custodiet ipsos custodes?" puzzle could be possibly solved.
>

Start here:
https://en.wikipedia.org/wiki/Capability-based_security
https://en.wikipedia.org/wiki/Capability-based_addressing

After that: Google is your friend
Re: Bounded pointers Michael S 11/4/13 2:23 PM
The first article appears very loosely related to bound checks.
The second is a bit closer, but not much so. It still mostly discusses methods of process isolation. But process isolation is simply not interesting - it's solved problem, solved by several different solutions, all of which are good enough.
The article does not describe how capability-based addressing is going to help a compiler/language run-time to detect out-of-bound access to "automatic", in C sense, array that hits the belly of another, unrelated, automatic array in the same process. I assume that language run-time is not going to ask an OS to allocate separate capability for each and every object, including most short-lived?


>
>
> After that: Google is your friend
>

Except google can't index Andy's brain. Or, may be, it can, but results are reserved for google itself and a couple of TLAs.

Re: Bounded pointers Anne & Lynn Wheeler 11/4/13 3:38 PM

tymshare did gnosis in the late 70s & early 80s for mainframe 370
... which was spun-off as keykos when M/D bought tymshare (disclaimer: i
was brought in to evaluate gnosis as part of the spinoff).
http://www.cis.upenn.edu/~KeyKOS/

one of the objectives of gnosis was to provide ability to offer 3rd
party applications on commercial online service bureau platform ...
with use charges accounting traced back to each application use
(allowing prorated remittance to the 3rd parties). I estimated that
1/3rd of system pathlength was involved in that accounting.  in
transition to keykos ... all that accounting overhead was removed
significantly improving keykos throughput/performance.

and
http://www.cis.upenn.edu/~KeyKOS/KeyTXF/KeyTXF.html

from above:

2.1 Persistent Memory

Satish M. Thatte describes a design for a persistent virtual memory
system [9] with transactions for object-oriented garbage collected
programming systems and object-oriented databases. In contrast with the
approach to persistent memory used in KeyKOS, his design requires
unforgeable pointers, bounds checking, and automatic garbage
collection. His transaction manager keeps an external redo log outside
virtual memory, while KeyTXF keeps the log in virtual memory.

... snip ...

more recent systems based on gnosis/keykos
http://www.eros-os.org/
http://en.wikipedia.org/wiki/EROS_%28microkernel%29
http://www.coyotos.org/
http://en.wikipedia.org/wiki/Coyotos
http://www.capros.org/
http://en.wikipedia.org/wiki/CapROS


--
virtualization experience starting Jan1968, online at home since Mar1970
Re: Bounded pointers Ivan Godard 11/4/13 4:11 PM
On 11/4/2013 2:23 PM, Michael S wrote:
> On Monday, November 4, 2013 11:57:21 AM UTC+2, Ivan Godard wrote:
>> On 11/4/2013 12:58 AM, Michael S wrote:
>>
>> <snip>
>>
>>
>>
>>> What are those "capabilities" that you are mentioning so often?
>>> In particular, how they help to augment chain of trust for "safe"
>>> languages? Lacking imagination, I can't see how "Quis custodiet
>>> ipsos custodes?" puzzle could be possibly solved.
>>
>>>
>>
>>
>>
>> Start here:
>> https://en.wikipedia.org/wiki/Capability-based_security
>> https://en.wikipedia.org/wiki/Capability-based_addressing
>>
>
> The first article appears very loosely related to bound checks. The
> second is a bit closer, but not much so. It still mostly discusses
> methods of process isolation. But process isolation is simply not
> interesting - it's solved problem, solved by several different
> solutions, all of which are good enough. The article does not
> describe how capability-based addressing is going to help a
> compiler/language run-time to detect out-of-bound access to
> "automatic", in C sense, array that hits the belly of another,
> unrelated, automatic array in the same process. I assume that
> language run-time is not going to ask an OS to allocate separate
> capability for each and every object, including most short-lived?

You assume wrongly. In caps, any data that you wish to distinguish from
other data using hardware-enforced security will need a cap of its own.

If you don't care to make the distinction - for example, you if don't
care to have hardware-enforced distinction between .re and .im in a
complex number, then you don't need caps for them.

Caps are a quite general secure-protection mechanism. How that mechanism
is mapped to host language, or to the individual program, is not
something the underlying hardware either knows or cares. In particular,
your program (or the compiler, or the RTS) can use caps to detect
out-of-bounds for something that your program (or the compiler or RTS)
thinks of as an "array". Hardware doesn't know "arrays", and caps don't
either. Yet a cap can protect an address range - hardware does know
about addresses.

Ivan
Re: Bounded pointers Ivan Godard 11/4/13 4:14 PM
On 11/4/2013 3:38 PM, Anne & Lynn Wheeler wrote:
>

<snip much info about KeyKos and its descendants>

Biographical note:

Norm Hardy, who did KeyKos, has been a valued member of the Mill team
for the last five years.
Re: Bounded pointers Bakul Shah 11/4/13 7:22 PM
Prof. Hank Levy's excellent but out of print book is
now available as a set of PDFs from his webpage:

http://homes.cs.washington.edu/~levy/capabook/

Definitely worth reading as it describes a bunch
of cap. systems in detail.
Re: Bounded pointers Andy (Super) Glew 11/5/13 12:07 AM
On 11/4/2013 2:23 PM, Michael S wrote:
> On Monday, November 4, 2013 11:57:21 AM UTC+2, Ivan Godard wrote:
>> On 11/4/2013 12:58 AM, Michael S wrote:
>>
>> <snip>
>>
>>
>>
>>> What are those "capabilities" that you are mentioning so often?
>>> In particular, how they help to augment chain of trust for "safe" languages?
>>> Lacking imagination, I can't see how "Quis custodiet ipsos custodes?" puzzle could be possibly solved.
>>
>>>
>>
>>
>>
>> Start here:
>> https://en.wikipedia.org/wiki/Capability-based_security
>> https://en.wikipedia.org/wiki/Capability-based_addressing
>>
>
> The first article appears very loosely related to bound checks.
> The second is a bit closer, but not much so. It still mostly discusses methods of process isolation. But process isolation is simply not interesting - it's solved problem, solved by several different solutions, all of which are good enough.
> The article does not describe how capability-based addressing is going to help a compiler/language run-time to detect out-of-bound access to "automatic", in C sense, array that hits the belly of another, unrelated, automatic array in the same process.



> I assume that language run-time is not going to ask an OS to allocate separate capability for each and every object, including most short-lived?

The trouble with wikipedia is that it is not allowed to have imagination.

Or not even imagination - I don't think that I am the first guy to
figure out how to do capabilities with OS-suitable security without OS
overhead.  But, no, I don't have a reference off the top of my head.

In the world I would like, and tried to to build, you don't need to do a
system call to the OS to allocate a capability.

There are simple machine instructions to manipulate capabilities.  E.g.
an allocator might have a capability for a large contiguous chunk of
memory, and divides this up into smaller capabilities to return to users.

I went to the Intel MPX manuals for an example, but they don't really
have the right instruction support.

So briefly:

Imagine that you have bounds [lwb,upb) in a register. Probably adorned
with permissions and non-foregability stuff.

You can always tighten the bounds to create a new capability

NewCapReg := NARROW( newbase, newsize, OldCapReg )
    if [newbase,newbase+newsize) not in [lwb,upb) error
    return [newbase,newbase+newsize)


That's pretty much all that is required to create a new capability in a
register.  A few checks, comparrisons, basically a few adds and subtracts.

To store it in memory, Intel MPX's BNDSTX ionstruction.

The fun part is making that last step secure - having it write to memory
locations that only BNDSTX and BNDLDX can access.
Re: Bounded pointers Michael S 11/5/13 7:00 AM
On Tuesday, November 5, 2013 5:22:46 AM UTC+2, Bakul Shah wrote:
>
>
> Prof. Hank Levy's excellent but out of print book is
> now available as a set of PDFs from his webpage:
>
> http://homes.cs.washington.edu/~levy/capabook/
>
> Definitely worth reading as it describes a bunch of cap. systems in detail.

Thank you.
Re: Bounded pointers Andy (Super) Glew 11/5/13 6:16 PM
But I warn you: most of the cap systems that Hank's book describes are
old. In particular, most are descriptor table based.

One of my design goals was never to have a descriptor table that is
required to be a contiguous region of virtual or, worse, physical
memory.  Finite regions always overflow.

I think that one of the problems with old cap systems was that their
objects were nearly always addressed by a tuple
(descriptor_number,offset).  One of which was always too small.  Worse,
if the physical memory, the largest contiguous array that you ever
wanted to access (which I assume is all of physical memory) is N bits,
then the pair (descriptor_number,offset) wants to be 2N bits long.  A
fat pointer.

In my opinion the best, and probably most successful, semi-recent cap
system was the IBM AS/400 family,
http://en.wikipedia.org/wiki/IBM_System_i.  These used fat pointers -
128 bits on a 64-bit PowerPC, with a 64 bit virtual address. Originally
they had an additional bit, stolen from ECC, that was used to make
non-forgeable pointers, that could be passed around and copied by user
code, but which could not

Rumor has it various PowerPC processors had 65-bit registers, with the
65th bit being the "security" bit that was stored in memory as ECC.

Some may argue that the AS-400 is not a "true" or "ideal" capability
system. I think that is theology. Indeed, wikipedia says that the AS/400
replaced the System/38, and removed capability-based addressing.
Nevertheless, the AS/400 family's secure pointers have properties
considerably tighter than ordinary C/C++ pointers = integers = addresses.

Besides: although I am a fan of capabilities, I am not a purist.  There
are problems with most capability security models that I want to go
beyond capabilities to solve.


By the way, cap systems often come with a number of other features, that
people often think are necessary for a cap system.  Not necessarily so:

Caps + SAS (Single Address Space)
             - not necessarily required.  Caps enable SAS, although you can
implement SAS with page granular protection.   But, also, you can build
caps for regular multiple virtual address space systems.

Caps + Fat Pointers
          - not necessarily required. In fact, eliminating the need for fat
pointers was one of the key insights.  An insight which, by the way, I
gained in a comp.arch discussion circa 2004.

Caps + Segments
        - not necessarily required.  IMHO caps have to be able to live within
an existing standard paged virtual memory system.
Re: Bounded pointers Ivan Godard 11/5/13 6:43 PM
I too support caps, though I differ with Andy on some of the details. We
agree that descriptor tables are anathema, but differ on whether fat
pointers are too. I agree that caps work will with SAS, but I support
SAS with or without caps (the Mill is SAS), so I see caps as irrelevant
to SAS.

Lastly, I see no reason to hold with pages in general, and am quite
happy with (properly supported) segments. The Mill is something of a mix
in that respect.

Re: Bounded pointers Michael S 11/6/13 5:37 AM
On Wednesday, November 6, 2013 4:16:46 AM UTC+2, Andy (Super) Glew wrote:
>
> But I warn you: most of the cap systems that Hank's book describes are
> old. In particular, most are descriptor table based.
>

I guessed that already. That was one of the reasons behind my original question - I am interested to know what sort of capability-based addressing *you* want to have, with emphasis on you.

>
>
> One of my design goals was never to have a descriptor table that is
> required to be a contiguous region of virtual or, worse, physical
> memory.  Finite regions always overflow.
>
>
> I think that one of the problems with old cap systems was that their
> objects were nearly always addressed by a tuple
> (descriptor_number,offset).  One of which was always too small.  Worse,
> if the physical memory, the largest contiguous array that you ever
> wanted to access (which I assume is all of physical memory) is N bits,
> then the pair (descriptor_number,offset) wants to be 2N bits long.  A
> fat pointer.

In context of smartphones and content-consumption oriented tablets, 36-bit physical address could be sufficient up until 2025 or a little more. So, wouldn't 28:36 split of 64-bit register work fine? Or are you afraid that 28 bits are insufficient for key?


>
>
>
> In my opinion the best, and probably most successful, semi-recent cap
> system was the IBM AS/400 family,

So, if the most successful capability-based system is built on non-cap CPU. Does not it prove that capability support in hardware is counter-productive?

>
> http://en.wikipedia.org/wiki/IBM_System_i.  These used fat pointers -
> 128 bits on a 64-bit PowerPC, with a 64 bit virtual address. Originally
> they had an additional bit, stolen from ECC, that was used to make
> non-forgeable pointers, that could be passed around and copied by user
> code, but which could not

could not what?

>
> Rumor has it various PowerPC processors had 65-bit registers, with the
> 65th bit being the "security" bit that was stored in memory as ECC.
>
> Some may argue that the AS-400 is not a "true" or "ideal" capability
> system. I think that is theology. Indeed, wikipedia says that the AS/400
> replaced the System/38, and removed capability-based addressing.
> Nevertheless, the AS/400 family's secure pointers have properties
> considerably tighter than ordinary C/C++ pointers = integers = addresses.
>
>
>
> Besides: although I am a fan of capabilities, I am not a purist.  There
> are problems with most capability security models that I want to go
> beyond capabilities to solve.
>
> By the way, cap systems often come with a number of other features, that
> people often think are necessary for a cap system.  Not necessarily so:
>
> Caps + SAS (Single Address Space)
>              - not necessarily required.  Caps enable SAS, although you can
> implement SAS with page granular protection.   But, also, you can build
> caps for regular multiple virtual address space systems.
>
>
>
> Caps + Fat Pointers
>           - not necessarily required. In fact, eliminating the need for fat
> pointers was one of the key insights.  An insight which, by the way, I
> gained in a comp.arch discussion circa 2004.
>

Sharing that particular insight will be especially helpful.
Re: Bounded pointers Joe keane 11/6/13 3:09 PM
In article <jwvsivi1v0y.fsf-monnier+comp.arch@gnu.org>,
Stefan Monnier  <mon...@iro.umontreal.ca> wrote:
>But, if "the compiler can do array bounds checking efficiently, and
>reliably, without any special hardware support", why would you waste
>resources on some hardware extension?

In article <5277246E...@SPAM.comp-arch.net>,
Andy (Super) Glew <an...@SPAM.comp-arch.net> wrote:
>Security.
>
>OS-level partitioning.
>
>I don't trust assembly language programmers.  And I suspect that I will
>still want to support assembly language programmers.

You forgot one:

The designers *enjoy* banging their heads against a wall.
Re: Bounded pointers Andy (Super) Glew 11/6/13 10:37 PM
On 11/6/2013 5:37 AM, Michael S wrote:
> On Wednesday, November 6, 2013 4:16:46 AM UTC+2, Andy (Super) Glew wrote:
>>
>> But I warn you: most of the cap systems that Hank's book describes are
>> old. In particular, most are descriptor table based.
>>
>
> I guessed that already. That was one of the reasons behind my original question - I am interested to know what sort of capability-based addressing *you* want to have, with emphasis on you.
>
>>
>>
>> One of my design goals was never to have a descriptor table that is
>> required to be a contiguous region of virtual or, worse, physical
>> memory.  Finite regions always overflow.
>>
>>
>> I think that one of the problems with old cap systems was that their
>> objects were nearly always addressed by a tuple
>> (descriptor_number,offset).  One of which was always too small.  Worse,
>> if the physical memory, the largest contiguous array that you ever
>> wanted to access (which I assume is all of physical memory) is N bits,
>> then the pair (descriptor_number,offset) wants to be 2N bits long.  A
>> fat pointer.
>
> In context of smartphones and content-consumption oriented tablets, 36-bit physical address could be sufficient up until 2025 or a little more.

Wanna bet?

Physical maybe. Probably not virtual.  And I am not so certain about
physical.

Consider a SAS system with a 64GB flash or other NVRAM backing store.

>So, wouldn't 28:36 split of 64-bit register work fine? Or are you afraid that 28 bits are insufficient for key?

28 bits is insufficient for key. And 36 bits will soon be insufficient
for virtual address.
Re: Bounded pointers Ivan Godard 11/6/13 11:38 PM
On 11/6/2013 10:37 PM, Andy (Super) Glew wrote:
> On 11/6/2013 5:37 AM, Michael S wrote:

>> So, wouldn't 28:36 split of 64-bit register work fine? Or are you
>> afraid that 28 bits are insufficient for key?
>
> 28 bits is insufficient for key. And 36 bits will soon be insufficient
> for virtual address.
>

60 bits of virtual is enough of a problem to make me doubtful about
taking 7 of them for the floating-point pointer checking scheme.
Re: Bounded pointers Michael S 11/7/13 12:22 AM
First, you are SAS. Second, you aim way above my target of "smartphones and content-consumption oriented tablets".
The first probably more important than the second.

Re: Bounded pointers Michael S 11/7/13 12:37 AM
On Thursday, November 7, 2013 8:37:39 AM UTC+2, Andy (Super) Glew wrote:
> On 11/6/2013 5:37 AM, Michael S wrote:
>
> > On Wednesday, November 6, 2013 4:16:46 AM UTC+2, Andy (Super) Glew wrote:
> >>
> >> But I warn you: most of the cap systems that Hank's book describes are
> >> old. In particular, most are descriptor table based.
> >>
> >
>
> > I guessed that already. That was one of the reasons behind my original question - I am interested to know what sort of capability-based addressing *you* want to have, with emphasis on you.
>
> >
> >>
> >>
> >> One of my design goals was never to have a descriptor table that is
> >> required to be a contiguous region of virtual or, worse, physical
> >> memory.  Finite regions always overflow.
> >>
> >>
>
> >> I think that one of the problems with old cap systems was that their
> >> objects were nearly always addressed by a tuple
> >> (descriptor_number,offset).  One of which was always too small.  Worse,
> >> if the physical memory, the largest contiguous array that you ever
> >> wanted to access (which I assume is all of physical memory) is N bits,
> >> then the pair (descriptor_number,offset) wants to be 2N bits long.  A
> >> fat pointer.
> >
>
> > In context of smartphones and content-consumption oriented tablets, 36-bit physical address could be sufficient up until 2025 or a little more.
>
>
> Wanna bet?
> Physical maybe. Probably not virtual.  And I am not so certain about physical.
>
> Consider a SAS system with a 64GB flash or other NVRAM backing store.
>

I don't want to consider SAS.
I prefer to consider minimally modified Linux (Android) or Windows kernel and to think, first, how cap-inspired addressing can make Davlik or .net engine simpler or faster or more robust. Second, I'd start to ask the same question about "native" apps.
 
>
> >So, wouldn't 28:36 split of 64-bit register work fine? Or are you afraid that 28 bits are insufficient for key?
>
>
> 28 bits is insufficient for key. And 36 bits will soon be insufficient
> for virtual address.
>

Now I understand that I still did not "got it". I am still thinking in terms (segment selector, segment offset) tuples, with 36 bits being an offset. Sounds like I should read more about capabilities.


Re: Bounded pointers Andy (Super) Glew 11/7/13 6:25 PM
On 11/7/2013 12:37 AM, Michael S wrote:
> On Thursday, November 7, 2013 8:37:39 AM UTC+2, Andy (Super) Glew wrote:
>> On 11/6/2013 5:37 AM, Michael S wrote:
>>
>>> On Wednesday, November 6, 2013 4:16:46 AM UTC+2, Andy (Super) Glew wrote:
>>>>
>>>> But I warn you: most of the cap systems that Hank's book describes are
>>>> old. In particular, most are descriptor table based.
>>>>
>>>
>>
>>> I guessed that already. That was one of the reasons behind my original question - I am interested to know what sort of capability-based addressing *you* want to have, with emphasis on you.
>>
>>>
>>>>
>>>>
>>>> One of my design goals was never to have a descriptor table that is
>>>> required to be a contiguous region of virtual or, worse, physical
>>>> memory.  Finite regions always overflow.
>>>>
>>>>
>>
>>>> I think that one of the problems with old cap systems was that their
>>>> objects were nearly always addressed by a tuple
>>>> (descriptor_number,offset).  One of which was always too small.  Worse,
>>>> if the physical memory, the largest contiguous array that you ever
>>>> wanted to access (which I assume is all of physical memory) is N bits,
>>>> then the pair (descriptor_number,offset) wants to be 2N bits long.  A
>>>> fat pointer.
>>>
>>
>>> In context of smartphones and content-consumption oriented tablets, 36-bit physical address could be sufficient up until 2025 or a little more.
>>
>>
>> Wanna bet?
>> Physical maybe. Probably not virtual.  And I am not so certain about physical.
>>
>> Consider a SAS system with a 64GB flash or other NVRAM backing store.
>>
>
> I don't want to consider SAS.

Fair enough. I also tend to ignore SAS, because it is not what we have
now. Plus I think that security and testing guys will always want to be
ABLE to put processes (and guest OSes) in completely separate address
spaces - if only to get reproduceable behavior for testing.  (By the
way, SAS and isolated virtual machines???)

But... SAS has long been "of interest". Some of the early Java phones
were SAS hardware.  I think the Newton was SAS.  So, while not high on
my list, it is just something I always think about.

By the way, I realized that I had erred in conflating "SAS (Single
Address Space)" and "SLS (Single Level Store)" - SLS=>all files are
memory mapped at all times. They often go together, but they don't have to.

Anyway, something here is relevant because the on-march of NVRAM
(Non-Volatile Memory) such as flash makes it very, very, likely that all
storage on a device like a cellphone or tablet *could* be memory mapped
at all times. Which would probably have pedrformance advantages.
Although potentially reliability disadvantages - the separation between
DRAM and disk means that you can usually clear DRAM and reboot.




> I prefer to consider minimally modified Linux (Android) or Windows kernel and to think,
> first, how cap-inspired addressing can make Davlik or .net engine simpler or faster or more robust.
> Second, I'd start to ask the same question about "native" apps.

To me, the questions are inseparable:

Caps - at least, where I think caps can be taken - give a software
virtual machine like Dalvik or .net or Javascript the ability to pass
language objects to native code, and be confident that the native code
is not corrupting anything outside the minimal privileges of the
capabilities that the native code has been granted by the VM.

Basically, caps give people the ability to pass linked data structures
via shared memory, without having to marshall the linked data structure
into a serialized representation.

By the ways: present day capability privilege systems are not
necessarily as good in this regard as I think they should be.  Mostly
they can do read and write protection.  But I think that we may need to
have the ability to give native code the ability to call a method on an
object, and have that method read and write the object data, without
having code outside the method do so.

If we can get to this, then I think that we will have removed one of the
big barriers and impediments to creating secure systems, with different
subsystems having the least amount of privilege needed.  At the moment,
most of our systems have "privilege domains" that are far too broad -
because there is overhead in dealing with privilege domains.

The software VMs like Java make it easier to create separate privilege
domains.  But they do it at the cost of leaving large swatches of code,
the native code libraries and/or other software VMs, outside their
security system.

It is possible that hardware assists may improve the performance of
Dalvik/Java/.net/Javascript.  But I think that it is worth doing even if
the interpreter/JITter

By the way: I can imagine an on-the fly binary rewriting tool that
rewrote the native libraries to enforce capability like protection
without hardware assists. In fact, that is something like how Cp/PL or
possibly MPX was simulated.  It wasn't fast enough, but I can imagine it
being made fast enough to make hardware assists unnecessary.  I am not
religious about hardware assists.  I am evangelical about ubiquity - we
want this fine grain security to apply to both Dalvik/Java/etc. and to
native code. For security, you want to have the security checks done in
as few places as possible.



>
>>
>>> So, wouldn't 28:36 split of 64-bit register work fine? Or are you afraid that 28 bits are insufficient for key?
>>
>>
>> 28 bits is insufficient for key. And 36 bits will soon be insufficient
>> for virtual address.
>>
>
> Now I understand that I still did not "got it".
> I am still thinking in terms (segment selector, segment offset) tuples, w
> With 36 bits being an offset. Sounds like I should read more about capabilities.

No, I think you get it.

36 bits would be an offset.  Probably the largest object that you could
create in a 28/36 split would have 2^36, 64GB of memory (although some
would argue that it could be 64GB * sizeof(array_element_type)).

My rule of thumb is that I always want to be able to create an array of
bytes the size of ... DRAM?  Whatever my virtual address space is?  And
36 bits sounds like it will not last very long.  Tablets, remember, are
quickly coming to rival and surpass desktop PCs in capability.

28 bits would be the number of objects.  256M objects sounds very small.

The only static partition that I would be happy with TODAY would be
32/40 or thereabouts - 4 billion objects, 2^40 bytes for the largest
object.  And I do not think that 40 bits will last very long.

I am more happy with dynamic partitions - e.g. there is a 4-6 bit field
that says where the segnum/segoff partition is drawn.   So you can have
20/40 or 30/30 or ...

But the real killer is that software just plain doesn't like segments.
Lots of software wants to increment the offset blindly, and not worry
about carry propagating into the selector.

---

In the above, I say "object:", when I really say "object allocated in
contiguous memory".  Software may of course create non contiguous objects.

---

Also, in the above we have only discussed segnum/segoff.  You may want
to place other metadata in your capabilities - like permissions bits,
object IDs that must be matched, etc.  Everything that you add like this
inside a "pointer" gets subtracted from segnum/segoff.

Probably my big claim to fame[*] here is realizing that you can stick
much such metadata outside of a pointer per se, so that you are not
limited by 64 or 128 or even 256 bits.  (Not sure if I have priority; I
don't think Intel applied for any patent on this, so probably somebody
else realized it first. Or maybe I did, way back in the 1980s. Whatever.
Re: Bounded pointers Michael S 11/8/13 4:17 AM
> at all times. Which would probably have performance advantages.

I don't see why there will be performance advantage.
If it was NOR flash then, may be, esp. on SoC with very big LLC, it would make sense to copy to main memory only at write access and then read-only pages (hopefully, overwhelming majority) can be accessed directly from flash, which could have performance advantage for accesses with low spacial or temporal locality.
But NOR flash tech is stagnating for decade, still [barely] good enough for many embedded systems, but in phones/tablets it's now relegated to the same role, it plays in PCs -initial boot and nothing else. All NVRAM that we have in phones/tablets today is NAND, and not naked NAND chips, but NAND shielded behind pretty sophisticated controller. From software perspective it's not very different from rotating disk, except that access blocks are bigger than traditional 512B.
I don't see why NAND disk memory mapped at all times would have any performance advantage. Even if you use memory mapping for storage access (which, by itself, is not very good idea most of the time) you want to map files, not disk blocks, so mapping is unlikely to remain constant over the time.


> Although potentially reliability disadvantages - the separation between
> DRAM and disk means that you can usually clear DRAM and reboot.
>
>
> > I prefer to consider minimally modified Linux (Android) or Windows kernel and to think,
> > first, how cap-inspired addressing can make Davlik or .net engine simpler or faster or more robust.
> > Second, I'd start to ask the same question about "native" apps.
>
>
>
> To me, the questions are inseparable:
> Caps - at least, where I think caps can be taken - give a software
> virtual machine like Dalvik or .net or Javascript the ability to pass
> language objects to native code, and be confident that the native code
> is not corrupting anything outside the minimal privileges of the
> capabilities that the native code has been granted by the VM.
>
> Basically, caps give people the ability to pass linked data structures
> via shared memory, without having to marshall the linked data structure
> into a serialized representation.
>
>

But here you greatly increase complexity of VM/RTS rather than reduce it, all in favor of modest, in most cases, performance advantage. I can't say that I like the idea.

Several years ago you said that one of your pet ideas is improving memory copying. Somehow I think that (improved memory copying => cheap serialization) is fundamentally better approach to the same problem than fine-grained access control.


>
> By the ways: present day capability privilege systems are not
> necessarily as good in this regard as I think they should be.  Mostly
> they can do read and write protection.  But I think that we may need to
> have the ability to give native code the ability to call a method on an
> object, and have that method read and write the object data, without
> having code outside the method do so.
>
> If we can get to this, then I think that we will have removed one of the
> big barriers and impediments to creating secure systems, with different
> subsystems having the least amount of privilege needed.  At the moment,
> most of our systems have "privilege domains" that are far too broad -
> because there is overhead in dealing with privilege domains.
>

Is what you propose similar to i286/386 call gates?
I always found the idea usable, but I am not in the OS/VM/RTS racket. Those, who are "in" seemed to never like call gates.


>
>
> The software VMs like Java make it easier to create separate privilege
> domains.  But they do it at the cost of leaving large swatches of code,
> the native code libraries and/or other software VMs, outside their
> security system.
>
> It is possible that hardware assists may improve the performance of
> Dalvik/Java/.net/Javascript.  But I think that it is worth doing even if
> the interpreter/JITter
>
>
>
> By the way: I can imagine an on-the fly binary rewriting tool that
> rewrote the native libraries to enforce capability like protection
> without hardware assists. In fact, that is something like how Cp/PL or
> possibly MPX was simulated.  It wasn't fast enough, but I can imagine it
> being made fast enough to make hardware assists unnecessary.  I am not
> religious about hardware assists.  I am evangelical about ubiquity - we
> want this fine grain security to apply to both Dalvik/Java/etc. and to
> native code. For security, you want to have the security checks done in
> as few places as possible.
>

Practicing professional claim the opposite. Security works best when you have multiple fences that only minimally trust each other.

Re: Bounded pointers Michael S 11/8/13 5:18 AM
On Friday, November 8, 2013 4:25:01 AM UTC+2, Andy (Super) Glew wrote:
> On 11/7/2013 12:37 AM, Michael S wrote:
> > On Thursday, November 7, 2013 8:37:39 AM UTC+2, Andy (Super) Glew wrote:
> >> On 11/6/2013 5:37 AM, Michael S wrote:
> >>
> >>> On Wednesday, November 6, 2013 4:16:46 AM UTC+2, Andy (Super) Glew wrote:
> >>>>
>
>
> >>> So, wouldn't 28:36 split of 64-bit register work fine? Or are you afraid that 28 bits are insufficient for key?
> >>
> >>
> >> 28 bits is insufficient for key. And 36 bits will soon be insufficient
> >> for virtual address.
> >>
> >
>
> > Now I understand that I still did not "got it".
> > I am still thinking in terms (segment selector, segment offset) tuples, w
> > With 36 bits being an offset. Sounds like I should read more about capabilities.
>
>
> No, I think you get it.
>
> 36 bits would be an offset.  Probably the largest object that you could  
> create in a 28/36 split would have 2^36, 64GB of memory (although some
> would argue that it could be 64GB * sizeof(array_element_type)).
>
>
> My rule of thumb is that I always want to be able to create an array of
> bytes the size of ... DRAM?  Whatever my virtual address space is?  And
> 36 bits sounds like it will not last very long.  Tablets, remember, are
> quickly coming to rival and surpass desktop PCs in capability.
>

That's why I repeat, in each and every post "smartphones and content-consumption oriented tablets".
[Although it's not really a good definition. Because these phablets/tablets are actively used for shooting videos/stills and for audio recording, all of which is content-creation. May be, even for basic photo editiong. I feel the difference, but can't express it in sufficiently short words.]

Content-creation oriented tablets are completely different kettle of fish. Those are essentially light PCs/laptops with touch-oriented interface. They do not necessarily run the same OS (OSX/WinPro/Ubuntu/Mint vs iOS/WinRT/Andoroid). More importantly, not today, but in the future, they hopefully will be operated mostly by computer-literate users (as today's masses of illiterates quickly  migrate to the first category of devices).
Computer-literate users are far less likely to be hit by malware if for nothing else then because they are more aware of its existence. They are also more capable of manual managing of sandboxes, including very coarse grain sandoboxes in for of full virtual machines.

In short, in those PC-like tablets, smart automatic security is less necessary, and often less wanted.

Not so in the first type of devices. Here as good as possible automatic security is *very* desirable, esp. on [less reliant on trusted app store and more close to traditional "wild" PC model] Android side.
And here, I say, 36-bit index will be sufficient at least until 2025.
May be you don't pay attention, but in that space we still have new devices with 256MB of RAM actively sold, although, frankly, they suck. And 512MB devices are still capable to fulfill majority of user's needs.

>
>
> 28 bits would be the number of objects.

The one thing that I don't understand at all in the whole non-OS-managed capabilities enterprise, is how we withdraw keys for short-lived objects. And it seems related to a question of seriousness of 28-bit limitation through the problem of key recycling. Recycling of the same key sounds like a serious security hole. And with 28 bits recycling [why still in the same process] is, probably, inevitable.


> 256M objects sounds very small.
> The only static partition that I would be happy with TODAY would be
> 32/40 or thereabouts - 4 billion objects, 2^40 bytes for the largest
> object.  And I do not think that 40 bits will last very long.
>
> I am more happy with dynamic partitions - e.g. there is a 4-6 bit field
> that says where the segnum/segoff partition is drawn.   So you can have
> 20/40 or 30/30 or ...
>
> But the real killer is that software just plain doesn't like segments.
> Lots of software wants to increment the offset blindly, and not worry
> about carry propagating into the selector.
>

So what's your solution?
Do you think that sparse population of total key space with legal values is sufficient?
Re: Bounded pointers Michael S 11/8/13 5:29 AM
On Friday, November 8, 2013 4:25:01 AM UTC+2, Andy (Super) Glew wrote:
> On 11/7/2013 12:37 AM, Michael S wrote:
> > On Thursday, November 7, 2013 8:37:39 AM UTC+2, Andy (Super) Glew wrote:
>
> Also, in the above we have only discussed segnum/segoff.  You may want
> to place other metadata in your capabilities - like permissions bits,
> object IDs that must be matched, etc.  Everything that you add like this
> inside a "pointer" gets subtracted from segnum/segoff.
>
> Probably my big claim to fame[*] here is realizing that you can stick
> much such metadata outside of a pointer per se, so that you are not
> limited by 64 or 128 or even 256 bits.  (Not sure if I have priority; I
> don't think Intel applied for any patent on this, so probably somebody
> else realized it first. Or maybe I did, way back in the 1980s. Whatever.
>

May be, out of ignorance, but I never thought about key part of tuple as a descriptor. I always though about it as a selector - an index in [optionally associative] array of descriptors, with hardware-managed SW-invisible TLB taking care of [average] speed issues. Not unlike hierarchical page tables.
Re: Bounded pointers Ivan Godard 11/8/13 8:07 AM
On 11/8/2013 5:18 AM, Michael S wrote:
> On Friday, November 8, 2013 4:25:01 AM UTC+2, Andy (Super) Glew
<snip>

>> 28 bits would be the number of objects.
>
> The one thing that I don't understand at all in the whole non-OS-managed
capabilities enterprise, is how we withdraw keys for short-lived objects.
And it seems related to a question of seriousness of 28-bit limitation
through the problem of key recycling. Recycling of the same key sounds
like a serious security hole. And with 28 bits recycling [why still in
the same process] is, probably, inevitable.
>

You have hit on two of the classical design issues in any caps syste,
and most security systems in general: how do you do revoke, and how do
you allocate/recover ids. There are a range of solutions to both
problems known, and probly many yet to be invented, but both must be
addressed in any system.

>> 256M objects sounds very small.
>> The only static partition that I would be happy with TODAY would be
>> 32/40 or thereabouts - 4 billion objects, 2^40 bytes for the largest
>> object.  And I do not think that 40 bits will last very long.
>>
>> I am more happy with dynamic partitions - e.g. there is a 4-6 bit field
>> that says where the segnum/segoff partition is drawn.   So you can have
>> 20/40 or 30/30 or ...


That was my floating-point pointer proposal, floated here. It is a
debugging aid, but (as proposed) cannot be used for system security.


>> But the real killer is that software just plain doesn't like segments.
>> Lots of software wants to increment the offset blindly, and not worry
>> about carry propagating into the selector.

Lots of software violates the rules of the host language and indulges in
undefined behavior. So?


Re: Bounded pointers EricP 11/8/13 10:32 AM
Andy (Super) Glew wrote:
>
> Fair enough. I also tend to ignore SAS, because it is not what we have
> now. Plus I think that security and testing guys will always want to be
> ABLE to put processes (and guest OSes) in completely separate address
> spaces - if only to get reproduceable behavior for testing.  (By the
> way, SAS and isolated virtual machines???)

And since the kernel can always diddle the hardware to force
multiple address spaces, there is no such thing as SAS.
For example (and I haven't re-watched the video with this in mind),
on the Mill one might accomplish an address spaces swap by
flushing the entire virtual cache back to physical DRAM,
then then switching physical page table roots.

Probably not what the designers had in mind though.

Eric


Re: Bounded pointers Ivan Godard 11/8/13 11:22 AM
Well, flushing the entire Mill cache would realize all the backless
memory, so everything would at least possess a physical address.
Currently there's no way to switch the page table root = it persists in
the hardware - but I suppose we could add one.

But I don't think the idea would work, First, there's the problem of
existing inter-address-space pointers: you could wind up with two
different cache lines with the same physical address, and I don't see
any evident way to avoid that (other then using physically-addressed
cache per normal).

Second, the code and data that is doing all this switching has to be in
cache too, so it's not really possible to completely flush the cache -
sort of "who flushes the flusher" :-) So for that to work, at least the
flusher has to live, at the same virtual address, in every address
space, i.e. SAS. Many "overlay" systems in early boxes with tiny address
spaces worked that way; the overlay code was always present, while
overlays were switched in and out.

Me, I'd call that "SAS with overlays" (the boxes had no MMU), rather
than "multi-address-space"; YMMV.

As for whether security types want to have aliased address spaces, that
has not been true in our investigations. There have been a lot more bugs
(and exploits) found that arise from code forgetting to translate
addresses of data shared across address spaces (e.f. a read() pointer
argument, which must refer into the client's space, not the OSs) than
bugs arising from someone forgetting to set access permissions on a
grant in a SAS.

For that matter, I haven't heard much complaint about all the systems
that reserve half the address space for a SAS OS, using the other half
for aliased client spaces.
Re: Bounded pointers Stefan Monnier 11/8/13 11:36 AM
> And since the kernel can always diddle the hardware to force
> multiple address spaces, there is no such thing as SAS.

The fact that the Mill is "virtually-address cache" is so far the only
thing that struck me a clear threat to its viability.  I know that there
is some support in some versions of the Linux kernel for embedded
systems without MMUs, but as for running the kind of OS you'd want for
a heavy server workload I wonder how easy it is.

Sure, you can flush the caches when switching processes, but your
performance will be seriously affected (especially since it seems that
their caches are virtually indexed even in the level-2 caches, so you'd
need to "flush deep").


        Stefan
Re: Bounded pointers Ivan Godard 11/8/13 12:27 PM
It's SAS; you do not flush the caches at all. The lines stay there, in
the cache (until they LRU and get evicted of course). Protection is *in
front* of the cache, so other processes cannot address those lines, even
though they are present in the cache.
Re: Bounded pointers Stephen Fuld 11/8/13 12:54 PM
On 11/8/2013 4:17 AM, Michael S wrote:
> On Friday, November 8, 2013 4:25:01 AM UTC+2, Andy (Super) Glew wrote:

snip

>> Anyway, something here is relevant because the on-march of NVRAM
>> (Non-Volatile Memory) such as flash makes it very, very, likely that all
>> storage on a device like a cellphone or tablet *could* be memory mapped
>> at all times. Which would probably have performance advantages.
>
> I don't see why there will be performance advantage.


I see potential for substantial performance improvements, especially as
with the type of systems we are discussing, you can change the hardware.

The improvement I see is by putting the flash memory on the memory bus
instead of treating it as a "fast disk" on the other side of some
peripheral bus and a controller.  This allows higher transfer rates due
to a wider bus and lower overhead due to the elimination of the device
driver code on the host side and the corresponding code in the
controller to convert the disk commands into memory accesses.  Yes,
there would have to be background code running in the host for block
relocation, wear leveling etc., but here again, the hardware could
assist by having a "page table", that is used to map the user requested
address to a possibly relocated hardware address.

The memory space would be divided at some point into the two
technologies and the hardware would know how to access each kind of
memory.  This also allows eliminating the need for 512 byte blocks (for
SATA disk compatibility), so you could use smaller blocks if that was
appropriate, which might reduce the amount of data transferred and thus
further improve performance.

If you still wanted a file system, as opposed to SAS that includes the
flash, it would be simplified over what we now have, a further potential
performance improvement.

In summary, we are in a "local minimum" due to the decision to ease
flash integration into desktop and server systems by making them emulate
disk drives.  Getting over the hill to reconsider that decision would
lead to improved performance.


--
  - Stephen Fuld
(e-mail address disguised to prevent spam)
Re: Bounded pointers Stefan Monnier 11/8/13 1:07 PM
> It's SAS;

But all the OSes I know used to run heavy workload Internet servers are
not SAS.  You can make them run by flushing the cache upon process
switch, but that will kill your performance.

So, you're going to have to either make those OSes work in an SAS, or
convert users to another OS.  I hope you can convert those OSes to SAS,
because that would be interesting in its own right.


        Stefan
Re: Bounded pointers Andy (Super) Glew 11/8/13 5:00 PM
Why "greatly increase"?

The basic bounds checks of MPX just get created by the compiler.  Any
standards compliant code.

Fancier restrictions can be based on type.


> Several years ago you said that one of your pet ideas is improving memory copying. Somehow I think that (improved memory copying => cheap serialization) is fundamentally better approach to the same problem than fine-grained access control.

You are not the only person to point this out.

It's like - why bother passing around pointers?  Instead, marshall your
data structures into XML, and pass them around as strings.


> Is what you propose similar to i286/386 call gates?

I don't think so.


>
> Practicing professional claim the opposite. Security works best when you have multiple fences that only minimally trust each other.

Sigh.

I have been a full-time practicing security professional at least twice
in my career. And part-time all of my career.

Plus, I have the say-so of some of the world's leading security
professionals - including the guy who may have invented, or at least
publicized, firewalls.

It is good to have many security checks, checking many different aspects
of security.  Multiple layers of security to break through.

But, for any given security check, you look for a bottleneck - as few
places as possible to play the check.

The bad thing is to implement the same logical check in many, many
different places - and miss some.  It is bad to say "all you need to do
is find every piece of code that might possibly increment an integer
that might overflow/wrap and thereby cause a buffer overflow, and
eityher prove to yourself that it can't wrap, or add an overflow check."
Re: Bounded pointers Andy (Super) Glew 11/8/13 5:03 PM
On 11/8/2013 5:18 AM, Michael S wrote:
> On Friday, November 8, 2013 4:25:01 AM UTC+2, Andy (Super) Glew wrote:

>> But the real killer is that software just plain doesn't like segments.
>> Lots of software wants to increment the offset blindly, and not worry
>> about carry propagating into the selector.
>>
>
> So what's your solution?
> Do you think that sparse population of total key space with legal values is sufficient?

My solution is not to do segments at all.

You can do capabilities without segments.

Nuff said.
Re: Bounded pointers Andy (Super) Glew 11/8/13 5:04 PM
On 11/8/2013 8:07 AM, Ivan Godard wrote:
> On 11/8/2013 5:18 AM, Michael S wrote:
>> On Friday, November 8, 2013 4:25:01 AM UTC+2, Andy (Super) Glew

>>> But the real killer is that software just plain doesn't like segments.
>>> Lots of software wants to increment the offset blindly, and not worry
>>> about carry propagating into the selector.
>
> Lots of software violates the rules of the host language and indulges in
> undefined behavior. So?

So fix the language.
Re: Bounded pointers robert...@yahoo.com 11/8/13 10:52 PM
On Thu, 07 Nov 2013 18:25:01 -0800, "Andy (Super) Glew"
<an...@SPAM.comp-arch.net> wrote:

>
>Fair enough. I also tend to ignore SAS, because it is not what we have
>now. Plus I think that security and testing guys will always want to be
>ABLE to put processes (and guest OSes) in completely separate address
>spaces - if only to get reproduceable behavior for testing.  (By the
>way, SAS and isolated virtual machines???)
>
>But... SAS has long been "of interest". Some of the early Java phones
>were SAS hardware.  I think the Newton was SAS.  So, while not high on
>my list, it is just something I always think about.
>
>By the way, I realized that I had erred in conflating "SAS (Single
>Address Space)" and "SLS (Single Level Store)" - SLS=>all files are
>memory mapped at all times. They often go together, but they don't have to.


FWIW, Windows Mobile originally was too, and it was an option in at
least the early WinCE stuff..
Re: Bounded pointers Michael S 11/9/13 8:48 AM
On Friday, November 8, 2013 10:54:55 PM UTC+2, Stephen Fuld wrote:
> On 11/8/2013 4:17 AM, Michael S wrote:
>
> > On Friday, November 8, 2013 4:25:01 AM UTC+2, Andy (Super) Glew wrote:
>
> snip
>
> >> Anyway, something here is relevant because the on-march of NVRAM
> >> (Non-Volatile Memory) such as flash makes it very, very, likely that all
> >> storage on a device like a cellphone or tablet *could* be memory mapped
> >> at all times. Which would probably have performance advantages.
> >
>
> > I don't see why there will be performance advantage.
>
> I see potential for substantial performance improvements, especially as
> with the type of systems we are discussing, you can change the hardware.
>
>

> snip


Sorry, Stephen, but your post sounds like you don't know enough about strong and weak points of today's NAND flash technology. In particular, it seems that you don't realize that block nature of the read access is just as fundamental as as block write access, and that desirable block size is actually bigger than with rotating HDs. I recommend to educate yourself a little bit.
Re: Bounded pointers EricP 11/9/13 9:38 AM
Ivan Godard wrote:
> On 11/8/2013 10:32 AM, EricP wrote:
>> Andy (Super) Glew wrote:
>>>
>>> Fair enough. I also tend to ignore SAS, because it is not what we have
>>> now. Plus I think that security and testing guys will always want to
>>> be ABLE to put processes (and guest OSes) in completely separate
>>> address spaces - if only to get reproduceable behavior for testing.
>>> (By the way, SAS and isolated virtual machines???)
>>
>> And since the kernel can always diddle the hardware to force
>> multiple address spaces, there is no such thing as SAS.
>> For example (and I haven't re-watched the video with this in mind),
>> on the Mill one might accomplish an address spaces swap by
>> flushing the entire virtual cache back to physical DRAM,
>> then then switching physical page table roots.
>>
>> Probably not what the designers had in mind though.
>>
>> Eric
>>
>>
>
> Well, flushing the entire Mill cache would realize all the backless
> memory, so everything would at least possess a physical address.

I didn't say it would be pretty!

> Currently there's no way to switch the page table root = it persists in
> the hardware - but I suppose we could add one.

By which I take you to mean that you hard wired the
root physical address of a hierarchical page table?
I can see lots of reasons to not do that:
that physical frame might be bad but the rest of memory is good.
A root pointer register would be better. Various other approaches
such as Block Address Translate or BAT tables have also been dicussed.

By the way, you also need some protected mode registers that the
OS can use to pointer to some important internal data structures.
In an SMP systems those structures are different for each cpu.

In SMP each cpu needs its own Processor State Descriptor (PsdT)
struct to store things like that cpu's interrupt state, thread, etc.
There would be a vector of PsdT structs in virtual memory
but we need to know which element is _this_ cpu's PsdT.
Also need the current thread header active on _this_ cpu.

On x86/x64 FS and GS segment registers are used for this purpose.

It is possible for hardware to NOT have such handy registers.
In that case an OS port can set up the page table for each cpu
slightly differently - only one page needs to be different.
On each cpu the same virtual address X maps to a different
physical page, and that page contains one item: the cpu number
for _this_ cpu that indexes into the PsdT vector to its Psd.

In order to do mapping games like this each cpu must have
a different page table root.

> But I don't think the idea would work, First, there's the problem of
> existing inter-address-space pointers: you could wind up with two
> different cache lines with the same physical address, and I don't see
> any evident way to avoid that (other then using physically-addressed
> cache per normal).

There are no inter-address space pointers. See below.

Also everything I'm saying would likely fall apart if the
SMP coherency protocol was based on just virtual address.
However (just a guess) including an OS assigned Address Space ID (ASID)
with each cache virtual address might fix it all up.

> Second, the code and data that is doing all this switching has to be in
> cache too, so it's not really possible to completely flush the cache -
> sort of "who flushes the flusher" :-) So for that to work, at least the
> flusher has to live, at the same virtual address, in every address
> space, i.e. SAS. Many "overlay" systems in early boxes with tiny address
> spaces worked that way; the overlay code was always present, while
> overlays were switched in and out.

One popular way to do multi-processing is for the lower
half of the address space to be Process or P space, the part
that switches when the process switches, and the upper half to
be System or S space, the part that is common to all processes.

So yes, the flusher and switcher and the rest of the OS
would reside in S space that is common to all processes.
The OS would flush the lower P space virtual cache.
Then it could overwrite the lower half entries in the
root page table frame to effect the space switch.
And it updates the ASID at this point.

This is another reason to have a page table root pointer as
it allows each SMP cpu to manage its own space differently.

As with traditional OS's, if someone wants to send a message
between processes then it must be copied from P space
into S space, the process switched, then copied back.
Pointers in S space are valid in all process context,
pointers in P space are valid only in their own.

> Me, I'd call that "SAS with overlays" (the boxes had no MMU), rather
> than "multi-address-space"; YMMV.
>
> As for whether security types want to have aliased address spaces, that
> has not been true in our investigations. There have been a lot more bugs
> (and exploits) found that arise from code forgetting to translate
> addresses of data shared across address spaces (e.f. a read() pointer
> argument, which must refer into the client's space, not the OSs) than
> bugs arising from someone forgetting to set access permissions on a
> grant in a SAS.

Yes that does happen from time to time though I can't
speak to its frequency relative to other bugs.

> For that matter, I haven't heard much complaint about all the systems
> that reserve half the address space for a SAS OS, using the other half
> for aliased client spaces.

I'm not sure what an 'aliased client space' is but it
may be what I was describing above.

Eric


Re: Bounded pointers EricP 11/9/13 10:14 AM
EricP wrote:
>
> So yes, the flusher and switcher and the rest of the OS
> would reside in S space that is common to all processes.
> The OS would flush the lower P space virtual cache.
> Then it could overwrite the lower half entries in the
> root page table frame to effect the space switch.
> And it updates the ASID at this point.

Oops... if there is an ASID then it tags the virtual
cache lines so we don't need to flush the cache.
Just switch the PTEs, the ASID and go.

Hmmm... how is coherency maintain with DMA?

Eric
Re: Bounded pointers Bakul Shah 11/9/13 10:40 AM
On 11/8/13, 12:27 PM, Ivan Godard wrote:
> It's SAS; you do not flush the caches at all. The lines stay there, in the cache (until they LRU and
> get evicted of course). Protection is *in front* of the cache, so other processes cannot address
> those lines, even though they are present in the cache.

If you *had* to put in a MMU what would the Mill arch. lose?
And if you had to provide virtual machine support? These days
even smart phones sport multiple virtual address spaces!
Re: Bounded pointers Andy (Super) Glew 11/9/13 5:46 PM
On 11/8/2013 5:00 PM, Andy (Super) Glew wrote:
> On 11/8/2013 4:17 AM, Michael S wrote:

>> Several years ago you said that one of your pet ideas is improving
>> memory copying. Somehow I think that (improved memory copying => cheap
>> serialization) is fundamentally better approach to the same problem
>> than fine-grained access control.
>
> You are not the only person to point this out.
>
> It's like - why bother passing around pointers?  Instead, marshall your
> data structures into XML, and pass them around as strings.

By the way: I am only being slightly facetious when I say that passing
around serialized data structures may be the "way of the future". It
certainly has many advantages from a software engineering and security
point of view. You are mainly restricted to passing around relatively
compact linked data structures by value, not reference - but some
suggest that is a good idea anyway.

However, I am fairly certain that passing things by reference is less
power, than marshalling/serializing megabytes of memory.

We are now in a situation where

a) passing things by reference as pointers is cheap - but insecure at
the hardware level, since all participants usually have to be in the
same privilege domain. (Exceptions for kernel accessing user, and for
communications windows that are shared although most memory is not shared.)

But of course, pointers are mainly limited to cache coherent shared
memory. They don't scale to the internet.

b) passing things by value, as serialized data - strings, XML, JSON,
etc. - is certainly the only thing that scales to internet. Can easily
cross security domain boundaries.

There may be a trend to do more and more such passing around of
serialized values even inside shared memory machines.

c) passing by reference, using URLs, is certainly done.

But I hazarsd that there is circa 1000X cost increase between each.
Re: Bounded pointers Andy (Super) Glew 11/9/13 5:50 PM
On 11/8/2013 5:04 PM, Andy (Super) Glew wrote:
> On 11/8/2013 8:07 AM, Ivan Godard wrote:
>> On 11/8/2013 5:18 AM, Michael S wrote:
>>> On Friday, November 8, 2013 4:25:01 AM UTC+2, Andy (Super) Glew
>
>>>> But the real killer is that software just plain doesn't like segments.
>>>> Lots of software wants to increment the offset blindly, and not worry
>>>> about carry propagating into the selector.
>>
>> Lots of software violates the rules of the host language and indulges in
>> undefined behavior. So?
>
> So fix the language.

That's a bit tongue in cheek.

What I am trying to say is that many programs do things that are in the
borderland between "undefined by the language, but behave quite portably
on most implementations of the language".  And also that many such
things are useful, like cramming tag bits into pointers.  Ada did a much
better job of allowing control of such low level representations via a
high level language than C and C++ do.
Re: Bounded pointers Andy (Super) Glew 11/9/13 6:01 PM
??  are you saying that Windows Mobile and WinCE were originally SLS -
Single Level Store? Or SAS?  In either case, thanks, I did not know that.

I also recall reading that Jeff Hawkins said that one of the best things
that Palm had to differentiate it from its competitors, in the early
days of PDAs, was that it was execute-in-place. No distinction between
RAM and filesystem - it was all DRAM.  Q: was PalmOS 1.0 SLS or SAS?
Anyone know?

I wonder if this means that SLS and SAS are more of interest in early
days of a form factor, when memory challenged?
Re: Bounded pointers robert...@yahoo.com 11/10/13 12:49 AM
On Sat, 09 Nov 2013 18:01:56 -0800, "Andy (Super) Glew"
<an...@SPAM.comp-arch.net> wrote:

>On 11/8/2013 10:52 PM, Robert Wessel wrote:
>> On Thu, 07 Nov 2013 18:25:01 -0800, "Andy (Super) Glew"
>> <an...@SPAM.comp-arch.net> wrote:
>>
>>>
>>> Fair enough. I also tend to ignore SAS, because it is not what we have
>>> now. Plus I think that security and testing guys will always want to be
>>> ABLE to put processes (and guest OSes) in completely separate address
>>> spaces - if only to get reproduceable behavior for testing.  (By the
>>> way, SAS and isolated virtual machines???)
>>>
>>> But... SAS has long been "of interest". Some of the early Java phones
>>> were SAS hardware.  I think the Newton was SAS.  So, while not high on
>>> my list, it is just something I always think about.
>>>
>>> By the way, I realized that I had erred in conflating "SAS (Single
>>> Address Space)" and "SLS (Single Level Store)" - SLS=>all files are
>>> memory mapped at all times. They often go together, but they don't have to.
>>
>>
>> FWIW, Windows Mobile originally was too, and it was an option in at
>> least the early WinCE stuff..
>
>??  are you saying that Windows Mobile and WinCE were originally SLS -
>Single Level Store? Or SAS?  In either case, thanks, I did not know that.


SAS.  Pre-WinCE 6.0, you could have up to 32 processes, of up to 32MB,
all mapped into a single 4GB address space, which also contained the
kernel.  Various hardware could provide various levels of protection,
ranging from none at all to fairly solid (x86 versions used ~32
ordinary page tables with different pages being marked as accessible
in the different tables*).  The OS got away with mostly just checking
the high byte of all addresses you passed to it.

WinCE 6.0 changed that to a traditional multiple address space model.


*I never checked, but always assumed that it was actually just a
different page directory for each process.


>I also recall reading that Jeff Hawkins said that one of the best things
>that Palm had to differentiate it from its competitors, in the early
>days of PDAs, was that it was execute-in-place. No distinction between
>RAM and filesystem - it was all DRAM.  Q: was PalmOS 1.0 SLS or SAS?
>Anyone know?
>
>I wonder if this means that SLS and SAS are more of interest in early
>days of a form factor, when memory challenged?


Certainly the lack of a proper MMU will make a SAS a popular idea, and
that was true on S/360 as well as early mobile devices.

I've always had a soft spot for SAS, although you really need
something capability-like to go with it.  Inter-process communication
is just *so* clumsy with most multiple address space systems.
370/ESA's multiple address space handling solves some of the problems
(a subsystem can be called from, and access a client processes address
space, directly), but there's no restriction to only accessing a
portion of your address space.

SLS I'm less sure about - there's a nice symmetry to it, but long term
storage just doesn't really look or work like memory, and if you try
to treat it that way, you can easily have horrible performance
problems.  Nor does the integration of caps for memory and persistent
storage strike me as being all that useful - sure, you'd want some
connection between the two, but for the most part the serve fairly
different functions.  OTOH, the AS/400 folks have certainly made it
work.
Re: Bounded pointers Megol 11/10/13 2:21 PM
Larger block sizes than ~2MiB that's the minimum for a HDD to reach peak read/write rate? I find that hard to believe (haven't directly touched modern NAND Flash though).
Re: Bounded pointers Ivan Godard 11/10/13 3:13 PM
On 11/9/2013 5:46 PM, Andy (Super) Glew wrote:
> On 11/8/2013 5:00 PM, Andy (Super) Glew wrote:
>> On 11/8/2013 4:17 AM, Michael S wrote:
>
>>> Several years ago you said that one of your pet ideas is improving
>>> memory copying. Somehow I think that (improved memory copying => cheap
>>> serialization) is fundamentally better approach to the same problem
>>> than fine-grained access control.
>>
>> You are not the only person to point this out.
>>
>> It's like - why bother passing around pointers?  Instead, marshall your
>> data structures into XML, and pass them around as strings.
>
> By the way: I am only being slightly facetious when I say that passing
> around serialized data structures may be the "way of the future". It
> certainly has many advantages from a software engineering and security
> point of view. You are mainly restricted to passing around relatively
> compact linked data structures by value, not reference - but some
> suggest that is a good idea anyway.
>
> However, I am fairly certain that passing things by reference is less
> power, than marshalling/serializing megabytes of memory.
>
> We are now in a situation where
>
> a) passing things by reference as pointers is cheap - but insecure at
> the hardware level, since all participants usually have to be in the
> same privilege domain. (Exceptions for kernel accessing user, and for
> communications windows that are shared although most memory is not shared.)

And when running on a Mill. Security talk coming at some point.

:-)
Re: Bounded pointers Mark Thorson 11/10/13 3:25 PM
Ivan Godard wrote:
>
> And when running on a Mill. Security talk coming at some point.

I sure hope you use some other method than
the one for Tuesday's talk.  I can't register
because of eventbrite.com's broken software,
so I won't be able to attend that one.  I'd
like to be there, but it costs me a lot of time,
and I can't make that investment if there's
any possibility I won't be able to get in.
Re: Bounded pointers Ivan Godard 11/10/13 3:37 PM
On 11/9/2013 9:38 AM, EricP wrote:
> Ivan Godard wrote:
>> On 11/8/2013 10:32 AM, EricP wrote:
>>> Andy (Super) Glew wrote:
>>>>
>>>> Fair enough. I also tend to ignore SAS, because it is not what we have
>>>> now. Plus I think that security and testing guys will always want to
>>>> be ABLE to put processes (and guest OSes) in completely separate
>>>> address spaces - if only to get reproduceable behavior for testing.
>>>> (By the way, SAS and isolated virtual machines???)
>>>
>>> And since the kernel can always diddle the hardware to force
>>> multiple address spaces, there is no such thing as SAS.
>>> For example (and I haven't re-watched the video with this in mind),
>>> on the Mill one might accomplish an address spaces swap by
>>> flushing the entire virtual cache back to physical DRAM,
>>> then then switching physical page table roots.
>>>
>>> Probably not what the designers had in mind though.
>>>
>>> Eric
>>>
>>>
>>
>> Well, flushing the entire Mill cache would realize all the backless
>> memory, so everything would at least possess a physical address.
>
> I didn't say it would be pretty!
>
>> Currently there's no way to switch the page table root = it persists
>> in the hardware - but I suppose we could add one.
>
> By which I take you to mean that you hard wired the
> root physical address of a hierarchical page table?

Who said anything about physical?

> I can see lots of reasons to not do that:
> that physical frame might be bad but the rest of memory is good.
> A root pointer register would be better. Various other approaches
> such as Block Address Translate or BAT tables have also been dicussed.
>
> By the way, you also need some protected mode registers that the
> OS can use to pointer to some important internal data structures.
> In an SMP systems those structures are different for each cpu.

We haven't found any cases of this need, but then we haven't ported the
OS yet.  There are quite a few special registers (real time clock; base
of interrupt vector; CPUid; many others). However none seem to require a
"protected mode".

> In SMP each cpu needs its own Processor State Descriptor (PsdT)
> struct to store things like that cpu's interrupt state, thread, etc.
> There would be a vector of PsdT structs in virtual memory
> but we need to know which element is _this_ cpu's PsdT.
> Also need the current thread header active on _this_ cpu.

Yes, CPUid is a register, as is thread id and turf id - so what?

> On x86/x64 FS and GS segment registers are used for this purpose.

Did I mention that the Mill is not a member of the x86 family?

> It is possible for hardware to NOT have such handy registers.
> In that case an OS port can set up the page table for each cpu
> slightly differently - only one page needs to be different.
> On each cpu the same virtual address X maps to a different
> physical page, and that page contains one item: the cpu number
> for _this_ cpu that indexes into the PsdT vector to its Psd.

Beats me why you need a page table (and apparently aliasing) for this.

> In order to do mapping games like this each cpu must have
> a different page table root.

I'm pretty sure I said in some thread here that "Mill != x86".

>> But I don't think the idea would work, First, there's the problem of
>> existing inter-address-space pointers: you could wind up with two
>> different cache lines with the same physical address, and I don't see
>> any evident way to avoid that (other then using physically-addressed
>> cache per normal).
>
> There are no inter-address space pointers. See below.

Why not?

The context is: there are three one-byte values, and three processes.
Each byte is to be usable by two of the three processes, a different
pair each; attempted use by the third process of each pair should fault.
The bytes are in the middle of other stuff, usable by other mixes of
processes. Each process that can use a byte should use the same source
code and machine operations to do so that it uses for any other data access.

The problem: design an architecture that cheaply supports this use case.

> Also everything I'm saying would likely fall apart if the
> SMP coherency protocol was based on just virtual address.
> However (just a guess) including an OS assigned Address Space ID (ASID)
> with each cache virtual address might fix it all up.

Why bother?

>> Second, the code and data that is doing all this switching has to be
>> in cache too, so it's not really possible to completely flush the
>> cache - sort of "who flushes the flusher" :-) So for that to work, at
>> least the flusher has to live, at the same virtual address, in every
>> address space, i.e. SAS. Many "overlay" systems in early boxes with
>> tiny address spaces worked that way; the overlay code was always
>> present, while overlays were switched in and out.
>
> One popular way to do multi-processing is for the lower
> half of the address space to be Process or P space, the part
> that switches when the process switches, and the upper half to
> be System or S space, the part that is common to all processes.

Yes, that goes back quite a ways.

> So yes, the flusher and switcher and the rest of the OS
> would reside in S space that is common to all processes.
> The OS would flush the lower P space virtual cache.
> Then it could overwrite the lower half entries in the
> root page table frame to effect the space switch.
> And it updates the ASID at this point.

Why bother? Why not just give each process all the address space it
wants, disjoint from that given to all the other processes, and call the
OS a process just like all the others?

> This is another reason to have a page table root pointer as
> it allows each SMP cpu to manage its own space differently.
>
> As with traditional OS's, if someone wants to send a message
> between processes then it must be copied from P space
> into S space, the process switched, then copied back.
> Pointers in S space are valid in all process context,
> pointers in P space are valid only in their own.

Yes, the scheme does require copying. I don't see why anyone would want
to add *more* overhead, though. Seems to me (and I might have overlooked
something here; please correct me) that NO copying is better than SOME
copying, for arbitrary values of "some".


Re: Bounded pointers Ivan Godard 11/10/13 3:38 PM
DMA with virtual addresses.

Re: Bounded pointers Ivan Godard 11/10/13 3:42 PM
Yes, there's a ton of code that needs to be ripped out as part of the
port. We are told that there is still a fair amount of code in support
of microkernel in most of the relevant code bases (-nix anyway; I have
no idea about Windows), going back to the black cube. The L4 guys didn't
seem to have much trouble porting Linux, anyway.
(https://en.wikipedia.org/wiki/L4_microkernel)
Re: Bounded pointers Ivan Godard 11/10/13 3:50 PM
The Mill has a MMU already. Well, like everything else its presence is
configurable at tape-out, but I can't imagine a Mill member being sold
without one.

The MMU (and TLB) sits between the cache and DRAM - see the third talk
(ootbcomp.com/docs/memory), which has several slides that show this
clearly. I encourage you to take the time to actually see the
architecture; simply scrolling through the slides will cause unwarranted
assumptions.

VM support is trivial on the Mill: an ASID register that is prepended to
the address in memory references, and suitably expanded entries in the
tag array in the caches. There are no privileged operations, so there's
nothing for the hypervisor to have to trap on.
Re: Bounded pointers Andy Valencia 11/10/13 5:15 PM
Ivan Godard <iv...@ootbcomp.com> writes:
> Yes, there's a ton of code that needs to be ripped out as part of the
> port. We are told that there is still a fair amount of code in support
> of microkernel in most of the relevant code bases (-nix anyway; I have
> no idea about Windows), going back to the black cube. The L4 guys didn't
> seem to have much trouble porting Linux, anyway.

Part of me is worried about how much you might be biting off.  The other part
wishes I was free to jump in and have a go at it! :-)

Andy Valencia
Home page: http://www.vsta.org/andy/
To contact me: http://www.vsta.org/contact/andy.html
Re: Bounded pointers Ivan Godard 11/10/13 5:19 PM
On 11/10/2013 5:15 PM, Andy Valencia wrote:
> Ivan Godard <iv...@ootbcomp.com> writes:
>> Yes, there's a ton of code that needs to be ripped out as part of the
>> port. We are told that there is still a fair amount of code in support
>> of microkernel in most of the relevant code bases (-nix anyway; I have
>> no idea about Windows), going back to the black cube. The L4 guys didn't
>> seem to have much trouble porting Linux, anyway.
>
> Part of me is worried about how much you might be biting off.  The other part
> wishes I was free to jump in and have a go at it! :-)

You are as free as you want to be.

Ivan


Re: Bounded pointers Stephen Fuld 11/10/13 6:10 PM
On 11/10/2013 3:37 PM, Ivan Godard wrote:
>
> Why bother? Why not just give each process all the address space it
> wants, disjoint from that given to all the other processes, and call the
> OS a process just like all the others?

This is related to the issue I brought up some time ago.  The direct
answer to your question is that someone has to be the "giver" of address
spaces, and there can be only one process that is allowed to do that.

I can see mechanisms like at hardware reset, everything is enabled and
the first process to set some register is the giver and the hardware
prevents that register from being overwritten.

Or perhaps there is a process-id that is hardwired to be allowed to
control address spaces allocation, and at boot time, the OS takes that
process-id.

There may be other mechanisms, but there has to be something to make the
address space allocation "protected".
Re: Bounded pointers Ivan Godard 11/10/13 8:08 PM
You  know, there's PhD for somebody i studying the political assumptions
of operating systems structure. There's long been a truism (somebody
here is sure to remember the source that I can't) that IT projects tend
to structurally mirror the organizations that create them. Do OS model
mimic the national governments similarly?

Monolithic OS models tend to be autocratic: there is one (repeat *one*)
entity that can hand out things. You have to make a pilgrimage to the
Authority (i.e. take an OS trap - something interesting in that choice
of words, too), and receive your blessing from the benignly smiling font
of all Good Things.

An OS of this model can be built in a Mill, but it's not required.
Natively (another interesting word) anyone can give address space to
anyone else; no problem. The hardware itself gives address space to the
first asker, as part of the boot sequence. In fact, it gives *all* the
address space to that first visitor. Kind of as if Yahveh handed out
apples as door rises. Thereafter each may in turn give of that which he
has received. Rather a Quaker approach, really, and one hopes not too
Utopian.

I don't wish you, Stephen, to think that I'm calling you a Fascist; I'm
not. But it  is interesting how the implicit assumptions of Fascist
operating systems have so pervaded our world view that they unthinkingly
seem the be the Only Right Way.

In a more technical form: address space allocation is an unprivileged
operation called "grant". Protection (and how we expect it to be use in
OSs) is the subject of a future talk.

Ivan
Re: Bounded pointers Bakul Shah 11/10/13 11:35 PM
On 11/10/13, 3:50 PM, Ivan Godard wrote:
> On 11/9/2013 10:40 AM, Bakul Shah wrote:
>> On 11/8/13, 12:27 PM, Ivan Godard wrote:
>>> It's SAS; you do not flush the caches at all. The lines stay there, in
>>> the cache (until they LRU and
>>> get evicted of course). Protection is *in front* of the cache, so
>>> other processes cannot address
>>> those lines, even though they are present in the cache.
>>
>> If you *had* to put in a MMU what would the Mill arch. lose?
>> And if you had to provide virtual machine support? These days
>> even smart phones sport multiple virtual address spaces!
>
> The Mill has a MMU already. Well, like everything else its presence is configurable at tape-out, but
> I can't imagine a Mill member being sold without one.

I am still confused. Let me ask once more, hopefully more
clearly this time. Can the present Mill architecture provide
multiple virtual address spaces *efficiently*?  If not, what
would be the cost of providing this? Thanks.

> The MMU (and TLB) sits between the cache and DRAM - see the third talk (ootbcomp.com/docs/memory),
> which has several slides that show this clearly. I encourage you to take the time to actually see
> the architecture; simply scrolling through the slides will cause unwarranted assumptions.

There is no mention of MMU in your slides. You mention
virtual addresses and physical addresses but you also say
Single Address Space, which confuses me.

You say "pointers can be passed to OS or other tasks without
translation". To me that says at least some tasks run in the
same address space (i.e. the current definition of threads).
But your SAS comment leads me to believe all of them run in
the same address space.

> VM support is trivial on the Mill: an ASID register that is prepended to the address in memory
> references, and suitably expanded entries in the tag array in the caches. There are no privileged
> operations, so there's nothing for the hypervisor to have to trap on.

If you implemented capabilities in some fashion wouldn't this
fall out naturally? In a Cap. arch. *everything* is privileged
:-)

Re: Bounded pointers Bakul Shah 11/11/13 12:08 AM
On 11/10/13, 11:35 PM, Bakul Shah wrote:
> On 11/10/13, 3:50 PM, Ivan Godard wrote:
>> On 11/9/2013 10:40 AM, Bakul Shah wrote:
>>> On 11/8/13, 12:27 PM, Ivan Godard wrote:
>>>> It's SAS; you do not flush the caches at all. The lines stay there, in
>>>> the cache (until they LRU and
>>>> get evicted of course). Protection is *in front* of the cache, so
>>>> other processes cannot address
>>>> those lines, even though they are present in the cache.
>>>
>>> If you *had* to put in a MMU what would the Mill arch. lose?
>>> And if you had to provide virtual machine support? These days
>>> even smart phones sport multiple virtual address spaces!
>>
>> The Mill has a MMU already. Well, like everything else its presence is configurable at tape-out, but
>> I can't imagine a Mill member being sold without one.
>
> I am still confused. Let me ask once more, hopefully more
> clearly this time. Can the present Mill architecture provide
> multiple virtual address spaces *efficiently*?  If not, what
> would be the cost of providing this? Thanks.
>
>> The MMU (and TLB) sits between the cache and DRAM - see the third talk (ootbcomp.com/docs/memory),
>> which has several slides that show this clearly. I encourage you to take the time to actually see
>> the architecture; simply scrolling through the slides will cause unwarranted assumptions.
>
> There is no mention of MMU in your slides. You mention
> virtual addresses and physical addresses but you also say
> Single Address Space, which confuses me.

On re-reading it appears what you're calling TLB is
what I would call a MMU. TLB == a cache for PTEs
(CAM or N way direct map). You still need some sort
of pagetable to organize these PTEs.

>
> You say "pointers can be passed to OS or other tasks without
> translation". To me that says at least some tasks run in the
> same address space (i.e. the current definition of threads).
> But your SAS comment leads me to believe all of them run in
> the same address space.
>
>> VM support is trivial on the Mill: an ASID register that is prepended to the address in memory
>> references, and suitably expanded entries in the tag array in the caches. There are no privileged
>> operations, so there's nothing for the hypervisor to have to trap on.

Since there is no privileged mode, how are you going to switch
ASID?  Or can anyone change ASID? I suppose you'll have to do
some sort of capabilities implementation.... then presumably
the very first process (after reboot) can distribute keys as
needed. Then splitting out protection checking from TLBs also
makes perfect sense.

Looking forwarding to your protection talk!
Re: Bounded pointers Ivan Godard 11/11/13 12:31 AM
On 11/10/2013 11:35 PM, Bakul Shah wrote:
> On 11/10/13, 3:50 PM, Ivan Godard wrote:
>> On 11/9/2013 10:40 AM, Bakul Shah wrote:
>>> On 11/8/13, 12:27 PM, Ivan Godard wrote:
>>>> It's SAS; you do not flush the caches at all. The lines stay there, in
>>>> the cache (until they LRU and
>>>> get evicted of course). Protection is *in front* of the cache, so
>>>> other processes cannot address
>>>> those lines, even though they are present in the cache.
>>>
>>> If you *had* to put in a MMU what would the Mill arch. lose?
>>> And if you had to provide virtual machine support? These days
>>> even smart phones sport multiple virtual address spaces!
>>
>> The Mill has a MMU already. Well, like everything else its presence is
>> configurable at tape-out, but
>> I can't imagine a Mill member being sold without one.
>
> I am still confused. Let me ask once more, hopefully more
> clearly this time. Can the present Mill architecture provide
> multiple virtual address spaces *efficiently*?  If not, what
> would be the cost of providing this? Thanks.

No, the Mill has only one virtual address space; that's the definition
of a Single Address Space architecture (SAS). It's not a matter of
efficiency; there is only one, period. One of the other posters
suggested flushing the cache so it would be possible to alias the
virtual space, and - maybe - that could be done, but that is an overlay
in a single space, not multiple spaces.

>> The MMU (and TLB) sits between the cache and DRAM - see the third talk
>> (ootbcomp.com/docs/memory),
>> which has several slides that show this clearly. I encourage you to
>> take the time to actually see
>> the architecture; simply scrolling through the slides will cause
>> unwarranted assumptions.
>
> There is no mention of MMU in your slides. You mention
> virtual addresses and physical addresses but you also say
> Single Address Space, which confuses me.

There is a single virtual space. There is a single physical space. The
virtual space (60 bits) is much bigger than the physical space (member
dependent, perhaps 48 bits). As always on any machine, the hardware and
mapping tables that translates from virtual to physical addresses are
called an MMU, of which the TLB (shown in the slides) is a cache. The
Mill has a quite normal MMU/TLB, other than three differences: where it
appears in the memory reference stream (after the cache rather than
before it); how it responds to an unmapped reference (return a zero on a
load; allocate a one-line-page on evict); and having one additional
supported page size (one line). Otherwise it maps virtual to physical
(or to disk pages) quite ordinarily.

> You say "pointers can be passed to OS or other tasks without
> translation". To me that says at least some tasks run in the
> same address space (i.e. the current definition of threads).
> But your SAS comment leads me to believe all of them run in
> the same address space.

All of them run in the same space; that's what SAS means. They do not
necessarily possess the same rights in that space. The bit pattern of an
address means the same memory byte to all, but only some can use that
bit pattern in a load or store to actually reach and use that byte.

>> VM support is trivial on the Mill: an ASID register that is prepended
>> to the address in memory
>> references, and suitably expanded entries in the tag array in the
>> caches. There are no privileged
>> operations, so there's nothing for the hypervisor to have to trap on.
>
> If you implemented capabilities in some fashion wouldn't this
> fall out naturally? In a Cap. arch. *everything* is privileged

The Mill uses grant-based security, not capabilities. I wish it were
otherwise, but I know how to sell grant security and I don't know how to
sell caps.

Mill grants have byte granularity. Grants are a limited subset of caps.
They do address protection, but necessarily on a per-turf basis rather
than per object; it is not possible to attach rights to an object, only
to the user of that object. Grants and caps have different propagation
rules. Caps tend to have a much richer notion of rights than grants do.
Other differences.

Ivan
Re: Bounded pointers Michael S 11/11/13 12:57 AM
On Monday, November 11, 2013 12:21:05 AM UTC+2, Megol wrote:
> On Saturday, November 9, 2013 5:48:48 PM UTC+1, Michael S wrote:
>
>
> > Sorry, Stephen, but your post sounds like you don't know enough about strong and weak points of today's NAND flash technology. In particular, it seems that you don't realize that block nature of the read access is just as fundamental as as block write access, and that desirable block size is actually bigger than with rotating HDs. I recommend to educate yourself a little bit.
>
>
>
> Larger block sizes than ~2MiB that's the minimum for a HDD to reach peak read/write rate? I find that hard to believe (haven't directly touched modern NAND Flash though).

No, by "block" I didn't mean "smallest unit accessible at peak rate", i.e. in case of rotating HDs what people used to call "track". I'd rather meant "smallest individually addressable unit", what people used to call "sector" and which was traditionally set to 512B.


Re: Bounded pointers Ivan Godard 11/11/13 1:05 AM
In the terminology I am familiar with (terms often vary in usage) the
TLB is a cache in the MMU. That is, the TLB is a hardware CAM that acts
like a hash table, each of whose entries are keyed by virtual address
and carry physical addresses, flags, and other info. The TLB is a cache
over a much larger memory table, typically with a tree-like organization.

The MMU is the computational hardware that takes a virtual address and a
table entry from the TLB and tries to produce a physical address.
The MMU is tasked with walking the table and updating the TLB if the
keyed entry is not found in the TLB; alternatively it may trigger a trap
to software. On the Mill is has additional duties: returning data values
(zeros) directly on backless loads, and allocating pages on backless
evicts. It will also detect that an address has been paged out, and
signals software to bring the page in from disk or other  backing store,
and may be tasked with detecting  access to invalid addresses if that is
not done by a protection unit.

>> You say "pointers can be passed to OS or other tasks without
>> translation". To me that says at least some tasks run in the
>> same address space (i.e. the current definition of threads).
>> But your SAS comment leads me to believe all of them run in
>> the same address space.
>>
>>> VM support is trivial on the Mill: an ASID register that is prepended
>>> to the address in memory
>>> references, and suitably expanded entries in the tag array in the
>>> caches. There are no privileged
>>> operations, so there's nothing for the hypervisor to have to trap on.
>
> Since there is no privileged mode, how are you going to switch
> ASID?  Or can anyone change ASID? I suppose you'll have to do
> some sort of capabilities implementation.... then presumably
> the very first process (after reboot) can distribute keys as
> needed. Then splitting out protection checking from TLBs also
> makes perfect sense.

If the machine is not virtualized then there is no ASID because there is
only one address space, 60 bits in size.

If the machine  is virtualized then the hypervisor has a single address
space which is larger than the single address spaces of each client. For
ease of implementation, the clients address spaces together are laid out
to tile the hypervisor space, so each client space can be represented by
a small index into the array of client spaces. The index may be thought
of as an ASID, but it is an index with numeric meaning, not an arbitrary
id. The client address space is 60 bits; with 256 possible concurrent
clients the hypervisor space would be 68 bits.

The hardware maintains a register holding the current index, and the
index is prepended to any address computed by load and store operations.
The concatenated index and load/store address form an address in the
hypervisor space. Caches and MMU/TLB key with the hypervisor address,
not the smaller client load/store address.

Hence, the changes from an unvirtualized member to a virtualized on
require 1) deciding the number of supported clients (and hence the size
of the client index); 2) extending the key array of the caches, PLB, and
TLB to support the wider hypervisor addresses; 3) minor changes to the
IPC machinery (not filed yet) to permit communication and data sharing
between clients; 4) porting hypervisor software.

Currently there's no component defined to deliver a software interrupt,
but we will have to add a HEYU when we start worrying about multicore
and the hypervisor can use that for interrupt relaying. Need to get
single-core working first :-)


Re: Bounded pointers Stephen Fuld 11/11/13 1:09 AM
There it is.  As I stated above, there has to be some mechanism for this
and you just said what it is.  Once you have that, the rest of what you
indicated can follow.


> In fact, it gives *all* the
> address space to that first visitor.

Sure.


> Kind of as if Yahveh handed out
> apples as door rises. Thereafter each may in turn give of that which he
> has received. Rather a Quaker approach, really, and one hopes not too
> Utopian.

I understand.  My post was just to indicate that there must be some one
at first.

> I don't wish you, Stephen, to think that I'm calling you a Fascist; I'm
> not. But it  is interesting how the implicit assumptions of Fascist
> operating systems have so pervaded our world view that they unthinkingly
> seem the be the Only Right Way.

I didn't think you were calling me a fascist.  :-)  And I am familiar
with several OSs that do things differently, including the Burroughs
large scale MCP (which also has no hardware privilege).


> In a more technical form: address space allocation is an unprivileged
> operation called "grant". Protection (and how we expect it to be use in
> OSs) is the subject of a future talk.

Fair enough.  Once you get to the first step, all the rest can follow.




>
> Ivan
Re: Bounded pointers Paul A. Clayton 11/11/13 8:50 AM
On Monday, November 11, 2013 4:05:44 AM UTC-5, Ivan Godard wrote:
[snip]
> If the machine is not virtualized then there is no ASID because there is
> only one address space, 60 bits in size.

Is the virtual address size defined as always 60 bits
for all family members?

If not, it might be interesting to use something like
the mechanism used for Stanford MIPS where upper bits
of the virtual address were masked out (I think it was
a simple mask, but there might have been a zero check.)
to provide bits for an ASID. Using such a mechanism
would allow more small virtual machines.

> If the machine  is virtualized then the hypervisor has a single address
> space which is larger than the single address spaces of each client. For
> ease of implementation, the clients address spaces together are laid out
> to tile the hypervisor space, so each client space can be represented by
> a small index into the array of client spaces. The index may be thought
> of as an ASID, but it is an index with numeric meaning, not an arbitrary
> id. The client address space is 60 bits; with 256 possible concurrent
> clients the hypervisor space would be 68 bits.

With a virtually tagged cache, virtual machines would not
be able to exploit caching of deduplicated memory. (From
what I understand, deduplication is mainly of benefit for
reducing *memory* footprint, but I suspect that for some
workloads sharing caching can provide a measurable benefit.)

(The Mill removes the easy deduplication opportunity of
zero pages, but some code pages and perhaps some data pages
could also be deduplicated.)

On the positive side, it seems that it might be possible
for a virtualization-capable implementation to avoid
nested (or the worse shadowed) page tables. Defining
virtualization properly at the start could avoid compatibility
issues (and virtualization holes) later. If all virtual
machine systems use paravirtualization (i.e., at least a
minimal awareness of the possible existence and manipulations
of a hypervisor layer)--as would seem practical given the
lack of legacy OS support needed--, then some issues might
be avoided. (On the other hand, such could make porting
existing OSes more involved.)
Re: Bounded pointers Ivan Godard 11/11/13 9:25 AM
On 11/11/2013 8:50 AM, Paul A. Clayton wrote:
> On Monday, November 11, 2013 4:05:44 AM UTC-5, Ivan Godard wrote:
> [snip]
>> If the machine is not virtualized then there is no ASID because there is
>> only one address space, 60 bits in size.
>
> Is the virtual address size defined as always 60 bits
> for all family members?
>
> If not, it might be interesting to use something like
> the mechanism used for Stanford MIPS where upper bits
> of the virtual address were masked out (I think it was
> a simple mask, but there might have been a zero check.)
> to provide bits for an ASID. Using such a mechanism
> would allow more small virtual machines.

Data representation is uniform across all family members, so 60 bits for
all. This requirement is for support of heterogeneous multi-core.

>> If the machine  is virtualized then the hypervisor has a single address
>> space which is larger than the single address spaces of each client. For
>> ease of implementation, the clients address spaces together are laid out
>> to tile the hypervisor space, so each client space can be represented by
>> a small index into the array of client spaces. The index may be thought
>> of as an ASID, but it is an index with numeric meaning, not an arbitrary
>> id. The client address space is 60 bits; with 256 possible concurrent
>> clients the hypervisor space would be 68 bits.
>
> With a virtually tagged cache, virtual machines would not
> be able to exploit caching of deduplicated memory. (From
> what I understand, deduplication is mainly of benefit for
> reducing *memory* footprint, but I suspect that for some
> workloads sharing caching can provide a measurable benefit.)
>
> (The Mill removes the easy deduplication opportunity of
> zero pages, but some code pages and perhaps some data pages
> could also be deduplicated.)

Deduplication  the file system and of physical pages works the same way
in a Mill as on any other architecture. Deduplication of virtual memory
is possible if the memory is known to be read-only - four copies of a
guest OS code page can be deduplicated to one shared copy, so long as
the page was placed at the same virtual address in each guest. However,
to take advantage of this the guest would need to know that it is a
guest. That might be complicated for the VM, so more likely the caches
would not be deduplicated. We have put almost no thought into the issue,
and so this answer may be obviated by future Mill development.

> On the positive side, it seems that it might be possible
> for a virtualization-capable implementation to avoid
> nested (or the worse shadowed) page tables. Defining
> virtualization properly at the start could avoid compatibility
> issues (and virtualization holes) later. If all virtual
> machine systems use paravirtualization (i.e., at least a
> minimal awareness of the possible existence and manipulations
> of a hypervisor layer)--as would seem practical given the
> lack of legacy OS support needed--, then some issues might
> be avoided. (On the other hand, such could make porting
> existing OSes more involved.)
>

Exactly. We have done some thinking about this part of virtualization,
and have concluded that the tables under the PLB can be shared
throughout, with each guest thinking he owns his subtree and there is no
hypervisor. The tables under the TLB are more difficult to make clean,
and paravirtualization might be the way to go. We have also discussed
using a two-step translation in hardware; that is not very painful on a
Mill given the location of translation.

Mostly what we need is to extend our sim for virtualization, port an OS
kernel, and play with it; one hopes with someone on the team who knows
where the VM bodies are buried. I suspect we can come up with something
clever, given that we have no legacy constraints.
Re: Bounded pointers Megol 11/13/13 5:22 AM
Then I don't understand your point. Current HDDs (at least as subset of them) have 4kiB sectors with the option to emulate 512 byte sectors by splitting + RMW in the background.
I had problems finding NAND datasheets but my impression is that a standard page is 4kiB+ECC which would make NAND superior to HDD in all ways except write/erase cycles.

Could still be missing something here.
Re: Bounded pointers Michael S 11/13/13 1:35 PM
I think, you're are missing nothing except context.

My point is not so much about NAND-based mass storage having *bigger* sectors than rotating HDs  as about NAND-based mass storage not having *smaller* sectors (although, I do think that high-throughput NAND SSDs do have bigger pages internally than 4KB you mentioned; IMHO, they have to, because their way to achieve high throughput is by stripping accesses between multiple, relatively slow throughput-wise, individual flash devices).

So, I claim that mapping NAND flashes directly into address space of app. processor (a.k.a. CPU) would be at least as counterproductive as direct mapping of sectors of rotating HD.

I claim that direct mapping of this sort becomes potentially profitable (although not necessarily so, there are other, equally important, prerequisites) only when individual "random" access at the physical level is not much bigger than a single cache line.
Re: Bounded pointers Stephen Fuld 11/14/13 10:07 AM
On 11/9/2013 8:48 AM, Michael S wrote:
> On Friday, November 8, 2013 10:54:55 PM UTC+2, Stephen Fuld wrote:
>> On 11/8/2013 4:17 AM, Michael S wrote:
>>
>>> On Friday, November 8, 2013 4:25:01 AM UTC+2, Andy (Super) Glew wrote:
>>
>> snip
>>
>>>> Anyway, something here is relevant because the on-march of NVRAM
>>>> (Non-Volatile Memory) such as flash makes it very, very, likely that all
>>>> storage on a device like a cellphone or tablet *could* be memory mapped
>>>> at all times. Which would probably have performance advantages.
>>>
>>
>>> I don't see why there will be performance advantage.
>>
>> I see potential for substantial performance improvements, especially as
>> with the type of systems we are discussing, you can change the hardware.
>>
>>
>
>> snip
>
>
> Sorry, Stephen, but your post sounds like you don't know enough about strong and weak points of today's
> NAND flash technology. In particular, it seems that you don't realize that block nature of the read
 > access is just as fundamental as as block write access, and that
desirable block size is actually
 > bigger than with rotating HDs. I recommend to educate yourself a
little bit.


Thanks for the feedback. I did take the time to read a modern flash data
sheet.  I did realize that they were block read devices, but I hadn't
realized that the blocks were as large as they are in current
technology.  While that modifies what I wrote earlier, I think it is
still fundamentally sound.  However, based on your post below, I seem to
have left the wrong impression.

Putting the flash memory on the memory bus as opposed to some peripheral
bus is an independent decision from how the software (primarily the OS)
treats it.  Specifically, you could still have the flash directly
attached to the CPU via the memory buss, but still have the hardware and
software treat it as a block device.  This is pretty much what IBM did
with their extended memory on S/370, where they treated what was
physically RAM as a 4Kbyte block addressable storage.

The performance (and perhaps power) advantage of the memory bus is the
elimination of the overhead of the translation to say SATA commands,
then translation back to memory commands for the flash, and elimination
of the serialization/deserialization overhead and potentially faster
transfers through the wider bus.

You would still treat the flash logically as a block memory, not a RAM,
though I do note that there is software available to treat it sort of
like RAM, hiding the implementation of a log structured file system
beneath standard interfaces such as malloc.
Re: Bounded pointers Anton Ertl 11/15/13 5:35 AM
Michael S <already...@yahoo.com> writes:
>Sorry, Stephen, but your post sounds like you don't know enough about stron=
>g and weak points of today's NAND flash technology. In particular, it seems=
> that you don't realize that block nature of the read access is just as fun=
>damental as as block write access, and that desirable block size is actuall=
>y bigger than with rotating HDs.

How big is the desirable block size for flash?  For hard disks it's
about 1-2MB these days (based on the assumption that you want to get
half of the maximum bandwidth).

- anton
--
M. Anton Ertl                    Some things have to be seen to be believed
an...@mips.complang.tuwien.ac.at Most things have to be believed to be seen
http://www.complang.tuwien.ac.at/anton/home.html
Re: Bounded pointers William Clodius 11/15/13 8:55 PM
Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:

> Michael S <already...@yahoo.com> writes:
> >Sorry, Stephen, but your post sounds like you don't know enough about stron=
> >g and weak points of today's NAND flash technology. In particular, it seems=
> > that you don't realize that block nature of the read access is just as fun=
> >damental as as block write access, and that desirable block size is actuall=
> >y bigger than with rotating HDs.
>
> How big is the desirable block size for flash?  For hard disks it's
> about 1-2MB these days (based on the assumption that you want to get
> half of the maximum bandwidth).
>
> - anton
I'm not a ahardware guy, but I would have thought that several (related)
blocks could be requested in the seek time, and that blocks that are
related would be written on the same or neighboring tracks with
appropriate heuristics for data storage. In that case that block size
would not be the limiting factor in throughput, and that smaller block
sizes would be used to minimize disk wastage for smaller files, e.g.,
text and email messages.
More topics »