On 7/31/2013 9:32 PM, Ivan Godard wrote:
> On 7/28/2013 4:54 PM, Paul A. Clayton wrote:
>> Some links to information on the Intel website are
>> available here:
>>
http://software.intel.com/en-us/intel-isa-extensions#pid-16080-1495
>>
>> I have not looked at this yet, but the description I looked over
>> (
http://www.realworldtech.com/forum/?threadid=135174&curpostid=135274 )
>> seemed to indicate that the base/bound values can either be
>> directly loaded or be loaded from a general table (allowing
>> one to avoid the need for fat pointers).
>>
>> From my extremely vague impression, this might be similar
>> to a technique referenced by Andy Glew called HardBound
>> (Devietti et al., 2008, "HardBound: Architectural Support
>> for Spatial Safety of the C Programming Language"--I have not
>> read this paper either!).
>>
>> Sorry not to provide any real content on this, but since
>> no one else has posted this news here I thought that it
>> would be better to submit a low-content post quickly.
>>
>
By the way, the quick reference to the detailed description is
http://download-software.intel.com/sites/default/files/319433-015.pdf
I suspect, and am happy, but also embarrassed and concerned, to say, the
MPX is probably the emasculated remnant of my last major project at Intel.
By the way, I must emphasize that I am not trying to claim credit for
MPX. I left what may have been a related project circa 2009, and left
Intel completely a few months later for the second and final time. The
guys who brought MPX to the point where Intel can announce it certainly
deserve credit for years of hard work. Thomas Edison said invention is
1% inspiration and 99% perspiration: I worked on the 1% inspiration
across many years, dating back to when I was at Gould in the late 1980s;
although I think that the 3-4 full time years I spent on it at Intel
counts as perspiration, my time was undoubtedly only a small fraction of
the work spent on MPX. Plus, my work may not even have fed into MPX at all.
Suspect: the people blogging about MPX are people I worked with, who
were co-inventors, e.g. on US patent application 20110078389 MANAGING
AND IMPLEMENTING METADATA IN CENTRAL PROCESSING UNIT USING REGISTER
EXTENSIONS.
(Example of an Intel blog about MPX:
http://software.intel.com/en-us/blogs/2013/07/23/intel-memory-protection-extensions-intel-mpx-design-considerations.)
Happy: It's nice to see some part of that work be made publicly visible.
It's also nice to see Intel providing some support for security, even
if it is quite limited.
Embarrassed and concerned: MPX, as announced, runs the risk of being
like the old Intel BOUNDS instruction - not very useful. Can be
emulated with other instructions. Only a performance tweak, and I would
be surprised if it is a sizable performance tweak.
Embarrassed: having separate instructions to check lower bounds and
upper bounds looks very much like famous example of the VAX index
instruction
http://people.cs.umass.edu/~emery/classes/cmpsci691st/readings/Arch/RISC-patterson.pdf.
Instead of doing A <= B <= C, do
unsigned B-A <= C-A. Since C-A can be pre-calculated, I can only
imagine that Intel did not want to create another instruction that used
a 3-input adder, as would be necessary to calculate
precalc(C-A) + -A + B
and do the proper checks.
Emasculated: (a) MPX is not an ISA that could be used to create
capability security, even if it is not that right now. Basically, it
doesn't do the checks as part of the memory access. (b) Excessive
overhead, e.g. separate lower and upper bound checks.
Concerned: my main concern is that MPX might give bounds checking such
a bad name that this avenue of ISA evolution may be cut off.
I hope that MPX will be good enough to demonstrate customer demand that
will justify further improvements. I am worried that it will not be good
enough.
> I'm somewhat surprised that Intel found this to be worth doing, but the
> customers must want it.
I am also surprised that Intel found the present form of MPX worth
doing. I would have expected the instruction overhead to be too high.
However, it seems that Intel may have made a software-only
implementation of this available
http://software.intel.com/sites/products/
parallelmag/singlearticles/issue11/7080_2_IN_
ParallelMag_Issue11_Pointer_Checker.pdf
I can only conjecture that this relatively minor instruction set
extension in MPX provides enough incremental performance improvement to
be justified. And this is why I have hope: perhaps a succession of such
small evolutionary steps will lead to an MPX 2.0 that would not be
embarrassing.
Another possibility: as you know, Intel recently changed CEO, replacing
Paul Otellini. Andy Grove's favorite, Renee James, in #2 in a box at the
top, president, with Brian Krzanich.
James may be in search of a success to justify, prove, establish herself
as worthy of being #1. In particular, most of James' experience is in
managing Intel's SW group, including the acquisition of McAfee for 7.68
billion dollars. She probably needs to stamp her mark on a chip,
hardware rather than software, to establish credibility. Getting an
instruction set extension like MPX into a chip would be one such way -
even if it is really not
However, this worries me: we may have seen something similar happen and
not turn out well. Pat Gelsinger IMHO pushed Larrabee prematurely from a
prototype into what was supposed to be a product. Larrabee and Pat both
crashed and burned. I worry that James may be pushing MPX into a
project - perhaps not prematurely (since I happen to know that stuff
like MPX had had simulated several billion times more instructions that
Larrabee), but with insufficient hardware support. I.e. MPX may have
been forced to make excessive compromises in order to fit into whatever
hardware budget it has been given for first ship.
> It is only usable with compiler co-operation, so
> it's not a cap but a debugging aid.
Any capability system requires memory allocator cooperation, e.g. to
create the bounds that are to be checked as part of the capability.
Compiler cooperation, if you are to create new tighter bounds every time
you take the address, e.g. of a field in a structure.
But, yes, MPX is a debugging aid only because the compiler may choose to
generate bounds checking code or not. I.e. bounds checking is at the
discretion of the compiler, code generator, or programmer choosing
compiler options. The fact that the bounds check instruction is
decoupled, independent of, the memory reference makes it hard to do
mandatory security.
This is the thing that I find most embarrassing. MPX, as announced,
cannot be used as the basis for a real capability security architecture.
On the other hand, Ivan, we members of the cult of capability security
are a very small minority. In many, most, subcultures of computer
architecture, not being a capability architecture would be a good thing.
The really embarrassing and concerning thing is that there is a lot of
instruction overhead to use MPX. Possibly less overhead than if similar
bounds checking were generated with existing instructions - but I would
be interested to see how much.
> Because the table is keyed by pointer location rather than by pointer
value it requires the compiler
> to track where a given pointer came from, which seems buggy to me.
Here I must correct you. (Actually, this whole post is mainly to correct
this single wrong statement.)
Think about it:
Any fat pointer capability system, like that if the IBM AS400, really
has the bounds associated with the address of the memory location
containing the pointer not the address that the pointer points to.
MPX is just like that - except instead of using a fat pointer the
bounds, which in the AS400 would be adjacent to the actual pointer - are
placed in an outlying table. You might call it a non-adjacent fat pointer.
Actually, keying by pointer value is what would constitute breaking the
capability security model - most definitely if the table were global. A
global table indexed by pointer value is ambient security, whereas a
table indexed by pointer location is capability security, but only if
the path to any such pointer location itself is protected by
capabilities. (It is this latter that MPX does not make mandatory.)
You might argue that a table keyed by pointer value might be made into a
capability system by manipulating it constantly, e.g. by removing
pointers a particular function or method is not allowed to access.
I.e. by creating a local descriptor table, not a global descriptor
table. However, as far as I know the only really commercially
successful almost-capability systems, like the IBM AS-400, have been fat
pointer based systems. (I know, Norm Hardy and KeyKOS - but I am afraid
that I don't consider that commercially successful. And I don't know
the details of its table manipulations.)
> And it's a big hunk of hardware.
Actually, I doubt that it is a big chunk of hardware. 4 128 bit
registers - which might even be obtained by reusing the existing x86
segment registers, which *may* not be heavily used in 64 bit code.
(Which originally were not supposed to be used in 64-bit code at all,
although more and more their use has been revived.)
Something like a page table walker for lookup. Possibly microcode, and
hence cheap to build, but slow to use.
The bounds check instructions themselves are quite simple. IMHO
excessively simple, as mentioned above.
>
> Several years ago we explored a sort of caps-lite approach that would
> provide similar debugging assistance, but set it aside because we felt
> that it didn't provide real (caps like) security and so wasn't worth the
> trouble. Intel does listen to the customers, so I guess there's demand,
> so I suppose we should dust ours off.
>
> Comparing the two, both use natural-sized pointers, which is a must to
> avoid breaking every data structure. Both permit pointer construction,
> comparison and delta (address difference) operations, and conversion
> to/from integer. Intel's is optional; ours is not although you can
> program around it by treating the pointer as an integer, for example for
> xor-linked lists.
I agree that making such checks mandatory is desirable.
But I think that requiring such mandatory security in Day 1 would be the
kiss of death for such a product. Too much legacy code would not work.
The theory with which I started the project that may have led to MPX is
that mandatory checking must be optional, at least at first. It must be
possible for a compiler or assembly language programmer to bypass all
bounds checks and capability checks, at least at first. I would like
to be able to create systems where security is mandatory, OS level -
e.g. by marking particular pages so that full checking is required for
any reference to them, e.g. potentially for all memory. But not at first.
The other part of my theory is that there must be secure pointers and
insecure pointers. But that the distinction should be transparent, as
much as possible, to compiler or assembly language programmer. Secure
pointers will always have some overhead, in power if not performance;
but if the same instructions manipulate both security and insecure
pointers, then compilers will always generate code sequences and library
functions to which secure pointers, possibly required mandatory
checking, can be passed.
I.e. my theory is that the most important aspect of such a security ISA
extension is making (possibly) secure code run as fast as possible on
insecure, unchecked, pointers. The speed with which secure code runs
secure, checked, pointers, is secondary.
Ideally, secure code running insecure pointers runs as fast as insecure
code, that does not have the possibility of checking.
MPX does not meet this goal.
However, I will admit: the customer demand for security was high enough
that perhaps this secure code/insecure data theory is not needed.
By the way: much of my thinking on this issue started out in comp.arch,
e.g. in a thread entitled Segments, capabilities, buffer overrun
attacks, 5/14/03.
In that posting thread I also described how and why I thought that it
was important to be able to do an instruction set extension without OS
intervention, and how to do it. (Whether security related, or
otherwise.) (A security extension without OS intervention is probably
always going to be advisory security, not mandatory.)
Probably most important, in that posting thread I figured out how to do
2X fat pointers - pointers that contained the address of the object, and
the address of a descriptor, slightly less overhead that a full fat
pointer which typically requires 4X the word size (address of object,
lower bound, upper bounds, stuff). And I figured out how to make 2X fat
pointers secure and non-forgeable, by allocating the full 4X descriptor
in a protected memory page, with a pointer back to the pointer in user
memory. I stopped posting as soon as I realized that I was onto
something. But my employer at the time, AMD, was not interested. (Let
me emphasize: I only started that thread because I thought it was
something that was totally bluesky. I did not think it was going to lead
to something useful. I stopped as soon as I realized that I might be
onto something, but I don't think I was ever able to do any AMD
proprietary work on it.)
A few years later, at Intel, I picked up the work from where it had been
left off, in public, on comp.arch. At Intel I finally realized that I
could eliminate the fat pointer in user data space completely, so that
there would be no incentive for a compiler to ever generate a
datastructure that could not hold a secure pointer - since
sizeof(secure-pointer) was the same as sizeof(insecure-pointer), 4 bytes
on a 32 bit machine, 8 bytes in a 64 bit machine, albeit with 32 or more
bytes allocated 256+ bits.
The fat pointer in user data space can be eliminated completely by
mapping the address of the pointer (stored in memory) to a descriptor.
Milo Martin and his students (and others) did this by a linear address
translation: Address_of_Descriptor = Address_of_ptr * Scale + Offset.
Typically this requires 4X the address space to be used for descriptors
as is used for the actual user data. Much of the descriptor space would
be empty. Because such a linear transformation is 1:1, it inherently
supports non-forgeability (if the descriptor memory is protected).
A more flexible table, like a tree, like the tree used for page tables /
virtual address translation, might require less memory overhead to store
descriptors. But required the back pointer to make the secure pointer
non-forgeable.
Intel MPX's BNDLX/BNDSTX instructions, with the simple BD (Bounds
Directory) and BT (Bounds Table) data structure is intermediate. It
requires less address space than Martin et al's linear transformation,
but it still has a glass jaw if sparse - e.g. if a single pointer is
allocated in a quarter page, an entire 4K page of memory is required,
with only a single pointer descriptor (Bound Table Entry) stored in it.
INTERESTING TECHNICAL NOTE (at least interesting to me): from
http://download-software.intel.com/sites/default/files/319433-015.pdf
I notice that MPX's BTE (Bound Table Entry) contains
* valid bit (of course)
* lower bound (of course)
* upper bound (of course)
* pointer value - ??
* reserved
The bounds are obvious.
The "pointer value" sounds like the backpointer, the address of the
pointer, that I first figured out the need for in old comp.arch USEnet
postings the Segments, capabilities, buffer overrun attacks, 5/14/03.
But you only need the address of the pointer / backpointer if you have a
fairly arbitrary datastructure mapping pointers stored in memory to
their bounds. I needed it for the 2003 post, with semi-fat pointers.
You need it for fairly arbitrary datastructures, like classic binary
trees. But you don't need it if you are doing address scaling/linear
transformations, like Martin et al. And you don't need it for the
BD/BT/BTE bounds table that the MPX docs seem to describe - which can be
described as semi-linear, or linar in the BD and BT. You might need it
for the MPX tables if two BDs can point to the same BT, e.g. as an
optimization to permit reducing memory by sharing BTs, using the address
of the pointer to disambiguate. But usually I am the only guy krazy
enough to think of doing something like that to reduce memory
consumption. Barring that, the MPX tables do not need to have the
"address of the pointer".
Notice also that they call it the "pointer value", whereas I have talked
about "address of pointer" or "backpointer". "Pointer value" sounds
like the address of the object that the pointer is pointing to, rather
than the address of the pointer itself.
Later, on page 9-10 of the reference, you can see examples of the
compact notation I use to discuss this: "Bounds in memory are associated
with the memory address where the pointer is stored, i.e., Ap. Linear
address LAp ..."
I.e. the pointer lives in user memory at address Ap, and contains the
address of the object, Ao. The descriptor lives somewhere else,
non-adjacent - I call it address Ad, for obvious reasons - and contains
Ao_lwb, and Ao_upb. My "backpointer" in the descriptor contains the
original Ap. It is not clear to me if the MPX descriptor / bound table
entry's "pointer value" is Ap or Ao.
The MPX BTE doesn't need at Ap for lookup. Noteven fr full security.
But for a large array, many possible Ao's could be used.
Q: what is this? What is this used for?
Prefetching?
Here's a thought that might be hopeful for those of us who have drunk
the Koolaid of capabilities: if there is circuitry to check that the
"pointer value in the BTE matches the pointer value, Ao, used by the
checking instructions BNDCL / BNDCU, and also the pointer used in an
actual memory reference, then this COULD be used for full, mandatory,
capability security.
But that seems extremely complicated. I would never propose that for an
efficient implementation. I am sure that Intel is not currently doing
it. But it is the sort of thing that I might keep in an unpublished
"future directions for the architecture" spec, if I felt that full
mandatory security was desirable even for the present, limited, MPX ISA.
So, it's an open question: exactly what is this"pomyer value" in the BTE
used for?
It is possible that this
> Intel's requires extra code at point of use; ours does not.
I consider it unfortunate that Intel's requires extra code to (a) load
(and store) the bounds, and (b) check them.
The latter is especially important, since it prevents a proper
capability security architecture from ever being achieved with the
instructions in the present MPX.
But, like I said, most people do not think that full capability security
should be a goal.
> Intel's
> potentially catches all invalid memory references; ours catches only
> most of them. Intel's permits arbitrary (wild) address arithmetic; ours
> treats that as an error whether you dereference the result or not (i.e.
> we follow the C/C++ standard), although we do handle the C++
> one-past-the-end correctly.
> Intel's has a range cache and memory tables
> and associated hardware; ours does not.
Does Intel's have a range cache? I didn't see that.
> Intel's has a set of additional
> registers to be managed; ours does not.
My initial direction was to not have any additional SW visible
registers: hence US patent application 20110078389 MANAGING AND
IMPLEMENTING METADATA IN CENTRAL PROCESSING UNIT USING REGISTER
EXTENSIONS, where each general purpose register used for addressing had
an extension.
I would still prefer that. Meets my theoretical goal of having secure
code run insecure pointers with no unnecessary instructions.
However, as it became obvious that the ISA extension would only be given
a minimal hardware budget (it was competing with Larrabee), I eventually
acquiesced to having separate bounds/capability registers. I.e. I was
willing to compromise my design principles. I was only willing to do
this after wrestling with myself, eventually making myself comfortable
that it was still capability security, even though it might not reach my
minimal instruction overhead goals.
I.e. I was willing to compromise performance and compatibility, but not
security. Interestingly, some of the other members of the team were
vociferously opposed to this compromise, using words like "They would be
embarrassed to have their names associated with it." This was
essentially the last straw, that resulted in me leaving - being pushed
off - the instruction set extension project that was based on my ideas
and inventions. And yet MPX is now announced, both with the compromise
that I was willing to make (separate bounds) and the compromise that I
was not willing to make (no possibility of being a capability ISA).
>
> We will file ours, although it is not clear whether it will be included
> in production Mills.
--
The content of this message is my personal opinion only. Although I am
an employee (currently Imagination Technologies, which acquired MIPS; in
the past of companies such as Intellectual Ventures/QIPS, Intel, AMD,
Motorola, and Gould), I reveal this only so that the reader may account
for any possible bias I may have towards my employers' products. The
statements I make here in no way represent my employers' positions on
the issue, nor am I authorized to speak on behalf of my employers, past
or present.