I will lead with the caveat that I am not a hardware person. However, I am
interested in computer architecture, and am curious whether any of the
following architectural features have been proposed or studied at all in the
past. If somebody could comment on them, or point at existing resources, that
would be great. Thanks!
An optional branch is one that, just like a regular branch, must be taken if
its condition is true. However, if the condition turns out to be false, the
hardware may, at its discretion, either take the branch or not take the
branch. The advantage is that, if the branch was predicted taken, but the
condition turned out to be false, it is not necessary to roll back any state.
The idea is that software can use these in cases where it has a fast path and
a slow path, and the fast path works for only a subset of inputs, where the
slow path works for all inputs. It's a performance win when the fast path is
fast enough to be worth having, but the difference in performance between the
fast and slow paths is less than that of a mispredict.
This feature is actually already on gpus
but obviously branches work very differently on gpus.
These can be transparently taken advantage of by compilers with no
source-level changes (though the latter may be helpful). Instructions that
operate on memory locations can specify a tag with a few bits; the behaviour
is undefined if any two memory locations with different tags alias. This
simplifies memory disambiguation, removing the need to spend so many resources
tracking and predicting it.
Lightweight fences can be provided, possibly implicitly at subroutine
boundaries, such that two operations on opposite sides of a fence are _always_
allowed to alias.
This one I'm least sure of. The idea is to do less bookkeeping for things
like gc barriers and bounds checks, where the slow path is very slow and can
afford to do some of the state reconstruction in software. A coalesceable
branch is a special type of branch, which has a tag associated. If the
condition of a coalesceable branch is true, you can either take that branch,
_or_ take any previous coalesceable branch with the same tag. There can again
be lightweight fences to enable local reasoning in code generation.
There's a common theme here: restrictions are added on the software side, and
freedom added to the hardware side. The hardware could choose to not make use
of that freedom, and continue operating as it always has. I'm not quite sure
what to make of that.