On Mon, 20 Aug 2012 21:48:11 -0700 (PDT), Hugh Aguilar
<
hughag...@nospicedham.yahoo.com> wrote:
>On Monday, August 20, 2012 12:45:52 PM UTC-7, George Neuner wrote:
>> On Thu, 16 Aug 2012 20:40:38 -0700 (PDT), Hugh Aguilar
>> <
hughag...@nospicedham.yahoo.com> wrote:
>
>> Any conditional or indirect branch will introduce a bubble into the
>> prefetch pipeline stage ... but on modern chips the question of
>> whether the decode stage will stall depends on if the address resolve
>> must wait on the register (e.g., for a load to complete) and if the
>> branch target location is somewhere in cache. Typically a code fetch
>> from L2 will be fast enough to avoid a stall.
>
>> Now the x86 also has memory-indirect branch where the target is
>> fetched from a location provided as an argument to the instruction.
>> This can't be scheduled as effectively as a separate register load +
>> branch, so memory-indirect pretty much is guaranteed to stall unless
>> the address location happens to be in L1 (and the branch target is in
>> cache as above).
>
>In ITC the cfa is in a register (it is not a literal encoded in the operand),
>and the cfa contains the address of the code to execute. This is
>definitely going to stall.
Typo? I assume you meant the target address is in *memory* (as in a
jump table). This is *not* guaranteed to stall but only to introduce
bubbles into the pipeline. Technically to "stall" means the execution
pipeline drains completely - i.e. the CPU runs out of work to do -
before the code fetch mechanism can provide new work.
Remember that modern x86 (since Pentium and excepting Atom) are OoO
processors. The CPU can look ahead in the instruction stream and
begin working on an upcoming branch before executing unrelated
instructions that precede the branch.
I'm currently not aware of any OoO ARM, but the design doesn't
constrain execution order. ARM itself doesn't make chips - it
licenses the design - and so the details of implementation are up to
the manufacturer.
If both the jump table location and the target of the branch are in
cache, you take a hit but not necessarily a stall. The degree of the
hit depends on exactly where in the cache hierarchy the requested
items lie.
The x86 since i486 has had separate L1 caches for data and code. Since
Pentium Pro (P6) there has been on chip a combined code/data L2 cache.
Since the Pentium IV the L1 code cache contains not x86 instructions
but rather already decoded instruction traces in the CPU's internal
wide format. Code fetched from the L1 trace cache skips decoding and
is injected directly into the execution stage.
Still the case of a subroutine return followed closely by a memory
indirect branch is likely to cause a pipeline stall ... but this is
because the subroutine return itself *is* a memory indirect branch
(with address pulled from the stack) and the CPU can't handle two
memory indirect branches back-to-back unless all the data and code
involved can be served from the L1 caches.
However, there are ways to tail thread an interpreter so that each
"instruction" subroutine ends with a single indirect branch directly
to the next subroutine. It isn't necessary to return to a dispatcher
after each subroutine.
>In DTC the cfa is in a register (it is not a literal encoded in the operand),
>and the cfa is the address of the code to execute. My understanding
>of what you're saying, is that this won't stall --- although I had assumed
>that it would.
Right. Assuming the address value is in the register the cost to
redirect the prefetch is just an extra cycle to access it. Again this
is dependent on the register load having completed before the branch
instruction begins to execute, and on the target of the branch being
somewhere in cache (at least in L2).
>> It's the need to define code blocks ahead of time which is the
>> problem. Despite also borrowing from Lisp, AFAICT, there is no way in
>> Ruby for a code block (or even a regular function) to close over
>> variables, so references to non-locals within the block will use the
>> current values of those variables at execution time. IMO this creates
>> a lot of potential for programmer confusion between the definition
>> environment and the execution environment.
>
>It seems intuitive that the variables should be the current values at run-time
>--- why would the programmer expect the values at compile-time to be
> used?
I don't have handy a good example and it's a bit difficult to explain
my objection, but it's a matter of context and where the programmer's
attention is focused when writing the code. The case that concerns me
is a too often encountered "oops!" where the programmer's train of
thought gets derailed.
The typical use of a code block is _supposed_ (in the sense of
"thought to be") to be a little helper function defined inline at the
point of use. There are 2 problems with this:
Ruby actively discourages using explicit looping code (though it's
possible) in favor of using iterator functions. Ruby function can
accept a code block argument - but only one. Ruby's built-in iterator
functions all are 1-dimensional and pass only a single argument to the
block when it's executed. It is possible to write your own iterator
functions, but that is an advanced control concept that a lot of
programmers likely will never explore. It far more likely that a
programmer will contort the code to fit what Ruby provides.
So if you find you need to pass more than one parameter to a code
block, it's very likely that you will pass the extra parameters
indirectly by referencing non-locals. And if you find you need to
pass more than one code block into a function, you have to cut and
paste your inline definitions into explicit forward definitions
(requiring a slightly different syntax) bound to named variables and
then go back and rewrite the call site.
Assuming you got the cut-n-paste correct, you suddenly have a
*reusable* function that wasn't written with reusability in mind. But
you'll tend to forget that and, somewhere along the line, you are
bound to try to reuse one where it's external dependencies aren't
satisfied. Worse, it might even appear to work.
And once you discover that your little function really isn't reusable
and want to make it so ... you can't ... at least not without
redesigning and refactoring (some part of) your program. So instead
you are likely to copy/tweak it and from then on have to maintain
redundancies.
All of which might have been avoided had you designed a stateful
"function" object to use instead of the code block in the first place
... but you can't always pass a regular object to an iterator function
(it depends on how it was written), so the program design would have
to have taken a different course from the beginning.
To belabor my point: if Ruby allowed functions to accept multiple code
blocks (as did Smalltalk which was the inspiration), and if it's code
blocks could - at least optionally and with compiler warnings - close
over their external dependencies, the whole issue would go away and
the language would be both more powerful and easier to use.
I'm not in any way claiming these to be fatal flaws in Ruby ... but
the cut-n-paste is just annoying busy work under the best of
circumstances and it can result in a lot of head scratching if/when
the code block reuse bug bites. The worst of it is that I believe it
could easily have been avoided.
> ... I really don't know very much about computer science
FWIW, I have a CS Masters in database/data modeling, accumulated
course work (sans dissertation) for a second Masters in programming
language design/implementation, and 20+ years professional experience
in various areas including medical and HRT industrial imaging, high
performance server apps, some bare metal embedded, and along the way
I've actually managed to use my education a few times to design
databases and create DSL compilers/runtimes. I've forgotten how many
languages I've learned over the years.
For fun I experiment with languages, compilers and runtime systems ...
like so many others I'm searching for a really good way for the
*average* programmer to write programs that *safely* take advantage of
close coupled (i.e. not message passing) parallelism, without having
to explicitly deal with synchronization and without having to learn
strange new languages or weird development paradigms.
But (much more than) enough about me :="
It certainly isn't necessary to understand CS theory to apply it. I
think a formal CS education is (or ought to be) necessary for certain
application areas: operating systems, databases, developer tools, etc.
Other important areas like engineering design and support tools have
their own math/science/engineering requirements.
But for run of the mill productivity / education / entertainment
applications, I don't believe a formal CS or CE education is necessary
... maybe desirable, but not necessary.
Program development is as much art as it is science ... really good
developers are kin to renaissance philosophers. IMO, the most
important attributes for a developer to have are attention to detail,
the ability to visualize problems at many different levels of
abstraction and to quickly jump from one abstraction level to another
without getting confused.
As is typical, we're way off topic now and YMMV 8-)
George