Account Options

  1. Sign in
The old Google Groups will be going away soon.
Switch to the new Google Groups.
Google Groups Home
« Groups Home
Can someone explain step by step how one avoid many conditional in forth as described in Moore Fourth essay?
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  12 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Rod Pemberton  
View profile  
 More options Jan 11, 7:21 am
Newsgroups: comp.lang.forth, comp.sys.intel, comp.arch
From: "Rod Pemberton" <do_not_h...@noavailemail.cmm>
Date: Wed, 11 Jan 2012 07:21:53 -0500
Local: Wed, Jan 11 2012 7:21 am
Subject: Re: Can someone explain step by step how one avoid many conditional in forth as described in Moore Fourth essay?
"Arnold Doray" <inva...@invalid.com> wrote in message

news:jehfpu$9de$1@dont-email.me...
...

> > [Forth code]

> CPU pipelining is improved by reducing conditionals. Modern CPUs have
> branch prediction, but these aren't always successful, in which case the
> pipline needs to be flushed, lowering the CPU's throughput.

Is that still true for multiple cores?

I.e., I would think the following is entirely possible and plausible, but I
haven't studied a CPU design in decades.  The CPU's designers could execute
the process in parallel with one core taking one direction for the branch
and the another core taking the other branch direction.  Once the correct
branch is determined, the bad execution path is discarded.  If the primary
core had the good execution path, it just continues execution.  If the
alternate core had the good execution path, it's internal state could be
"pushed" to the primary core.  If they used static ram for the internal
state, then it could be "pushed" asynchronously, i.e., between clocks or
sub-clocks.  It would require reserving a core for the branch path
execution, at least temporarily.

Rod Pemberton


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Alex McDonald  
View profile  
 More options Jan 11, 7:29 am
Newsgroups: comp.lang.forth, comp.sys.intel, comp.arch
From: Alex McDonald <b...@rivadpm.com>
Date: Wed, 11 Jan 2012 04:29:23 -0800 (PST)
Local: Wed, Jan 11 2012 7:29 am
Subject: Re: Can someone explain step by step how one avoid many conditional in forth as described in Moore Fourth essay?
On Jan 11, 12:21 pm, "Rod Pemberton" <do_not_h...@noavailemail.cmm>
wrote:

Search on "speculative execution". It can also be done at the compiler
level; that doesn't require processor support. Intel's P6 was their
first chip to support it iirc.

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
n...@cam.ac.uk  
View profile  
 More options Jan 11, 7:48 am
Newsgroups: comp.lang.forth, comp.sys.intel, comp.arch
From: n...@cam.ac.uk
Date: Wed, 11 Jan 2012 12:48:36 +0000 (GMT)
Local: Wed, Jan 11 2012 7:48 am
Subject: Re: Can someone explain step by step how one avoid many conditional in forth as described in Moore Fourth essay?
In article <e222444f-1181-4388-bdbd-7824b05c8...@dp8g2000vbb.googlegroups.com>,
Alex McDonald  <b...@rivadpm.com> wrote:

>Search on "speculative execution". It can also be done at the compiler
>level; that doesn't require processor support. Intel's P6 was their
>first chip to support it iirc.

But don't believe any page that says that it is a solution to the
problem of memory latency (on cache misses).  It isn't, and won't
ever be, for any 'normal' language and application.  That's a MUCH
harder task :-(

Regards,
Nick Maclaren.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Terje Mathisen  
View profile  
 More options Jan 11, 8:33 am
Newsgroups: comp.lang.forth, comp.sys.intel, comp.arch
From: Terje Mathisen <"terje.mathisen at tmsw.no">
Date: Wed, 11 Jan 2012 14:33:15 +0100
Local: Wed, Jan 11 2012 8:33 am
Subject: Re: Can someone explain step by step how one avoid many conditional in forth as described in Moore Fourth essay?

That's "Eager Execution" (EE), something which good branch prediction
makes even less of a win: You can never go through more than 2-3
branches like this before the power loss becomes totally horrendous
(running 4-8 threads in parallel, 75-88% wasted resources), while lots
of current code can branch predict past 6-8 branches before the
probability of a branch miss passes 50%, and all those branches could
actually be inside the pipeline at once.

EE should only be considered as a last resource for branches where the
runtime branch history shows very low predictability, i.e. closer to 50%
branch misses.

BTW, back in March 1995 I visited Intel in Santa Clara, in order to meet
some people from the P6 team and to get an engineering sample PentiumPro
machine.

I had to sign an NDA up front, so before doing that I insisted that I'd
be allowed to write down on a whiteboard everything I then knew or had
been able to guess about the architecture, based on nothing but public
sources. I.e. this was to properly delineate what the NDA could cover.

It turned out that I was correct in every particular, except for one
thing: I expected EE because I didn't realize how much they had been
able to improve the branch predictor.

The engineers on the other side of the table were very happy, while some
of the management and legal people seemed quite worried. :-)

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Arnold Doray  
View profile  
 More options Jan 11, 10:03 am
Newsgroups: comp.lang.forth, comp.sys.intel, comp.arch
From: Arnold Doray <inva...@invalid.com>
Date: Wed, 11 Jan 2012 15:03:46 +0000 (UTC)
Local: Wed, Jan 11 2012 10:03 am
Subject: Re: Can someone explain step by step how one avoid many conditional in forth as described in Moore Fourth essay?

I don't think this is how it's done on modern CPUs. They use branch
prediction, which can be as good as 70% - 80% right,if I recall
correctly. When the wrong branch is taken, the entire pipeline needs to
be flushed. That's the reason to avoid conditionals.

But let's say for argument's sake that you're right. There are a few
problems with your scheme:

1) It's slow: because you're tying up another core. So you effectively
have one core less than previously.

2) Also, the switching overhead may be large if it has to be done often.

3) What happens with code containing multiple branches? You can't follow
them all. What to do?

4) Let's suppose you have lots of cores so that (3) is not a big issue.
Still, the number of instructions before the next branching could be less
than your pipeline's depth. So you're wasting the advantage offered by a
deep pipeline.

Thinking about Gavino's questio again, it occurs to me that with
predictive branching, a long loop will almost never flush the pipeline
(because the prediction is done by keeping statistics of the historical
branch choices). So this reason alone (pipeline flushing) is not a good
reason avoid using loops. Perhaps CM's comments were made when the
prediction was not as sophisticated as it is now.

Of course, avoiding conditionals entirely will give a faster solution.
How much faster is the question. Anyone has comments/experience with this?

Cheers,
Arnold


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Andy (Super) Glew  
View profile  
 More options Jan 11, 12:25 pm
Newsgroups: comp.lang.forth, comp.sys.intel, comp.arch
From: "Andy (Super) Glew" <a...@SPAM.comp-arch.net>
Date: Wed, 11 Jan 2012 09:25:29 -0800
Local: Wed, Jan 11 2012 12:25 pm
Subject: Re: Can someone explain step by step how one avoid many conditional in forth as described in Moore Fourth essay?
On 1/11/2012 4:29 AM, Alex McDonald wrote:

Taking both paths on a branch is currently called "eager execution".

Branch prediction is one form of "speculative execution".

P6 did speculative, but not eager.  As, for that matter, did P5 (Pentium).

I don't know of anyone doing full eager execution, although it has been
studied out the bejeezus.  Branch prediction usually beats it.  I
believe an IBM chip did eager ifetch - fetching both sides of a branch -
but did not actually execute, stalled at decoder or therabouts.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Joe keane  
View profile  
 More options Jan 13, 12:47 pm
Newsgroups: comp.lang.forth, comp.sys.intel, comp.arch
From: j...@panix.com (Joe keane)
Date: Fri, 13 Jan 2012 17:47:16 +0000 (UTC)
Local: Fri, Jan 13 2012 12:47 pm
Subject: Re: Can someone explain step by step how one avoid many conditional in forth as described in Moore Fourth essay?
In article <4F0DC609.9020...@SPAM.comp-arch.net>,
Andy (Super) Glew <a...@SPAM.comp-arch.net> wrote:

>I believe an IBM chip did eager ifetch - fetching both sides of a
>branch - but did not actually execute, stalled at decoder or
>therabouts.

Doing some 'early' loads seems like a good idea, assuming that you have
some IF units free.

But that is a *lot* different from the other suggestion, copying the
machine state to a different core.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Andy (Super) Glew  
View profile  
 More options Jan 13, 1:09 pm
Newsgroups: comp.lang.forth, comp.sys.intel, comp.arch
From: "Andy (Super) Glew" <a...@SPAM.comp-arch.net>
Date: Fri, 13 Jan 2012 10:09:03 -0800
Local: Fri, Jan 13 2012 1:09 pm
Subject: Re: Can someone explain step by step how one avoid many conditional in forth as described in Moore Fourth essay?
On 1/13/2012 9:47 AM, Joe keane wrote:

> In article<4F0DC609.9020...@SPAM.comp-arch.net>,
> Andy (Super) Glew<a...@SPAM.comp-arch.net>  wrote:
>> I believe an IBM chip did eager ifetch - fetching both sides of a
>> branch - but did not actually execute, stalled at decoder or
>> therabouts.

> Doing some 'early' loads seems like a good idea, assuming that you have
> some IF units free.

> But that is a *lot* different from the other suggestion, copying the
> machine state to a different core.

Microarchitectures proposed for eager execution:

* using the same pipeline(s) as ordinary non-eager
** e.g. if stalled on the main path, execute on the alternative path
** DEE: e.g. if have gone too far down one path, execute on an
alternative path that has higher relative probability of success than
predicting yet another branch on the current path.

Actually, the DEE technique applies to all forms of eager execution.

* using eager threads on the same CPU core
** although not necessarily the same pipeline(s) - eager threading was
one of the techniques I wanted to investigate on top of my multicluster
multithreading (MCMT) substrate -- although in my experience eager
nearly always loses out to non-eager forms of speculation.

Note that this is not necessarily the same pipeline resources, since in
my terminology an MCMT machine like Bulldozer is really a single CPU
with multiple execution clusters.

* using eager threads of execution on different CPU cores.

Problem is that eager threads seldom last long, so it is hard for eager
threads to overcome the cost of migrating to another core.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Joe keane  
View profile  
 More options Jan 13, 11:16 pm
Newsgroups: comp.lang.forth, comp.sys.intel, comp.arch
From: j...@panix.com (Joe keane)
Date: Sat, 14 Jan 2012 04:16:35 +0000 (UTC)
Local: Fri, Jan 13 2012 11:16 pm
Subject: Re: Can someone explain step by step how one avoid many conditional in forth as described in Moore Fourth essay?
In article <4F10733F.4040...@SPAM.comp-arch.net>,
Andy (Super) Glew <a...@SPAM.comp-arch.net> wrote:

>Problem is that eager threads seldom last long, so it is hard for eager
>threads to overcome the cost of migrating to another core.

Essentially creating a new thread, for every -branch-, falls into the
'that don't sound right'.

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Terje Mathisen  
View profile  
 More options Jan 14, 4:23 am
Newsgroups: comp.lang.forth, comp.sys.intel, comp.arch
From: Terje Mathisen <"terje.mathisen at tmsw.no">
Date: Sat, 14 Jan 2012 10:23:45 +0100
Local: Sat, Jan 14 2012 4:23 am
Subject: Re: Can someone explain step by step how one avoid many conditional in forth as described in Moore Fourth essay?

Joe keane wrote:
> In article<4F10733F.4040...@SPAM.comp-arch.net>,
> Andy (Super) Glew<a...@SPAM.comp-arch.net>  wrote:
>> Problem is that eager threads seldom last long, so it is hard for eager
>> threads to overcome the cost of migrating to another core.

> Essentially creating a new thread, for every -branch-, falls into the
> 'that don't sound right'.

I've thought about this in the context "execute alternating instructions
from both sides of the branch, adding implicit predicates to them", so
that when the branch retires the predicates become known and half the
instructions are cancelled.

The problem is of course that you need to fetch from two paths at the
same time, i.e. you need a pair of virtualized IP registers as well.

The first criterium must be that you only ever do this for code that has
been executed a number of times, and where the branch history has turned
out to be very unpredictable.

Another idea could be to start with the predicted branch, then switch to
the other on the first stall/cache miss or new branch.

Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Paul A. Clayton  
View profile  
 More options Jan 14, 1:10 pm
Newsgroups: comp.lang.forth, comp.sys.intel, comp.arch
From: "Paul A. Clayton" <paaronclay...@gmail.com>
Date: Sat, 14 Jan 2012 10:10:19 -0800 (PST)
Local: Sat, Jan 14 2012 1:10 pm
Subject: Re: Can someone explain step by step how one avoid many conditional in forth as described in Moore Fourth essay?
On Jan 14, 4:23 am, Terje Mathisen <"terje.mathisen at tmsw.no">
wrote:

> Joe keane wrote:
> > In article<4F10733F.4040...@SPAM.comp-arch.net>,
> > Andy (Super) Glew<a...@SPAM.comp-arch.net>  wrote:
> >> Problem is that eager threads seldom last long, so it is hard for eager
> >> threads to overcome the cost of migrating to another core.

> > Essentially creating a new thread, for every -branch-, falls into the
> > 'that don't sound right'.

> I've thought about this in the context "execute alternating instructions
> from both sides of the branch, adding implicit predicates to them", so
> that when the branch retires the predicates become known and half the
> instructions are cancelled.

This reminds me of a (not so useful) idea I posted here
21 Oct 2010, "Move on 'commit' predication?"  Message-ID:
<f3ea2ca0-c6c3-4014-b55b-66880f07f...@e14g2000yqe.googlegroups.com>
http://groups.google.com/group/comp.arch/browse_thread/thread/f9af23c...

It was basically just a (wrong-headed?) microarchitecture
for supporting dynamic predication (of hammock branches)
where one path was less likely but uncertain enough to
justify predication.

(One of the issues of eager execution is recognizing joining.)


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Andy (Super) Glew  
View profile  
 More options Jan 14, 8:15 pm
Newsgroups: comp.lang.forth, comp.sys.intel, comp.arch
From: "Andy (Super) Glew" <a...@SPAM.comp-arch.net>
Date: Sat, 14 Jan 2012 17:15:35 -0800
Local: Sat, Jan 14 2012 8:15 pm
Subject: Re: Can someone explain step by step how one avoid many conditional in forth as described in Moore Fourth essay?
On 1/13/2012 8:16 PM, Joe keane wrote:

> In article<4F10733F.4040...@SPAM.comp-arch.net>,
> Andy (Super) Glew<a...@SPAM.comp-arch.net>  wrote:
>> Problem is that eager threads seldom last long, so it is hard for eager
>> threads to overcome the cost of migrating to another core.

> Essentially creating a new thread, for every -branch-, falls into the
> 'that don't sound right'.

Nobody would seriously propose a new thread for every branch.

You would only consider eager threading for flakey branches, where you
don't know which way they are going to go.  And won't know for a while.

Myself, I think non-threading approaches to eager execution are better.
  E.g. the discussion of short forward branches - emit colde that
computes both sides of such a hammock, with operations like Phi nodes to
bring back rto a single dataflow.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »