Function pointer vs conditional branch

Edward Rutherford

unread,

Jan 4, 2012, 5:20:18 PM1/4/12

to

Hello

As a general rule-of-thumb: Are function pointers more efficient than if
statements, if the same path will be taken throughout the course of the
program?

E.g. If I was going to repeatedly render an image in one of three ways,
and the method of rendering it gets selected at the beginning of
execution: Would it be more efficient to use a function pointer instead
of an if statement?

Thanks.

Kaz Kylheku

unread,

Jan 4, 2012, 5:46:24 PM1/4/12

to

An if statement selects a body of code which is still within the same lexical
scope, whereas a function call branches into a different scope. So the program
organization perspective is the most important. Obviously, if you have to
select from among multiple execution branches, which all require access to the
same set of local variables, you cannot use a function call (unless you go out
of your way to emulate lexical closures).

Are you asking which of these is better:

if (condition1) {
func1(...);
} else if (condition2) {
func2(...);
} else {
func3(...);
}

versus:

if (condition1) {
fptr = func1;
} else if (condition2) {
fptr = func2;
} else {
fptr = func3;
}

(*fptr)(...);

Stated in this way, the rewrite is unlikely to be fruitful in obtaining a
performance improvement, and may even be worse.

To obtain the benefit of using the function pointer, you have to turn your
attention into how that pointer is obtained.

Perhaps the conditions are numeric and can be put into a static dispatch
table, function pointers.

However, note that compilers do these kinds of optimizations. Code like:

switch (x) {
case 0:
func0(...);
break;
case 1:
func1(...);
break;
...
case N:
func1(...);
break;
default:
foo();
}

may well be compiled into an indirect jump through a hidden table indexed by N:

if (x >= 0 && x <= N)
hidden_table[x](...)
else
foo();

so unless you verify that the compiler isn't doing it, it wouldn't be worth it
to do it manually (other than for improving the code).

Such a table dispatch is almost certainly going to be faster than a naive
compile of the switch, because on modern CPU's it's faster to fetch some data
from a table and branch once than to go through two branches (as a rule of
thumb). Branches disrupt pipelines, requiring techniques like scheduling
instructions in branch delay slots, and branch prediction.

Another question is, how often does the three-way decision have to be
re-evaluated? Is it always different for each rendering job?

If the same choice is applied to multiple rendering jobs, it may help to
represent that by a context structure which has the function pointer in it,
already initialized to point to the correct function whenever the variables
are updated which influence the decision. There is no decision to make at
render time: just retrieve the pointer and call it:

render_context->render(render_context, ...);

How complicated are the conditions being evaluated to make the three-way
switch? Does it boil down to dispatch based on a numeric code, or something
else?

What is more frequently occuring: rendering jobs, or changes to the conditions
that influence the three-way choice?

Ian Collins

unread,

Jan 4, 2012, 5:47:39 PM1/4/12

to

There isn't a general rule-of-thumb answer, there ate too many variables
to consider.

--
Ian Collins

Eric Sosman

unread,

Jan 4, 2012, 8:15:07 PM1/4/12

to

On 1/4/2012 5:20 PM, Edward Rutherford wrote:
> Hello
>
> As a general rule-of-thumb: Are function pointers more efficient than if
> statements, if the same path will be taken throughout the course of the
> program?

"I have it on the best authority." -- Arthur

> E.g. If I was going to repeatedly render an image in one of three ways,
> and the method of rendering it gets selected at the beginning of
> execution: Would it be more efficient to use a function pointer instead
> of an if statement?

Mu.

Here's a counter-question: Does it matter? Estimate, pray,
the amount of time spent dispatching to one rendering function or
another versus the amount of time spent in that function actually
doing the rendering. Let's see: A smallish 640x480 image has 50K
pixels, depending on image characteristics that's anywhere from
50K to 200K values to compute, 50K pixel-to-pixel navigations to
figure out ("Have I hit the end of a row?"), ... And you're
worried about the once-per-rendering cost of deciding which
function to call?

I've mentioned an old friend's metaphor more than once, but
I think it applies rather strongly here: You're cleaning the bottle
caps and cigarette butts off the beach, to make the sand nice and
neat around the whale carcases.

--
Eric Sosman
eso...@ieee-dot-org.invalid

Kaz Kylheku

unread,

Jan 4, 2012, 8:25:15 PM1/4/12

to

On 2012-01-05, Eric Sosman <eso...@ieee-dot-org.invalid> wrote:
> Here's a counter-question: Does it matter? Estimate, pray,
> the amount of time spent dispatching to one rendering function or
> another versus the amount of time spent in that function actually
> doing the rendering. Let's see: A smallish 640x480 image has 50K
> pixels, depending on image characteristics that's anywhere from
> 50K to 200K values to compute, 50K pixel-to-pixel navigations to
> figure out ("Have I hit the end of a row?"), ... And you're
> worried about the once-per-rendering cost of deciding which
> function to call?

My reaction (hitherto unwritten) was that he must be just handing the images
off to some very fast hardware to care about the dispatch speed.

> I've mentioned an old friend's metaphor more than once, but
> I think it applies rather strongly here: You're cleaning the bottle
> caps and cigarette butts off the beach, to make the sand nice and
> neat around the whale carcases.

That little beauty doesn't apply that well; if you keep using it, you will soon
wear a hole right through!

Here, we may have a case of borrowing a motorcycle for the last 50 meters
of a marathon.

(Well, that might be attractive, depending on what shape you're in, but
it won't make a big difference to your time.)

BartC

unread,

Jan 4, 2012, 8:25:23 PM1/4/12

to

"Edward Rutherford" <edward.p.r...@REMOVETHIS.gmail.com> wrote in
message news:je2jb2$886$1...@speranza.aioe.org...

How many times will that if statement or function dispatch get executed? For
example, once per image, once per row, or once per pixel?

--
Bartc

Eric Sosman

unread,

Jan 4, 2012, 8:36:27 PM1/4/12

to

On 1/4/2012 8:25 PM, Kaz Kylheku wrote:
> On 2012-01-05, Eric Sosman<eso...@ieee-dot-org.invalid> wrote:

>> [...]

>> I've mentioned an old friend's metaphor more than once, but
>> I think it applies rather strongly here: You're cleaning the bottle
>> caps and cigarette butts off the beach, to make the sand nice and
>> neat around the whale carcases.
>
> That little beauty doesn't apply that well; if you keep using it, you will soon
> wear a hole right through!

I've also used one about waxing your car to improve fuel economy:
Does it help by making the car slipperier, or hurt by increasing the
weight, and how do these effects compare to those of the underinflated
tires, the empty boat trailer dragging along behind, and the driver's
lamentable fondness for jackrabbit starts?

> Here, we may have a case of borrowing a motorcycle for the last 50 meters
> of a marathon.

Nice! I'll add it to the repertoire, if you don't mind/

--
Eric Sosman
eso...@ieee-dot-org.invalid

Gene

unread,

Jan 4, 2012, 8:57:32 PM1/4/12

to

On Jan 4, 5:20 pm, Edward Rutherford

Rendering an image will involve billions of instructions. Branching
to the start of the rendering algorithm is one or two instructions.
The choice doesn't matter.

Now if the branch is in the inner loop being executed once per pixel,
it _might_ make a small, very processor-dependent difference. Modern
processors often have some kind of branch prediction or split
execution path to optimize pipeline behavior for conditional branch
instructions. These mechanisms don't normally work for function
pointers. So for a 2- or 3-way branch, you'd probably want to first
try the natural if...then...else. As always, profile and tweak only
if you're sure the simplest coding isn't fast enough.

Gordon Burditt

unread,

Jan 5, 2012, 12:54:40 AM1/5/12

to

> As a general rule-of-thumb: Are function pointers more efficient than if
> statements, if the same path will be taken throughout the course of the
> program?

Calling via a function pointer involves argument passing, setting
up a new stack frame (yes, we know C doesn't require a hardware
stack), and a call instruction (which on some platforms gets
complicated).

An if statement involves a conditional branch.

I'd say the if statement usually wins, *ESPECIALLY* if you have to
pass a dozen or two local variables (or pointers to them) to get
them in scope of the function you called.

> E.g. If I was going to repeatedly render an image in one of three ways,
> and the method of rendering it gets selected at the beginning of
> execution: Would it be more efficient to use a function pointer instead
> of an if statement?

If you are going to render an image, who *CARES* how efficient the
calling sequence is? Worry about the image renderer, which probably
takes a billion clock cycles rather than a few dozen.

If you are trying to travel overseas as fast as possible, do you
worry about polishing your car key to reduce wind resistance as you
insert the key into the lock? Or do you worry about selecting the
best airline schedule, and going through security in a skimpy bikini
to reduce search time?

Kaz Kylheku

unread,

Jan 5, 2012, 1:02:42 AM1/5/12

to

On 2012-01-05, Eric Sosman <eso...@ieee-dot-org.invalid> wrote:
> On 1/4/2012 8:25 PM, Kaz Kylheku wrote:
>> On 2012-01-05, Eric Sosman<eso...@ieee-dot-org.invalid> wrote:
>>> [...]
>>> I've mentioned an old friend's metaphor more than once, but
>>> I think it applies rather strongly here: You're cleaning the bottle
>>> caps and cigarette butts off the beach, to make the sand nice and
>>> neat around the whale carcases.
>>
>> That little beauty doesn't apply that well; if you keep using it, you will soon
>> wear a hole right through!
>
> I've also used one about waxing your car to improve fuel economy:
> Does it help by making the car slipperier, or hurt by increasing the
> weight, and how do these effects compare to those of the underinflated
> tires, the empty boat trailer dragging along behind, and the driver's
> lamentable fondness for jackrabbit starts?

Washing and waxing the car smoothes out the engine idle, lowers the exhaust
rattle by 15 dB and causes the clutch to engage like it's leather-padded.

88888 Dihedral

unread,

Jan 5, 2012, 2:25:04 AM1/5/12

to

Edward Rutherford於 2012年1月5日星期四UTC+8上午6時20分18秒寫道：

I'll give an example to illustrate my understanding of the questions in your
post in pseudo code.

//Filter all gray levels equual or above 170 to be 255 and those bellows to 0.

For each pixel in the image:
if gray>=170 store 255 ;
else store 0;

For each pixel in the image:
store map[gray];

Which will work better?

88888 Dihedral

unread,

Jan 5, 2012, 2:38:53 AM1/5/12

to

Edward Rutherford於 2012年1月5日星期四UTC+8上午6時20分18秒寫道：

I'll give an example in pseudo code here for problems in your post.

How to turn a gray level image into a binary one?

The rule: given a Th=170, if the gray level>=Th just make it 255 else 0

//Method1:
for each pixel:
if gray>=Th store 255; else store 0;

//Method2:
for(i=0;i<Th;i++) map[i]=0;
for(; i<256;i++) map[i]=255;
for each pixel in the image:
store map[gay];

Both methods will work.

Joe Pfeiffer

unread,

Jan 5, 2012, 3:47:06 PM1/5/12

to

I can't imagine it would make any noticeable difference. The work in
rendering the image will overwhelm any difference in choosing which
function to call so massively that the choice won't even show up in the
noise.

Choose the one that better fits the overall structure of the program and
your own coding style.

Nobody

unread,

Jan 5, 2012, 4:29:55 PM1/5/12

to

On Wed, 04 Jan 2012 22:20:18 +0000, Edward Rutherford wrote:

> As a general rule-of-thumb: Are function pointers more efficient than if
> statements, if the same path will be taken throughout the course of the
> program?

I suspect that the conditionals will be more efficient in most cases.

A function pointer tends to act as a barrier to optimisation, as the
compiler will normally have to assume that:

a) the pointer can point to any function of the correct type, and
b) the function itself can be called from anywhere.

Even if everything (the functions and the pointer) is limited to a single
translation unit (i.e. none of them have external linkage), the compiler
is unlikely to take advantage of this (at least, I've never seen a
compiler do this).

OTOH, when you have code in a particular context, the compiler can
optimise the code for that context.

Dr Nick

unread,

Jan 6, 2012, 1:44:49 AM1/6/12

to

Nobody <nob...@nowhere.com> writes:

> On Wed, 04 Jan 2012 22:20:18 +0000, Edward Rutherford wrote:
>
>> As a general rule-of-thumb: Are function pointers more efficient than if
>> statements, if the same path will be taken throughout the course of the
>> program?
>
> I suspect that the conditionals will be more efficient in most cases.

I suspect we don't know enough. I'm sure this is the case if it's a
choice between doing a simple "if x" do this within a 20 iterations
loop, or do that on one hand and doing the same thing once before the
loop to set a function pointer, the if will win.

If it's a nest of 50 if statements to pick one of 17 different
operations, and some of the ifs involve calculating the same thing each
time, and there are billions of iterations of the loop, the function
pointer might win.

Then, of course, you pull the nest of ifs out of the loop, case it
inside the loop and let the compiler build a table if it thinks it's the
best thing.

The if is almost certainly easier to read, and that's a form of
efficiency.
--
Online waterways route planner | http://canalplan.eu
Plan trips, see photos, check facilities | http://canalplan.org.uk

BGB

unread,

Jan 6, 2012, 12:41:56 PM1/6/12

to

On 1/4/2012 4:20 PM, Edward Rutherford wrote:
> Hello
>
> As a general rule-of-thumb: Are function pointers more efficient than if
> statements, if the same path will be taken throughout the course of the
> program?
>

if placed head-to-head, the 'if' will likely be significantly faster.
this is because the 'if' is a simple conditional jump, and the CPU can
typically branch-predict it.

an indirect function call involves call/return magic
(creating/destroying stack frames, passing any arguments, ...), and also
can't generally be branch-predicted (so, one gets a pipeline spill every
time the call is made).

the function pointer can win though if it can save going through a bunch
of winding logic (say, the "choosing what to do" part is fairly
expensive, and the function pointer can get more directly to the goal).

an example of the above:
an x86 interpreter, where decoding instructions is fairly expensive;
if one caches decoded instructions with a function pointer pointing to
the logic for the operation, it can be faster than a small mountain of
'if' and 'switch' statements.

another case I had found was when dealing with dynamically-typed method
dispatch, where:
the arguments could be passed to the called function in one of several
ways (as ordinary function arguments, as a packed array, as an array of
dynamically-typed references, as a list, ...);
multiple layers of abstraction were involved (the logic wound through
about 4 different libraries);
there were multiple types of ways the method could be dispatched (native
vs passed off to an interpreter vs ...);
also, types were based on strings, so this would involve looking up the
type for an object a number of times, and a bunch of calls to "strcmp()";
...

caching a function pointer to the specific dispatch logic in the vtable
(only for certain common cases) was able to cut a 500ns operation down
to more about 15ns (with a 2.8GHz AMD CPU).

typically it only really matters if this sort of thing is off in an
inner-loop somewhere.

a big expensive operation, if done rarely, will often cost hardly anything.

> E.g. If I was going to repeatedly render an image in one of three ways,
> and the method of rendering it gets selected at the beginning of
> execution: Would it be more efficient to use a function pointer instead
> of an if statement?
>

as others have noted, in such a scenario is shouldn't likely matter.

if it is a long inner loop, and something like:
for(...)
{
if(X) { A... }
else if(Y) { B... }
else { C... }
}

a slight speedup may be gained from something like:

if(X)
{
for(...) { A... }
}
else if(Y)
{
for(...) { B... }
}
else
{
for(...) { C... }
}

dropping a function pointer call in the middle of a loop could very well
cost more, and it is very well possible that the later form could be
slower (say, if the checks are very simple and the CPU's
branch-predictor does its thing).

Kaz Kylheku

unread,

Jan 6, 2012, 1:33:55 PM1/6/12

to

On 2012-01-06, BGB <cr8...@hotmail.com> wrote:
> an indirect function call involves call/return magic
> (creating/destroying stack frames, passing any arguments, ...), and also
> can't generally be branch-predicted

You can safely predict that an indirect function call is always taken,
since it is unconditional.

BartC

unread,

Jan 6, 2012, 4:29:43 PM1/6/12

to

"BGB" <cr8...@hotmail.com> wrote in message
news:je7bph$g44$1...@news.albasani.net...

> On 1/4/2012 4:20 PM, Edward Rutherford wrote:

>> E.g. If I was going to repeatedly render an image in one of three ways,
>> and the method of rendering it gets selected at the beginning of
>> execution: Would it be more efficient to use a function pointer instead
>> of an if statement?
>
> as others have noted, in such a scenario is shouldn't likely matter.

The OP hasn't clarified exactly what he means. Perhaps selecting the render
mode simply means it is known at the start of the process.

That mode might still need checking on a per-line, per-pixel, or per-byte
basis, or maybe it's per primitive.

In fact I think it is likely the OP knows perfectly well that a single
if-statement or indirect call per render (even if it's per frame, and there
are many frames per second), has no significance, but it is not practical to
have several independent lots of render subsystems, which only differ in one
small rendering detail. But I might be wrong...

--
Bartc

Willem

unread,

Jan 6, 2012, 4:41:42 PM1/6/12

to

Kaz Kylheku wrote:
) On 2012-01-06, BGB <cr8...@hotmail.com> wrote:
)> an indirect function call involves call/return magic
)> (creating/destroying stack frames, passing any arguments, ...), and also
)> can't generally be branch-predicted
)
) You can safely predict that an indirect function call is always taken,
) since it is unconditional.

But you can't predict *where* it will jump, which is the point.

The purpose of branch prediction is to be able to pipeline the instructions
after the branch.

SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT

Stephen Sprunk

unread,

Jan 6, 2012, 5:08:07 PM1/6/12

to

On 06-Jan-12 15:41, Willem wrote:
> Kaz Kylheku wrote:
> ) On 2012-01-06, BGB <cr8...@hotmail.com> wrote:
> )> an indirect function call involves call/return magic
> )> (creating/destroying stack frames, passing any arguments, ...), and also
> )> can't generally be branch-predicted
> )
> ) You can safely predict that an indirect function call is always taken,
> ) since it is unconditional.
>
> But you can't predict *where* it will jump, which is the point.
>
> The purpose of branch prediction is to be able to pipeline the instructions
> after the branch.

That's the point of the BTB in modern CPUs: to predict the target of
indirect branches. How successful they are, of course, depends on the
program; for many common scenarios (eg. virtual method calls), they work
very well, though there are some where they're known to be completely
ineffective.

S

--
Stephen Sprunk "God does not play dice." --Albert Einstein
CCIE #3723 "God is an inveterate gambler, and He throws the
K5SSS dice at every possible opportunity." --Stephen Hawking

Kaz Kylheku

unread,

Jan 6, 2012, 5:35:56 PM1/6/12

to

On 2012-01-06, Willem <wil...@toad.stack.nl> wrote:
> Kaz Kylheku wrote:
> ) On 2012-01-06, BGB <cr8...@hotmail.com> wrote:
> )> an indirect function call involves call/return magic
> )> (creating/destroying stack frames, passing any arguments, ...), and also
> )> can't generally be branch-predicted
> )
> ) You can safely predict that an indirect function call is always taken,
> ) since it is unconditional.
>
> But you can't predict *where* it will jump, which is the point.

really?

load r1, <whatever>
add r2, r3
jmp [r1]

We already know where the jmp [r1] will go as soon as r1 is loaded, and can start
prefetching instructions as soon as the value of r1 is stable.

> The purpose of branch prediction is to be able to pipeline the instructions
> after the branch.

That is called pipeline prefetch. Branch prediction is an educated guess
whether or not a conditional jump will be taken, or fall through.

BGB

unread,

Jan 6, 2012, 6:00:42 PM1/6/12

to

On 1/6/2012 4:35 PM, Kaz Kylheku wrote:
> On 2012-01-06, Willem<wil...@toad.stack.nl> wrote:
>> Kaz Kylheku wrote:
>> ) On 2012-01-06, BGB<cr8...@hotmail.com> wrote:
>> )> an indirect function call involves call/return magic
>> )> (creating/destroying stack frames, passing any arguments, ...), and also
>> )> can't generally be branch-predicted
>> )
>> ) You can safely predict that an indirect function call is always taken,
>> ) since it is unconditional.
>>
>> But you can't predict *where* it will jump, which is the point.
>
> really?
>
> load r1,<whatever>
> add r2, r3
> jmp [r1]
>
> We already know where the jmp [r1] will go as soon as r1 is loaded, and can start
> prefetching instructions as soon as the value of r1 is stable.
>

in a use, such as, say:
...
mov eax, [ebp-...]
mov ecx, [eax]
call dword [ecx+0x44]

the CPU may not know the address until the memory loads get done (in the
call instruction), so the pipeline may not get filled.

a lot though depends on how likely it is that one will endlessly jump to
the same address as before, which the CPU can optimize for somewhat.

>> The purpose of branch prediction is to be able to pipeline the instructions
>> after the branch.
>
> That is called pipeline prefetch. Branch prediction is an educated guess
> whether or not a conditional jump will be taken, or fall through.

never-mind any exactness of terminology or not...

I was meaning in a more general sense, where the branch target is
predicted and pipelined (generally) rather than the specific case of
predicting a branch-taken vs not-taken.

Stephen Sprunk

unread,

Jan 6, 2012, 6:08:50 PM1/6/12

to

On 06-Jan-12 16:35, Kaz Kylheku wrote:
> On 2012-01-06, Willem <wil...@toad.stack.nl> wrote:
>> Kaz Kylheku wrote:
>> ) On 2012-01-06, BGB <cr8...@hotmail.com> wrote:
>> )> an indirect function call involves call/return magic
>> )> (creating/destroying stack frames, passing any arguments, ...), and also
>> )> can't generally be branch-predicted
>> )
>> ) You can safely predict that an indirect function call is always taken,
>> ) since it is unconditional.
>>
>> But you can't predict *where* it will jump, which is the point.
>
> really?
>
> load r1, <whatever>
> add r2, r3
> jmp [r1]
>
> We already know where the jmp [r1] will go as soon as r1 is loaded, and can start
> prefetching instructions as soon as the value of r1 is stable.

True, but we can't know to start fetching instructions at [r1] until the
jump has at least been decoded. And are decoders smart enough to try to
resolve the jump immediately and send it to the prefetcher rather than
send it to an execution unit? And does that matter, given it may be
tens to hundreds of cycles until the load completes and the target is
known--and then there may be tens to hundreds more cycles until the
first instruction at the target is fetched and decoded?

BGB

unread,

Jan 6, 2012, 6:15:13 PM1/6/12

to

On 1/6/2012 3:29 PM, BartC wrote:
> "BGB" <cr8...@hotmail.com> wrote in message
> news:je7bph$g44$1...@news.albasani.net...
>> On 1/4/2012 4:20 PM, Edward Rutherford wrote:
>
>>> E.g. If I was going to repeatedly render an image in one of three ways,
>>> and the method of rendering it gets selected at the beginning of
>>> execution: Would it be more efficient to use a function pointer instead
>>> of an if statement?
>>
>> as others have noted, in such a scenario is shouldn't likely matter.
>
> The OP hasn't clarified exactly what he means. Perhaps selecting the
> render mode simply means it is known at the start of the process.
>

this is why most of the rest of the post tried to address general tradeoffs.

> That mode might still need checking on a per-line, per-pixel, or
> per-byte basis, or maybe it's per primitive.
>

or per-scene...

> In fact I think it is likely the OP knows perfectly well that a single
> if-statement or indirect call per render (even if it's per frame, and
> there are many frames per second), has no significance, but it is not
> practical to have several independent lots of render subsystems, which
> only differ in one small rendering detail. But I might be wrong...
>

dunno...

I think for whatever reason many people think "the video card is very
fast, yet the CPU very slow", and could very well actually think that an
if statement here or there can actually adversely effect performance.

from personal experience though, the GPU tends to get bogged down by
things like fill-rate and similar well before one has to start worrying
about the CPU getting overloaded by the occasional if-statement.

much complexity in a 3D engine still ends up being needed mostly to
figure out to (reasonably efficiently) cull the scene prior to passing
the geometry to the video card, as fill-rate can still be a bit of a killer.

(yes, yes, WRT numerical processing, the video card is much faster than
the CPU, but it spends a good deal more time drawing individual pixels
and running shader programs for each pixel and stuff, so it mostly
averages out, and games/... still tend to be much more video-card bound
than CPU bound).

so, maybe the OP was thinking that the time to execute an if statement
was larger than the time for the video card to draw a bunch of geometry
or similar?...

or such...

Kaz Kylheku

unread,

Jan 6, 2012, 6:44:40 PM1/6/12

to

You mean that the if there is inadequate prefetch in one instruction,
it may interfere with prefetch in a later instruction?

Earth-shattering!

Kaz Kylheku

unread,

Jan 6, 2012, 6:49:12 PM1/6/12

to

On 2012-01-06, Stephen Sprunk <ste...@sprunk.org> wrote:
> On 06-Jan-12 16:35, Kaz Kylheku wrote:
>> On 2012-01-06, Willem <wil...@toad.stack.nl> wrote:
>>> Kaz Kylheku wrote:
>>> ) On 2012-01-06, BGB <cr8...@hotmail.com> wrote:
>>> )> an indirect function call involves call/return magic
>>> )> (creating/destroying stack frames, passing any arguments, ...), and also
>>> )> can't generally be branch-predicted
>>> )
>>> ) You can safely predict that an indirect function call is always taken,
>>> ) since it is unconditional.
>>>
>>> But you can't predict *where* it will jump, which is the point.
>>
>> really?
>>
>> load r1, <whatever>
>> add r2, r3
>> jmp [r1]
>>
>> We already know where the jmp [r1] will go as soon as r1 is loaded, and can start
>> prefetching instructions as soon as the value of r1 is stable.
>
> True, but we can't know to start fetching instructions at [r1] until the
> jump has at least been decoded.

It is statically fixed in the code that it this instruction fetches through r1.
The identity of r1 won't change.

All branch-prediction/prefetch related optimizations in the CPU have to know
/something/ about an instruction ahead of time, before that instruction is
fetched and decoded. Special state info is maintained for that purpose,
right?

jacob navia

unread,

Jan 6, 2012, 6:55:55 PM1/6/12

to

Le 07/01/12 00:49, Kaz Kylheku a écrit :

>
> It is statically fixed in the code that it this instruction fetches through r1.
> The identity of r1 won't change.
>

Only in statically fixed calls, i.e. when you write
call someFunction
or, in C
someFunction();

But we are speaking about dynamically loaded target address that is
*NOT* statically fixed as when you write in C

FnPtr fn

fn = GetFnAddress();
fn(34);

or

FnPtr fn[10];

(*fn[idx])(34);

Kaz Kylheku

unread,

Jan 6, 2012, 7:07:25 PM1/6/12

to

On 2012-01-06, jacob navia <ja...@spamsink.net> wrote:
> Le 07/01/12 00:49, Kaz Kylheku a écrit :
>>
>> It is statically fixed in the code that it this instruction fetches through r1.
>> The identity of r1 won't change.
>>
>
> Only in statically fixed calls, i.e. when you write
> call someFunction
> or, in C
> someFunction();
>
> But we are speaking about dynamically loaded target address that is
> *NOT* statically fixed as when you write in C

No, I mean hat the identity of the register r1 will not change;
of course r1 has different contents in different invocations
of the code. But we can know that "this instruction branches
to the address of r1" without having to decode the
instruction every time. That decoded information can be cached in some
prefetch-related data structure in the processor.

Michael Angelo Ravera

unread,

Jan 7, 2012, 2:00:11 AM1/7/12

to

On Wednesday, January 4, 2012 2:20:18 PM UTC-8, Edward Rutherford wrote:
> Hello
>
> As a general rule-of-thumb: Are function pointers more efficient than if
> statements, if the same path will be taken throughout the course of the
> program?
>

> E.g. If I was going to repeatedly render an image in one of three ways,
> and the method of rendering it gets selected at the beginning of
> execution: Would it be more efficient to use a function pointer instead
> of an if statement?

The main improvement may be to readability. It may be cleaner to work out which function to call once and just call it by pointer. The cost of an instruction to make an indirect call to a function or to make a conditional branch will be approximately equal. Either could be more effecient.

christian.bau

unread,

Jan 7, 2012, 9:14:49 AM1/7/12

to

On Jan 6, 9:41 pm, Willem <wil...@toad.stack.nl> wrote:

> But you can't predict *where* it will jump, which is the point.

Of course you can. The simplest prediction is "the indirect jump
instruction will jump to the same place as it did the last time". And
just as with conditional branch prediction, the processor can start
executing instructions, and if the prediction turns out to be wrong,
the processor throws the results away and starts from the correct
place.

BartC

unread,

Jan 7, 2012, 9:44:49 AM1/7/12

to

"christian.bau" <christ...@cbau.wanadoo.co.uk> wrote in message
news:9f1b15e9-cc29-4524...@m4g2000vbc.googlegroups.com...

> On Jan 6, 9:41 pm, Willem <wil...@toad.stack.nl> wrote:
>
>> But you can't predict *where* it will jump, which is the point.
>
> Of course you can. The simplest prediction is "the indirect jump
> instruction will jump to the same place as it did the last time". And
> just as with conditional branch prediction,

A conditional branch has a choice of two destinations.

The indirect call (it's not even a jump) has a choice of millions.

Even it is actually the same as last time, with the instructions after the
call possibly depending on new values for stack and frame pointers, what can
be done might be limited compared with a branch prediction.

> the processor can start
> executing instructions, and if the prediction turns out to be wrong,
> the processor throws the results away and starts from the correct
> place.

Hopefully it won't go too far in executing those instructions (such as
overwriting some essential global..).

--
Bartc

Robert Wessel

unread,

Jan 7, 2012, 3:37:56 PM1/7/12

to

The point is that branch target prediction *is* implemented on many
processors, and is highly successful for many workload, particularly
the type where you have a loop around an indirect call, and the
computation of that target can be hoisted above the loop. Things
like needing to wait for parameters to be computed is taken care of by
the ordinary OoO mechanisms. Calls to (C++) virtual functions, which
are quite similar to the case being discussed, are a major reason for
branch target prediction.

Kaz Kylheku

unread,

Jan 7, 2012, 3:42:27 PM1/7/12

to

On 2012-01-07, Robert Wessel <robert...@yahoo.com> wrote:
> the ordinary OoO mechanisms. Calls to (C++) virtual functions, which
> are quite similar to the case being discussed, are a major reason for
> branch target prediction.

Also, shared library calls, which are another example of late binding.

Joe keane

unread,

Jan 7, 2012, 4:04:58 PM1/7/12

to

In article <je2jb2$886$1...@speranza.aioe.org>,

Edward Rutherford <edward.p.r...@REMOVETHIS.gmail.com> wrote:
>E.g. If I was going to repeatedly render an image in one of three ways,
>and the method of rendering it gets selected at the beginning of
>execution: Would it be more efficient to use a function pointer instead
>of an if statement?

Take a look at optimizing switches, it's rather similar.

e.g.

if (ind == 0)
do_ind_0();
else if (ind < 3)
{
if (ind != 1)
do_ind_2();
else
do_ind_1();
}
else
{
...
}

versus:

if (ind < TOOBIG)
_asm('jmp _hdntbl[$ind]');
else
goto does_not_compute;

Stephen Sprunk

unread,

Jan 7, 2012, 5:47:41 PM1/7/12

to

I thought that case was handled with (load-time or lazy) relocation
and/or position-independent code.

Stephen Sprunk

unread,

Jan 7, 2012, 6:02:11 PM1/7/12

to

On 07-Jan-12 08:44, BartC wrote:
> "christian.bau" <christ...@cbau.wanadoo.co.uk> wrote in message
> news:9f1b15e9-cc29-4524...@m4g2000vbc.googlegroups.com...
>> On Jan 6, 9:41 pm, Willem <wil...@toad.stack.nl> wrote:
>>> But you can't predict *where* it will jump, which is the point.
>>
>> Of course you can. The simplest prediction is "the indirect jump
>> instruction will jump to the same place as it did the last time". And
>> just as with conditional branch prediction,
>
> A conditional branch has a choice of two destinations.
>
> The indirect call (it's not even a jump) has a choice of millions.

... and that's why modern processors have a Branch Target Buffer, which
tracks the last few targets of each indirect branch. When the decoder
finds an indirect branch, it looks up that branch in the BTB, predicts
the target and continues on from there.

Note that failing to predict the target would stall the CPU anyway until
the branch was resolved; an incorrect prediction resulting in a pipeline
flush is no worse, but there's always the possibility the prediction
will be right and that stall/flush can be averted.

> Even it is actually the same as last time, with the instructions after
> the call possibly depending on new values for stack and frame pointers,
> what can be done might be limited compared with a branch prediction.

... just as if the branch didn't exist. That's standard OoO stuff,
nothing special for indirect branches.

>> the processor can start executing instructions, and if the prediction
>> turns out to be wrong, the processor throws the results away and
>> starts from the correct place.
>
> Hopefully it won't go too far in executing those instructions (such as
> overwriting some essential global..).

Of course not; nothing is committed until an instruction reaches the end
of the pipeline. If a branch misprediction is detected (or various
other OoO hazards), the pipeline is flushed--and the processor stalls
while the pipeline is refilled.

Robert Wessel

unread,

Jan 8, 2012, 2:49:17 AM1/8/12

to

On Sat, 07 Jan 2012 16:47:41 -0600, Stephen Sprunk
<ste...@sprunk.org> wrote:

>On 07-Jan-12 14:42, Kaz Kylheku wrote:
>> On 2012-01-07, Robert Wessel <robert...@yahoo.com> wrote:
>>> the ordinary OoO mechanisms. Calls to (C++) virtual functions, which
>>> are quite similar to the case being discussed, are a major reason for
>>> branch target prediction.
>>
>> Also, shared library calls, which are another example of late binding.
>
>I thought that case was handled with (load-time or lazy) relocation
>and/or position-independent code.

Usually the calls to a shared library compile into "call [dword ptr
address]" (x86 - IOW an indirect call via a pointer in memory) or
"load R1,LocalWordWithAddress / call r2,r1" (RISC) or similar. The
word containing the address of the shared library entry point is
patched by the loader, but it's still usually an indirect call.

As a counterexample, at least one x86 environment generates a long
JMP, which is the target of the shared library CALLs in the
application, and that JMP is what's patched by the loader to jump to
the actual entry point.

It would be possible, at least on some architectures, to patch the
original call directly, but for whatever reason, that's not
particularly popular.

Kaz Kylheku

unread,

Jan 8, 2012, 3:35:05 AM1/8/12

to

On 2012-01-08, Robert Wessel <robert...@yahoo.com> wrote:
> It would be possible, at least on some architectures, to patch the
> original call directly, but for whatever reason, that's not
> particularly popular.

That reason would be that if you treat the code as immutable, you can
map the same pages into multiple address spaces (a big part of the "shared" in
"shared object").