Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Pointing to a variable slower than a variable?

63 views
Skip to first unread message

JiiPee

unread,
Sep 12, 2015, 12:05:49 PM9/12/15
to
Was coding and started to think about this:

1)
struct A
{
int a;
void change() { a = 5; }
};

2)
struct X
{
int a;
};

struct A
{
X x;
void change() { x.a = 5; }
};

Are 1) and 2) change()-calls (100%) equally fast? 2) is accessing
a-variable by an object x; does it slow down (a bit) things?

JiiPee

unread,
Sep 12, 2015, 12:16:48 PM9/12/15
to
struct X
{
int a;
};

struct A
{
X x;
void change() { x.a = 5; }
};

okey, the reason am asking this is because I would like the A class not
being able to modify directly the a-variable (which would be private in
X). So I would rather put a to another class as a private member and
create a function to modify it.

is there another way to do the same thing; to hide the a from class A?

anyway, because speed is important here (doing millions of calculations
per second) i want to think about the speed as well.

JiiPee

unread,
Sep 12, 2015, 12:29:40 PM9/12/15
to
2)
struct X
{
int a;
};

struct A
{
X x;
void use_a_alot()
};

Also, if x.a is slower than having a-member in A, is it faster to copy
the x.a to a temporary variable if using it a lot... like this:
void A::use_a_alot()
{
int temp = x.a;
//... and then use temp like 10 times in the code
// ,....using temp...
}

or is as fast to use x.a?


Öö Tiib

unread,
Sep 12, 2015, 12:39:40 PM9/12/15
to
It is question about quality of implementation. C++ standard
does not have requirements about it. By my experience with real
C++ compilers there can be difference when the code is compiled
for debugging but there are no difference when the code is
compiled and optimized for releasing as application, so struct
that contains one int is processed as fast at int itself.

Note that the difference is rather important. Always profile
the release versions since the debug versions can be orders of
magnitude slower.

Öö Tiib

unread,
Sep 12, 2015, 12:57:59 PM9/12/15
to
In practice also array of single element is as quick as just element so
following code is as fast as usage of bare 'int a' member of 'A' in
practice:

3)

struct X
{
int a[1];
};

struct A
{
X x[1];
void use_a0_of_x0_alot();
};

However nothing of it is guaranteed by standard.

Richard Damon

unread,
Sep 12, 2015, 3:02:25 PM9/12/15
to
If you look at the assembly code that would typically be generated from
the two expressions ( a being a member of A or a being a member of X
being a member of A ) they would be the same, as a is just a chunk of
memory at a fixed offset to the base address of the object.

Now, if you change a to be private in X, then A can't write x.a = 5, so
you will need a function in X to do the operation.

Maybe something like:

struct X
{
void change() { a = 5; }
private:
int a;
}

struct A
{
X x;
void change() { x.change(); }
}


Now, but the description of the abstract machine, we have some
additional operations, for example, A:change() needs to compute the
address of x for instance, to pass to X:change(). Since everything is
inline, it is likely that this will still get converted (at least when
optimization is enabled) to the same code as before.

JiiPee

unread,
Sep 12, 2015, 3:27:33 PM9/12/15
to
On 12/09/2015 20:01, Richard Damon wrote:
>
>
> Maybe something like:
>
> struct X
> {
> void change() { a = 5; }
> private:
> int a;
> }
>
> struct A
> {
> X x;
> void change() { x.change(); }
> }

yes

>
>
> Now, but the description of the abstract machine, we have some
> additional operations, for example, A:change() needs to compute the
> address of x for instance, to pass to X:change(). Since everything is
> inline, it is likely that this will still get converted (at least when
> optimization is enabled) to the same code as before.

yes, this is the situation. so you would think in release they have the
same speed. Ok, I guess I can do the private thing.... its much better
structure as well. but i guess I can measure the time to be sure about it.

mark

unread,
Sep 13, 2015, 8:37:45 AM9/13/15
to
Using the temporary variable can sometimes improve performance. The
compiler doesn't always know if there are other pointers to x.a and this
can sometimes prevent optimizations that you would expect to occur.

If you use the temporary variable, the compiler has a much better chance
of being certain that there are no aliases that can potentially modify x.a.

That being said, the performance difference is usually very, very small.

JiiPee

unread,
Sep 13, 2015, 8:56:19 AM9/13/15
to
On 13/09/2015 13:37, mark wrote:
> On 2015-09-12 18:29, JiiPee wrote:
>> 2)
>> struct X
>> {
>> int a;
>> };
>>
>> struct A
>> {
>> X x;
>> void use_a_alot()
>> };
>>
>> Also, if x.a is slower than having a-member in A, is it faster to copy
>> the x.a to a temporary variable if using it a lot... like this:
>> void A::use_a_alot()
>> {
>> int temp = x.a;
>> //... and then use temp like 10 times in the code
>> // ,....using temp...
>> }
>>
>> or is as fast to use x.a?
>
> Using the temporary variable can sometimes improve performance. The
> compiler doesn't always know if there are other pointers to x.a and
> this can sometimes prevent optimizations that you would expect to occur.

ok, this is what I was guessing as well

>
> If you use the temporary variable, the compiler has a much better
> chance of being certain that there are no aliases that can potentially
> modify x.a.
>
> That being said, the performance difference is usually very, very small.

ok but if you run a loop with hundreds of millions calculations /calls
to that variable then even small difference can count? of course if you
run it only like a thousand times it does not matter

Öö Tiib

unread,
Sep 13, 2015, 1:47:03 PM9/13/15
to
Compilers are written by humans who may have overlooked some opportunity
to optimize. That case may happen exactly in your code. Concentrate
on getting your program to do something useful correctly. If later it
appears slow then profile and optimize. Software is "soft" in sense
that it is easy to change later.

Paavo Helde

unread,
Sep 13, 2015, 2:28:46 PM9/13/15
to
JiiPee <n...@notvalid.com> wrote in news:HneJx.43217$xw.4...@fx22.am4:

>
> ok but if you run a loop with hundreds of millions calculations /calls
> to that variable then even small difference can count? of course if you
> run it only like a thousand times it does not matter
>

If the speed is so important, then there is no other way than to profile
your code and analyze the results. The optimizing compilers are pretty good
nowadays and will easily optimize away simple abstraction layers as in your
example. The real bottlenecks are often in places you don't expect, so
there is not much point to pre-guess what kind of code is more optimizable
by the compiler. Try to write the code in a clear and maintainable way,
then if you are not satisfied with the performance, profile the code and
try to fix the actual bottlenecks, not the imagined ones.

HTH
Paavo

red floyd

unread,
Sep 13, 2015, 2:38:12 PM9/13/15
to
What both Oo Tiib and Paavo have said are really all you need to know.

Knuth's Law applies as well. "Premature optimization is the root of all
evil."

You may spend hours or days on micro-optimizations while the real
bottleneck is elsewhere in your code.

Code that is fast but incorrect is useless. Here is an example of fast
but incorrect code that is supposed to calculate all primes less than
1 million:

int main()
{
}

It runs very fast, but it's incorrect and therefore useless. Get your
code WORKING first. Then AND ONLY THEN figure out where your bottlenecks
are (that's why profilers were invented), and work on optimizing that...
Not with micro-optimizations, but looking to see "Is there a better way
to do this?"


mark

unread,
Sep 13, 2015, 2:47:51 PM9/13/15
to
It's very rare that the performance difference will be more than 10%.
This sort of problem also readily shows up if you look at the
disassembled code - which you should be doing for the performance
critical parts. The compiler very often unexpectedly has its hands tied
behind its back and can't do certain optimizations due to restrictions
you didn't intend to put into place. Unless you look at the generated
code, you won't really know.

In any case, there are usually much, much higher gains to be had from
better data layout, better algorithms or better containers (the standard
library is quite limited).

You seem to be going down the route of an object oriented design. IME,
that usually leads to a bad memory layout (suboptimal for performance)
unless you are extremely careful. Modern CPUs highly depend on locality
for good performance. If your data is scattered all over the memory,
performance will drop like a rock.

If you organize your data structures right, the compiler may be able to
auto-vectorize some stuff - that can be a huge performance gain.

Changing your data layout later on will be quite difficult, so that's
something you should consider from the very beginning.

Scott Lurndal

unread,
Sep 14, 2015, 9:23:23 AM9/14/15
to
JiiPee <n...@notvalid.com> writes:
>Was coding and started to think about this:
>
>1)
>struct A
>{
> int a;
> void change() { a = 5; }
>};
>

$ cat /tmp/a.c

struct A { int a; void change(void) { a = 5; } };

int main(int argc, const char **argv, const char **envp)
{
A a;

a.change();

return 0;
}

00000000004005d6 <A::change()>:
4005d6: 55 push %rbp
4005d7: 48 89 e5 mov %rsp,%rbp
4005da: 48 89 7d f8 mov %rdi,-0x8(%rbp)
4005de: 48 8b 45 f8 mov -0x8(%rbp),%rax
4005e2: c7 00 05 00 00 00 movl $0x5,(%rax)
4005e8: 5d pop %rbp
4005e9: c3 retq
4005ea: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)

>2)
>struct X
>{
> int a;
>};
>
>struct A
>{
> X x;
> void change() { x.a = 5; }
>};
>
$ cat /tmp/a.c

struct X { int a; };

struct A { X x; void change(void) { x.a = 5; } };

int main(int argc, const char **argv, const char **envp)
{
A a;

a.change();

return 0;
}

00000000004005d6 <A::change()>:
4005d6: 55 push %rbp
4005d7: 48 89 e5 mov %rsp,%rbp
4005da: 48 89 7d f8 mov %rdi,-0x8(%rbp)
4005de: 48 8b 45 f8 mov -0x8(%rbp),%rax
4005e2: c7 00 05 00 00 00 movl $0x5,(%rax)
4005e8: 5d pop %rbp
4005e9: c3 retq
4005ea: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)


Both varieties generate identical code.

JiiPee

unread,
Sep 14, 2015, 1:25:17 PM9/14/15
to
On 14/09/2015 14:23, Scott Lurndal wrote:
> JiiPee <n...@notvalid.com> writes:
> Both varieties generate identical code.

thanks very much. I hope I learn to do this one day :). This is what I
wanted.... the fact about it, and there it is. So I guess we can
continue coding with good structures without fear of losing speed.

0 new messages