Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

[Comparative performance] Argument passing (built-in types)

21 views

Skip to first unread message

Alex Vinokur

unread,

Sep 11, 2003, 6:29:09 AM9/11/03

Here are results of comparative performance tests carried out
using the same compiler (gcc 3.2)
in different environments (CYGWIN, MINGW, DJGPP)
on Windows 2000 Professional.

Performing these tests was prompted by discussion
in the thread titled "C++ Performance Overheads"
(news:comp.lang.c++.moderated, 1999)
* http://groups.google.com/groups?th=914f5cc720cd9fe8,
especially by Bjarne Stroustrup and Siemel Naran's articles :
* http://groups.google.com/groups?selm=FArs05.Ftq%40research.att.com
* http://groups.google.com/groups?selm=FAwIzL.3Hp%40research.att.com
* http://groups.google.com/groups?selm=FB7qDn.2q0%40research.att.com
* http://groups.google.com/groups?selm=slrn7ibk9b.u6j.sbnaran%40fermi.ceg.uiuc.edu

========= Argument passing (built-in types) : BEGIN =========

------ Program fragments ------
static int s = 0;

void foo_no_args () { ++s; }

void foo_bool (bool ) { ++s; }
void foo_bool_ref (bool& ) { ++s; }
void foo_bool_ptr (bool* ) { ++s; }

void foo_short (short ) { ++s; }
void foo_short_ref (short& ) { ++s; }
void foo_short_ptr (short* ) { ++s; }

void foo_int (int ) { ++s; }
void foo_int_ref (int& ) { ++s; }
void foo_int_ptr (int* ) { ++s; }

void foo_long (long ) { ++s; }
void foo_long_ref (long& ) { ++s; }
void foo_long_ptr (long* ) { ++s; }

void foo_long_long (long long ) { ++s; }
void foo_long_long_ref (long long& ) { ++s; }
void foo_long_long_ptr (long long* ) { ++s; }

bool bool_value = true;
char char_value = 'a';
short short_value = 'a';
int int_value = 'a';
long long_value = 'a';
long long long_long_value = 'a';

bool* ptr_bool = &bool_value;
char* ptr_char = &char_value;
short* ptr_short = &short_value;
int* ptr_int = &int_value;
long* ptr_long = &long_value;
long long* ptr_long_long = &long_long_value;

#define DO_IT for (k = 0; k < NO_OF_REPETITIONS; k++)

DO_IT /* Do nothing */;
DO_IT foo_no_args ();

DO_IT foo_bool (bool_value) ;
DO_IT foo_bool_ref (bool_value) ;
DO_IT foo_bool_ptr (&bool_value) ;
DO_IT foo_bool_ptr (ptr_bool) ;

DO_IT foo_char (char_value) ;
DO_IT foo_char_ref (char_value) ;
DO_IT foo_char_ptr (&char_value) ;
DO_IT foo_char_ptr (ptr_char) ;

DO_IT foo_short (short_value) ;
DO_IT foo_short_ref (short_value) ;
DO_IT foo_short_ptr (&short_value) ;
DO_IT foo_short_ptr (ptr_short) ;

DO_IT foo_int (int_value) ;
DO_IT foo_int_ref (int_value) ;
DO_IT foo_int_ptr (&int_value) ;
DO_IT foo_int_ptr (ptr_int) ;

DO_IT foo_long (long_value) ;
DO_IT foo_long_ref (long_value) ;
DO_IT foo_long_ptr (&long_value) ;
DO_IT foo_long_ptr (ptr_long) ;

DO_IT foo_long_long (long_long_value) ;
DO_IT foo_long_long_ref (long_long_value) ;
DO_IT foo_long_long_ptr (&long_long_value) ;
DO_IT foo_long_long_ptr (ptr_long_long) ;

------ How it was measured ------
Very schematically measurements are carried out as following (pseudo code):

time_type get_elapsed_time () // specific function foo_xxx()
{
time_type measured_time[NO_OF_TESTS];

for (i = 0; i < NO_OF_TESTS; i++)
{
start_time = ...
for (k = 0; k < NO_OF_REPETITIONS; k++)
{
/* calling specific function foo_xxx() */
}
end_time = ...
measured_time[i] = end_time - start_time;
}

sort (measured_time);

sum = 0;
for (i = THRESHOLD; i < (NO_OF_TESTS - THRESHOLD); i++)
{
sum += measured_time[i];
}

elapsed_time = sum/(NO_OF_TESTS - 2*THRESHOLD);

return elapsed_time;
}

------ What was measured ------
Performance tests were performed for the following optimization levels :
* No optimization
* Optimization O1
* Optimization O2
* Optimization O3

The get_elapsed_time () procedure was executed five times for each optimization level.

------ Measurement results ------
Measurement results (for each optimization level) are as following :
1. Five time-cost summary tables (one table corresponds to one invocation of get_elapsed_time();
See : http://groups.google.com/groups?th=c8c189cba67e05a0
2. One _normalized_ time-cost summary table (computed on basis of the time-cost summary tables).
See in this article below.

------ Measurement environment ------
* CYGWIN, g++ 3.2
* MINGW, g++ 3.2
* DJGPP, g++ 3.2

------ Measurement tool ------
C/C++ Program Perfometer
http://sourceforge.net/projects/cpp-perfometer
http://alexvn.freeservers.com/s1/perfometer.html

========= Argument passing (built-in types) : END ===========

========= Results for CYGWIN : BEGIN =========

#================================================
# Windows 2000 (1.70 GHz), CYGWIN_NT-5.0, g++ 3.2
#================================================

#===========================================================
# Comparative Perfomance : argument passing (built-in types)
# ------- Summary -------
#===========================================================

--------------------------------------------
Tested functions
--------------------------------------------
static int s = 0;
void foon () { ++s; } // no arguments
void foov (T t); { ++s; } // by value
void foor (T& t); { ++s; } // by reference
void foop (T* t); { ++s; } // by pointer
--------------------------------------------

No optimization : Normalized time-cost (cost of int-by-value = 100)
------------------------------------------------------------------------
| | by value | by ref | by ptr |
| Type of an argument |-----------|-----------|------------------------|
| | foov(var) | foor(var) | foop(&var) | foop(ptr) |
|----------------------------------------------------------------------|
| 1 | do nothing | 33 |
| 2 | no arguments | 99 |
|- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - |
| 3 | bool | 120 | 101 | 101 | 104 |
| 4 | char | 116 | 110 | 100 | 98 |
| 5 | short | 114 | 99 | 101 | 101 |
| 6 | int | 100 | 99 | 100 | 101 |
| 7 | long | 100 | 102 | 95 | 98 |
| 8 | long long | 128 | 102 | 99 | 102 |
------------------------------------------------------------------------

Optimization O1 : Normalized time-cost (cost of int-by-value = 100)
------------------------------------------------------------------------
| | by value | by ref | by ptr |
| Type of an argument |-----------|-----------|------------------------|
| | foov(var) | foor(var) | foop(&var) | foop(ptr) |
|----------------------------------------------------------------------|
| 1 | do nothing | 13 |
| 2 | no arguments | 100 |
|- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - |
| 3 | bool | 101 | 106 | 100 | 103 |
| 4 | char | 113 | 119 | 105 | 104 |
| 5 | short | 124 | 108 | 95 | 99 |
| 6 | int | 100 | 100 | 106 | 102 |
| 7 | long | 100 | 104 | 99 | 105 |
| 8 | long long | 118 | 97 | 104 | 103 |
------------------------------------------------------------------------

Optimization O2 : Normalized time-cost (cost of int-by-value = 100)
------------------------------------------------------------------------
| | by value | by ref | by ptr |
| Type of an argument |-----------|-----------|------------------------|
| | foov(var) | foor(var) | foop(&var) | foop(ptr) |
|----------------------------------------------------------------------|
| 1 | do nothing | 21 |
| 2 | no arguments | 84 |
|- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - |
| 3 | bool | 114 | 96 | 101 | 100 |
| 4 | char | 100 | 98 | 97 | 99 |
| 5 | short | 101 | 100 | 102 | 102 |
| 6 | int | 100 | 103 | 104 | 111 |
| 7 | long | 99 | 100 | 102 | 102 |
| 8 | long long | 116 | 101 | 104 | 107 |
------------------------------------------------------------------------

Optimization O3 : Normalized time-cost (cost of int-by-value = 100)
------------------------------------------------------------------------
| | by value | by ref | by ptr |
| Type of an argument |-----------|-----------|------------------------|
| | foov(var) | foor(var) | foop(&var) | foop(ptr) |
|----------------------------------------------------------------------|
| 1 | do nothing | 51 |
| 2 | no arguments | 102 |
|- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - |
| 3 | bool | 98 | 101 | 102 | 97 |
| 4 | char | 94 | 96 | 99 | 101 |
| 5 | short | 98 | 99 | 95 | 99 |
| 6 | int | 100 | 101 | 102 | 103 |
| 7 | long | 102 | 100 | 102 | 105 |
| 8 | long long | 105 | 104 | 98 | 100 |
------------------------------------------------------------------------

========= Results for CYGWIN : END ===========

========= Results for MINGW : BEGIN =========

#================================================
# Windows 2000 (1.70 GHz), MINGW 2.0.0-2, g++ 3.2
#================================================

No optimization : Normalized time-cost (cost of int-by-value = 100)
------------------------------------------------------------------------
| | by value | by ref | by ptr |
| Type of an argument |-----------|-----------|------------------------|
| | foov(var) | foor(var) | foop(&var) | foop(ptr) |
|----------------------------------------------------------------------|
| 1 | do nothing | 37 |
| 2 | no arguments | 101 |
|- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - |
| 3 | bool | 130 | 104 | 101 | 103 |
| 4 | char | 131 | 131 | 104 | 98 |
| 5 | short | 131 | 104 | 103 | 102 |
| 6 | int | 100 | 100 | 103 | 102 |
| 7 | long | 99 | 115 | 105 | 105 |
| 8 | long long | 132 | 107 | 103 | 105 |
------------------------------------------------------------------------

Optimization O1 : Normalized time-cost (cost of int-by-value = 100)
------------------------------------------------------------------------
| | by value | by ref | by ptr |
| Type of an argument |-----------|-----------|------------------------|
| | foov(var) | foor(var) | foop(&var) | foop(ptr) |
|----------------------------------------------------------------------|
| 1 | do nothing | 12 |
| 2 | no arguments | 83 |
|- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - |
| 3 | bool | 100 | 102 | 97 | 99 |
| 4 | char | 98 | 98 | 96 | 101 |
| 5 | short | 98 | 97 | 99 | 100 |
| 6 | int | 100 | 102 | 99 | 112 |
| 7 | long | 105 | 104 | 105 | 107 |
| 8 | long long | 110 | 106 | 104 | 106 |
------------------------------------------------------------------------

Optimization O2 : Normalized time-cost (cost of int-by-value = 100)
------------------------------------------------------------------------
| | by value | by ref | by ptr |
| Type of an argument |-----------|-----------|------------------------|
| | foov(var) | foor(var) | foop(&var) | foop(ptr) |
|----------------------------------------------------------------------|
| 1 | do nothing | 18 |
| 2 | no arguments | 86 |
|- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - |
| 3 | bool | 114 | 101 | 110 | 99 |
| 4 | char | 100 | 105 | 100 | 99 |
| 5 | short | 98 | 108 | 99 | 100 |
| 6 | int | 100 | 99 | 100 | 100 |
| 7 | long | 99 | 99 | 98 | 98 |
| 8 | long long | 114 | 99 | 108 | 107 |
------------------------------------------------------------------------

Optimization O3 : Normalized time-cost (cost of int-by-value = 100)
------------------------------------------------------------------------
| | by value | by ref | by ptr |
| Type of an argument |-----------|-----------|------------------------|
| | foov(var) | foor(var) | foop(&var) | foop(ptr) |
|----------------------------------------------------------------------|
| 1 | do nothing | 56 |
| 2 | no arguments | 97 |
|- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - |
| 3 | bool | 100 | 98 | 100 | 100 |
| 4 | char | 112 | 104 | 100 | 100 |
| 5 | short | 104 | 101 | 100 | 96 |
| 6 | int | 100 | 97 | 101 | 101 |
| 7 | long | 103 | 98 | 100 | 99 |
| 8 | long long | 98 | 96 | 103 | 97 |
------------------------------------------------------------------------

========= Results for MINGW : END ===========

========= Results for DJGPP : BEGIN =========

#===============================================
# Windows 2000 (1.70 GHz), DJGPP 2.03, gpp 3.2.1
#===============================================

No optimization : Normalized time-cost (cost of int-by-value = 100)
------------------------------------------------------------------------
| | by value | by ref | by ptr |
| Type of an argument |-----------|-----------|------------------------|
| | foov(var) | foor(var) | foop(&var) | foop(ptr) |
|----------------------------------------------------------------------|
| 1 | do nothing | 37 |
| 2 | no arguments | 88 |
|- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - |
| 3 | bool | 122 | 109 | 105 | 91 |
| 4 | char | 128 | 122 | 103 | 94 |
| 5 | short | 130 | 100 | 107 | 89 |
| 6 | int | 100 | 111 | 103 | 96 |
| 7 | long | 99 | 98 | 105 | 96 |
| 8 | long long | 122 | 99 | 103 | 91 |
------------------------------------------------------------------------

Optimization O1 : Normalized time-cost (cost of int-by-value = 100)
------------------------------------------------------------------------
| | by value | by ref | by ptr |
| Type of an argument |-----------|-----------|------------------------|
| | foov(var) | foor(var) | foop(&var) | foop(ptr) |
|----------------------------------------------------------------------|
| 1 | do nothing | 6 |
| 2 | no arguments | 89 |
|- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - |
| 3 | bool | 96 | 94 | 97 | 92 |
| 4 | char | 115 | 99 | 106 | 98 |
| 5 | short | 103 | 108 | 101 | 102 |
| 6 | int | 100 | 103 | 102 | 106 |
| 7 | long | 108 | 100 | 99 | 102 |
| 8 | long long | 103 | 100 | 104 | 101 |
------------------------------------------------------------------------

Optimization O2 : Normalized time-cost (cost of int-by-value = 100)
------------------------------------------------------------------------
| | by value | by ref | by ptr |
| Type of an argument |-----------|-----------|------------------------|
| | foov(var) | foor(var) | foop(&var) | foop(ptr) |
|----------------------------------------------------------------------|
| 1 | do nothing | 6 |
| 2 | no arguments | 96 |
|- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - |
| 3 | bool | 123 | 100 | 102 | 102 |
| 4 | char | 100 | 106 | 108 | 108 |
| 5 | short | 98 | 108 | 98 | 100 |
| 6 | int | 100 | 102 | 104 | 100 |
| 7 | long | 98 | 102 | 109 | 106 |
| 8 | long long | 121 | 100 | 102 | 102 |
------------------------------------------------------------------------

Optimization O3 : Normalized time-cost (cost of int-by-value = 100)
------------------------------------------------------------------------
| | by value | by ref | by ptr |
| Type of an argument |-----------|-----------|------------------------|
| | foov(var) | foor(var) | foop(&var) | foop(ptr) |
|----------------------------------------------------------------------|
| 1 | do nothing | 40 |
| 2 | no arguments | 100 |
|- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - |
| 3 | bool | 80 | 80 | 80 | 100 |
| 4 | char | 100 | 100 | 100 | 100 |
| 5 | short | 100 | 100 | 80 | 100 |
| 6 | int | 100 | 100 | 100 | 100 |
| 7 | long | 100 | 100 | 100 | 100 |
| 8 | long long | 100 | 100 | 80 | 100 |
------------------------------------------------------------------------

========= Results for DJGPP : END ===========

=====================================
Alex Vinokur
mailto:ale...@connect.to
http://mathforum.org/library/view/10978.html
=====================================

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Michael Jørgensen

unread,

Sep 11, 2003, 5:37:28 PM9/11/03

I would like to contest this result.

First of all, some basic information is lacking in your post:
- What values were chosen for NO_OF_REPETITIONS and NO_OF_TESTS.
- What was the total duration (in seconds) of each single measurement, i.e.
each time "end_time-start_time" was evaluated.

I have a suspicion that the duration was too small, leading to large
sampling and/or quantization error in the measurements.

The results lack error estimates. To be specific: The measurements in
http://groups.google.com/groups?th=c8c189cba67e05a0 were repeated five
times. The values for [CYGWIN_NT-5.0, No optimization, bool, by value] were:
100, 107, 100, 106, and 103 respectively. This particular value appears to
have a standard deviation of around 3%. Furthermore, during normalization
two such values are divided, and hence the relative errors are added.

Such analysis should be detailed for each value presented in the summary.

Another objection: The argument value in the functions is never actually
used. It is quite possible that the compiler inlines the function call and
completely eliminates any occurences of the argument value. This would in
turn lead results that are *independent* of the particular argument passing
technique used.

You should therefore compare the assembler output (even if you don't
understand assembler) of the various functions. I would not be surprised, if
all the different function calls lead to the same assembler code, at least
when optimizing at level -O3.

Finally, I failed to find any conclusion in your post. I, too, am not able
to conclude anything meaningful from your post, considering the objections
given above.

One more point: You repeat your test using three different environments. Do
you have any prior suspicion that the test results should depend on the
execution environment. I would indeed be surprised if this was the case. I
mean, it is the *compiler* that determines the assembler code to be
executed, and you explicitly state that the compiler is the same in all
cases. The environment could perhaps influence the speed of any library
and/or system functions, but that is not part of your test.

-Michael.

P.S. If possible, please upload your entire test program to a web site for
me to download. I would like to try to repeat your test on a Linux platform.

Alex Vinokur

unread,

Sep 13, 2003, 4:16:07 AM9/13/03

"Michael Jørgensen" <in...@ukendt.dk> wrote in message news:3f6077dc$0$97171$edfa...@dread12.news.tele.dk...

> I would like to contest this result.
>
> First of all, some basic information is lacking in your post:
> - What values were chosen for NO_OF_REPETITIONS and NO_OF_TESTS.

NO_OF_REPETITIONS, NO_OF_TESTS and other invocation parameters
can be chosen (set up) via command line of
C/C++ Program Perfometer's executable file.
Note. About C/C++ Program Perfometer download sites see below.

The values used in the considered test are as following :

NO_OF_REPETITIONS = 10000000,
NO_OF_TESTS = 5,
Note. NO_OF_TESTS has nothing to do with number of measurement runs
which is equal to 5 as well;
THRESHOLD_FACTOR = 0.2,
So, THRESHOLD = NO_OF_TESTS * THRESHOLD_FACTOR = 1,
i.e. results of 2 test (1 minimal + 1 maximal) have been discarded.

Some raw logs of the measurement can be seen at
* http://groups.google.com/groups?selm=bjri6t%24m6n8a%241%40ID-79865.news.uni-berlin.de
* http://groups.google.com/groups?selm=bjricv%24lrr6s%241%40ID-79865.news.uni-berlin.de

Here is a fragment of the raw log.

%%%%%%%%% Raw log : Fragment-1 : BEGIN %%%%%%%%%

No optimization
#############################################
# --------
# Resource
# --------
# Resource Name : user time used (via rusage)
# Resource Cost Unit : milliseconds (unsigned long long)
# Resource State Unit : timeval
# ===========================================
# Function-7 : of 26
# PRETTY FUNCTION : void pass_action_no_size()
# FUNCTION : pass_action_no_size
# FILE : t_pass.cpp, line#225
# DESCRIPTION : char via char-value (size = No)
# PER-CALLS : 10000000
# TOTAL TESTS : 5
# ---------------------
# Specific Performance is Runcost expended per 10000000 iterations
# ---------------------
# Detailed measurement report
#############################################

---------------------------------------------
--- Results sorted according to a test number
---------------------------------------------
Test[ 1] : 100 [10000000 iterations]
Test[ 2] : 90 [10000000 iterations]
Test[ 3] : 100 [10000000 iterations]
Test[ 4] : 101 [10000000 iterations]
Test[ 5] : 99 [10000000 iterations]

----------------------------------------
--- Results sorted according to run cost
----------------------------------------
-Test[ 2] : 90 [10000000 iterations]
+Test[ 5] : 99 [10000000 iterations]
+Test[ 1] : 100 [10000000 iterations]
+Test[ 3] : 100 [10000000 iterations]
-Test[ 4] : 101 [10000000 iterations]

No optimization
-------------------------------------
--- Summary report : Average run cost
-------------------------------------
Average value = 99 per 10000000 iterations (299/3 )
*** All tests ***
Total tests = 5
Min abs value = 90 --> in Test#2
Max abs value = 101 --> in Test#4
*** Selected tests ***
Discarding threshold = 0.2
Discarded tests = 2 (1 minimal and 1 maximal)
Total selected tests = 3
Min selected value = 99 --> in Test#5
Max selected value = 100 --> in Test#3

%%%%%%%%% Raw log : Fragment-1 : END %%%%%%%%%%%

> - What was the total duration (in seconds) of each single measurement, i.e.
> each time "end_time-start_time" was evaluated.

From the fragment (of the raw log) above we can see that
for [CYGWIN_NT-5.0, No optimization, char, by value]
the duration was from 90 to 101 milliseconds (per 10000000 repetitions).

>
> I have a suspicion that the duration was too small, leading to large
> sampling and/or quantization error in the measurements.

It seems that time-costs dispersion (90 - 101) in the fragment above
is quite acceptable.

>
> The results lack error estimates. To be specific: The measurements in
> http://groups.google.com/groups?th=c8c189cba67e05a0 were repeated five
> times. The values for [CYGWIN_NT-5.0, No optimization, bool, by value] were:
> 100, 107, 100, 106, and 103 respectively. This particular value appears to
> have a standard deviation of around 3%. Furthermore, during normalization
> two such values are divided, and hence the relative errors are added.

As it was noted above minimal and maximal values of measured time-cost were discarded.

>
> Such analysis should be detailed for each value presented in the summary.
>
> Another objection: The argument value in the functions is never actually used.

That is one of possible testsuites.

> It is quite possible that the compiler inlines the function call and
> completely eliminates any occurences of the argument value. This would in
> turn lead results that are *independent* of the particular argument passing
> technique used.
>
> You should therefore compare the assembler output (even if you don't
> understand assembler) of the various functions.

It might be of interest to the performance tester.
Sorry, it _must_ be of interest to the performance tester.
Because the considered testsuites are a part of
some unit compilation of C/C++ Program Perfometer
that requires thinking through.

> I would not be surprised, if all the different function calls lead to the same
> assembler code, at least when optimizing at level -O3.

I think you are right relative to 'Optimization O3'.
It seems (according to the measurement results) that
in 'No optimization', 'Optimization O1', 'Optimization 02'
the occurrences of the argument value don't eliminate.
In any case one can carry out another performance test
with functions which actually use the argument value.
However, that is another performance test, that is quite different.

>
> Finally, I failed to find any conclusion in your post.

I wanted my original post to contain only (brief) description
and results of the performance tests.

> I, too, am not able to conclude anything meaningful from your post,
> considering the objections given above.
>
> One more point: You repeat your test using three different environments. Do
> you have any prior suspicion that the test results should depend on the
> execution environment.

Yes, I do.

Example-1 (Related to performance topic).
Different performance of the same compiler (gcc 3.2) in CYGWIN and MINGW.
a) http://groups.google.com/groups?th=120864de94019802
b) http://article.gmane.org/gmane.comp.gnu.mingw.user/8614
http://article.gmane.org/gmane.comp.gnu.mingw.user/8615

Example-2 (Not related to performance topic).
The program behavior is different on CYGWIN and MINGW.
a) http://groups.google.com/groups?th=a6219f74d54ea916
b) http://article.gmane.org/gmane.comp.gnu.mingw.user/8854
http://article.gmane.org/gmane.comp.gnu.mingw.user/8855
http://article.gmane.org/gmane.comp.gnu.mingw.user/8856

> I would indeed be surprised if this was the case. I
> mean, it is the *compiler* that determines the assembler code to be
> executed, and you explicitly state that the compiler is the same in all
> cases. The environment could perhaps influence the speed of any library
> and/or system functions, but that is not part of your test.
>
> -Michael.
>
> P.S. If possible, please upload your entire test program to a web site for
> me to download. I would like to try to repeat your test on a Linux platform.
>

It would be interesting to see your test results.

My original post contained the download sites :
C/C++ Program Perfometer (Version 2.4.2-1.13 and later versions)
* http://sourceforge.net/projects/cpp-perfometer
* http://alexvn.freeservers.com/s1/perfometer.html

Any questions might be sent to :
Mailing list : http://sourceforge.net/forum/forum.php?forum_id=272195
mailto:ale...@connect.to

Regards,

=====================================
Alex Vinokur
mailto:ale...@connect.to
http://mathforum.org/library/view/10978.html
=====================================

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Michael Jørgensen

unread,

Sep 15, 2003, 5:57:49 AM9/15/03

"Alex Vinokur" <ale...@bigfoot.com> wrote in message
news:bjt25a$mmesq$1...@ID-79865.news.uni-berlin.de...

> The values used in the considered test are as following :
>
> NO_OF_REPETITIONS = 10000000,
> NO_OF_TESTS = 5,

[snip]

> > - What was the total duration (in seconds) of each single measurement,
i.e.
> > each time "end_time-start_time" was evaluated.
>
> From the fragment (of the raw log) above we can see that
> for [CYGWIN_NT-5.0, No optimization, char, by value]
> the duration was from 90 to 101 milliseconds (per 10000000 repetitions).

This is my point. I would recommend increasing the number of repetitions by
a factor of 10, at least. The point is that measuring sub-second duration on
a multi-tasking system is inherently inaccurate. Most operating systems
allocate time slices for each running process. It is not uncommon for these
time slices to be around 10 milliseconds. Indeed, the measured duration
values in your test do in fact appear grouped around multiples of 10
milliseconds. This is what I mean about quantization error.

[snip]

> > I would not be surprised, if all the different function calls lead to
the same
> > assembler code, at least when optimizing at level -O3.
>
> I think you are right relative to 'Optimization O3'.
> It seems (according to the measurement results) that
> in 'No optimization', 'Optimization O1', 'Optimization 02'
> the occurrences of the argument value don't eliminate.

> In any case one can carry out another performance test
> with functions which actually use the argument value.
> However, that is another performance test, that is quite different.

A different test indeed, but a very interesting one I believe. This would
show the "true" difference between passing by value and passing by
reference.

One more thought: There might be a difference between passing by
const-reference and passing by non-const-reference: The compiler might be
able to apply certain optimizations in the const case.

-Michael.

Alex Vinokur

unread,

Sep 30, 2003, 4:56:56 PM9/30/03

"Michael Jørgensen" <in...@ukendt.dk> wrote in message news:3f65575d$0$97261$edfa...@dread12.news.tele.dk...

> "Alex Vinokur" <ale...@bigfoot.com> wrote in message
> news:bjt25a$mmesq$1...@ID-79865.news.uni-berlin.de...
>
> > The values used in the considered test are as following :
> >
> > NO_OF_REPETITIONS = 10000000,
> > NO_OF_TESTS = 5,
> [snip]
> > > - What was the total duration (in seconds) of each single measurement,
> i.e.
> > > each time "end_time-start_time" was evaluated.
> >
> > From the fragment (of the raw log) above we can see that
> > for [CYGWIN_NT-5.0, No optimization, char, by value]
> > the duration was from 90 to 101 milliseconds (per 10000000 repetitions).
>
> This is my point. I would recommend increasing the number of repetitions by
> a factor of 10, at least.

Because of some problem on my Windows 2000 (see http://groups.google.com/groups?th=15ada0b5b53731ef)
I can't perform long intensive computation.
So, sometimes on my particular computer the duration of each single measurement
is limited when using C++ Program Perfometer.

More long duration can be reached when more short programs are carried out,
(see program Simple Perfometer:
* http://groups.google.com/groups?selm=bl66ej%248eraf%241%40ID-79865.news.uni-berlin.de,
test report:
* http://groups.google.com/groups?th=7e791dcf1778099c
).
Note. By the way, using the Simple Perfometer, we could try to detect
how the measurement duration affects the measurement results and conclusion.

> The point is that measuring sub-second duration on
> a multi-tasking system is inherently inaccurate. Most operating systems
> allocate time slices for each running process. It is not uncommon for these
> time slices to be around 10 milliseconds. Indeed, the measured duration
> values in your test do in fact appear grouped around multiples of 10
> milliseconds. This is what I mean about quantization error.
>
> [snip]
>
> > > I would not be surprised, if all the different function calls lead to
> the same
> > > assembler code, at least when optimizing at level -O3.
> >
> > I think you are right relative to 'Optimization O3'.
> > It seems (according to the measurement results) that
> > in 'No optimization', 'Optimization O1', 'Optimization 02'
> > the occurrences of the argument value don't eliminate.

The functions are actually called (i.e., not inlined) at levels 'No optimization', 'Optimization O1', 'Optimization O2'.
The functions are inlined at level 'Optimization O3'.
Details can be seen below.

>
> > In any case one can carry out another performance test
> > with functions which actually use the argument value.
> > However, that is another performance test, that is quite different.
>
> A different test indeed, but a very interesting one I believe. This would
> show the "true" difference between passing by value and passing by
> reference.

Done.

>
> One more thought: There might be a difference between passing by
> const-reference and passing by non-const-reference: The compiler might be
> able to apply certain optimizations in the const case.
>

Done.
The compiler seems not to apply optimizations in the const case.

Comparative performance : measurement and analysis
--------------------------------------------------
Testsuite : Argument passing (built-in types)
Tools :
* the C/C++ Program Perfometer
http://sourceforge.net/projects/cpp-perfometer/
http://alexvn.freeservers.com/s1/perfometer.html
* the objdump utility
http://www.bigbiz.com/cgi-bin/manpage?objdump

Contents ;
----------
1. Testsuite
2. Environment
3. Summary
3.1. Perfometer
3.1.0. No optimization
3.1.1. Optimization O1
3.1.2. Optimization O2
3.1.3. Optimization O3
3.2. objdump
3.2.0. No optimization
3.2.1. Optimization O1
3.2.2. Optimization O2
3.2.3. Optimization O3
4. The same argument type & different optimization levels
4.1. char
4.1.1. Perfometer
4.1.2. objdump
4.2. short
4.2.1. Perfometer
4.2.2. objdump
4.3. int
4.3.1. Perfometer
4.3.2. objdump
5. The same optimization level & different argument types
5.0. No optimization
5.0.1. objdump
5.1. Optimization O1
5.1.1. objdump
5.2. Optimization O2
5.2.1. objdump
6. Non-const & const arguments; pointers and references
7. Inlining and non-inlining (actual calling functions)

1. Testsuite
============

----------------
Tested functions
----------------
int x = rand();

static int s = 0;

--- fno-function : no arguments ---
void fno () { s += x; } // no arguments

--- f1-functions : an argument is never actually used ---
void f1_val (T); { s += x; } // by value

void f1_ref (T&); { s += x; } // by const reference
void f1_cref (const T&); { s += x; } // by reference

void f1_ptr (T*); { s += x; } // by pointer
void f1_cptr (const T*); { s += x; } // by const pointer

--- f2-functions : an argument is actually used ---
void f2_val (T t); { s += t; } // by value

void f2_ref (T& t); { s += t; } // by const reference
void f2_cref (const T& t); { s += t; } // by reference

void f2_ptr (T* t); { s += *t; } // by pointer
void f2_cptr (const T* t); { s += *t; } // by const pointer

-------------------------
Tested built-in types T :
-------------------------
* bool
* char
* short
* int
* long
* long long
----------------------

2. Environment
==============

Windows 2000 Professional
Intel(R) Celeron(R) CPU 1.70. GHz
CYGWIN_NT-5.0 1.5.4(0.94/3/2)
GNU g++ version 3.3.1 (cygming special)
GNU objdump 2.14.90 20030901

3. Summary
==========

3.1. Perfometer
---------------

#==================================================================
# Comparative Performance : argument passing (built-in types)
# ------- Summary -------
#------------------------------------------------------------------

# Resource Name : user time used (via rusage)

# Resource Cost Unit : milliseconds per 5000000 repetitions

# Resource State Unit : timeval

#------------------------------------------------------------------
# The total duration of each single measurement was about
# 50-100 milliseconds per 5000000 repetitions
#- - - - - - - - - -- - - - - - - - - - - - - - - - - - -
# Note-1. Because of some problem on my Windows 2000,
# long intensive computation can't be performed
# (see http://groups.google.com/groups?th=15ada0b5b53731ef)
# So, here the duration is about 0.05-0.1 sec
# Note-2. More long duration of each single measurement
# can be reached when more short programs are carried out
# (see http://groups.google.com/groups?selm=bl66ej%248eraf%241%40ID-79865.news.uni-berlin.de,
# http://groups.google.com/groups?th=7e791dcf1778099c
# )
#==================================================================

Only _normalized_ time-cost summary table
(computed on basis of _actual_ time-cost summary tables)
are represented here.
-----------------------------------------------------
_Actual_ time-cost summary tables can be seen at
* http://groups.google.com/groups?th=230a038d78a9e416
-----------------------------------------------------

3.1.0. No optimization : Normalized time-cost (cost of int-by-value-arg-unused = 100)
---------------------------------------------------------------------------------------------------------------------------

| |------------|--------------------------|-----------------------------------------------------|
| Type of an argument | value | non-const | const | non-const | const |
| |------------|------------|-------------------------------------------------------------------|
| | fval(var) | fref(var) | fcref(var) | fptr(&var) | fptr(ptr) | fcptr(&var) | fcptr(ptr) |
|-------------------------------------------------------------------------------------------------------------------------|
| 1 | do nothing | 26 |
| 2 | no arguments | 105 |
| 3 | bool f1:unused | 112 | 101 | 99 | 101 | 96 | 103 | 98 |
| 4 | char f1:unused | 111 | 108 | 98 | 102 | 100 | 99 | 103 |
| 5 | short f1:unused | 118 | 100 | 97 | 97 | 97 | 103 | 101 |
| 6 | int f1:unused | 100 | 98 | 101 | 94 | 100 | 96 | 97 |
| 7 | long f1:unused | 106 | 102 | 98 | 95 | 99 | 98 | 101 |
| 8 | long long f1:unused | 142 | 100 | 97 | 107 | 97 | 99 | 96 |
| 9 | bool f2:used | 114 | 100 | 98 | 98 | 99 | 108 | 99 |
| 10 | char f2:used | 114 | 96 | 98 | 102 | 100 | 100 | 98 |
| 11 | short f2:used | 115 | 107 | 108 | 108 | 110 | 103 | 101 |
| 12 | int f2:used | 98 | 105 | 101 | 103 | 100 | 100 | 101 |
| 13 | long f2:used | 118 | 100 | 104 | 105 | 107 | 102 | 105 |
| 14 | long long f2:used | 131 | 109 | 101 | 102 | 100 | 102 | 102 |
---------------------------------------------------------------------------------------------------------------------------
Perfometer 'No optimization' comments.
Comment-P01. Functions
* passing-by-value
* for bool, char, short, 'long long' arguments
apparently require more time than passing-by-value-int-unused ("normalizator").
Comment-P02. Results for
* passing-by-value-long-used (118)
seem to be weird.

3.1.1. Optimization O1 : Normalized time-cost (cost of int-by-value-arg-unused = 100)
---------------------------------------------------------------------------------------------------------------------------

| |------------|--------------------------|-----------------------------------------------------|
| Type of an argument | value | non-const | const | non-const | const |
| |------------|------------|-------------------------------------------------------------------|
| | fval(var) | fref(var) | fcref(var) | fptr(&var) | fptr(ptr) | fcptr(&var) | fcptr(ptr) |
|-------------------------------------------------------------------------------------------------------------------------|
| 1 | do nothing | 13 |
| 2 | no arguments | 88 |
| 3 | bool f1:unused | 100 | 103 | 101 | 100 | 105 | 100 | 102 |
| 4 | char f1:unused | 280 | 100 | 101 | 100 | 100 | 102 | 103 |
| 5 | short f1:unused | 101 | 99 | 102 | 100 | 100 | 100 | 99 |
| 6 | int f1:unused | 100 | 100 | 101 | 102 | 100 | 106 | 100 |
| 7 | long f1:unused | 100 | 102 | 100 | 100 | 100 | 100 | 100 |
| 8 | long long f1:unused | 119 | 100 | 102 | 101 | 101 | 100 | 99 |
| 9 | bool f2:used | 100 | 98 | 103 | 99 | 102 | 101 | 100 |
| 10 | char f2:used | 107 | 100 | 101 | 100 | 100 | 102 | 103 |
| 11 | short f2:used | 104 | 102 | 100 | 101 | 100 | 100 | 101 |
| 12 | int f2:used | 101 | 99 | 101 | 99 | 100 | 100 | 100 |
| 13 | long f2:used | 100 | 100 | 102 | 105 | 99 | 99 | 100 |
| 14 | long long f2:used | 120 | 100 | 100 | 107 | 101 | 102 | 102 |
---------------------------------------------------------------------------------------------------------------------------
Perfometer 'Optimization O1' comments.
Comment-P11. Functions
* passing-by-value
* for 'long long' argument
apparently require more time than "normalizator" passing-by-value-int-unused.
Comment-P12. Result for
* passing-by-value-char-unused (280)
is very odd.
Especially if one takes into account that
assembler code for 'char' argument
doesn't differ from one for 'char&', 'int', 'int&', etc.
(See Paragraph-5.1.1 of this article).

3.1.2. Optimization O2 : Normalized time-cost (cost of int-by-value-arg-unused = 100)
---------------------------------------------------------------------------------------------------------------------------

| |------------|--------------------------|-----------------------------------------------------|
| Type of an argument | value | non-const | const | non-const | const |
| |------------|------------|-------------------------------------------------------------------|
| | fval(var) | fref(var) | fcref(var) | fptr(&var) | fptr(ptr) | fcptr(&var) | fcptr(ptr) |
|-------------------------------------------------------------------------------------------------------------------------|
| 1 | do nothing | 12 |
| 2 | no arguments | 86 |
| 3 | bool f1:unused | 99 | 99 | 100 | 99 | 101 | 101 | 103 |
| 4 | char f1:unused | 100 | 100 | 101 | 99 | 103 | 99 | 100 |
| 5 | short f1:unused | 99 | 99 | 101 | 98 | 99 | 100 | 101 |
| 6 | int f1:unused | 100 | 99 | 99 | 101 | 99 | 101 | 99 |
| 7 | long f1:unused | 99 | 99 | 100 | 99 | 99 | 99 | 100 |
| 8 | long long f1:unused | 112 | 99 | 100 | 99 | 99 | 100 | 99 |
| 9 | bool f2:used | 100 | 101 | 101 | 100 | 108 | 104 | 99 |
| 10 | char f2:used | 98 | 103 | 100 | 99 | 102 | 106 | 100 |
| 11 | short f2:used | 101 | 100 | 99 | 100 | 102 | 99 | 99 |
| 12 | int f2:used | 99 | 100 | 100 | 99 | 99 | 99 | 99 |
| 13 | long f2:used | 99 | 99 | 99 | 100 | 100 | 101 | 103 |
| 14 | long long f2:used | 111 | 99 | 99 | 99 | 99 | 99 | 99 |
---------------------------------------------------------------------------------------------------------------------------
Perfometer 'Optimization O2' comments.
Comment-P21. Functions
* passing-by-value
* for 'long long' argument
apparently require more time than "normalizator" passing-by-value-int-unused.

3.1.3. Optimization O3 : The tested function are inlined (not actually _called_).

3.2. objdump
------------

#====================================================================
# Information from object file : argument passing (built-in types)
# ------- Summary -------
#--------------------------------------------------------------------
# Each cell of the table contains number of assembler commands
# created for each separate function.
# Each cell contains two values - v1/v2 :
# v1 is number of _all_ assembler command for the function
# v2 is number of "arithmetic" assembler command (add, mov, etc)
#====================================================================

--------------------------------------------------------------
More detailed information from the object files can be seen at
* http://groups.google.com/groups?th=c1e82100af8e0c01
--------------------------------------------------------------

3.2.0. No optimization : Number of assembler commands
-------------------------------------------------------------------------------------------

| |-----------|------------------------|------------------------|
| Type of an argument | value | non-const | const | non-const | const |
| |-----------|-----------|-------------------------------------|
| | fval(var) | fref(var) | fcref(var) | fptr(ptr) | fcptr(ptr) |
|-----------------------------------------------------------------------------------------|
| 1 | do nothing | --- |
| 2 | no arguments | 6/3 |
| 3 | bool f1:unused | 10/6 | 6/3 | 6/3 | 6/3 | 6/3 |
| 4 | char f1:unused | 10/6 | 6/3 | 6/3 | 6/3 | 6/3 |
| 5 | short f1:unused | 9/3 | 6/3 | 6/3 | 6/3 | 6/3 |
| 6 | int f1:unused | 6/3 | 6/3 | 6/3 | 6/3 | 6/3 |
| 7 | long f1:unused | 6/3 | 6/3 | 6/3 | 6/3 | 6/3 |
| 8 | long long f1:unused | 12/8 | 6/3 | 6/3 | 6/3 | 6/3 |
| 9 | bool f2:used | 9/6 | 8/4 | 8/4 | 8/4 | 8/4 |
| 10 | char f2:used | 9/6 | 8/4 | 8/4 | 8/4 | 8/4 |
| 11 | short f2:used | 10/6 | 8/4 | 8/4 | 8/4 | 8/4 |
| 12 | int f2:used | 6/3 | 7/4 | 7/4 | 7/4 | 7/4 |
| 13 | long f2:used | 6/3 | 7/4 | 7/4 | 7/4 | 7/4 |
| 14 | long long f2:used | 12/8 | 7/4 | 7/4 | 7/4 | 7/4 |
-------------------------------------------------------------------------------------------
objdump 'No optimization' comments.
Comment-D01. The tested functions are actually _called_ (not inlined).
The object file contains the following rows :

----------------------------------------------------------------------------------------
1a94: e8 67 e5 ff ff call 0 <foo1_no_vars()>
256c: e8 9f da ff ff call 10 <foo1_bool_val(bool)>
[--omitted--]
39e5d: e8 c8 65 fc ff call 42a <foo2_long_long_cptr(long long const*)>
3eda5: e8 80 16 fc ff call 42a <foo2_long_long_cptr(long long const*)>
----------------------------------------------------------------------------------------

3.2.1. Optimization O1 : Number of assembler commands
-------------------------------------------------------------------------------------------

| |-----------|------------------------|------------------------|
| Type of an argument | value | non-const | const | non-const | const |
| |-----------|-----------|-------------------------------------|
| | fval(var) | fref(var) | fcref(var) | fptr(ptr) | fcptr(ptr) |
|-----------------------------------------------------------------------------------------|
| 1 | do nothing | --- |
| 2 | no arguments | 8/4 |
| 3 | bool f1:unused | 8/4 | 8/4 | 8/4 | 8/4 | 8/4 |
| 4 | char f1:unused | 8/4 | 8/4 | 8/4 | 8/4 | 8/4 |
| 5 | short f1:unused | 8/4 | 8/4 | 8/4 | 8/4 | 8/4 |
| 6 | int f1:unused | 8/4 | 8/4 | 8/4 | 8/4 | 8/4 |
| 7 | long f1:unused | 8/4 | 8/4 | 8/4 | 8/4 | 8/4 |
| 8 | long long f1:unused | 8/4 | 8/4 | 8/4 | 8/4 | 8/4 |
| 9 | bool f2:used | 7/3 | 8/4 | 8/4 | 8/4 | 8/4 |
| 10 | char f2:used | 7/3 | 8/4 | 8/4 | 8/4 | 8/4 |
| 11 | short f2:used | 7/3 | 8/4 | 8/4 | 8/4 | 8/4 |
| 12 | int f2:used | 6/3 | 8/5 | 8/5 | 8/5 | 8/5 |
| 13 | long f2:used | 6/3 | 8/5 | 8/5 | 8/5 | 8/5 |
| 14 | long long f2:used | 7/4 | 8/5 | 8/5 | 8/5 | 8/5 |
-------------------------------------------------------------------------------------------
objdump 'Optimization O1' comments.
Comment-D11. The tested functions are actually called (not inlined).

3.2.2. Optimization O2 : Number of assembler commands
-------------------------------------------------------------------------------------------

| |-----------|------------------------|------------------------|
| Type of an argument | value | non-const | const | non-const | const |
| |-----------|-----------|-------------------------------------|
| | fval(var) | fref(var) | fcref(var) | fptr(ptr) | fcptr(ptr) |
|-----------------------------------------------------------------------------------------|
| 1 | do nothing | --- |
| 2 | no arguments | 6/3 |
| 3 | bool f1:unused | 6/3 | 6/3 | 6/3 | 6/3 | 6/3 |
| 4 | char f1:unused | 6/3 | 6/3 | 6/3 | 6/3 | 6/3 |
| 5 | short f1:unused | 6/3 | 6/3 | 6/3 | 6/3 | 6/3 |
| 6 | int f1:unused | 6/3 | 6/3 | 6/3 | 6/3 | 6/3 |
| 7 | long f1:unused | 6/3 | 6/3 | 6/3 | 6/3 | 6/3 |
| 8 | long long f1:unused | 6/3 | 6/3 | 6/3 | 6/3 | 6/3 |
| 9 | bool f2:used | 7/3 | 10/4 | 10/4 | 10/4 | 10/4 |
| 10 | char f2:used | 7/3 | 10/4 | 10/4 | 10/4 | 10/4 |
| 11 | short f2:used | 7/3 | 10/4 | 10/4 | 10/4 | 10/4 |
| 12 | int f2:used | 10/5 | 7/4 | 7/4 | 7/4 | 7/4 |
| 13 | long f2:used | 10/5 | 7/4 | 7/4 | 7/4 | 7/4 |
| 14 | long long f2:used | 10/5 | 7/4 | 7/4 | 7/4 | 7/4 |
-------------------------------------------------------------------------------------------
objdump 'Optimization O2' comments.
Comment-D21. The tested functions are actually called (not inlined).

3.2.3. Optimization O3. The tested function are inlined (not actually _called_).
The object file doesn't hold rows which contain calls of the tested functions.

4. The same argument type & different optimization levels
=========================================================

Conclusion (on the basis of information obtained from object files, see below).
On each each optimization level an assembler code for
* foo (T&)
* foo (const T&)
* foo (T*)
* foo (const T*)
is the same one.

4.1. char
---------
4.1.1. Perfometer
-----------------

Normalized time-cost for char
---------------------------------------------------------------------------------------------------------------------------

4.1.2. objdump
--------------
char-argument

No optimization. Assembler code for functions which take char-argument
---------------

t_pass1-0.o: file format pe-i386

Disassembly of section .text:

------ argument is never actually used ------
0000006a <foo1_char_val(char)>:
6a: 55 push %ebp
6b: 89 e5 mov %esp,%ebp
6d: 83 ec 04 sub $0x4,%esp
70: 8b 45 08 mov 0x8(%ebp),%eax
73: 88 45 ff mov %al,0xffffffff(%ebp)
76: a1 00 00 00 00 mov 0x0,%eax
7b: 01 05 28 00 00 00 add %eax,0x28
81: c9 leave
82: c3 ret
83: 90 nop

00000084 <foo1_char_ref(char&)>:
84: 55 push %ebp
85: 89 e5 mov %esp,%ebp
87: a1 00 00 00 00 mov 0x0,%eax
8c: 01 05 28 00 00 00 add %eax,0x28
92: 5d pop %ebp
93: c3 ret

00000094 <foo1_char_cref(char const&)>:
94: 55 push %ebp
95: 89 e5 mov %esp,%ebp
97: a1 00 00 00 00 mov 0x0,%eax
9c: 01 05 28 00 00 00 add %eax,0x28
a2: 5d pop %ebp
a3: c3 ret

000000a4 <foo1_char_ptr(char*)>:
a4: 55 push %ebp
a5: 89 e5 mov %esp,%ebp
a7: a1 00 00 00 00 mov 0x0,%eax
ac: 01 05 28 00 00 00 add %eax,0x28
b2: 5d pop %ebp
b3: c3 ret

000000b4 <foo1_char_cptr(char const*)>:
b4: 55 push %ebp
b5: 89 e5 mov %esp,%ebp
b7: a1 00 00 00 00 mov 0x0,%eax
bc: 01 05 28 00 00 00 add %eax,0x28
c2: 5d pop %ebp
c3: c3 ret

------ argument is actually used ------
0000027e <foo2_char_val(char)>:
27e: 55 push %ebp
27f: 89 e5 mov %esp,%ebp
281: 83 ec 04 sub $0x4,%esp
284: 8b 45 08 mov 0x8(%ebp),%eax
287: 88 45 ff mov %al,0xffffffff(%ebp)
28a: 0f be 45 ff movsbl 0xffffffff(%ebp),%eax
28e: 01 05 28 00 00 00 add %eax,0x28
294: c9 leave
295: c3 ret

00000296 <foo2_char_ref(char&)>:
296: 55 push %ebp
297: 89 e5 mov %esp,%ebp
299: 8b 45 08 mov 0x8(%ebp),%eax
29c: 0f be 00 movsbl (%eax),%eax
29f: 01 05 28 00 00 00 add %eax,0x28
2a5: 5d pop %ebp
2a6: c3 ret
2a7: 90 nop

000002a8 <foo2_char_cref(char const&)>:
2a8: 55 push %ebp
2a9: 89 e5 mov %esp,%ebp
2ab: 8b 45 08 mov 0x8(%ebp),%eax
2ae: 0f be 00 movsbl (%eax),%eax
2b1: 01 05 28 00 00 00 add %eax,0x28
2b7: 5d pop %ebp
2b8: c3 ret
2b9: 90 nop

000002ba <foo2_char_ptr(char*)>:
2ba: 55 push %ebp
2bb: 89 e5 mov %esp,%ebp
2bd: 8b 45 08 mov 0x8(%ebp),%eax
2c0: 0f be 00 movsbl (%eax),%eax
2c3: 01 05 28 00 00 00 add %eax,0x28
2c9: 5d pop %ebp
2ca: c3 ret
2cb: 90 nop

000002cc <foo2_char_cptr(char const*)>:
2cc: 55 push %ebp
2cd: 89 e5 mov %esp,%ebp
2cf: 8b 45 08 mov 0x8(%ebp),%eax
2d2: 0f be 00 movsbl (%eax),%eax
2d5: 01 05 28 00 00 00 add %eax,0x28
2db: 5d pop %ebp
2dc: c3 ret
2dd: 90 nop

Optimization O1. Assembler code for functions which take char-argument
---------------

t_pass1-1.o: file format pe-i386

Disassembly of section .text:

------ argument is never actually used ------
00000084 <foo1_char_val(char)>:
84: 55 push %ebp
85: 89 e5 mov %esp,%ebp
87: a1 28 00 00 00 mov 0x28,%eax
8c: 03 05 00 00 00 00 add 0x0,%eax
92: a3 28 00 00 00 mov %eax,0x28
97: 5d pop %ebp
98: c3 ret
99: 90 nop

0000009a <foo1_char_ref(char&)>:
9a: 55 push %ebp
9b: 89 e5 mov %esp,%ebp
9d: a1 28 00 00 00 mov 0x28,%eax
a2: 03 05 00 00 00 00 add 0x0,%eax
a8: a3 28 00 00 00 mov %eax,0x28
ad: 5d pop %ebp
ae: c3 ret
af: 90 nop

------ argument is actually used ------
00000302 <foo2_char_val(char)>:
302: 55 push %ebp
303: 89 e5 mov %esp,%ebp
305: 0f be 45 08 movsbl 0x8(%ebp),%eax
309: 01 05 28 00 00 00 add %eax,0x28
30f: 5d pop %ebp
310: c3 ret
311: 90 nop

00000312 <foo2_char_ref(char&)>:
312: 55 push %ebp
313: 89 e5 mov %esp,%ebp
315: 8b 45 08 mov 0x8(%ebp),%eax
318: 0f be 00 movsbl (%eax),%eax
31b: 01 05 28 00 00 00 add %eax,0x28
321: 5d pop %ebp
322: c3 ret
323: 90 nop

Optimization O2. Assembler code for functions which take char-argument
---------------

t_pass1-2.o: file format pe-i386

Disassembly of section .text:

------ argument is never actually used ------
00000060 <foo1_char_val(char)>:
60: 55 push %ebp
61: a1 00 00 00 00 mov 0x0,%eax
66: 89 e5 mov %esp,%ebp
68: 01 05 28 00 00 00 add %eax,0x28
6e: 5d pop %ebp
6f: c3 ret

00000070 <foo1_char_ref(char&)>:
70: 55 push %ebp
71: a1 00 00 00 00 mov 0x0,%eax
76: 89 e5 mov %esp,%ebp
78: 01 05 28 00 00 00 add %eax,0x28
7e: 5d pop %ebp
7f: c3 ret

------ argument is actually used ------
00000280 <foo2_char_val(char)>:
280: 55 push %ebp
281: 89 e5 mov %esp,%ebp
283: 0f be 45 08 movsbl 0x8(%ebp),%eax
287: 5d pop %ebp
288: 01 05 28 00 00 00 add %eax,0x28
28e: c3 ret
28f: 90 nop

00000290 <foo2_char_ref(char&)>:
290: 55 push %ebp
291: 89 e5 mov %esp,%ebp
293: 8b 45 08 mov 0x8(%ebp),%eax
296: 5d pop %ebp
297: 0f be 00 movsbl (%eax),%eax
29a: 01 05 28 00 00 00 add %eax,0x28
2a0: c3 ret
2a1: 90 nop
2a2: 8d b4 26 00 00 00 00 lea 0x0(%esi,1),%esi
2a9: 8d bc 27 00 00 00 00 lea 0x0(%edi,1),%edi

4.2. short
---------
4.2.1. Perfometer
-----------------

Normalized time-cost for short
---------------------------------------------------------------------------------------------------------------------------

4.2.2. objdump
--------------
short-argument

Assembler code for short-argument is similar to one for char-argument.

4.3. int
---------
4.3.1. Perfometer
-----------------

Normalized time-cost int
---------------------------------------------------------------------------------------------------------------------------

4.3.2. objdump
--------------
int-argument

No optimization. Assembler code for functions which take int-argument
---------------

t_pass1-0.o: file format pe-i386

Disassembly of section .text:

------ argument is never actually used ------
0000011e <foo1_int_val(int)>:
11e: 55 push %ebp
11f: 89 e5 mov %esp,%ebp
121: a1 00 00 00 00 mov 0x0,%eax
126: 01 05 28 00 00 00 add %eax,0x28
12c: 5d pop %ebp
12d: c3 ret

0000012e <foo1_int_ref(int&)>:
12e: 55 push %ebp
12f: 89 e5 mov %esp,%ebp
131: a1 00 00 00 00 mov 0x0,%eax
136: 01 05 28 00 00 00 add %eax,0x28
13c: 5d pop %ebp
13d: c3 ret

------ argument is actually used ------
00000340 <foo2_int_val(int)>:
340: 55 push %ebp
341: 89 e5 mov %esp,%ebp
343: 8b 45 08 mov 0x8(%ebp),%eax
346: 01 05 28 00 00 00 add %eax,0x28
34c: 5d pop %ebp
34d: c3 ret

0000034e <foo2_int_ref(int&)>:
34e: 55 push %ebp
34f: 89 e5 mov %esp,%ebp
351: 8b 45 08 mov 0x8(%ebp),%eax
354: 8b 00 mov (%eax),%eax
356: 01 05 28 00 00 00 add %eax,0x28
35c: 5d pop %ebp
35d: c3 ret

Optimization O1. Assembler code for functions which take int-argument
---------------

t_pass1-1.o: file format pe-i386

Disassembly of section .text:

------ argument is never actually used ------
00000160 <foo1_int_val(int)>:
160: 55 push %ebp
161: 89 e5 mov %esp,%ebp
163: a1 28 00 00 00 mov 0x28,%eax
168: 03 05 00 00 00 00 add 0x0,%eax
16e: a3 28 00 00 00 mov %eax,0x28
173: 5d pop %ebp
174: c3 ret
175: 90 nop

00000176 <foo1_int_ref(int&)>:
176: 55 push %ebp
177: 89 e5 mov %esp,%ebp
179: a1 28 00 00 00 mov 0x28,%eax
17e: 03 05 00 00 00 00 add 0x0,%eax
184: a3 28 00 00 00 mov %eax,0x28
189: 5d pop %ebp
18a: c3 ret
18b: 90 nop

------ argument is actually used ------
000003b2 <foo2_int_val(int)>:
3b2: 55 push %ebp
3b3: 89 e5 mov %esp,%ebp
3b5: 8b 45 08 mov 0x8(%ebp),%eax
3b8: 01 05 28 00 00 00 add %eax,0x28
3be: 5d pop %ebp
3bf: c3 ret

000003c0 <foo2_int_ref(int&)>:
3c0: 55 push %ebp
3c1: 89 e5 mov %esp,%ebp
3c3: 8b 55 08 mov 0x8(%ebp),%edx
3c6: a1 28 00 00 00 mov 0x28,%eax
3cb: 03 02 add (%edx),%eax
3cd: a3 28 00 00 00 mov %eax,0x28
3d2: 5d pop %ebp
3d3: c3 ret

Optimization O2. Assembler code for functions which take int-argument
---------------

t_pass1-2.o: file format pe-i386

Disassembly of section .text:

------ argument is never actually used ------
00000100 <foo1_int_val(int)>:
100: 55 push %ebp
101: a1 00 00 00 00 mov 0x0,%eax
106: 89 e5 mov %esp,%ebp
108: 01 05 28 00 00 00 add %eax,0x28
10e: 5d pop %ebp
10f: c3 ret

00000110 <foo1_int_ref(int&)>:
110: 55 push %ebp
111: a1 00 00 00 00 mov 0x0,%eax
116: 89 e5 mov %esp,%ebp
118: 01 05 28 00 00 00 add %eax,0x28
11e: 5d pop %ebp
11f: c3 ret

------ argument is actually used ------
000003a0 <foo2_int_val(int)>:
3a0: 55 push %ebp
3a1: a1 28 00 00 00 mov 0x28,%eax
3a6: 89 e5 mov %esp,%ebp
3a8: 8b 55 08 mov 0x8(%ebp),%edx
3ab: 5d pop %ebp
3ac: 01 d0 add %edx,%eax
3ae: a3 28 00 00 00 mov %eax,0x28
3b3: c3 ret
3b4: 8d b6 00 00 00 00 lea 0x0(%esi),%esi
3ba: 8d bf 00 00 00 00 lea 0x0(%edi),%edi

000003c0 <foo2_int_ref(int&)>:
3c0: 55 push %ebp
3c1: 89 e5 mov %esp,%ebp
3c3: 8b 45 08 mov 0x8(%ebp),%eax
3c6: 5d pop %ebp
3c7: 8b 00 mov (%eax),%eax
3c9: 01 05 28 00 00 00 add %eax,0x28
3cf: c3 ret

5. The same optimization level & different argument types
=========================================================
char vs. int

5.0. No optimization
--------------------
5.0.1. objdump
--------------

t_pass1-0.o: file format pe-i386

Disassembly of section .text:

00000084 <foo1_char_ref(char&)>:
84: 55 push %ebp
85: 89 e5 mov %esp,%ebp
87: a1 00 00 00 00 mov 0x0,%eax
8c: 01 05 28 00 00 00 add %eax,0x28
92: 5d pop %ebp
93: c3 ret

0000011e <foo1_int_val(int)>:
11e: 55 push %ebp
11f: 89 e5 mov %esp,%ebp
121: a1 00 00 00 00 mov 0x0,%eax
126: 01 05 28 00 00 00 add %eax,0x28
12c: 5d pop %ebp
12d: c3 ret

0000012e <foo1_int_ref(int&)>:
12e: 55 push %ebp
12f: 89 e5 mov %esp,%ebp
131: a1 00 00 00 00 mov 0x0,%eax
136: 01 05 28 00 00 00 add %eax,0x28
13c: 5d pop %ebp
13d: c3 ret

00000340 <foo2_int_val(int)>:
340: 55 push %ebp
341: 89 e5 mov %esp,%ebp
343: 8b 45 08 mov 0x8(%ebp),%eax
346: 01 05 28 00 00 00 add %eax,0x28
34c: 5d pop %ebp
34d: c3 ret

0000034e <foo2_int_ref(int&)>:
34e: 55 push %ebp
34f: 89 e5 mov %esp,%ebp
351: 8b 45 08 mov 0x8(%ebp),%eax
354: 8b 00 mov (%eax),%eax
356: 01 05 28 00 00 00 add %eax,0x28
35c: 5d pop %ebp
35d: c3 ret

5.1. Optimization O1
--------------------
5.1.1. objdump
--------------

t_pass1-1.o: file format pe-i386

Disassembly of section .text:

00000160 <foo1_int_val(int)>:
160: 55 push %ebp
161: 89 e5 mov %esp,%ebp
163: a1 28 00 00 00 mov 0x28,%eax
168: 03 05 00 00 00 00 add 0x0,%eax
16e: a3 28 00 00 00 mov %eax,0x28
173: 5d pop %ebp
174: c3 ret
175: 90 nop

000003b2 <foo2_int_val(int)>:
3b2: 55 push %ebp
3b3: 89 e5 mov %esp,%ebp
3b5: 8b 45 08 mov 0x8(%ebp),%eax
3b8: 01 05 28 00 00 00 add %eax,0x28
3be: 5d pop %ebp
3bf: c3 ret

5.2. Optimization O2
--------------------
5.2.1. objdump
--------------

t_pass1-2.o: file format pe-i386

Disassembly of section .text:

00000070 <foo1_char_ref(char&)>:
70: 55 push %ebp
71: a1 00 00 00 00 mov 0x0,%eax
76: 89 e5 mov %esp,%ebp
78: 01 05 28 00 00 00 add %eax,0x28
7e: 5d pop %ebp
7f: c3 ret

00000100 <foo1_int_val(int)>:
100: 55 push %ebp
101: a1 00 00 00 00 mov 0x0,%eax
106: 89 e5 mov %esp,%ebp
108: 01 05 28 00 00 00 add %eax,0x28
10e: 5d pop %ebp
10f: c3 ret

00000110 <foo1_int_ref(int&)>:
110: 55 push %ebp
111: a1 00 00 00 00 mov 0x0,%eax
116: 89 e5 mov %esp,%ebp
118: 01 05 28 00 00 00 add %eax,0x28
11e: 5d pop %ebp
11f: c3 ret

000003a0 <foo2_int_val(int)>:
3a0: 55 push %ebp
3a1: a1 28 00 00 00 mov 0x28,%eax
3a6: 89 e5 mov %esp,%ebp
3a8: 8b 55 08 mov 0x8(%ebp),%edx
3ab: 5d pop %ebp
3ac: 01 d0 add %edx,%eax
3ae: a3 28 00 00 00 mov %eax,0x28
3b3: c3 ret
3b4: 8d b6 00 00 00 00 lea 0x0(%esi),%esi
3ba: 8d bf 00 00 00 00 lea 0x0(%edi),%edi

000003c0 <foo2_int_ref(int&)>:
3c0: 55 push %ebp
3c1: 89 e5 mov %esp,%ebp
3c3: 8b 45 08 mov 0x8(%ebp),%eax
3c6: 5d pop %ebp
3c7: 8b 00 mov (%eax),%eax
3c9: 01 05 28 00 00 00 add %eax,0x28
3cf: c3 ret

6. Non-const & const arguments; pointers and references
=======================================================
There is no difference between assembler code for
* foo (T&)
* foo (const T&)
* foo (T*)
* foo (const T*)
for the same built-in type T.

7. Inlining and non-inlining (actual calling functions)
=======================================================
'No optimization', 'Optimization O1' and 'Optimization O2'
produce calling (not inlining) tested functions.
'Optimization O3' produces inlining tested functions.

==========================
Alex Vinokur
mailto:ale...@connect.to
------------------------
http://lists.sourceforge.net/lists/listinfo/cpp-perfometer-users
news://news.gmane.org/gmane.comp.lang.c++.perfometer
==========================

0 new messages