Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

What practice can get speed improvement in C++?

262 views
Skip to first unread message

fl

unread,
Jul 14, 2015, 7:53:43 PM7/14/15
to
Hi,

I am given the following question:



Experience in C/C++ code optimization for speed improvement.



I can only get the following answers:

1. Use reference to replace a direct parameter in function call;
2. Use register for some frequent variables;
3. Use assembly code to replace C code
4. Unroll a loop(?)


What is your opinion to the question?

Thanks,

Barry Schwarz

unread,
Jul 14, 2015, 8:24:11 PM7/14/15
to
On Tue, 14 Jul 2015 16:53:30 -0700 (PDT), fl <rxj...@gmail.com>
wrote:

>Hi,
>
>I am given the following question:
>
>
>
>Experience in C/C++ code optimization for speed improvement.
>
>
>
>I can only get the following answers:
>
>1. Use reference to replace a direct parameter in function call;

Benefit is realized only when argument is a very large aggregate or
function is called very frequently.

>2. Use register for some frequent variables;

Compiler is free to ignore suggested storage class.

>3. Use assembly code to replace C code

No longer C or C++ and a maintenance nightmare.

>4. Unroll a loop(?)

Is this practical if the iteration count is not constant.

>What is your opinion to the question?

The best way of improving speed is to select or design a speed
efficient algorithm. For a trivial example, computing factorials will
be faster in a loop than using recursion.

--
Remove del for email

Victor Bazarov

unread,
Jul 14, 2015, 8:29:51 PM7/14/15
to
On 7/14/2015 7:53 PM, fl wrote:
> I am given the following question:
>
>
>
> Experience in C/C++ code optimization for speed improvement.

That's not a question. AFAIK, a question starts with an auxiliary verb
or with a special word like "where" or "who" or "what" or "how", and
ends with a question mark ("?"). Here you have a directive or an
incomplete sentence.

> I can only get the following answers:
>
> 1. Use reference to replace a direct parameter in function call;
> 2. Use register for some frequent variables;
> 3. Use assembly code to replace C code
> 4. Unroll a loop(?)
>
>
> What is your opinion to the question?

My opinion is that the questioner doesn't know the subject.

0. There is no such thing as C/C++.

As for the choices for optimizing the speed, 1 and 4 are viable. Using
'register' is but a hint to the compiler. Choice 3 seems nothing to do
with C++ (the topic of this newsgroup).

V
--
I do not respond to top-posted replies, please don't ask

woodb...@gmail.com

unread,
Jul 14, 2015, 10:50:43 PM7/14/15
to
There's an asm keyword, but I don't think it gets used
much.

The OP may wish to consider some containers in Boost
that outperform those in the C++ standard.

Brian
Ebenezer Enterprises - So far G-d has helped us.
http://webEbenezer.net

Alf P. Steinbach

unread,
Jul 14, 2015, 11:07:25 PM7/14/15
to
On 15-Jul-15 1:53 AM, fl wrote:
>
> I am given the following question:
>
> Experience in C/C++ code optimization for speed improvement.

I read that as, "list your experience with speed optimization for C and
C++ code".


> I can only get the following answers:
>
> 1. Use reference to replace a direct parameter in function call;
> 2. Use register for some frequent variables;
> 3. Use assembly code to replace C code
> 4. Unroll a loop(?)
>
> What is your opinion to the question?

First, it's about particular programming languages, so presumably if you
have experience with MEASURING, which is the most important thing, then
you are expected to relate something of how you did that differently for
the two mentioned languages, than for some other possible languages. I
think you would be expected to mention some common tools and approaches
for this, such as profiler, Unix-land `time` command, instrumenting code
with timers and logging, and just dead reckoning (which can tell you if
other ways are necessary). Regarding timers it would be nice if you know
about the very limited resolution of C library time in Windows, and
knowing about higher resolution C++11 timers, and about Boost timers.

Second, after measuring comes algorithmic considerations. Replacing an
O(n^2) algorithm with an O(n) algorithm is common. E.g. I think it would
count in a positive way if you mentioned how you once optimized some
repeated string concatenation, perhaps in Java. Knowing about support
for this in C++11 would be extra point.

Third, simple mechanical optimizations like you mention above. But your
point (2), "Use register for some frequent variables", would count
NEGATIVELY if I were to evaluate the answer, unless this is about an
application to become part of a compiler development team (for ordinary
app code you should leave that to the compiler: forcing it you're likely
to do worse). On the other hand, knowing that the "register" keyword was
deprecated in C++11 would be positive.


Cheers & hth.,

- Alf

--
Using Thunderbird as Usenet client, Eternal September as NNTP server.

Lőrinczy Zsigmond

unread,
Jul 15, 2015, 12:38:36 AM7/15/15
to
Forget it. 'Smart tricks' won't give you more than a few percent.
Plus kindly remember:
Premature optimization is the root of all evil

Paavo Helde

unread,
Jul 15, 2015, 1:22:54 AM7/15/15
to
fl <rxj...@gmail.com> wrote in news:aaa4556e-8239-412c-be9d-7308a1a2cbb6
@googlegroups.com:

> Hi,
>
> I am given the following question:
>
>
>
> Experience in C/C++ code optimization for speed improvement.
>

This is not a question. If this is a requirement for a new hire, then I
would think this means the ability to run a profiler like
valgrind+callgrind, ability to interpret results and ability to modify the
code to get rid of bottlenecks found this way.

hth
Paavo

bartekltg

unread,
Jul 15, 2015, 9:10:48 AM7/15/15
to
On 15.07.2015 02:23, Barry Schwarz wrote:
> On Tue, 14 Jul 2015 16:53:30 -0700 (PDT), fl <rxj...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I am given the following question:
>>
>>
>>
>> Experience in C/C++ code optimization for speed improvement.
>>
>>
>>
>> I can only get the following answers:
>>
>> 1. Use reference to replace a direct parameter in function call;
>
> Benefit is realized only when argument is a very large aggregate or
> function is called very frequently.


T& and const T& is quite good habit. In most cases it will
be used with containers. And if we use it for small types
(for example because we write const T & in template, and then
use if with 'int') compiler regardless will copy value.

>> 2. Use register for some frequent variables;
>
> Compiler is free to ignore suggested storage class.

Even worse.
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4296.pdf
D.2 register keyword [depr.register]
1 The use of the register keyword as a storage-class-specifier (7.1.1)
is deprecated.



>> 3. Use assembly code to replace C code
>
> No longer C or C++ and a maintenance nightmare.

>> 4. Unroll a loop(?)
>
> Is this practical if the iteration count is not constant.
>
>> What is your opinion to the question?
>
> The best way of improving speed is to select or design a speed
> efficient algorithm. For a trivial example, computing factorials will
> be faster in a loop than using recursion.

IMHO it shoud be threated as the same algorithm, just different
execution. I think changing recursion to loops, aside from
oblivious cases like factorial, is premature optimalization.
And in simple cases it change nothing, compiler will change
it for us:

c++ code:
int factorial (int x){
if (x>1) return x*factorial(x-1);
else return x;
}

asm generated by gcc

.type _Z9factoriali, @function
_Z9factoriali:
.LFB1879:
.cfi_startproc
cmpl $1, %edi
jle .L50
movl $1, %edx
.p2align 4,,10
.p2align 3
.L49:
leal -1(%rdi), %eax
imull %edi, %edx
cmpl $1, %eax
movl %eax, %edi
jne .L49
.L48:
imull %edx, %eax
ret
.L50:
movl %edi, %eax
movl $1, %edx
jmp .L48


Just a simple loop;-)

bartekltg





JiiPee

unread,
Jul 16, 2015, 12:01:03 AM7/16/15
to
On 15/07/2015 14:10, bartekltg wrote:
> On 15.07.2015 02:23, Barry Schwarz wrote:
>
>
>
>>
>>> What is your opinion to the question?
>>
>> The best way of improving speed is to select or design a speed
>> efficient algorithm. For a trivial example, computing factorials will
>> be faster in a loop than using recursion.
>
> IMHO it shoud be threated as the same algorithm, just different
> execution. I think changing recursion to loops, aside from
> oblivious cases like factorial, is premature optimalization.
> And in simple cases it change nothing, compiler will change
> it for us:
>
> c++ code:
> int factorial (int x){
> if (x>1) return x*factorial(x-1);
> else return x;
> }

factorial maybe not the best example as I would do it using constexpr:
constexpr long long fac(int k)
{
return (k == 0) ? 1 : k * fac(k - 1);
}
giving an instant answer (calculated compile time).

Öö Tiib

unread,
Jul 16, 2015, 4:04:26 AM7/16/15
to
Yes, factorial is always terrible example. Even see your own code.
No checks of limits? You prefer to document that 'k > 20' and
'k < 0' are undefined behavior? People usually prefer (possibly
compile time) diagnostics to such documentation.

Even IEEE double precision can't contain 23! accurately and will
overflow at 171! so usefulness of factorial is rather narrowly
limited.

bartekltg

unread,
Jul 16, 2015, 7:23:56 AM7/16/15
to
Sure...

main(){
big_int x;
cin>>c;
cout<<fac(x);
}

;-)

Also, instead int we have for example arthmetic modulo 1 178 973 853,
so x can be too bit to put every possible answer in a array.

bartekltg




fl

unread,
Jul 16, 2015, 7:34:57 AM7/16/15
to
On Tuesday, July 14, 2015 at 10:22:54 PM UTC-7, Paavo Helde wrote:
> fl <rx*****8...@gmail.com> wrote in news:aaa4556e-8239-412c-be9d-7308a1a2cbb6
Thanks for the replies. The question is from a new hire. The position is about
programming for embedded RTOS system, although the code may be for the test
bench (less embedded in some sense). I code a lot at the lower level. It is
the first time I know valgrind+callgrind.

Öö Tiib

unread,
Jul 16, 2015, 9:54:34 AM7/16/15
to
My psychic abilities fail me ... what is here 'big_int' and what is 'c'
and is the 'fac' here called with uninitialized argument?


> Also, instead int we have for example arthmetic modulo 1 178 973 853,
> so x can be too bit to put every possible answer in a array.

Who suggested to put answers to array? In both functions
all checks are missing but neither puts anything to array.

bartekltg

unread,
Jul 16, 2015, 11:30:30 AM7/16/15
to
A type. It can be uint64_t, uint1024_t from boost or relay
big integer from GMP. What do you like and need. It doesn't matter.

> and is the 'fac' here called with uninitialized argument?

Sorry, a typo, should be
cin>>x;

The point is, constexpr will help us when... the argument is constant;-)
In the other case, it is just a function.


>
>> Also, instead int we have for example arthmetic modulo 1 178 973 853,
>> so x can be too bit to put every possible answer in a array.
>
> Who suggested to put answers to array?In both functions


I just skip two post:

I: constexpr cant help with nonconst.
somebody: you can put answers in a small array, small because
21! is outside range of uint64_t.
I: for factorial yes, but not always. Look at modified problem
(factorial modulo).

> all checks are missing but neither puts anything to array.

It doesn't matter, computing "factorial of integer munbers in modulo
arithmetic" is quite artificial problem. This is only a simple example.

The point is: a simple recursion will be optimized to loop. For more
complex case it is hard. For example you can write quicksort as a loop
(using small stack for division points), it isn't even that hard, but
still most implementations (including standard library) use recursion.
Can you write extended euclidean algorithm (simple algorithm from
school:) as loop? I think of course you can, but it won't be pretty
and wou would waste too much time.

Of course, if you see something like factorial, you shouldn't event
think and write it as a loop. But in other case, wasting time on
rewriting algorithm as a loop I consider premature optimalization.

Of course, there is always an exception. For GPU you probably start
from removing all recursions ;-)

bartekltg





JiiPee

unread,
Jul 16, 2015, 11:48:32 AM7/16/15
to
I did not mean it to be perfectly done (just the idea), but you can
easily change it to something like:

constexpr long long unsafeFac(int k)
{
return (k == 0) ? 1 : k * fac(k - 1);
}

constexpr long long fac(int k)
{
return (k < 0 || k > 15) ?|||throw| |exception()| : unsafeFac(k);
}

Now it complains *compile* time the error (which is definitely better than a runtime version) if k is too big or too small.




>

JiiPee

unread,
Jul 16, 2015, 11:53:01 AM7/16/15
to
On 16/07/2015 16:48, JiiPee wrote:
> On 16/07/2015 09:04, Öö Tiib wrote:
>> On Thursday, 16 July 2015 07:01:03 UTC+3, JiiPee wrote:
>>> c++ code:
>>> int factorial (int x){
>>> if (x>1) return x*factorial(x-1);
>>> else return x;
>>> }
>>> factorial maybe not the best example as I would do it using constexpr:
>>> constexpr long long fac(int k)
>>> {
>>> return (k == 0) ? 1 : k * fac(k - 1);
>>> }
>>> giving an instant answer (calculated compile time).
>> Yes, factorial is always terrible example. Even see your own code.
>> No checks of limits? You prefer to document that 'k > 20' and
>> 'k < 0' are undefined behavior? People usually prefer (possibly
>> compile time) diagnostics to such documentation.
>>
>> Even IEEE double precision can't contain 23! accurately and will
>> overflow at 171! so usefulness of factorial is rather narrowly
>> limited.
>
> I did not mean it to be perfectly done (just the idea), but you can
> easily change it to something like:
>
> constexpr long long unsafeFac(int k)
> {
> return (k == 0) ? 1 : k * fac(k - 1);
> }
>
> constexpr long long fac(int k)
> {
> return (k < 0 || k > 15) ?|||throw| |exception()| : unsafeFac(k);
> }

ups, letters messed up, should be:

Öö Tiib

unread,
Jul 16, 2015, 2:16:40 PM7/16/15
to
Factorial algorithms for arbitrary precision integer are different beast
entirely. Those can take some time.

> > and is the 'fac' here called with uninitialized argument?
>
> Sorry, a typo, should be
> cin>>x;
>
> The point is, constexpr will help us when... the argument is constant;-)
> In the other case, it is just a function.

Yes, that is how 'constexpr' functions work.

> >> Also, instead int we have for example arthmetic modulo 1 178 973 853,
> >> so x can be too bit to put every possible answer in a array.
> >
> > Who suggested to put answers to array?In both functions
>
>
> I just skip two post:
>
> I: constexpr cant help with nonconst.
> somebody: you can put answers in a small array, small because
> 21! is outside range of uint64_t.
> I: for factorial yes, but not always. Look at modified problem
> (factorial modulo).

Ok, indeed there are no silver bullets.

> > all checks are missing but neither puts anything to array.
>
> It doesn't matter, computing "factorial of integer munbers in modulo
> arithmetic" is quite artificial problem. This is only a simple example.

Ok.

> The point is: a simple recursion will be optimized to loop. For more
> complex case it is hard. For example you can write quicksort as a loop
> (using small stack for division points), it isn't even that hard, but
> still most implementations (including standard library) use recursion.
> Can you write extended euclidean algorithm (simple algorithm from
> school:) as loop?

It has been usually described as simple loop? Pseudo-code from wikipedia:

function extended_gcd(a, b)
s := 0; old_s := 1
t := 1; old_t := 0
r := b; old_r := a
while r ≠ 0
quotient := old_r div r
(old_r, r) := (r, old_r - quotient * r)
(old_s, s) := (s, old_s - quotient * s)
(old_t, t) := (t, old_t - quotient * t)
output "Bézout coefficients:", (old_s, old_t)
output "greatest common divisor:", old_r
output "quotients by the gcd:", (t, s)


> I think of course you can, but it won't be pretty
> and wou would waste too much time.

Might be you mix something up here ... it is simple algorithm.

> Of course, if you see something like factorial, you shouldn't event
> think and write it as a loop. But in other case, wasting time on
> rewriting algorithm as a loop I consider premature optimalization.

Depends how large factorials and how often are really need.
Calculating 10,000,000! naively will take minutes and there are
ways to optimize that:
http://www.cs.berkeley.edu/~fateman/papers/factorial.pdf
Sorry it is lisp, but the major number crunchers tend to love lisp.

bartekltg

unread,
Jul 17, 2015, 11:06:47 AM7/17/15
to
On 16.07.2015 20:16, Öö Tiib wrote:
>> A type. It can be uint64_t, uint1024_t from boost or relay
>> big integer from GMP. What do you like and need. It doesn't matter.
>
> Factorial algorithms for arbitrary precision integer are different beast
> entirely. Those can take some time.

I have never saw one.



>> The point is: a simple recursion will be optimized to loop. For more
>> complex case it is hard. For example you can write quicksort as a loop
>> (using small stack for division points), it isn't even that hard, but
>> still most implementations (including standard library) use recursion.
>> Can you write extended euclidean algorithm (simple algorithm from
>> school:) as loop?
>
> It has been usually described as simple loop? Pseudo-code from wikipedia:
>
> function extended_gcd(a, b)
> s := 0; old_s := 1
> t := 1; old_t := 0
> r := b; old_r := a
> while r ≠ 0
> quotient := old_r div r
> (old_r, r) := (r, old_r - quotient * r)
> (old_s, s) := (s, old_s - quotient * s)
> (old_t, t) := (t, old_t - quotient * t)
> output "Bézout coefficients:", (old_s, old_t)
> output "greatest common divisor:", old_r
> output "quotients by the gcd:", (t, s)

Heh, dementia;-] I was sure wiki have recursive version,
what other source could I have used to write that (as an
argument for usefulness of tuple) ;-)

tuple<int, int, int> egcd(int a,int b)
{ //return {gcd, x, y}; a*x+b*y = gcd;

if (b==0)
return make_tuple(a,1,0);
else
{
int g,x,y;
tie(g,y,x) = egcd(b,a%b);
return make_tuple(g,x, y - ((a / b)) * x);
}
}

Getting the first version from the second isn't as hard a I thought,
but still not trivial.


>> I think of course you can, but it won't be pretty
>> and wou would waste too much time.
>
> Might be you mix something up here ... it is simple algorithm.

No, its my fault, I didn't check and in my small math 'forfun'
library I used recursion.

>
>> Of course, if you see something like factorial, you shouldn't event
>> think and write it as a loop. But in other case, wasting time on
>> rewriting algorithm as a loop I consider premature optimalization.
>
> Depends how large factorials and how often are really need.
> Calculating 10,000,000! naively will take minutes and there are
> ways to optimize that:
> http://www.cs.berkeley.edu/~fateman/papers/factorial.pdf

Interesting sport. I like it ;-)


> Sorry it is lisp, but the major number crunchers tend to love lisp.

)))))))))))))))

If you work with number theory you have to have enough
"RAM" in brain to track all brackets without effort :)

bartekltg

Martijn van Buul

unread,
Jul 21, 2015, 5:28:26 AM7/21/15
to
* fl:
> On Tuesday, July 14, 2015 at 10:22:54 PM UTC-7, Paavo Helde wrote:
>> fl <rx*****8...@gmail.com> wrote in news:aaa4556e-8239-412c-be9d-7308a1a2cbb6
>> @googlegroups.com:
>> >
>> > Experience in C/C++ code optimization for speed improvement.
>> >
>>
>> This is not a question. If this is a requirement for a new hire, then I
>> would think this means the ability to run a profiler like
>> valgrind+callgrind, ability to interpret results and ability to modify the
>> code to get rid of bottlenecks found this way.
>
> Thanks for the replies. The question is from a new hire. The position is
> about programming for embedded RTOS system, although the code may be for the
> test bench (less embedded in some sense). I code a lot at the lower level. It
> is the first time I know valgrind+callgrind.

Programming for an RTOS requires a different mindset every now and then,
depending on the real-time restrictions and the platform. For example,
the *maximum* runtime requirements of an algorithm is often more important
than the *average* requirement, because the first is a limiting factor on
response time. Available memory and processor resources often don't scale as
easily, and some C++ features (or parts of the standard library) just don't
combine well with a hard real-time behaviour.

Often, the mere fact that you use an RTOS has implications beyond the
obvious. On modern desktop and server operating systems, a stack overflow
is almost unheard of, but on RTOS systems, stack space is usually *very*
limited, so you need to be aware of deeply recursive routines, or large objects
placed on the stack. Deadlocks are often more fatal; if your browser stalls for
a couple of seconds it is annoying; if the ECU of a car stalls it can lead to a
deadly accident.

Granted, this is all about "how to write your algorithm" and "what pitfalls
to avoid" rather than "what tricks to use to get a perceived boost", and
personally, I don't think "speed improvement" covers anything of the required
skills to sucessfully handle real-time programming.

--
Martijn van Buul - pi...@dohd.org

Juha Nieminen

unread,
Aug 3, 2015, 4:16:37 AM8/3/15
to
fl <rxj...@gmail.com> wrote:
> 1. Use reference to replace a direct parameter in function call;

Depends on the parameter type.

> 2. Use register for some frequent variables;
> 3. Use assembly code to replace C code
> 4. Unroll a loop(?)

Those are completely useless and detrimental.

The major design aspect that affects speed is algorithm and data
container choice and implementation. You can "register" and "unroll"
all you like, but if your algorithm or data container is improper for
the task, they won't help you a bit.

If you then want to go to "hacker optimization", then you have to
minimize the amount of dynamic allocations made in the program
(especially those made in a tight inner loop). If instead of a million
individual allocations you can perform just one, your program will
become faster.

C I/O functions, while less safe and less versatile, tend to be faster
than the C++ equivalents. If you can use, for instance, std::fread()
to read a large file, it will be faster than using C++ streams.

If you are number-crunching, then cache locality can have an enormous
effect on speed, but that's a rather advanced topic.

--- news://freenews.netfront.net/ - complaints: ne...@netfront.net ---

pip010

unread,
Aug 3, 2015, 7:22:59 AM8/3/15
to
most comments are right on!
just my addition:
1) mostly useless, c++11 compiler will move so even pointless for BigInt
2) as rightfully some people already pointed out, forget about it.
3) hardly a portable solution and hard times getting it right left alone better than the compiler
4) it is done by the optimization step of compiler, I think at least O2 but see compiler docs; most modern compilers will unroll and/or vectorize for loops

Always think of improving your design first to jump on low-level optimization techniques.
Profiling can help alot but only in debug :(
to truly profile you need special build release+pdb and things progressively become more complicated.
You can always measure though :) DO SO!

woodb...@gmail.com

unread,
Aug 3, 2015, 4:48:10 PM8/3/15
to
On Monday, August 3, 2015 at 3:16:37 AM UTC-5, Juha Nieminen wrote:

>
> Those are completely useless and detrimental.
>
> The major design aspect that affects speed is algorithm and data
> container choice and implementation. You can "register" and "unroll"
> all you like, but if your algorithm or data container is improper for
> the task, they won't help you a bit.
>
> If you then want to go to "hacker optimization", then you have to
> minimize the amount of dynamic allocations made in the program
> (especially those made in a tight inner loop). If instead of a million
> individual allocations you can perform just one, your program will
> become faster.

I think there's a place for std::string, but I use it less
than previously. I guess it's back to the future with arrays
of char.

Juha Nieminen

unread,
Aug 4, 2015, 4:21:29 AM8/4/15
to
woodb...@gmail.com wrote:
> I think there's a place for std::string, but I use it less
> than previously. I guess it's back to the future with arrays
> of char.

How would an array of chars be faster than std::string? Unless you are
using a static array (or somehow reusing the same allocated array,
which becomes a bit complicated)?

(Also note that the major compiler implementations are moving to
short string optimization, which means that if the majority of your
strings are short, then std::string will probably be faster than
using raw C strings.)

Öö Tiib

unread,
Aug 4, 2015, 11:07:24 AM8/4/15
to
On Tuesday, 4 August 2015 11:21:29 UTC+3, Juha Nieminen wrote:
> woodb...@gmail.com wrote:
> > I think there's a place for std::string, but I use it less
> > than previously. I guess it's back to the future with arrays
> > of char.
>
> How would an array of chars be faster than std::string?

Array can't but sometimes we can win a lot by using subranges of array.

Essentially what you wrote two posts ago: "If instead of a million individual
allocations you can perform just one, your program will become faster." ;-)

There are proposals about adding helpers to C++:
'string_view' from Łukasz Mendakiewicz and Herb Sutter https://isocpp.org/files/papers/N3762.html
'array_view' From Jeffrey Yasskin http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3851.pdf

> Unless you are
> using a static array (or somehow reusing the same allocated array,
> which becomes a bit complicated)?

Lot of text processing that uses 'string' will actually win same done
using 'char*'. It is because the most simple operations like usage of
'std::string::append' versus 'std::strcat' tend to be somewhat faster.

> (Also note that the major compiler implementations are moving to
> short string optimization, which means that if the majority of your
> strings are short, then std::string will probably be faster than
> using raw C strings.)

64 bit implementation can optimize 22 chars plus trailing null
as "short". If something like 'string_view' will be added to language
as well then the last little excuse left for raw 'char*' usage in C++
is 'argv' of 'main'. ;-)

JiiPee

unread,
Aug 4, 2015, 11:14:41 AM8/4/15
to
#include <iostream>
#include <exception>

constexpr long long unsafeFac(int k)
{
return (k == 0) ? 1 : k * unsafeFac(k - 1);
}

constexpr long long fac(int k)
{
return (k < 0 || k > 15) ? throw std::exception() : unsafeFac(k);
}

int main()
{
uint64_t x;
std::cin >> x;
try
{
std::cout << fac(x);
}
catch (std::exception& e)
{
std::cerr << "factorial must be [0-15] " << e.what() << '\n';
}

return 0;
}

woodb...@gmail.com

unread,
Aug 4, 2015, 1:15:17 PM8/4/15
to
On Tuesday, August 4, 2015 at 3:21:29 AM UTC-5, Juha Nieminen wrote:
> woodb...@gmail.com wrote:
> > I think there's a place for std::string, but I use it less
> > than previously. I guess it's back to the future with arrays
> > of char.
>
> How would an array of chars be faster than std::string? Unless you are
> using a static array (or somehow reusing the same allocated array,
> which becomes a bit complicated)?
>

This is interesting

http://stackoverflow.com/questions/21946447/how-much-performance-difference-when-using-string-vs-char-array

> (Also note that the major compiler implementations are moving to
> short string optimization, which means that if the majority of your
> strings are short, then std::string will probably be faster than
> using raw C strings.)
>

I think that's limited to about 20 characters.

Brian
Ebenezer Enterprises - "America didn't create religious liberty;
religious liberty created America." Bobby Jindal

http://webEbenezer.net

Bo Persson

unread,
Aug 4, 2015, 2:10:21 PM8/4/15
to
On 2015-08-04 19:15, woodb...@gmail.com wrote:
> On Tuesday, August 4, 2015 at 3:21:29 AM UTC-5, Juha Nieminen wrote:
>> woodb...@gmail.com wrote:
>>> I think there's a place for std::string, but I use it less
>>> than previously. I guess it's back to the future with arrays
>>> of char.
>>
>> How would an array of chars be faster than std::string? Unless you are
>> using a static array (or somehow reusing the same allocated array,
>> which becomes a bit complicated)?
>>
>
> This is interesting
>
> http://stackoverflow.com/questions/21946447/how-much-performance-difference-when-using-string-vs-char-array
>

Yes, like one of the replies notices - using sprintf to build filenames
will let you start opening the file 0.12 microseconds earlier.

Wanna bet if someone will notice the speedup?


Bo Persson


Christopher Pisz

unread,
Aug 4, 2015, 3:09:19 PM8/4/15
to
Like all the flawed comparisons, notice they never reserve even though
the length is known beforehand.


--
I have chosen to troll filter/ignore all subthreads containing the
words: "Rick C. Hodgins", "Flibble", and "Islam"
So, I won't be able to see or respond to any such messages
---

woodb...@gmail.com

unread,
Aug 4, 2015, 3:18:47 PM8/4/15
to
On Tuesday, August 4, 2015 at 2:09:19 PM UTC-5, Christopher Pisz wrote:
> On 8/4/2015 12:15 PM, woodb...@gmail.com wrote:
> > On Tuesday, August 4, 2015 at 3:21:29 AM UTC-5, Juha Nieminen wrote:
> >> woodb...@gmail.com wrote:
> >>> I think there's a place for std::string, but I use it less
> >>> than previously. I guess it's back to the future with arrays
> >>> of char.
> >>
> >> How would an array of chars be faster than std::string? Unless you are
> >> using a static array (or somehow reusing the same allocated array,
> >> which becomes a bit complicated)?
> >>
> >
> > This is interesting
> >
> > http://stackoverflow.com/questions/21946447/how-much-performance-difference-when-using-string-vs-char-array
> >
> >> (Also note that the major compiler implementations are moving to
> >> short string optimization, which means that if the majority of your
> >> strings are short, then std::string will probably be faster than
> >> using raw C strings.)
> >>
> >
> > I think that's limited to about 20 characters.
> >
> > Brian
> > Ebenezer Enterprises - "America didn't create religious liberty;
> > religious liberty created America." Bobby Jindal
> >
> > http://webEbenezer.net
> >
>
>
> Like all the flawed comparisons, notice they never reserve even though
> the length is known beforehand.
>
>

He makes a comment about that on one of the answers:

"I even tried to pre-declare and allocate space in the string in my answer, but that actually just caused things to slow down."

Brian
Ebenezer Enterprises - In G-d we trust.
http://webEbenezer.net

woodb...@gmail.com

unread,
Aug 4, 2015, 3:38:59 PM8/4/15
to
I think it matters in aggregate. Programmers generally work on
processes that run periodically/frequently. And maybe you change
a number of std::string objects to arrays of char in your application.


Brian
Ebenezer Enterprises - If G-d has you on a short leash, every
little bit helps.

http://webEbenezer.net

Christopher Pisz

unread,
Aug 4, 2015, 3:51:11 PM8/4/15
to
I had no such trouble:


// Shared Library
#include "PerformanceTimer.h"

// Standard Library
#include <cstdio>
#include <iostream>
#include <string>
#include <vector>
#include <cmath>

using namespace std;
#define TRIALS 10000000


int main()
{

vector<double> secondsElapsedPerRun;

//const char * baseLocation = "baseLocation";
const string baseLocation = "baseLocation";

// Let's declare them both before hand rather than just the C version
// Let's also use std::string::reserve just like we supply the
length of the c-array
char fname[255] = {};
//string fname;
//fname.reserve(255);

Shared::PerformanceTimer timer;
for (int i = 0; i < TRIALS; ++i)
{
timer.Start();

_snprintf_s(fname, 255, "%s_test_no.%d.txt",
baseLocation.c_str(), i);
//fname = baseLocation + "_test_no." + std::to_string(i) + ".txt";

secondsElapsedPerRun.push_back(timer.Stop());
}

// Calculate sum
double sum = 0.0;

for(vector<double>::iterator vecIter =
secondsElapsedPerRun.begin(); vecIter != secondsElapsedPerRun.end();
++vecIter)
{
sum+= *vecIter;
}

double average = sum / static_cast<double>(TRIALS);
cout << "Average: " << average << " seconds" << endl;

// Calculate variance
double variance = 0;

for(vector<double>::iterator vecIter =
secondsElapsedPerRun.begin(); vecIter != secondsElapsedPerRun.end();
++vecIter)
{
variance += (*vecIter - average) * (*vecIter - average);
}

variance /= static_cast<double>(TRIALS);

cout << "Variance: " << variance << " seconds" << endl;
cout << "Std. deviation: " << sqrt(variance) << " seconds" << endl;

// Windows specific
system("pause");
}




C++ string
Average: 1.21634e-006 seconds
Variance: 5.56171e-013 seconds
Std. deviation: 7.45769e-007 seconds

C string
Average: 6.3834e-007 seconds
Variance: 4.31959e-012 seconds
Std. deviation: 2.07836e-006 seconds

Seems a bit more fair.

Juha Nieminen

unread,
Aug 5, 2015, 4:12:27 AM8/5/15
to
They are not doing the same thing. Formatted printing into a static buffer
is not the same thing as building a dynamic string by appending stuff.

If std::sprintf() is sufficient for the task at hand, then definitely
use that. (After all, I did mention in my post that C I/O functions
tend to be more efficient than the C++ equivalents.)

Now, if you were doing the *same* thing in both cases, however, things
may be different. (For one, appending to a C string ought to be slower
than appending to a std::string, if for nothing else, then because
in the former case knowing were to append is an O(n) operation. And
of course dynamically growing the string as needed is a PitA using

Öö Tiib

unread,
Aug 5, 2015, 6:55:32 AM8/5/15
to
Someone might notice something other but speedup however.
The code in question ignores return value of 'snprintf'. It does not check if
it is less than 255. It does not allocate bigger buffer if it isn't. So most
important customer who has the application installed in a way that path
to file exceeds the limits of 'fname' will notice that the junk we sold
him does strange things and crashes. ;-)


Öö Tiib

unread,
Aug 5, 2015, 7:11:26 AM8/5/15
to
It is because the 'string::string(string&&)' called in benchmark does move
and if to replace it with 'string::operator=(string&&)' then it will move
anyway (and so release the pointlessly 'reserve'd buffer). 'string::reserve'
has point if you 'append' or '+=' to it.

The whole benchmark is a joke, the outcome of it is passed
nowhere and the work in 'for' cycle is pointless 'nop' in essence.
No wonder since real software rarely needs sliced to 254
characters pieces of texts.

Martin Shobe

unread,
Aug 5, 2015, 9:54:03 AM8/5/15
to
On 8/5/2015 3:12 AM, Juha Nieminen wrote:
> woodb...@gmail.com wrote:
>> This is interesting
>>
>> http://stackoverflow.com/questions/21946447/how-much-performance-difference-when-using-string-vs-char-array
>
> They are not doing the same thing. Formatted printing into a static buffer
> is not the same thing as building a dynamic string by appending stuff.
>
> If std::sprintf() is sufficient for the task at hand, then definitely
> use that. (After all, I did mention in my post that C I/O functions
> tend to be more efficient than the C++ equivalents.)
>

Just for fun, I run the tests above using Visual Studio 2013 (I used the
default release options and _snprintf instead of snprintf. I also added
a test for stringstream performance.) and got the following results.

Length is the length of the string in baseLocation. stream is the
average time to build the filename using a stringstream. reserve is the
average time using a string with a reserved buffer. string is using a
string without using a reserved buffer. Array is the c-style array.

length stream reserve string array
12 8.6122 3.7423 4.4735 4.6979 microseconds
78 7.8117 3.7714 4.4523 6.3554 microseconds

Which I think provides evidence for the maxim about measuring.

Martin Shobe

Bo Persson

unread,
Aug 5, 2015, 11:03:59 AM8/5/15
to
On 2015-08-04 21:38, woodb...@gmail.com wrote:
> On Tuesday, August 4, 2015 at 1:10:21 PM UTC-5, Bo Persson wrote:
>> On 2015-08-04 19:15, woodb...@gmail.com wrote:
>>>>
>>>
>>> This is interesting
>>>
>>> http://stackoverflow.com/questions/21946447/how-much-performance-difference-when-using-string-vs-char-array
>>>
>>
>> Yes, like one of the replies notices - using sprintf to build filenames
>> will let you start opening the file 0.12 microseconds earlier.
>>
>> Wanna bet if someone will notice the speedup?
>>
>
> I think it matters in aggregate. Programmers generally work on
> processes that run periodically/frequently. And maybe you change
> a number of std::string objects to arrays of char in your application.
>

And you don't think that actually opening the file will hide the
difference (being fractions of a microsecond)?


Bo Persson


woodb...@gmail.com

unread,
Aug 6, 2015, 1:24:36 AM8/6/15
to
On Wednesday, August 5, 2015 at 3:12:27 AM UTC-5, Juha Nieminen wrote:
> woodb...@gmail.com wrote:
> > This is interesting
> >
> > http://stackoverflow.com/questions/21946447/how-much-performance-difference-when-using-string-vs-char-array
>
> They are not doing the same thing. Formatted printing into a static buffer
> is not the same thing as building a dynamic string by appending stuff.
>

How about comparing these two programs:

#include <string.h>

int main ()
{
for (int i = 0; i < 4000000; ++i){
char s[200];
::strncpy(s, "We few; we happy few; we band of brothers.", sizeof(s));
}
}

and

#include <string>

int main ()
{
for (int i = 0; i < 4000000; ++i){
::std::string s = "We few; we happy few; we band of brothers.";
}
}

uname -a
Linux localhost.localdomain 4.0.4-301.fc22.x86_64 #1 SMP Thu May 21 13:10:33 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

g++ -v
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/5.1.1/lto-wrapper
Target: x86_64-redhat-linux
Configured with: ../configure --enable-bootstrap --enable-languages=c,c++,objc,obj-c++,fortran,ada,go,lto --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared --enable-threads=posix --enable-checking=release --enable-multilib --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-linker-hash-style=gnu --enable-plugin --enable-initfini-array --disable-libgcj --with-default-libstdcxx-abi=c++98 --with-isl --enable-libmpx --enable-gnu-indirect-function --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux
Thread model: posix
gcc version 5.1.1 20150422 (Red Hat 5.1.1-1) (GCC)


size oldschool stdstr
text data bss dec hex filename
1504 596 4 2104 838 oldschool
1875 604 40 2519 9d7 stdstr


The std::string version is over 4 times slower than the
old school version on this machine and even slower on a
FreeBSD machine.

time ./oldschool

real 0m0.085s
user 0m0.084s
sys 0m0.002s

time ./stdstr

real 0m0.365s
user 0m0.363s
sys 0m0.003s

Ian Collins

unread,
Aug 6, 2015, 1:42:31 AM8/6/15
to
woodb...@gmail.com wrote:
>
> How about comparing these two programs:
>
> #include <string.h>
>
> int main ()
> {
> for (int i = 0; i < 4000000; ++i){
> char s[200];
> ::strncpy(s, "We few; we happy few; we band of brothers.", sizeof(s));
> }
> }
>
> and
>
> #include <string>
>
> int main ()
> {
> for (int i = 0; i < 4000000; ++i){
> ::std::string s = "We few; we happy few; we band of brothers.";
> }
> }

A fairer comparison would be

int main ()
{
std::string s;
for (int i = 0; i < 4000000; ++i){
s = "We few; we happy few; we band of brothers.";
}
}

--
Ian Collins

woodb...@gmail.com

unread,
Aug 6, 2015, 1:53:58 AM8/6/15
to
Here's the time for that version.

time ./stdstr

real 0m0.120s
user 0m0.118s
sys 0m0.001s

In most cases I can't lift a std::string out of the loop like that
though.

woodb...@gmail.com

unread,
Aug 6, 2015, 2:33:11 AM8/6/15
to
I'm doing what I can to make things run fast. This is trading
some space for some speed. That seems like a good trade off to me.
I can't do much to make opening files faster.

If you don't have a lot of money for hardware, software optimizations
can help.

Brian
Ebenezer Enterprises - Bless G-d, America.
http://webEbenezer.net

Öö Tiib

unread,
Aug 6, 2015, 3:20:59 AM8/6/15
to
We run profiler on full application. We do not write code that has no
input nor output in tight loops. Therefore such tests are deceiving.
On all cases of composing multi-megabyte XML, JSON or the like (why else
you need to copy 4 millions of texts in tight loop?) that 'strncat' and
the like lose to 'string::append' and the like.

We can win both by hand optimizing usage of bytes (in vector of bytes)
for concrete case but that does give minor returns because the real
bottle-neck is in transfer of data. So if we instead use the effort of
that raw byte level tinkering to find a way to reduce the data transferred
by half then the program is twice faster.

Christopher Pisz

unread,
Aug 6, 2015, 11:10:38 AM8/6/15
to
s.reserve(200);
for (int i = 0; i < 4000000; ++i)
{
s = "We few; we happy few; we band of brothers.";
}
}



Bo Persson

unread,
Aug 6, 2015, 11:29:38 AM8/6/15
to
Or try doing this with C strings:

int main ()
{
std::string s;
for (int i = 0; i < 4000000; ++i){
s += "We few; we happy few; we band of brothers.";
}
}



Bo Persson

woodb...@gmail.com

unread,
Aug 6, 2015, 12:39:08 PM8/6/15
to
No thanks. I don't want to totally stop using std::string, but
I have some cases where I'm not appending anything to the initial
value.

Previously I was more comfortable using std::unique_ptr than
I am now. I still use unique_ptr, but not as much as before.
This is similar with std::string.


Brian
Ebenezer Enterprises
http://webEbenezer.net

Juha Nieminen

unread,
Aug 10, 2015, 4:25:54 AM8/10/15
to
woodb...@gmail.com wrote:
> for (int i = 0; i < 4000000; ++i){
> char s[200];
> ::strncpy(s, "We few; we happy few; we band of brothers.", sizeof(s));

You are copying data from a string literal to a stack-allocated static array.
Of course it's going to be faster than allocating a string dynamically.
(And of course it should be done like that, if it suffices and if it indeed
requires the speed.)

Btw, are you sure the compiler isn't optimizing the loop away? After all,
it can see that 's' isn't doing anything. It can also see that the body
of the loop isn't using the loop variable nor has any side effects.
I wouldn't be surprised if the complier optimized the whole thing away.

For a more reliable (and fairer) comparison, change it so that the string
being built is of a length unknown at compile time, and the result is
used for something...
0 new messages