Explanation for smart pointer benchmark results?

111 views
Skip to first unread message

Noah Sung

unread,
Jul 27, 2020, 5:13:05 PM7/27/20
to benchmark-discuss
Hi,

I'm kind of new to Google benchmark but I was trying to run some simple benchmarks comparing performance for construction of unique_ptr, shared_ptr, and raw pointers to an object of MyClass. I'm wondering why the times for raw pointers are slightly greater than those of unique_ptrs. I would expect the opposite to be true. Is there anything wrong with my implementation or something else I'm missing?

--------------------------------------------------------------------
Benchmark                          Time             CPU   Iterations
--------------------------------------------------------------------
BM_UniquePtr_MyClass/1          13.2 ns         13.2 ns     52478142
BM_UniquePtr_MyClass/10          140 ns          140 ns      4999401
BM_UniquePtr_MyClass/100        1300 ns         1300 ns       532017
BM_UniquePtr_MyClass/1000      12893 ns        12893 ns        53916
BM_SharedPtr_MyClass/1          36.2 ns         36.2 ns     19350467
BM_SharedPtr_MyClass/10          353 ns          353 ns      1983883
BM_SharedPtr_MyClass/100        3548 ns         3548 ns       197663
BM_SharedPtr_MyClass/1000      35447 ns        35447 ns        19803
BM_RawPtr_MyClass/1             13.9 ns         13.9 ns     50482698
BM_RawPtr_MyClass/10             144 ns          144 ns      4846621
BM_RawPtr_MyClass/100           1336 ns         1336 ns       521874
BM_RawPtr_MyClass/1000         13323 ns        13323 ns        52649


class MyClass {
    int i = 0;
    double d = 1.1;
    char c = 'c';
    bool b = false;
};

static void BM_UniquePtr_MyClass(benchmark::State &state) {
    for (auto _ : state) {
        for (auto i = 0; i < state.range(0); i++) {
            std::unique_ptr<MyClass> m(new MyClass);
            benchmark::DoNotOptimize(m);
            m.reset();
        }
        
        
    }
}

static void BM_SharedPtr_MyClass(benchmark::State &state) {
    for (auto _ : state) {
        for (auto i = 0; i < state.range(0); i++) {
            std::shared_ptr<MyClass> m(new MyClass);
            benchmark::DoNotOptimize(m);
            m.reset();
        }
        
    }
}

static void BM_RawPtr_MyClass(benchmark::State &state) {
    for (auto _ : state) {
        for (auto i = 0; i < state.range(0); i++) {
            MyClass* m(new MyClass);
            benchmark::DoNotOptimize(m);
            delete m;
        }
        
    }
}

BENCHMARK(BM_UniquePtr_MyClass)->RangeMultiplier(10)->Range(1, 1000);
BENCHMARK(BM_SharedPtr_MyClass)->RangeMultiplier(10)->Range(1, 1000);
BENCHMARK(BM_RawPtr_MyClass)->RangeMultiplier(10)->Range(1, 1000);

Dominic Hamon

unread,
Jul 29, 2020, 9:31:48 AM7/29/20
to Noah Sung, benchmark-discuss
Hi Noah

I don't see anything obviously wrong  with the benchmarks, but I do wonder why you have an inner loop over `state.range(0)`. Essentially what you're doing with that is timing how long it take to create `i` of the pointer rather than just to create the pointer.

I'd also add `()` after `new MyClass`: https://abseil.io/tips/146



--
You received this message because you are subscribed to the Google Groups "benchmark-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to benchmark-disc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/benchmark-discuss/127e2375-6b64-402a-8708-dde651e41430n%40googlegroups.com.

Noah Sung

unread,
Jul 29, 2020, 10:48:00 AM7/29/20
to benchmark-discuss
I ran a for loop just to see if time changes at all when creating multiple pointers. But I got rid of the for loop and the times show that raw pointer is faster than unique_ptr as expected:
---------------------------------------------------------------
Benchmark                     Time             CPU   Iterations
---------------------------------------------------------------
BM_UniquePtr_MyClass       13.2 ns         13.2 ns     52616815
BM_SharedPtr_MyClass       35.4 ns         35.4 ns     19731845
BM_RawPtr_MyClass          13.0 ns         13.0 ns     53262794


But what is also strange is that when I modify some of the members in MyClass, it still shows the raw pointer is slower. For example when I remove boolean data type from MyClass I get this:
---------------------------------------------------------------
Benchmark                     Time             CPU   Iterations
---------------------------------------------------------------
BM_UniquePtr_MyClass       12.8 ns         12.8 ns     53955283
BM_SharedPtr_MyClass       35.3 ns         35.3 ns     19789780
BM_RawPtr_MyClass          13.0 ns         13.0 ns     54141948

class MyClass {
    int i = 0;
    double d = 1.1;
    char c = 'c';
};

static void BM_UniquePtr_MyClass(benchmark::State &state) {
    for (auto _ : state) {
        std::unique_ptr<MyClass> m(new MyClass());
        benchmark::DoNotOptimize(m);
        m.reset();
        
        
    }
}

static void BM_SharedPtr_MyClass(benchmark::State &state) {
    for (auto _ : state) {
        std::shared_ptr<MyClass> m(new MyClass());
        benchmark::DoNotOptimize(m);
        m.reset();
        
    }
}

static void BM_RawPtr_MyClass(benchmark::State &state) {
    for (auto _ : state) {
        MyClass* m(new MyClass());
        benchmark::DoNotOptimize(m);
        delete m; 
    }
}


BENCHMARK(BM_UniquePtr_MyClass);
BENCHMARK(BM_SharedPtr_MyClass);
BENCHMARK(BM_RawPtr_MyClass);

Nathaniel Doromal

unread,
Jul 29, 2020, 12:44:55 PM7/29/20
to Dominic Hamon, Noah Sung, benchmark-discuss
You might be measuring statistical noise related to vagaries in caching and warm-up. 

My expectation is that the generated assembly code should be very similar with optimization.

Noah Sung

unread,
Jul 30, 2020, 5:26:42 PM7/30/20
to benchmark-discuss
I just wanted to post a final update to say that the benchmarks now show the expected results. Really not sure why I was getting the results I mentioned before. I just played around with modifying MyClass and even reverted my code to how it was in the previous post and the benchmark results changed for whatever reason:

---------------------------------------------------------------
Benchmark                     Time             CPU   Iterations
---------------------------------------------------------------
BM_UniquePtr_MyClass       13.2 ns         13.2 ns     52542963
BM_SharedPtr_MyClass       35.4 ns         35.4 ns     19764520
BM_RawPtr_MyClass          12.8 ns         12.8 ns     53881927
 
Anyways, thanks for the help though. The benchmarks seem to perform as expected now.

Reply all
Reply to author
Forward
0 new messages