Explanation for smart pointer benchmark results?

Noah Sung

unread,

Jul 27, 2020, 5:13:05 PM7/27/20

to benchmark-discuss

Hi,

I'm kind of new to Google benchmark but I was trying to run some simple benchmarks comparing performance for construction of unique_ptr, shared_ptr, and raw pointers to an object of MyClass. I'm wondering why the times for raw pointers are slightly greater than those of unique_ptrs. I would expect the opposite to be true. Is there anything wrong with my implementation or something else I'm missing?

--------------------------------------------------------------------

Benchmark Time CPU Iterations

--------------------------------------------------------------------

BM_UniquePtr_MyClass/1 13.2 ns 13.2 ns 52478142

BM_UniquePtr_MyClass/10 140 ns 140 ns 4999401

BM_UniquePtr_MyClass/100 1300 ns 1300 ns 532017

BM_UniquePtr_MyClass/1000 12893 ns 12893 ns 53916

BM_SharedPtr_MyClass/1 36.2 ns 36.2 ns 19350467

BM_SharedPtr_MyClass/10 353 ns 353 ns 1983883

BM_SharedPtr_MyClass/100 3548 ns 3548 ns 197663

BM_SharedPtr_MyClass/1000 35447 ns 35447 ns 19803

BM_RawPtr_MyClass/1 13.9 ns 13.9 ns 50482698

BM_RawPtr_MyClass/10 144 ns 144 ns 4846621

BM_RawPtr_MyClass/100 1336 ns 1336 ns 521874

BM_RawPtr_MyClass/1000 13323 ns 13323 ns 52649

class MyClass {

int i = 0;

double d = 1.1;

char c = 'c';

bool b = false;

};

static void BM_UniquePtr_MyClass(benchmark::State &state) {

for (auto _ : state) {

for (auto i = 0; i < state.range(0); i++) {

std::unique_ptr<MyClass> m(new MyClass);

benchmark::DoNotOptimize(m);

m.reset();

}

static void BM_SharedPtr_MyClass(benchmark::State &state) {

for (auto _ : state) {

for (auto i = 0; i < state.range(0); i++) {

std::shared_ptr<MyClass> m(new MyClass);

benchmark::DoNotOptimize(m);

m.reset();

}

static void BM_RawPtr_MyClass(benchmark::State &state) {

for (auto _ : state) {

for (auto i = 0; i < state.range(0); i++) {

MyClass* m(new MyClass);

benchmark::DoNotOptimize(m);

delete m;

}

BENCHMARK(BM_UniquePtr_MyClass)->RangeMultiplier(10)->Range(1, 1000);

BENCHMARK(BM_SharedPtr_MyClass)->RangeMultiplier(10)->Range(1, 1000);

BENCHMARK(BM_RawPtr_MyClass)->RangeMultiplier(10)->Range(1, 1000);

Dominic Hamon

unread,

Jul 29, 2020, 9:31:48 AM7/29/20

to Noah Sung, benchmark-discuss

Hi Noah

I don't see anything obviously wrong with the benchmarks, but I do wonder why you have an inner loop over `state.range(0)`. Essentially what you're doing with that is timing how long it take to create `i` of the pointer rather than just to create the pointer.

I'd also add `()` after `new MyClass`: https://abseil.io/tips/146

--
You received this message because you are subscribed to the Google Groups "benchmark-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to benchmark-disc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/benchmark-discuss/127e2375-6b64-402a-8708-dde651e41430n%40googlegroups.com.

Noah Sung

unread,

Jul 29, 2020, 10:48:00 AM7/29/20

to benchmark-discuss

I ran a for loop just to see if time changes at all when creating multiple pointers. But I got rid of the for loop and the times show that raw pointer is faster than unique_ptr as expected:

---------------------------------------------------------------

Benchmark Time CPU Iterations

---------------------------------------------------------------

BM_UniquePtr_MyClass 13.2 ns 13.2 ns 52616815

BM_SharedPtr_MyClass 35.4 ns 35.4 ns 19731845

BM_RawPtr_MyClass 13.0 ns 13.0 ns 53262794

But what is also strange is that when I modify some of the members in MyClass, it still shows the raw pointer is slower. For example when I remove boolean data type from MyClass I get this:

---------------------------------------------------------------

Benchmark Time CPU Iterations

---------------------------------------------------------------

BM_UniquePtr_MyClass 12.8 ns 12.8 ns 53955283

BM_SharedPtr_MyClass 35.3 ns 35.3 ns 19789780

BM_RawPtr_MyClass 13.0 ns 13.0 ns 54141948

class MyClass {

int i = 0;

double d = 1.1;

char c = 'c';

};

static void BM_UniquePtr_MyClass(benchmark::State &state) {

for (auto _ : state) {

std::unique_ptr<MyClass> m(new MyClass());

benchmark::DoNotOptimize(m);

m.reset();

}

static void BM_SharedPtr_MyClass(benchmark::State &state) {

for (auto _ : state) {

std::shared_ptr<MyClass> m(new MyClass());

benchmark::DoNotOptimize(m);

m.reset();

}

static void BM_RawPtr_MyClass(benchmark::State &state) {

for (auto _ : state) {

MyClass* m(new MyClass());

benchmark::DoNotOptimize(m);

delete m;

}

BENCHMARK(BM_UniquePtr_MyClass);

BENCHMARK(BM_SharedPtr_MyClass);

BENCHMARK(BM_RawPtr_MyClass);

Nathaniel Doromal

unread,

Jul 29, 2020, 12:44:55 PM7/29/20

to Dominic Hamon, Noah Sung, benchmark-discuss

You might be measuring statistical noise related to vagaries in caching and warm-up.

My expectation is that the generated assembly code should be very similar with optimization.

To view this discussion on the web visit https://groups.google.com/d/msgid/benchmark-discuss/CAO1dsSfpHbE_n5Aa5M_JfAExxYkfNPq7rn0%3Dtp0Z_WYB_rLHBw%40mail.gmail.com.

Noah Sung

unread,

Jul 30, 2020, 5:26:42 PM7/30/20

to benchmark-discuss

I just wanted to post a final update to say that the benchmarks now show the expected results. Really not sure why I was getting the results I mentioned before. I just played around with modifying MyClass and even reverted my code to how it was in the previous post and the benchmark results changed for whatever reason:

---------------------------------------------------------------

Benchmark Time CPU Iterations

---------------------------------------------------------------

BM_UniquePtr_MyClass 13.2 ns 13.2 ns 52542963

BM_SharedPtr_MyClass 35.4 ns 35.4 ns 19764520

BM_RawPtr_MyClass 12.8 ns 12.8 ns 53881927

Anyways, thanks for the help though. The benchmarks seem to perform as expected now.

Reply all

Reply to author

Forward