collection of pointers vs collection of objects

Christopher J. Pisz

unread,

Feb 9, 2017, 7:03:23 PM2/9/17

to

During one interview I was asked about things I would do to optimize
some code snippet.

I mentioned that

std::vector<Employee> was probably not as good as
std::vector<Employee *> or std::vector<std::shared_ptr<Employee>>

Because of the copy construction on insert.
The fellow did not seem to like that answer at all.

I mentioned that depending on the size of the object a collection of
pointers might be better.

He asked about cache misses.

I really don't know much about how this would interact with the cache. I
explained that, as I understand it, the employee would have to be a
certain size to fit them on the cache anyway and if it was a really
large object, you probably aren't going to get the whole collection in
there anyway.

Now that I think about it, we have emplace and move, so maybe my point
was silly.

What differences do you guys know about the two ways of containing
objects and why might you choose one over the other?

Christopher J. Pisz

unread,

Feb 9, 2017, 9:25:10 PM2/9/17

to

On 2/9/2017 7:25 PM, Stefan Ram wrote:
> "Christopher J. Pisz" <cp...@austin.rr.com> writes:
>> collection of pointers vs collection of objects
>
> Collections of pointers /are/ collections of objects,
> because pointers /are/ objects!

>
>> std::vector<Employee> was probably not as good as
>> std::vector<Employee *> or std::vector<std::shared_ptr<Employee>>
>

> Which results do you get when you do a microbenchmark
> for both cases (sum the salaries of all employees) for
> some large number of employees.

It's nearly the same. But it might not be an accurate test, because
everything is nearly contiguous since nothing else in the process is
being allocated.

> Which byte width and access times do the L1 cache, the L2
> cache and the main memory have on your system?
>
> What does the processor have to do in both cases (you might
> compile the benchmark and look at the assembler output).

I have no idea. I don't know assembly.

I am not sure how byte width comes into play. I know that obviously
the entire vector of Employees does not fit, I don't know if even one
Employee fits. But assuming two fit, then I would say that iteration
over the employees will be faster in the collection that does not use
pointers, while insertion will be slower. Because a register somewhere
has to load the pointer, the address has to be read, and the actual
value has to be obtained for each entry.

I know the processor has to load from a hierarhcy of fast to slow
locations something like L1 cache, then L2 cache, L3, physical memory,
page file.

Paavo Helde

unread,

Feb 10, 2017, 1:19:20 AM2/10/17

to

On 10.02.2017 2:03, Christopher J. Pisz wrote:
> During one interview I was asked about things I would do to optimize
> some code snippet.
>
> I mentioned that
>
> std::vector<Employee> was probably not as good as
> std::vector<Employee *> or std::vector<std::shared_ptr<Employee>>

This might or might not be faster, depending on how the data is
constructed and accessed. Anyway, I guess the interviewer expected an
answer which would change the big-O complexity instead.

Gareth Owen

unread,

Feb 10, 2017, 3:47:29 AM2/10/17

to

"Christopher J. Pisz" <cp...@austin.rr.com> writes:

> He asked about cache misses.

vector<Employee> guarantees contiguous memory so, with sane access
patterns, hardware prefetch will help you out.
With vector<Employee*> you're jumping all over the heap, so it won't.

> I really don't know much about how this would interact with the
> cache. I explained that, as I understand it, the employee would have
> to be a certain size to fit them on the cache anyway and if it was a
> really large object, you probably aren't going to get the whole
> collection in there anyway.

A Core2Duo has ~2MB of L2 cache. Unless you're the NSA, you're probably
not storing that much info per Employee.

Gareth Owen

unread,

Feb 10, 2017, 3:48:42 AM2/10/17

to

And, of course, prefetch isn't about storing the whole thing in cache.
It's about having the next-one ready when you need it.

Bo Persson

unread,

Feb 10, 2017, 4:03:17 AM2/10/17

to

You seem to focus on the cost of populating the vector, while the other
guy probably thinks about how to use the vector.

Having all data in contiguous memory is one advantage a vector has over
a linked list, for example. Traversing the list, every access will be a
random access that the processor can't easily predict. Very high risk
for a cache miss or a cache collision.

On the other hand, when traversing a vector it is very easy to predict
your next memory access and the cache might even be able to prefetch the
data. And if it isn't, the risk of a cache collision with the previous
element is about zero.

A vector of pointers is about the same as a linked list, just that the
links are collected in one place. The access pattern for the data will
be similar.

Bo Persson

Chris Vine

unread,

Feb 10, 2017, 5:52:44 AM2/10/17

to

On Thu, 9 Feb 2017 18:03:15 -0600
"Christopher J. Pisz" <cp...@austin.rr.com> wrote:
> During one interview I was asked about things I would do to optimize
> some code snippet.
>
> I mentioned that
>
> std::vector<Employee> was probably not as good as
> std::vector<Employee *> or std::vector<std::shared_ptr<Employee>>
>
> Because of the copy construction on insert.
> The fellow did not seem to like that answer at all.
>
> I mentioned that depending on the size of the object a collection of
> pointers might be better.
>
> He asked about cache misses.
>
> I really don't know much about how this would interact with the
> cache. I explained that, as I understand it, the employee would have
> to be a certain size to fit them on the cache anyway and if it was a
> really large object, you probably aren't going to get the whole
> collection in there anyway.
>
> Now that I think about it, we have emplace and move, so maybe my
> point was silly.

If the Employee type is mainly composed of fundamental types (integers,
floating points and so forth, say comprising an employee id number, the
rate of pay, a location code and the like) or moderately sized arrays of
these, then holding Employee objects in the vector by value is going to
be considerably faster because of the excellent locality. You will have
both data in cache and pre-fetchability, and also copying will be fast.

If however the Employee type has member data which is allocated on the
heap (which will degrade locality as between both different members of
an Employee object and different Employee objects in the vector) and is
expensive to copy, and it does not have a move constructor and move
assignment operator, then you will probably find yourself better off
holding objects of that type in the vector by pointer, unique_ptr or
shared_ptr. If the Employee type does have a move constructor and move
assignment operator then much of its implementation is probably
allocated on the heap anyway to support movability, and if the use case
mainly involves moving Employee objects into the vector rather than
copying them there, it would normally be better to hold them by value.

In the second case (the Employee type is not mainly composed of
fundamental types or moderately sized arrays of these) you are in a
grey area. You need to know more about the details of the
implementation of the type and how objects of the type are used. Mainly
though, you would need to measure.

Chris

Gareth Owen

unread,

Feb 10, 2017, 5:59:18 AM2/10/17

to

Chris Vine <chris@cvine--nospam--.freeserve.co.uk> writes:

> You need to know more about the details of the implementation of the
> type and how objects of the type are used. Mainly though, you would
> need to measure.

These two sentences can be used at the end to to improve your answer for
almost any programming interview question. :)

Öö Tiib

unread,

Feb 10, 2017, 6:26:13 AM2/10/17

to

On Friday, 10 February 2017 02:03:23 UTC+2, Christopher J. Pisz wrote:
> During one interview I was asked about things I would do to optimize
> some code snippet.
>
> I mentioned that
>
> std::vector<Employee> was probably not as good as
> std::vector<Employee *> or std::vector<std::shared_ptr<Employee>>
>
> Because of the copy construction on insert.
> The fellow did not seem to like that answer at all.

Yes, since case where large sequence gets performance-affectingly
frequent inserts and erases in middle of it is quite rare.
When it is That Case then it is very likely that (may be
intrusive) list or set will outperform all of given contenders by
large margin. When it is not That Case then 'std::vector<Employee>'
usually wins.

>
> I mentioned that depending on the size of the object a collection of
> pointers might be better.
>
> He asked about cache misses.
>
> I really don't know much about how this would interact with the cache. I
> explained that, as I understand it, the employee would have to be a
> certain size to fit them on the cache anyway and if it was a really
> large object, you probably aren't going to get the whole collection in
> there anyway.
>
> Now that I think about it, we have emplace and move, so maybe my point
> was silly.
>
> What differences do you guys know about the two ways of containing
> objects and why might you choose one over the other?

:) What *two* ways? I will list few ways of how i imagine a "vector of
employees". Note that it is limited to vector but we have *lot* of other
containers. Assumption is that vector in general is what we need and
now there are the constraints:

1) A 'std::vector<std::unique_ptr<Employee>>' I would pick when there
are meant to be dynamic polymorphic classes derived from Employee
and the vector owns the Employees.
2) A 'std::vector<std::shared_ptr<Employee>>' I would pick when
ownership of Employee is shared, for example when same Employee
may be simultaneously in several of those vectors and none of those
is then owner of it.
3) A 'std::vector<std::reference_wrapper<Employee>>' I would pick
when the Employees are owned by something else than that vector
and life-time of those is known to last longer than that of vector.
4) A 'std::vector<std::weak_ptr<Employee>>' I would pick when the
Employees are owned by something else than that vector and the
life-times may last less than life-time of vector.
5) A 'std::vector<Employee*>' I can't imagine situation where I would
be picking it (, but never say never).
6) A 'std::vector<Employee>' I would pick when the vector owns those
Employees and there are no dynamic polymorphism of Employees.

Like you see *none* are about performance. ;)

Chris Vine

unread,

Feb 10, 2017, 7:05:27 AM2/10/17

to

On Fri, 10 Feb 2017 03:26:05 -0800 (PST)
Öö Tiib <oot...@hot.ee> wrote:
> On Friday, 10 February 2017 02:03:23 UTC+2, Christopher J. Pisz
> wrote:
> > During one interview I was asked about things I would do to
> > optimize some code snippet.
> >
> > I mentioned that
> >
> > std::vector<Employee> was probably not as good as
> > std::vector<Employee *> or std::vector<std::shared_ptr<Employee>>
> >
> > Because of the copy construction on insert.
> > The fellow did not seem to like that answer at all.

[snip]

All very reasonable, but if the interviewer was unimpressed with
Christopher's answer about how to "optimize some code snippet" and
raised the issue of "cache misses", he wanted to know Christopher's
views on designing for locality, and that is about performance. You
get the job by dealing with the issues put to you.

The "two" ways referred to by Christopher I read as meaning (i) hold by
value, and (ii) hold by reference. Once you have decided you are going
to hold by reference then all the issues you mention also become
relevant of course. I agree that if the Employee type is polymorphic
that forces your hand, but given that the code snippet Christopher was
given to optimize seemed (if I read his posting correctly) to involve a
std::vector<Employee> object in the context of optimization, I think one
can take it that is wasn't.

Chris

Scott Lurndal

unread,

Feb 10, 2017, 9:48:14 AM2/10/17

to

"Christopher J. Pisz" <cp...@austin.rr.com> writes:

>I mentioned that depending on the size of the object a collection of
>pointers might be better.
>
>He asked about cache misses.

The effectiveness of Caches (and TLB's) is dependent upon
the degree of locality of references. A cache line on Intel
is 64-bytes (on some ARM64 processors it is 128 bytes).

Most processors have some internal HW prefetching algorithm in
the cache subsystem to anticipate future requests, and most of
those algorithms are stride-based. So, random pointer chasing is
likely to defeat the prefetching algorithms (software can alleviate
this somewhat by using prefetching instructions which are available on
both intel and arm64 processors to hint to the cache that some
line will be needed in the near future and the cache can load it
in anticipation asynchronously to the execution of the thread).

Walking through a vector or array of values will have a high cache
hit rate (close to 100% if the prefetcher is working well), while
chasing a pointer through memory will generally have a very low
hit rate because the processor cannot anticipate the next memory
reference.

If an instance of Employee required 256 bytes in memory, and it had
a 64-bit employee id field, for example, to walk through an array
of Employee objects looking for a specific employee id,
you'd load one cache line for the 64-byte region
containing the employee id. A stride based prefetcher can anticipate
this an have the next cache line ready by the time you're looking at
the employee id field in the next sequential element. It can't
do this if you're walking a vector of pointers.

A cache hit (l1) is about 3 cycles (1ns) on skylake, a hit to L2 is
11 cycles (4ns at 3Ghz) and a hit to L3 is about 40 cycles. A
miss at all three levels costs about 60 to 100ns to go to DRAM.

So a miss is about 33x slower than a hit at L1.

Scott Lurndal

unread,

Feb 10, 2017, 9:52:04 AM2/10/17

to

Thats L3 cache. L1 is about 32k, L2 is generally 256k per core and L3
is shared by all cores.

Doesn't matter unless you only have one Employee. If you have 10,000
employees, and each employee object is 256 bytes, that won't fit in L3
(even leaving aside code footprint and the other core(s)).

And the vast majority of real-world datasets don't fit cache, not even
close. ThunderX has 24Mb cache shared by 48 cores, for example, and
72kb split I/D at level 1. Any interesting dataset will be much larger.

Öö Tiib

unread,

Feb 10, 2017, 11:19:38 AM2/10/17

to

I have been sort of self-employed past quarter of century so my advice
is likely questionable there. What one has to do in interview is perhaps
to find out what the interviewer wants and to deliver that.

I generally ignore all such performance considerations until I have
working thing that really does the actual work that was needed
wholly AND also I have sets of actual data with what it does that.
It is typically about 5% of product code that matters at all to
performance and so grinding 95% of it preliminarily is waste of
effort. Was he really offered such circumstances? I doubt it. Therefore
answer what interviewer wants to hear however nonsensical it sounds.
Later sort out if you want to work for that company or not.

>
> The "two" ways referred to by Christopher I read as meaning (i) hold by
> value, and (ii) hold by reference. Once you have decided you are going
> to hold by reference then all the issues you mention also become
> relevant of course. I agree that if the Employee type is polymorphic
> that forces your hand, but given that the code snippet Christopher was
> given to optimize seemed (if I read his posting correctly) to involve a
> std::vector<Employee> object in the context of optimization, I think one
> can take it that is wasn't.

These were not issues. There is just the nature of data about what I think
when I pick one or other style of "vector of employees". It can be that I
don't know the nature so clearly yet. Then I take 'std::vector<Employee>'
until I find out. Code can be edited and will be edited and if I don't yet
know nature of data clearly then there is no way how I can ensure that it
performs optimally.

Manfred

unread,

Feb 10, 2017, 1:07:50 PM2/10/17

to

On 02/10/2017 05:19 PM, Öö Tiib wrote:
> On Friday, 10 February 2017 14:05:27 UTC+2, Chris Vine wrote:
>> On Fri, 10 Feb 2017 03:26:05 -0800 (PST)
>> Öö Tiib <oot...@hot.ee> wrote:

<snip>

>>> 1) A 'std::vector<std::unique_ptr<Employee>>' I would pick when there
>>> are meant to be dynamic polymorphic classes derived from Employee
>>> and the vector owns the Employees.
>>> 2) A 'std::vector<std::shared_ptr<Employee>>' I would pick when
>>> ownership of Employee is shared, for example when same Employee
>>> may be simultaneously in several of those vectors and none of those
>>> is then owner of it.
>>> 3) A 'std::vector<std::reference_wrapper<Employee>>' I would pick
>>> when the Employees are owned by something else than that vector
>>> and life-time of those is known to last longer than that of vector.
>>> 4) A 'std::vector<std::weak_ptr<Employee>>' I would pick when the
>>> Employees are owned by something else than that vector and the
>>> life-times may last less than life-time of vector.
>>> 5) A 'std::vector<Employee*>' I can't imagine situation where I would
>>> be picking it (, but never say never).
>>> 6) A 'std::vector<Employee>' I would pick when the vector owns those
>>> Employees and there are no dynamic polymorphism of Employees.
>>>
>>> Like you see *none* are about performance. ;)
>>
>> All very reasonable, but if the interviewer was unimpressed with
>> Christopher's answer about how to "optimize some code snippet" and
>> raised the issue of "cache misses", he wanted to know Christopher's
>> views on designing for locality, and that is about performance. You
>> get the job by dealing with the issues put to you.

I think you also get the job by giving exhaustive answers. The points
above seem pretty much on topic to me.

<snip>

>>
>> The "two" ways referred to by Christopher I read as meaning (i) hold by
>> value, and (ii) hold by reference. Once you have decided you are going
>> to hold by reference then all the issues you mention also become
>> relevant of course. I agree that if the Employee type is polymorphic
>> that forces your hand, but given that the code snippet Christopher was
>> given to optimize seemed (if I read his posting correctly) to involve a
>> std::vector<Employee> object in the context of optimization, I think one
>> can take it that is wasn't.
>
> These were not issues. There is just the nature of data about what I think
> when I pick one or other style of "vector of employees". It can be that I
> don't know the nature so clearly yet. Then I take 'std::vector<Employee>'
> until I find out. Code can be edited and will be edited and if I don't yet
> know nature of data clearly then there is no way how I can ensure that it
> performs optimally.
>

I agree that the first rationale for the choice is the nature of data,
as you correctly indicated.
I may add that it follows from your list that you would pick case 6)
unless there is polymorphism or the vector does not own the Employees,
which happens to be the most performing solution (not surprisingly,
since C++ usually likes the simplest solution to be the most performing
as well)

Juha Nieminen

unread,

Feb 13, 2017, 2:29:20 AM2/13/17

to

Christopher J. Pisz <cp...@austin.rr.com> wrote:

> I mentioned that
>
> std::vector<Employee> was probably not as good as
> std::vector<Employee *> or std::vector<std::shared_ptr<Employee>>
>
> Because of the copy construction on insert.
> The fellow did not seem to like that answer at all.

I would say that in most cases the former is better, even more efficient.

Memory allocation in languages that use the C runtime allocator is quite
heavy and inefficient. By using pointers you are introducing an additional
memory allocation (and deallocation) for every single element. This can
increase the inefficiency of the container by quite a lot (especially
if Employee doesn't itself allocate any memory dynamically).

Using a vector of values is also simpler and safer, as it's harder to
accidentally eg. leak memory, or access freed memory.

If Employee is a very heavy class to copy, then the optimization should
be done inside it, not in the container that's using it.