He was talking about traversing the set, not searching it.
In other words, for(auto& element: theSet).
(This, of course, assuming that the amount of data is so large
that it won't fit entirely even in the L3 cache. Or, if we are
just traversing the set for the first time since all of its contents
have been flushed from the caches.)
Of course even with std::vector it depends on the size of the
element. Traversing a (very large) std::vector linearly from
beginning to end isn't magically going to be very fast either,
if each element is large enough. And "large enough" is actually
quite small. If I remember correctly, cache line sizes are typically
64 bytes or so. This means that if the vector element type is an
object of size 64 bytes or more, and you are accessing just one member
variable of each object, then you'll get no benefit from linear
traversal compared to random access (in the case that the contents of
the vector are not already in the caches).
You only get a speed advantage for (very large) vectors which element
size is very small, like 4 or 8 bytes. For example, if the vector
represents a bitmap image, with each "pixel" element taking eg. 4 bytes,
then a linear traversal will be quite efficient (assuming none of the
vector contents were in any cache to begin with, you'll get an
extremely heavy cache miss only each 16 pixels.)
Of course almost none of this applies if the vector or set is
small enough to fit in L1 cache, and it has already been loaded in
there in its entirety previously. Then none of this matters almost
at all. It starts mattering a bit more if the vector is too large
for L1 but small enough for L2 cache, and furthermore if it's too
large for L2 but small enough for L3 cache.
Modern CPUs tend to have a quite large L3 cache, which mitigates
the problems of cache misses in many instances. For example my CPU
has a L3 cache of 12 MB. Thus if I need to, for example, do some
operations repeatedly to, let's say, an image that fits comfortably
within those 12 MB, then it will be very fast.
It's only when the dataset is much larger than L3 that cache locality
really starts having a very pronounced effect (when performing
operations repeatedly on the entire dataset).