Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Two vectors of ints or one vector of std::pair<int, int> ?

37 views
Skip to first unread message

Paul

unread,
Jan 17, 2019, 1:31:31 PM1/17/19
to
Suppose two vectors of ints are the same size -- say one million entries each.
Comparisons need to be made between elements in the vectors -- for example
it needs to be queried where A[780] < B[781].

The same information that can be stored in two vectors can also be stored
as a single vector of std::pair<int, int>.

Does a vector of pair<int, int> take up less space than two vectors?
Is there any difference in performance between an enquiry to determine
whether A[780].first < A[781].second compared to the above inquiry
with separate vectors.

Naively, I would think a std::pair approach is preferred. My guess is that
it takes less space. Also, is there an architectural point to do with caches
(my knowledge about this is hopelessly vague) that says that comparing
A[k] with B[m] is slow because these two vectors reside in unrelated
areas of memory whereas comparing A[k].first with A[m].second is faster
because the A entries are in the same area of memory?

Any thoughts?

Thanks,

Paul

Alf P. Steinbach

unread,
Jan 17, 2019, 2:46:32 PM1/17/19
to
On 17.01.2019 19:31, Paul wrote:
> Suppose two vectors of ints are the same size -- say one million entries each.
> Comparisons need to be made between elements in the vectors -- for example
> it needs to be queried where A[780] < B[781].
>
> The same information that can be stored in two vectors can also be stored
> as a single vector of std::pair<int, int>.

Or better, a vector of

struct Good_name
{
int a;
int b;
};


> Does a vector of pair<int, int> take up less space than two vectors?

Necessarily but it's a constant of a few bytes.


> Is there any difference in performance between an enquiry to determine
> whether A[780].first < A[781].second compared to the above inquiry
> with separate vectors.

Intuition often fails, so, try it.


> Naively, I would think a std::pair approach is preferred. My guess is that
> it takes less space. Also, is there an architectural point to do with caches
> (my knowledge about this is hopelessly vague) that says that comparing
> A[k] with B[m] is slow because these two vectors reside in unrelated
> areas of memory whereas comparing A[k].first with A[m].second is faster
> because the A entries are in the same area of memory?
>
> Any thoughts?

Go for most clear code.

And IMO that's neither separate vectors nor vector of `std::pair`.


Cheers!,

- Alf

Jorgen Grahn

unread,
Jan 17, 2019, 3:58:25 PM1/17/19
to
On Thu, 2019-01-17, Paul wrote:
> Suppose two vectors of ints are the same size -- say one million entries each.
> Comparisons need to be made between elements in the vectors -- for example
> it needs to be queried where A[780] < B[781].

Did you mean A[780] < B[780]?

> The same information that can be stored in two vectors can also be stored
> as a single vector of std::pair<int, int>.

> Does a vector of pair<int, int> take up less space than two vectors?
> Is there any difference in performance between an enquiry to determine
> whether A[780].first < A[781].second compared to the above inquiry
> with separate vectors.
>
> Naively, I would think a std::pair approach is preferred. My guess is that
> it takes less space. Also, is there an architectural point to do with caches
> (my knowledge about this is hopelessly vague) that says that comparing
> A[k] with B[m] is slow because these two vectors reside in unrelated
> areas of memory whereas comparing A[k].first with A[m].second is faster
> because the A entries are in the same area of memory?
>
> Any thoughts?

A vector of pairs takes up roughly as much space, and lookup in that
one is nicer to the cache than two lookups in two vectors[0].

What bothers me is: is this really the main use case for these
vectors? Are they created once, never updated, and only used to
answer the question a[N] < b[N]? If not, you're going to make some
other code uglier and slower. If yes, then a vector<bool> would be
nicer to the cache.

/Jorgen

[0] Although I cannot say how much of a difference that would make.
It depends on the overall access pattern.

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

Vir Campestris

unread,
Jan 17, 2019, 4:57:22 PM1/17/19
to
On 17/01/2019 19:46, Alf P. Steinbach wrote:
> Go for most clear code.
>
> And IMO that's neither separate vectors nor vector of `std::pair`.

This.^^^

I doubt there would be much performance difference running through them
serially. The only advantage could be if you kept picking out two
entries at random - in this case it's probable the second one will be
loaded in cache with the first with the pairs (or better structure) but
if they aren't near each other you'll get two cache misses.

And even then it's probably a very small effect.

Premature optimisation (google it) is normally a bad thing.

Andy

Paul

unread,
Jan 17, 2019, 6:54:17 PM1/17/19
to
Good responses by all three (so far).
Thanks a lot!

Paul

Juha Nieminen

unread,
Jan 20, 2019, 6:13:10 AM1/20/19
to
Vir Campestris <vir.cam...@invalid.invalid> wrote:
> I doubt there would be much performance difference running through them
> serially. The only advantage could be if you kept picking out two
> entries at random - in this case it's probable the second one will be
> loaded in cache with the first with the pairs (or better structure) but
> if they aren't near each other you'll get two cache misses.
>
> And even then it's probably a very small effect.
>
> Premature optimisation (google it) is normally a bad thing.

It depends on how many (consecutive) operations you need to do with
the data. If you need to do tons and tons of operations at once on
the same data, cache optimization can make it several times faster.

In fact, most game engines have been moving towards that direction
in later years. So-called data-oriented design (put all data that
needs to be handled at once as tightly packed in memory as possible,
in in arrays, and minimize the amount of branching as much as possible.
Code cache misses also can have a significantly detrimental effect in
terms of speed.)

--- news://freenews.netfront.net/ - complaints: ne...@netfront.net ---
0 new messages