I need to store a number of integer values which I will then search on
later to see if they exist in my container. Can someone tell me which
container would be quickest for finding these values? I can't use a
plain C array (unless I make it 2^32 in size!) since I don't know the
max integer value.
Thanks for any help
B2003
Store first, then sort, then search (using 'std::binary_search'), you
could just use 'std::vector'. If you expect both searching and updating
the container, 'std::set' is probably better, its insertions are quite
fast. What book on the Standard Library are you reading that does not
have comparison of different standard containers in terms of performance?
V
--
Please remove capital 'A's when replying by e-mail
I do not respond to top-posted replies, please don't ask
std::set
Lars
Sorted vector. See Effective STL, Item 23.
For the record, you wouldn't 2^32 integers, just 2^32 bits = 500 MiB.
It's actually not that much RAM, depending on your target system, and
would let you check for integers with O(1) complexity (rather than O(log
N)).
For that case, I think std::list is a better option, since the sorting
will be faster,
Do you have any proof of that?
Lists are implemented using pointers to point to the previous and to the
next elements, so list::sort(), is more efficient by changing pointer
values, while sorting a vector involves copying objects.
The original poster talked about storing integer values. I highly
doubt sorting a list of integers will be faster than sorting an array of
integers. In fact, I'm pretty sure of the contrary.
If memory usage is not an issue, std::set is by far the easiest solution.
(OTOH if the amount of integers can be counted in the millions, then
it may be better to use a sorted vector or whatever.)
And you really think that doing two pointer exchanges is faster than one
integer exchange?
However, it can still be slower, since it's more or less the worst thing you
can do to the cache.
Still, you should only get one cache-miss when looking for a value, if
you use a set or vector you will probably get more.
--
Erik Wikström
In other words, no. <g> One could also point out that quicksort can be
used to sort a vector but can't be used on a list, so obviously sorting
a vector is faster. The problem with both arguments is exactly what
Victor implied: they're handwaving. Proof requires much more detailed
analysis, or if you're making a decision for a particular platform,
measurement.
--
Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com) Author of "The
Standard C++ Library Extensions: a Tutorial and Reference
(www.petebecker.com/tr1book)
How do you get O(1) from a sorted vector<int>? A binary search costs
O(log N).
Lars
Jeff meant one would get O(1) from looking up in the full array of bits.
Every integer would mean a shift to form the index and a mask to get
to the bit value, one shift, one pointer addition, one dereference, one
bitwise AND per lookup, O(1).
I see, something like vector<bool>. Ok, that would be fast.
Lars
If you only do a single look-up, yes. But in that case, building up an array
of 500 Megabytes will take a quite significant amount of time. ;-)
Some systems won't allow you to allocate that much as a local
variable, so you might as well use std::vector. If the
reallocations are an issue, one call up front to capacity should
solve that.
--
James Kanze (GABI Software) email:james...@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
One thing I don't understand here: both a C style array and
std::vector use a single block of contiguous memory. How could
cache performance be any different for them?
(An earlier suggestion to use std::list does suffer from this,
of course.)
> >>>> Store first, then sort, then search (using
> >>>> 'std::binary_search'), you could just use 'std::vector'.
> >>> For that case, I think std::list is a better option, since
> >>> the sorting will be faster,
> >> Do you have any proof of that?
> > Lists are implemented using pointers to point to the
> > previous and to the next elements, so list::sort(), is more
> > efficient by changing pointer values, while sorting a vector
> > involves copying objects.
> In other words, no. <g> One could also point out that
> quicksort can be used to sort a vector but can't be used on a
> list, so obviously sorting a vector is faster. The problem
> with both arguments is exactly what Victor implied: they're
> handwaving. Proof requires much more detailed analysis, or if
> you're making a decision for a particular platform,
> measurement.
To other considerations: you can't use binary_search on
std::list---with any sort of size, that's going to make a
significant difference in favor of vector. And of course, an
std::list has horrible locality, which will translate into a lot
of cache misses (or even virtual page faults). So even if
sorting it were faster (which I somehow doubt)...
> On Oct 1, 7:31 pm, Erik Wikström <Erik-wikst...@telia.com> wrote:
>> On 2008-10-01 18:57, Rolf Magnus wrote:
>> > Jeff Schwab wrote:
>> >> Boltar wrote:
>> >>> I need to store a number of integer values which I will
>> >>> then search on later to see if they exist in my container.
>> >>> Can someone tell me which container would be quickest for
>> >>> finding these values? I can't use a plain C array (unless
>> >>> I make it 2^32 in size!) since I don't know the max
>> >>> integer value.
>
>> >> Sorted vector. See Effective STL, Item 23.
>
>> >> For the record, you wouldn't 2^32 integers, just 2^32 bits
>> >> = 500 MiB. It's actually not that much RAM, depending on
>> >> your target system, and would let you check for integers
>> >> with O(1) complexity (rather than O(log N)).
>
>> > However, it can still be slower, since it's more or less the
>> > worst thing you can do to the cache.
>
>> Still, you should only get one cache-miss when looking for a
>> value, if you use a set or vector you will probably get more.
>
> One thing I don't understand here: both a C style array and
> std::vector use a single block of contiguous memory. How could
> cache performance be any different for them?
We're not talking about C style arrays vs. vector, but about different
algorithms. One is to storing the values in a sorted array/vector, then
doing a binary search for the value. The other is like a trivial kind of
hash table, where each possible integer value has a boolean entry, so you
can simply use the value as index. Saves the search, but needs a big amount
of memory that is much larger than the cache.
Right, but the whole thing doesn't have to be in cache, just the parts
that are used. The rest may as well be swapped out of memory
altogether. Of course, that's the point of cache. :)
We must think generally. In general, sorting a list is faster than
sorting a vector, because the list sorting does not involve the
construction or destruction of any object.
Regarding ints, I think sorting a vector of ints and as list of ints,
both have about the same efficiency.
If the programmer decides to replace ints with other objects, he will
not have to change much in the code, if he uses a list.
list::sort is faster than any sort on vector, because it does not
Of course you can:
#include <algorithm>
#include <list>
#include <cstddef>
#include <iostream>
int main()
{
using namespace std;
typedef list<int> intlist;
intlist ilist;
for (size_t i= 0; i< 100; ++i)
ilist.push_back(100-i);
ilist.sort();
for(intlist::const_iterator p= ilist.begin(); p!= ilist.end(); ++p)
cout<< *p<< endl;
binary_search(ilist.begin(), ilist.end(), 50);
}
Why don't you just measure before you doubt the statements
of those who already went and did this?
On my platform, this
#include <iostream>
#include <vector>
#include <list>
#include <algorithm>
const unsigned int kLimit = 1000000;
template< class Cont >
void fill(Cont& cont)
{
for( unsigned int u = 0; u < kLimit; ++u ) {
cont.push_back(u);
}
}
template< class Cont >
void test(Cont& cont);
int main()
{
std::vector<unsigned int> v; v.reserve(kLimit);
std::list<unsigned int> l;
std::cout << "filling a vector..." << std::endl;
fill(v);
std::cout << "filling a list..." << std::endl;
fill(l);
std::cout << "...done.\n";
std::cout << "sorting a vector..." << std::endl;
test(v);
std::cout << "sorting a list..." << std::endl;
test(l);
return 0;
}
template< typename T, class Al >
inline void sort(std::vector<T,Al>& v) {std::sort(v.begin(),v.end());}
template< typename T, class Al >
inline void sort(std::list<T,Al>& l) {l.sort();}
#include <windows.h> //for GetTickCount()
template< class Cont >
void test(Cont& cont) {
const DWORD start = GetTickCount();
sort(cont);
std::cout << "...took " << GetTickCount()-start << "msecs." << std::endl;
}
outputs
filling a vector...
filling a list...
...done.
sorting a vector...
...took 47msecs.
sorting a list...
...took 1562msecs.
and thus agrees with everyone who disagreed with you.
> If the programmer decides to replace ints with other objects, he will
> not have to change much in the code, if he uses a list.
Right. It wouldn't do any good anyway as this
#include <iostream>
#include <vector>
#include <list>
#include <algorithm>
#include <string>
const unsigned int kLimit = 1000000;
template< class Cont >
void fill(Cont& cont)
{
for( unsigned int u = 0; u < kLimit; ++u ) {
cont.push_back(Test());
}
}
template< class Cont >
void test(Cont& cont);
class Test {
public:
Test() : instance_(++id_), str_("test it!") {}
bool operator<(const Test& rhs) {return instance_>rhs.instance_;}
private:
unsigned int instance_;
static unsigned int id_;
std::string str_;
};
unsigned int Test::id_ = 0;
int main()
{
std::vector<Test> v; v.reserve(kLimit);
std::list<Test> l;
std::cout << "filling a vector..." << std::endl;
fill(v);
std::cout << "filling a list..." << std::endl;
fill(l);
std::cout << "...done.\n";
std::cout << "sorting a vector..." << std::endl;
test(v);
std::cout << "sorting a list..." << std::endl;
test(l);
return 0;
}
template< typename T, class Al >
inline void sort(std::vector<T,Al>& v) {std::sort(v.begin(),v.end());}
template< typename T, class Al >
inline void sort(std::list<T,Al>& l) {l.sort();}
#include <windows.h> //for GetTickCount()
template< class Cont >
void test(Cont& cont) {
const DWORD start = GetTickCount();
sort(cont);
std::cout << "...took " << GetTickCount()-start << "msecs." << std::endl;
}
outputs
filling a vector...
filling a list...
...done.
sorting a vector...
...took 829msecs.
sorting a list...
...took 2437msecs.
and thus again disagrees with you.
Eagerly awaiting your counter example,
Scho-you-can-take-your-foot-out-of-your-mouth-now-bi
Not always so.
You are correct that as lists do not provide random-access iterators
quicksort can not be used. Instead, lists are normally sorted using
merge sort. Merge sort has better worst-case performance than
quicksort. The drawback of merge sort is that it normally requires
extra memory, whereas quicksort does not.
http://en.wikipedia.org/wiki/Merge_sort
<quote>
In the worst case, merge sort does about 39% fewer comparisons than
quicksort does in the average case; merge sort always makes fewer
comparisons than quicksort, except in extremely rare cases, when they
tie, where merge sort's worst case is found simultaneously with
quicksort's best case. In terms of moves, merge sort's worst case
complexity is O(n log n)—the same complexity as quicksort's best case,
and merge sort's best case takes about half as many iterations as the
worst case.
</quote>
--
Max
[ Non-portable code...]
>
> and thus again disagrees with you.
>
> Eagerly awaiting your counter example,
#include <iostream>
#include <ctime>
#include <vector>
#include <list>
#include <cstddef>
#include <algorithm>
class SomeClass
{
typedef std::vector<int> TypeVector;
TypeVector vec;
enum { VectorSize= 1000 };
public:
SomeClass();
bool operator<(const SomeClass &argSomeClass) const
{
return vec[0]< argSomeClass.vec[0];
}
};
int main()
{
using namespace std;
srand(time(0));
const size_t SIZE=10000;
typedef vector<SomeClass> Vector;
typedef list<SomeClass> List;
cout<< "\nCreating vector with "<< SIZE<< " elements..."<< flush;
Vector vec(SIZE);
cout<<" Done!\n\n"<< flush;
List lis(vec.size());
cout<< "Filling list with vector elements..."<< flush;
for(Vector::size_type i= 0; i< vec.size(); ++i)
lis.push_back(vec[i]);
cout<< " Done!\n\n"<< flush;
clock_t timeBeginVector, timeEndVector, timeBeginList, timeEndList;
cout<< "Timing the sorting of the vector..."<< flush;
timeBeginVector= clock();
sort(vec.begin(), vec.end());
timeEndVector= clock();
cout<< " Done!\n\n"<< flush;
cout<< "Timing the sorting of the list..."<< flush;
timeBeginList= clock();
lis.sort();
timeEndList= clock();
cout<< " Done!\n\n"<< flush;
cout<< "The sorting of the vector took "
<< static_cast<double>((timeEndVector- timeBeginVector))/
CLOCKS_PER_SEC
<< " seconds\n\n";
cout<< "The sorting of the list took "
<< static_cast<double>((timeEndList- timeBeginList))/
CLOCKS_PER_SEC
<< " seconds\n\n";
}
SomeClass::SomeClass():vec(VectorSize)
{
using namespace std;
for(TypeVector::size_type i= 0; i< vec.size(); ++i)
vec[i]= rand();
sort(vec.begin(), vec.end());
}
I had to add another zero in order to get sensible results:
Creating vector with 100000 elements... Done!
Filling list with vector elements... Done!
Timing the sorting of the vector... Done!
Timing the sorting of the list... Done!
The sorting of the vector took 0.016 seconds
The sorting of the list took 0.468 seconds
HTH,
Schobi
Your computer is faster than mine and probably with more RAM.
In my computer with SIZE 10000:
john@john-desktop:~/Projects/src$ ./foobar_cpp
Creating vector with 10000 elements... Done!
Filling list with vector elements... Done!
Timing the sorting of the vector... Done!
Timing the sorting of the list... Done!
The sorting of the vector took 2.55 seconds
The sorting of the list took 0.02 seconds
john@john-desktop:~/Projects/src$
In my computer with SIZE 100000 (uses swap):
john@john-desktop:~/Projects/src$ ./foobar_cpp
Creating vector with 100000 elements... Done!
Filling list with vector elements... Done!
Timing the sorting of the vector... Done!
Timing the sorting of the list... Done!
The sorting of the vector took 27.46 seconds
The sorting of the list took 4.58 seconds
john@john-desktop:~/Projects/src$
==> The timings less than 1 second are not conclusive. Add 0s until you
exceed 1-2 seconds.
corrected:
TypeVector vec;
enum { VectorSize= 1000 };
public:
SomeClass();
srand(time(0));
const size_t SIZE=10000;
===> List lis;
If I add a single one more, the program runsout of memory.
Schobi
Creating vector with 100000 elements... Done!
Filling list with vector elements... Done!
Timing the sorting of the vector... Done!
Timing the sorting of the list... Done!
The sorting of the vector took 0.015 seconds
The sorting of the list took 0.437 seconds
Schobi
And my results:
1. const size_t SIZE= 90000;
john@john-desktop:~/Projects/src$ ./foobar_cpp
Creating vector with 90000 elements... Done!
Filling list with vector elements... Done!
Timing the sorting of the vector... Done!
Timing the sorting of the list... Done!
The sorting of the vector took 24.79 seconds
The sorting of the list took 0.1 seconds
john@john-desktop:~/Projects/src$
2. const size_t SIZE= 100000;
john@john-desktop:~/Projects/src$ ./foobar_cpp
Creating vector with 100000 elements... Done!
Filling list with vector elements... Done!
Timing the sorting of the vector... Done!
Timing the sorting of the list... Done!
The sorting of the vector took 26.8 seconds
The sorting of the list took 0.1 seconds
john@john-desktop:~/Projects/src$
Again, increase the number of const size_t SIZE so as to exceed 1-2
seconds for the sorting of one container, and then post your results.
For example try
const size_t SIZE= 400000;
This exercise is pointless. I made 200K elements (no swap, but the
computer uses up to 2.5G of RAM, probably mostly due to the need to keep
all the pointers in the list elements) and I still consistently get
Creating vector with 200000 elements... Done!
Filling list with vector elements... Done!
Timing the sorting of the vector... Done!
Timing the sorting of the list... Done!
The sorting of the vector took 0.015 seconds
The sorting of the list took 0.156 seconds
What is it you're trying to prove? That your computer and your
implementation are dilapidated? Get a better compiler.
The specific code had a serious bug (list had double elements than
vector). Also differences in milliseconds have no point (are inconclusive).
Just for the record, my system is:
OS: Ubuntu 8.04 x86.
Compiler: gcc version 4.2.3 (Ubuntu 4.2.3-2ubuntu7)
Hardware: Pentium III 1 GHz, 1 GB RAM.
-------------- a debug build:
Creating vector with 200000 elements... Done!
Filling list with vector elements... Done!
Timing the sorting of the vector... Done!
Timing the sorting of the list... Done!
The sorting of the vector took 0.203 seconds
The sorting of the list took 4.602 seconds
-------------- a release build:
Creating vector with 200000 elements... Done!
Filling list with vector elements... Done!
Timing the sorting of the vector... Done!
Timing the sorting of the list... Done!
The sorting of the vector took 0.016 seconds
The sorting of the list took 0.062 seconds
-------------- another release build:
Creating vector with 300000 elements... Done!
Filling list with vector elements... Done!
Timing the sorting of the vector... Done!
Timing the sorting of the list... Done!
The sorting of the vector took 0.016 seconds
The sorting of the list took 0.125 seconds
---------------------------------------------------------
Tell me again, what does your example prove? And, again, get a better
compiler and a better library. Whatever version you're using is *crap*,
obviously.
Apart from the bug, sorting a list, does not involve the construction or
destruction of any object, while in the case of vector, element objects
are copied.
Do you agree that sorting a list does not involve object creation, while
sorting a vector involves the copying of the object elements?
or assignment,
The system I tested on is DELL T7400 (dual Intel Xeon quad core), Vista
Ultimate 64, Visual C++ 2008 sp1, 4 gigs of RAM. Since no parallelizing
is done in your program, the number of cores doesn't matter, but the CPU
is faster, and while I would love to try it on a 16-gigabyte machine or
better, I don't have access to one. My machine is what our clients are
getting, so the results stand. You're free to speculate what makes it
so *bad* on your machine, but I'd seriously think about switching to a
real OS and a real compiler, if I got such bad results as you do.
Whatever it involves (and I'd think it would actually be *swapping* and
not assignment or copying), apparently *your library* does it in a very
unacceptable manner.
You can theorize as much as you want, but in the end the results only
prove that one needs to measure before coming to any conclusions. Logic
does not always work well...
Doesn't a swap sort of imply a copy of "something"?
struct object {
int x;
};
object objs[2] = { { 0 }, { 1 } };
How do you swap `obj[0]' with `obj[1]' without copying "something"?
> apparently *your library* does it in a very unacceptable manner.
> You can theorize as much as you want, but in the end the results only
> prove that one needs to measure before coming to any conclusions. Logic
> does not always work well...
;^)
So how many constructions and destructions does std::sort involve?
>
>Regarding ints, I think sorting a vector of ints and as list of ints,
>both have about the same efficiency.
How many integer assignments to swap two ints in a vector?
How many pointer assignments to swap two elements in a list?
>
>If the programmer decides to replace ints with other objects, he will
>not have to change much in the code, if he uses a list.
What would he have to change with a vector?
--
Richard Herring
This program doesn't sort at all, because the vec is initialized with
one constant! Change the list filling code to
for(Vector::size_type i= 0; i< vec.size(); ++i) {
vec[i]= SomeClass();
lis.push_back(vec[i]);
}
and you have something to sort.
My result on OpenSuse 11 64Bit with Phenom 9850 and 4GB RAM, gcc 4.3.1,
const size_t SIZE=100000, compiled with g++ -O3 :
Creating vector with 100000 elements... Done!
Filling list with vector elements... Done!
Timing the sorting of the vector... Done!
Timing the sorting of the list... Done!
The sorting of the vector took 1.12 seconds
The sorting of the list took 0.12 seconds
Lars
[ ... ]
> You are correct that as lists do not provide random-access iterators
> quicksort can not be used.
Actually, that's not true. It's entirely possible to implement a
quicksort on a linked list. You can't (efficiently) use a median-of-
three (or whatever) pivot selection, but the quicksort itself has no
need for random-access iteration.
> Instead, lists are normally sorted using merge sort.
This, however, is true. Even though you _can_ use quicksort on a linked
list, the potential gains from doing so are much smaller than the
potential losses.
> Merge sort has better worst-case performance than
> quicksort. The drawback of merge sort is that it normally requires
> extra memory, whereas quicksort does not.
When applied to arrays/vectors, that's true. For a linked list, they
both require essentially the same extra memory.
--
Later,
Jerry.
The universe is a figment of its own imagination.
This is true even if the elements being sorted are ints?
What do you think is faster, copying and assigning individual ints, or
updating pairs of pointers at each stage of the sorting?
Not true.
Granted, you can't do it for the *first* partition, but there's no
problem in getting the median-of-three pivot in subsequent partitions.
A vector::swap() can be a shallow swap. For example if a vector is
implemented as a pointer pointing to an array on the free store, swap
could just swap the pointer values, without interfering with the contents.
? Yes it is initalised with its number of elements, which are SomeClass
objects created with their default constructor.
> Change the list filling code to
>
> for(Vector::size_type i= 0; i< vec.size(); ++i) {
> vec[i]= SomeClass();
> lis.push_back(vec[i]);
> }
Why?
On my system all values are equal. The constructor is only called once!
The other elements are initialized with a copy of this SomeClass object!
Don't know what the standard says about this ...
>> Change the list filling code to
>>
>> for(Vector::size_type i= 0; i< vec.size(); ++i) {
>> vec[i]= SomeClass();
>> lis.push_back(vec[i]);
>> }
>
>
> Why?
To get distinct elements. See above.
Lars
There are two vecs in the code. Please be more specific, which are you
referring to?
Vector vec(SIZE); in main.
Lars
Already spotted the strange behaviour and posted a new message with the
subject "Strange behaviour".
What does your example have to do with the OP's question? The OP has one int
as member, while in your code, each element contains a vector of 1000 ints
instead. If this proves anything, then that you can construct a situation
where a list is faster than a vector. Unfortunately, that situation has
nothing to do with the OP's question.
My point was that "something" is copied; in this case a pointer value.
However, in this case, that statement would be a ridiculous nit-pick!
;^)
Anyway, I this was about sorting vectors... AFAICT, this implies some
swapping of the contents of slots in the array, and swapping implies
"something" is copied. If its a pointer value, so be it. If not, well, then
things can start to get expensive. The sort and swap impl needs to be very
highly optimized indeed.
Anyway, I _thought_ this was about sorting vectors...
ARGH!
You can use quick-sort on a skip-list where one of the skip-links points to
the middle of the list, or some other pivot.
I managed to duplicate your results once, but only once. Despite
numerous tries your code consistently shows the vector being at least an
order of magnitude faster. Make sure your are not running other CPU-
intensive applications while running the test.
> The system I tested on is DELL T7400 (dual Intel Xeon quad core), Vista
> Ultimate 64, Visual C++ 2008 sp1, 4 gigs of RAM. Since no parallelizing
> is done in your program, the number of cores doesn't matter, but the CPU
> is faster, and while I would love to try it on a 16-gigabyte machine or
> better, I don't have access to one. My machine is what our clients are
> getting, so the results stand. You're free to speculate what makes it
> so *bad* on your machine, but I'd seriously think about switching to a
> real OS and a real compiler, if I got such bad results as you do.
Long since I last heard anyone calling Linux and gcc toy products. :-)
--
Erik Wikström
OK, as it was determined, your compiler has a bug (that makes it invoke
the default c-tor for all elements of a vector constructed with only the
size given. I've added a copy constructor that does exactly the same
thing as the default constructor and finally got your results. A vector
of large objects with random values takes longer to sort than a list of
large objects with similarly random values. Significantly longer. Your
theory is confirmed, I suppose.
Now, it has really nothing to do with the OP's problem, but still...
I didn't call them toy products, either.
Actually from the responses I got in the "Strange behaviour (corrected)"
thread, it isn't a bug.
> I've added a copy constructor that does exactly the same
> thing as the default constructor and finally got your results. A vector
> of large objects with random values takes longer to sort than a list of
> large objects with similarly random values. Significantly longer. Your
> theory is confirmed, I suppose.
>
> Now, it has really nothing to do with the OP's problem, but still...
Strangely enough there should be no difference. With the default copy
constructor I get the differences I mentioned, while you get them only
when you define the copy constructor explicitly.
On Oct 2, 4:08 am, Ioannis Vranos replied:
> If the programmer decides to replace ints with other objects, he will
> not have to change much in the code, if he uses a list.
Even if he uses a std::vector, he still won't have to change much
in the code.
In my experience, it is usually not a good thing to write code
optimized for something that "might" happen in the future. If it does
happen, let the coder change it then, and not earlier.
I've seen code where data was stored in a list (or vector), but
nothing whatsoever was being done with the list after that. When I
asked the original coder why it was there, the reply was, "So that if
a future coder needs a list of the data, it's there to use."
Seriously, though, if you saw a list or vector that had did nothing
other than being populated, wouldn't you think it was a bug (or at
least vestigial code that was forgotten to be removed)?
I mean, if I had to add code that required me to use a list/vector
of that data, I would probably declare and populate my own list (who
wouldn't?). Even if I did see the first list (the one written for
future use), I would probably pass it over thinking it was needed for
some other piece of code -- fearing that if I manipulated it in any
way it might break current functional code.
To be honest, I think the STL sort algorithms are efficient enough
that the type of container they sort doesn't really matter in the long
run. In other words, they scale well enough that their difference in
speed can be pretty much considered negligible.
That being said, I think the original poster should use the
container that best suits the problem best. If the data is meant to
be used as a linked-list, then he/she should use a std::list. If the
data is meant to be used as a vector, then he/she should use a
std::vector.
Personally, I would suggest using a std::set, since it seems like
all he/she needs the container for is for checking to see if a number
is among the set of numbers he/she has already encountered -- unless
he/she also needs to use the set of numbers somewhere else as a list
or a vector.
Take care,
-- Jean-Luc
Disregard this message of mine.
#include <iostream>
#include <ctime>
#include <vector>
#include <list>
#include <cstddef>
#include <algorithm>
class SomeClass
{
typedef std::vector<int> TypeVector;
TypeVector vec;
enum { VectorSize= 1000 };
public:
SomeClass();
SomeClass(const SomeClass &);
bool operator<(const SomeClass &argSomeClass) const
{
return vec[0]< argSomeClass.vec[0];
}
};
int main()
{
using namespace std;
srand(time(0));
const size_t SIZE=1000;
typedef vector<SomeClass> Vector;
typedef list<SomeClass> List;
cout<< "\nCreating vector with "<< SIZE<< " elements..."<< flush;
Vector vec(SIZE);
cout<<" Done!\n\n"<< flush;
List lis;
timeBeginVector= clock();
sort(vec.begin(), vec.end());
timeEndVector= clock();
timeBeginList= clock();
lis.sort();
timeEndList= clock();
sort(vec.begin(), vec.end());
}
==> SomeClass::SomeClass(const SomeClass &):vec(VectorSize)
I also think std::set is the answer for the OP.
Just like std::list, std::set wasts too much memory per element to be
efficient for storing large numbers of elements, especially if each
datum is of a fundamental type. There is no need to waste all that
memory if the collection stabilises before the lookups. If lookups and
changes to the collection were intermingled, then re-sorting a vector
(or reallocating it when inserting in the middle) would be too much,
then a set would probably be more efficient. Again, nothing can be said
with certainty without *measuring*.
Quite true -- my statement was sufficiently incomplete that it wasn't
really accurate as stated. Thanks for the correction.
You will never get a sorted vector. Every time you copy an element you
will change it's value!
Lars
I think during sort(), the default assignment operator is used, and not
the copy constructor.
You have to make a copy during the exchange of two elements:
SomeClass tmp = vec[i]; // new value here
vec[i] = vec[j];
vec[j] = tmp;
Lars
You are right, I will post a corrected version.
Thanks.
#include <iostream>
#include <ctime>
#include <vector>
#include <list>
#include <cstddef>
#include <algorithm>
class SomeClass
{
public:
typedef std::vector<int> TypeVector;
TypeVector vec;
enum { VectorSize= 1000 };
public:
void fillWithSortedRandomNumbers();
bool operator<(const SomeClass &) const;
SomeClass();
};
SomeClass::SomeClass():vec(VectorSize){}
inline void SomeClass::fillWithSortedRandomNumbers()
{
using namespace std;
for(TypeVector::size_type i= 0; i< vec.size(); ++i)
vec[i]= rand();
sort(vec.begin(), vec.end());
}
inline bool SomeClass::operator<(const SomeClass &argSomeClass) const
{
return vec[0]< argSomeClass.vec[0];
}
int main()
{
using namespace std;
srand(time(0));
const size_t SIZE=10000;
typedef vector<SomeClass> Vector;
typedef list<SomeClass> List;
cout<< "\nCreating vector with "<< SIZE<< " elements..."<< flush;
Vector vec(SIZE);
for(Vector::size_type i= 0; i< vec.size(); ++i)
vec[i].fillWithSortedRandomNumbers();
> Erik Wikström wrote:
>> On 2008-10-02 16:11, Victor Bazarov wrote:
>>> [..] You're free to speculate what makes it
>>> so *bad* on your machine, but I'd seriously think about switching to a
>>> real OS and a real compiler, if I got such bad results as you do.
>>
>> Long since I last heard anyone calling Linux and gcc toy products. :-)
>
> I didn't call them toy products, either.
No, you just claimed they were not "a real OS and a real compiler".
No, I didn't *claim* that either. I *implied* it, maybe. And still,
it's up to the reader to [try to] deduce what I *implied*. Don't put
words into my mouth, please.
You don;t need to know maximum value but aproximatelly
maximum number of integers that would be there.
If you are good guesser this hash class will suit your
needs. It has O(1) lookup and if there are more
numbers than table entries will do linear search,
but that shouldn't happen as much.
class HashT{
public:
HashT(unsigned max = 100):table_(max){}
bool find(unsigned num)const
{
unsigned index = num%table_.size();
if(table_[index].size()>0)
{
vector<int>::const_iterator
i = std::find(table_[index].begin(),table_[index].end(),num);
return i!=table_[index].end();
}
return false;
}
void insert(unsigned num)
{
unsigned index = num%table_.size();
if(table_[index].empty())
table_[index].push_back(num);
else
{
vector<int>::const_iterator
i = std::find(table_[index].begin(),table_[index].end(),num);
if(i == table_[index].end())
table_[index].push_back(num);
}
}
private:
vector<vector<int> > table_;
};
Greetings, Branimir.
[ ... ]
> A vector::swap() can be a shallow swap.
Not only can be, but basically _has_ to be. In particular, the _only_
situation in which vector::swap is allowed to throw is if the
container's allocator throws. This means that swap can't use _any_ of
the operators built into the type being contained. That leaves a shallow
swap as essentially the only option.
The memory isn't contiguous on most common contemporary systems; it is
heavily fragmented and just appears contiguous (paged).
AFAIK that's completely inconsequential from the point of view of the
cache. The size of a cache block is much smaller than the size of a
memory page (which is probably a multiple of it).
So it took you three tries and code that's just about as
far from the original requirement as possible while still
dealing with lists and vectors to come up with an example
where a list is actually faster than a vector.
Do you realize that you just proofed everyone else's point.
(Start with a vector, and measure whether a list is faster,
for it might happen, although not in general.)
Schobi
It takes some effort to maximize the use of the cache. I don't think one
could even use a `std::vector' in a cache-friendly manner. Well, first of
all, the memory for the vector would simply need to start on a cache-line
boundary; how can this be ensured? I think it would be impossible. Well,
perhaps with a custom allocator... Therefore, if you want to create highly
cache-friendly arrays and algorihtms, well, you need to "roll your own". I
don't quite see how one could use a `std::vector' and know that all the
caveats, such as false-sharing and proper data-alignment, have been
addressed...
Actually the knowledge that 'std::vector' is more cache-friendly
than 'std::list' was obtained empirical: First people discovered
the fact and then found out (or rather speculated, I suppose
nobody ever went and really researched it) that processor caching
is the reason.
Schobi