On Tuesday, 11 December 2018 12:40:59 UTC+2, Paul wrote:
> On Tuesday, December 11, 2018 at 9:45:48 AM UTC, Öö Tiib wrote:
> > On Tuesday, 11 December 2018 01:02:43 UTC+2, Paul wrote:
> > >
> > > I'm not sure how to proceed in that case. I'm doing the Boyer-Moore majority
> > > problem as follows. If a vector of ints has an element that occurs
> > > more than half the time, output that number, otherwise output 0.5.
> >
> > You return int as double and use 0.5 as indicator of "not found"?
> > That is obscure. Return something that makes sense like
> > std::optional<vote> or std::pair<bool,vote>.
> >
> > > There's a standard algorithm to do this by incrementing and decrementing
> > > a counter. But I'm pretending I don't know that, and trying to do
> > > the problem time-efficiently via data structures instead.
> >
> > The standard algorithm does it with two passes on unsorted
> > sequence so O(2*N). Sorting the sequence and then one-pass
> > counting is O(N*log(N)+N). Copying to another, sorted or hashed
> > container and then one-pass counting is about as time-complex
> > as with sorting plus it takes double memory too. Your algorithm
> > below loses to all of those.
It is pity that you did ignore it.
> >
> > > My code is below but I don't want the multiset to do so much unnecessary
> > > counting. Basically, I don't know how to code an at_least function for
> > > a multiset. Any ideas gratefully received. Thanks. Paul
> > >
> > > double DataStructuresMajority(const std::vector<int>& vec)
> > > {
> > > const double noMajority = 0.5;
> > > std::unordered_multiset<int> duplicates(vec.begin(), vec.end());
> > > for(std::unordered_set<int>::const_iterator iter = duplicates.begin(); iter!= duplicates.end(); ++iter)
> > > if(duplicates.count(*iter) > vec.size()/2)
> > > return *iter;
> > >
> > > return noMajority;
> > > }
> >
> > That is mixing up iterators of unordered_multiset and unordered_set.
> > Does it compile? Also you seemingly call count for every element in
> > equal range so it looks something like O(N*log(N) + N*N). Sad.
>
> Thanks for your reply. A number of points. In similar exercises, it is
> common to have a problem where solutions are assigned a non-negative integer
> and -1 is a sentinel to represent the no-solution case.
> Here, -1 is valid in the case where a solution is present. So I tried to
> mimic this basic technique and chose 0.5 as a sentinel.
> However, the technique doesn't work as well because it necessitates changing
> the return type which isn't ideal. I hadn't heard of std::optional -- many
> thanks for introducing this to me.
The whole point posting was to help you.
> The time complexity of my algorithm is O(N)
It is clearly not.
> Thanks for pointing out the mixing of the two types of iterators. I hadn't
> noticed that. It does compile with gcc. I avoid posting non-compilable code
> by copy-pasting of a verified program (although you couldn't have known
> that). I would rather it didn't compile so that I could have uncovered this
> mistake myself. I should have just used a range-based for loop.
Yes, but more importantly it can be made one pass algorithm.
> Because of the hashing, my algorithm is O(N). Unordered_multiset uses hashing.
> You might be correct that "it looks something like O(N*log(N) + N*N)" but
> it is O(N) regardless of what it looks like.
> See Alf's post for more info.
Can you try to explain it a bit better? Telling instead that hashtable is
using hashing clarifies nothing of your line of thought.
Learn to explain things with your own words; I can't find support to
your O(N) in any of Alf's posts.
I can explain why your complexity "looks" what I said:
A) Complexity of constructor
https://en.cppreference.com/w/cpp/container/unordered_multiset/unordered_multiset
type 2) so linear to quadratic. Starting O formula:
O(N*something + ...
B) Complexity of for loop over all elements
That is linear. Adding to O forrmula:
O(N*something + N * ...
C) Complexity of calling count in for loop
https://en.cppreference.com/w/cpp/container/unordered_multiset/count
That is also linear to the number of equal elements.
O(N*something + N * M)
Where N is number of elements and M is number of equal
elements. That can't be reduced down to O(N) .... and looks expensive
so go ahead and explain why it is not that.