Re: Compute Unique Numbers in a Set

302 views
Skip to first unread message

Bonita Montero

unread,
Jan 8, 2023, 12:01:08 AMJan 8
to
Now it's perfect:

#include <iostream>
#include <unordered_set>
#include <charconv>
#include <random>

using namespace std;

int main( int argc, char **argv )
{
try
{
if( argc < 4 )
return
cout << argv[0] << " n from to" << endl,
EXIT_FAILURE;
auto parse = []( char const *str, char const *err )
{
size_t value;
if( from_chars_result fcr = from_chars( str, str + strlen( str ),
value ); (bool)fcr.ec || *fcr.ptr )
throw invalid_argument( err );
return value;
};
size_t
n = parse( argv[1], "wrong number of values" ),
from = parse( argv[2], "wrong from-value" ),
to = parse( argv[3], "wrong to-value" );
if( from > to )
swap( from, to );
if( !n || n - 1 > to - from )
return
cout << "n is too small" << endl,
EXIT_FAILURE;
unordered_set<size_t> already;
already.reserve( n );
mt19937_64 mt;
uniform_int_distribution<size_t> uid( from, to );
while( already.size() != n )
{
size_t value;
do
value = uid( mt );
while( already.contains( value ) );
already.emplace( value );
cout << value << endl;
}
}
catch( exception const &exc )
{
return
cout << exc.what() << endl,
EXIT_FAILURE;
}
}

Bart

unread,
Jan 8, 2023, 9:49:14 AMJan 8
to
That's some torturous-looking code. My attempt for the same spec is give
below, in scripting code. Despite that, I'd argue that it would simpler
to use that as a starting point to create a C version rather than try
and adapt your C++ code, because it is basically pseudo-code.

Half of this is just reading and checking command line inputs which can
be in any manner at all. In any case it's of little relevance; of much
more use would be a function that returned a set of N unique random
numbers, that can be embedded into another program.

Note that your error message is wrong: N would be too large, not too small.

Neither version seeds the PRNG so they're both useless on repeated runs.

----------------------------------------------------------------

if ncmdparams<>3 then
println "Usage: N From To"
stop 1
fi

read n:"i", lower:"i", upper:"i"

if lower>upper then swap(lower,upper) fi

if n > upper-lower+1 then
abort("N is too large")
fi

a::=()

to n do
repeat
x:=random(lower..upper)
until x not in a
a &:= x
println x
od

Bonita Montero

unread,
Jan 8, 2023, 12:21:55 PMJan 8
to
Am 08.01.2023 um 15:48 schrieb Bart:

> That's some torturous-looking code. My attempt for the same spec is give
> below, in scripting code. Despite that, I'd argue that it would simpler
> to use that as a starting point to create a C version rather than try
> and adapt your C++ code, because it is basically pseudo-code.

C++ is used when things are usually not that simple and you have a
performance requirement. If you strip the screen output and run the
code with a large number of values any the given C++ code is magni-
tudes faster than any scripting code.


Bart

unread,
Jan 8, 2023, 12:34:55 PMJan 8
to
On 08/01/2023 14:48, Bart wrote:
> On 08/01/2023 05:01, Bonita Montero wrote:
>> Now it's perfect:
>>
>> <snip complicated C++ code>>

> That's some torturous-looking code.

I tested your [BM] code for speed, using N=1000000, and limits of 1..N+1.

The 'cout' in the loop was replaced with a count of all the values, so
that at the end the total was displayed to be able to compare results.

The C++ with gcc-O3 took 1.25 seconds.

My script initially took much longer due to using an unordered list
(storing up to 1M values). But I tweaked it use a bit-set.

Then it took 0.67 seconds.


(Revised script:)
>     a:=new(set,lower..upper)
> sum:=0

>     to n do
>         repeat
>             x:=random(lower..upper)
>         until x not in a
>         a[x]:=1
> sum+:=x
>     od

Bart

unread,
Jan 8, 2023, 12:46:39 PMJan 8
to
I posted my last reply before I even saw this comment. I explained how I
did exactly that! My script was faster.

I've now tried N=10M and results are:

C++ 79 seconds (g++ -O0)
C++ 16 seconds (g++ -O3)
Script (1) 13 seconds (HLL only, unoptimised)
Script (2) 9.5 seconds (ASM-accelerated unoptimised)
Script (3) 7.3 seconds (HLL only, optimised via C/gcc/-O3)

The 'optimisation' refers to the interpreter; the program is still bytecode.

The interesting thing is that my unoptimised interpeter, executing
byte-code, is 6 times as fast as unoptimised C++ executing native code.

(However I don't know what's going on inside your C++ libraries; this is
where C has the advantage of transparency.)

Paavo Helde

unread,
Jan 8, 2023, 2:20:06 PMJan 8
to
I see two issues with this perfect solution:

- repeated values are excluded, so the randomness is less than
perfect. But this seems to be by design.

- two lookups in 'already' instead of the optimal one. What's wrong
with insert() and checking its return value?






Ben Bacarisse

unread,
Jan 8, 2023, 2:20:26 PMJan 8
to
Bonita Montero <Bonita....@gmail.com> writes:

> Now it's perfect:

Does not even compile for me (g++ -std=c++20 required).

--
Ben.

Ike Naar

unread,
Jan 8, 2023, 4:46:14 PMJan 8
to
On 2023-01-08, Bart <b...@freeuk.com> wrote:
> On 08/01/2023 14:48, Bart wrote:
>> On 08/01/2023 05:01, Bonita Montero wrote:
>>> Now it's perfect:
>>>
>>> <snip complicated C++ code>>
>
>> That's some torturous-looking code.
>
> I tested your [BM] code for speed, using N=1000000, and limits of 1..N+1.
>
> The 'cout' in the loop was replaced with a count of all the values, so
> that at the end the total was displayed to be able to compare results.
>
> The C++ with gcc-O3 took 1.25 seconds.
>
> My script initially took much longer due to using an unordered list
> (storing up to 1M values). But I tweaked it use a bit-set.
>
> Then it took 0.67 seconds.
>
>
> (Revised script:)
>> ??? a:=new(set,lower..upper)
>> sum:=0
>
>> ??? to n do
>> ??????? repeat
>> ??????????? x:=random(lower..upper)
>> ??????? until x not in a
>> ??????? a[x]:=1
> > sum+:=x
>> ??? od

In the program below, each random number requires only constant time and
retries are not necessary.
For N=1000000 it runs in 0.01 second.


#include <stdlib.h>
#include <stdio.h>

enum {N = 1000000};
static int used;
static int numbers[N];
/*
numbers[0 .. N) contain a permutation of [0 .. N)
0 <= used <= N
numbers[0 .. used) are available
numbers[used .. N) are already used
*/

static void reset(void)
{
/* mark all numbers[0..N) as available */
used = N;
}

static void init(void)
{
/* fill numbers[0..N) with a permutation of [0..N) */
for (int i = 0; i != N; ++i)
{
numbers[i] = i;
}
}

static int getrandom(void)
{
/* pick a random value from the available numbers, and update the numbers array */
if (used == 0)
{
puts("ran out of available numbers");
abort();
}
int index = rand() % used;
--used;
int value = numbers[index];
numbers[index] = numbers[used];
numbers[used] = value;
return value;
}

static void run(int length)
/*
generate a run of 'length' unique numbers from the range [0..N)
*/
{
reset();
long long sum = 0;
for (int i = 0; i != length; ++i)
{
sum += getrandom();
}
printf("%lld\n", sum);
}

int main(void)
{
/* if necessary, seed the PRNG (not done here) */
init();
run(N);
return 0;
}

Bart

unread,
Jan 8, 2023, 6:13:51 PMJan 8
to
So, this is more of a shuffle? Since the set of numbers is not really
random, it's a permutation of a consecutive range of values.

I tried your code on 10M numbers (to be easier to measure), and with
gcc-O3 it took 0.22 seconds.

However, on Windows RAND_MAX is only about 32K, and the resulting
numbers were not properly distributed. Doubling-up the rand calls was
better, but the timing went up to 1.2 seconds for some reason.

I then imported my own prng, now timings were 0.3 seconds. Trying a
similar shuffle on scripting code was about 3 seconds.

Chris M. Thomasson

unread,
Jan 8, 2023, 6:18:49 PMJan 8
to
[...]

It looks like shuffle of a set of unique numbers.

Bonita Montero

unread,
Jan 8, 2023, 10:57:56 PMJan 8
to
Am 08.01.2023 um 18:46 schrieb Bart:

> I've now tried N=10M and results are:
>
> C++                    79   seconds  (g++ -O0)
> C++                    16   seconds  (g++ -O3)
> Script (1)             13   seconds  (HLL only, unoptimised)
> Script (2)              9.5 seconds  (ASM-accelerated unoptimised)
> Script (3)              7.3 seconds  (HLL only, optimised via C/gcc/-O3)

Whatever you tested: you dindn't test my code.


Alf P. Steinbach

unread,
Jan 9, 2023, 6:04:13 AMJan 9
to
On 2023-01-08 6:01 AM, Bonita Montero wrote:
>         if( !n || n - 1 > to - from )
>             return
>                 cout << "n is too small" << endl,
>                 EXIT_FAILURE;

Presumably you meant "n is too large".

Tip: you can get vastly more clear code by avoiding side effects in
expressions rather than trying to leverage such side effects.


>         unordered_set<size_t> already;
>         already.reserve( n );
>         mt19937_64 mt;
>         uniform_int_distribution<size_t> uid( from, to );
>         while( already.size() != n )
>         {
>             size_t value;
>             do
>                 value = uid( mt );
>             while( already.contains( value ) );
>             already.emplace( value );
>             cout << value << endl;
>         }

When n is large and is near or equal to the number of possible values,
this loop becomes inefficient: for the last number you can expect on
average n iterations to find that number, because each call of uid has a
1/n chance of finding it.

Not sure of the exact overall complexity but for that case I guess O(n^2).

However this can be a good way to generate distinct random numbers when
n is very small compared to the range size. But when n is close to the
range size, if the range size is small compared to available memory then
just store the sequence of numbers in the range and shuffle it, and use
the first n values in the shuffled sequence. When neither of these
conditions hold I don't know how to approach the problem; I'd probably
then start by looking it up in Wikipedia or general googling.

- Alf

Bart

unread,
Jan 9, 2023, 6:27:11 AMJan 9
to
It wasn't /my/ code because I don't know C++. It was just tweaked from
yours:

#include <iostream>
#include <unordered_set>
#include <charconv>
#include <random>
#include <string.h>

using namespace std;

int main( int argc, char **argv )
{
try
{
int n=10000000, from=1, to=n+1;
unordered_set<size_t> already;
already.reserve( n );
mt19937_64 mt;
uniform_int_distribution<size_t> uid( from, to );
long long int sum=0;
int i=0;
while( already.size() != n )
{
size_t value;
do
value = uid( mt );
while( already.contains( value ) );
already.emplace( value );
sum+=value;
// cout << value << " " << sum << " " << ++i << endl;
}
cout << sum << endl;

}
catch( exception const &exc )
{
return
cout << exc.what() << endl,
EXIT_FAILURE;
}
}


(The calculated 'sum' is not the sum of all the final set of values; I
realised I had no idea how to access those in a separate loop.)

Bonita Montero

unread,
Jan 9, 2023, 9:57:03 AMJan 9
to
You must have made sth. wrong.

Bonita Montero

unread,
Jan 9, 2023, 11:42:32 AMJan 9
to
Am 09.01.2023 um 12:03 schrieb Alf P. Steinbach:

> When n is large and is near or equal to the number of possible values,
> this loop becomes inefficient: for the last number you can expect on
> average n iterations to find that number, because each call of uid has a
> 1/n chance of finding it.

I know that the loop becomes inefficient then, but tell me a better
alternative algorithm. You'd have a list of eligible numbers and
randomly chose one of them. That would also take a lot of time to
remove the number from the list.

Ben Bacarisse

unread,
Jan 9, 2023, 6:22:26 PMJan 9
to
Bonita Montero <Bonita....@gmail.com> writes:

> Am 09.01.2023 um 12:03 schrieb Alf P. Steinbach:
>
>> When n is large and is near or equal to the number of possible
>> values, this loop becomes inefficient: for the last number you can
>> expect on average n iterations to find that number, because each call
>> of uid has a 1/n chance of finding it.
>
> I know that the loop becomes inefficient then, but tell me a better
> alternative algorithm.

An alternative that works well in those cases (and in some others) has
already been described where each candidate is chosen with the
appropriate probability. I originally set it out as an exercises since
I thought the OP was doing homework, but the gist of it was explained.

> You'd have a list of eligible numbers and
> randomly chose one of them. That would also take a lot of time to
> remove the number from the list.

No need:

void choose(int choose_n, int from_m, int *output)
{
for (int i = 0, j = 0; j < choose_n; i++)
if (true_with_probability(choose_n - j, from_m - i))
output[j++] = i + 1;
}

Obviously using one would not use this to choose 3 numbers from a set of
millions!

--
Ben.

Malcolm McLean

unread,
Jan 10, 2023, 8:11:27 AMJan 10
to
Here's my go.
#include <random>
#include <vector>
#include <cstdio>

/*
Draw a enadom number form the hypergeomtric distribution (sampling
from a bi-valued population without replacement)
rndengine - a random number generator engine
Ngood - number of positives in the population
Nbad - number of negatives in the population
Nsample - number of samples to take (less than or equal to Ngood + Nbad)
Returns: the number of "good" elements drawn, randomly.
*/
template<class rndengine>
int rand_hypergeometric(int Ngood, int Nbad, int Nsample, rndengine &eng)
{
int answer = 0;
std::uniform_real_distribution<double> distr(0, 1);
for (int i =0; i < Nsample; i++)
{
double p = distr(eng);
if (p < ((double)Ngood)/(Ngood + Nbad))
{
answer++;
Ngood--;
}
else
{
Nbad--;
}
}

return answer;
}

/*
Recursive randunique function
out - vector to collect results
Nsamples - number of samples to take
Npopulation - size of population
indexbase - fiddle to have populations not starting from 0, pass 0 at top level.
eng - the random number engine

*/
template<class rndengine>
void randunique_r(std::vector<int> &out, int Nsamples, int Npopulation,
int indexbase, rndengine &eng)
{
if (Nsamples == 0)
return;
std::uniform_int_distribution<int> distr(0, Npopulation -1);
int i = distr(eng);
int Nsampleslow = rand_hypergeometric(i, Npopulation - i -1, Nsamples-1, eng);
randunique_r(out, Nsampleslow, i, indexbase, eng);
out.push_back(i + indexbase);
randunique_r(out, Nsamples - Nsampleslow -1, Npopulation - i -1, indexbase + i, eng);
}

/*
Draw a set of unique random samples from a population
Nsamples - the number of elemnts to choose.
Npopulation - the population size
eng - the random number engine
Returns: vector with indices of chosen members, sorted ascending.
*/
template<class rndengine>
std::vector<int> randunique(int Nsamples, int Npopulation, rndengine &eng)
{
std::vector<int> answer;
randunique_r(answer, Nsamples, Npopulation, 0, eng);

return answer;
}

int main(void)
{
std::mt19937 engine(123);
std::vector<int> result = randunique(10, 20, engine);
for (int i =0; i <result.size(); i++)
printf("%d, ", result[i]);
printf("\n");

return 0;
}

Not very efficient. But the rand_hypergeometric functon is written naively.
It can be rewritten to run a lot faster, though this isn't easy.

Malcolm McLean

unread,
Jan 10, 2023, 8:19:48 AMJan 10
to
On Tuesday, 10 January 2023 at 13:11:27 UTC, Malcolm McLean wrote:
>
> randunique_r(out, Nsamples - Nsampleslow -1, Npopulation - i -1, indexbase + i, eng);
> }
Bug:: should be
indexbase + i + 1.

Tim Woodall

unread,
Jan 13, 2023, 1:40:21 AMJan 13
to
On 2023-01-08, Chris M. Thomasson <chris.m.t...@gmail.com> wrote:
>
> It looks like shuffle of a set of unique numbers.
>

The original spec wasn't very clear but looked to me to be a choose m
from n - which can be implemented as shuffle then choose first m (with
further optimization that you only need to shuffle the first m slots)

Tim Woodall

unread,
Jan 13, 2023, 1:40:21 AMJan 13
to
further optimization that you only need to shuffle *into* the first m slots)
>

Tim Woodall

unread,
Jan 13, 2023, 1:40:21 AMJan 13
to
On 2023-01-08, Bonita Montero <Bonita....@gmail.com> wrote:
> Now it's perfect:
>

Hmmm, about 20 years ago I was told a story of someone who needed a
'shuffle' function - random permutation. I don't know or don't recall
the context but I'd guess it was part of a test harness.

They wrote it, tested it, deployed it, and things ground to a
standstill.

Instead of permuting, they'd randomly picked numbers, checked for dupes,
and then added to their output, quick for very small sets, 'never'
terminated for large ones.

Bonita Montero

unread,
Jan 13, 2023, 2:26:26 AMJan 13
to
Now I make a list of eligible numbers, randomly chode one of them
and remove the item from the list:

#include <iostream>
#include <charconv>
#include <random>
#include <concepts>
#include <vector>

using namespace std;

int main( int argc, char **argv )
{
try
{
if( argc < 4 )
return
cout << argv[0] << " n from to" << endl,
EXIT_FAILURE;
auto parse = []( char const *str, char const *err )
{
size_t value;
if( from_chars_result fcr = from_chars( str, str + strlen( str ),
value ); (bool)fcr.ec || *fcr.ptr )
throw invalid_argument( err );
return value;
};
size_t n = parse( argv[1], "wrong number of values" );
if( !n )
return EXIT_SUCCESS;
size_t
from = parse( argv[2], "wrong from-value" ),
to = parse( argv[3], "wrong to-value" );
if( from > to )
swap( from, to );
if( n - 1 > to - from )
return
cout << "n is too small" << endl,
EXIT_FAILURE;
vector<size_t> free;
free.resize( to - from + 1 );
for( size_t i = from; size_t &v : free )
v = i++;
mt19937_64 mt;
size_t above = to;
while( free.size() )
{
auto pick = free.begin() + (uniform_int_distribution<size_t>( 0,
free.size() - 1 ))( mt );
//cout << *pick << endl;
free.erase( pick );
}
}
catch( exception const &exc )
{
return
cout << exc.what() << endl,
EXIT_FAILURE;
}
}

But that's _much_ slower.


Öö Tiib

unread,
Jan 13, 2023, 6:27:01 AMJan 13
to
On Friday, 13 January 2023 at 09:26:26 UTC+2, Bonita Montero wrote:
> Am 09.01.2023 um 17:42 schrieb Bonita Montero:
> > Am 09.01.2023 um 12:03 schrieb Alf P. Steinbach:
> >
> >> When n is large and is near or equal to the number of possible values,
> >> this loop becomes inefficient: for the last number you can expect on
> >> average n iterations to find that number, because each call of uid has
> >> a 1/n chance of finding it.
> >
> > I know that the loop becomes inefficient then, but tell me a better
> > alternative algorithm. You'd have a list of eligible numbers and
> > randomly chose one of them. That would also take a lot of time to
> > remove the number from the list.
> Now I make a list of eligible numbers, randomly chode one of them
> and remove the item from the list:
>
Some kind of weird algorithm ... what it supposedly does?
Why you require n from user if the algorithm does nothing with it?
Variable above is also ignored, perhaps the algorithm is meant as
half-made complete-yourself pseudocode?

> But that's _much_ slower.
>
Very likely as it does do something odd.

Kenny McCormack

unread,
Jan 13, 2023, 7:22:05 AMJan 13
to
In article <tpr12i$1i7r6$1...@dont-email.me>,
some lunatic <Bonita....@gmail.com> wrote:
...
>Now I make a list of eligible numbers, randomly chode one of them
>and remove the item from the list:
>
>#include <iostream>
>#include <charconv>
>#include <random>
>#include <concepts>
>#include <vector>

You still just don't seem to get this whole topicality thing, do ya?

--
Reading any post by Fred Hodgin, you're always faced with the choice of:
lunatic, moron, or troll.

I always try to be generous and give benefit of the doubt, by assuming troll.

Öö Tiib

unread,
Jan 13, 2023, 8:04:51 AMJan 13
to
On Friday, 13 January 2023 at 08:40:21 UTC+2, Tim Woodall wrote:
> On 2023-01-08, Bonita Montero <Bonita....@gmail.com> wrote:
> > Now it's perfect:
> >
> Hmmm, about 20 years ago I was told a story of someone who needed a
> 'shuffle' function - random permutation. I don't know or don't recall
> the context but I'd guess it was part of a test harness.
>
> They wrote it, tested it, deployed it, and things ground to a
> standstill.
>
> Instead of permuting, they'd randomly picked numbers, checked for dupes,
> and then added to their output, quick for very small sets, 'never'
> terminated for large ones.
>
It's bit frustrating to read those stories.
The O(n) algorithm for when the range to pick from and set to be picked
are of close size is not that large or complicated, how they manage
to take tons of time?

size_t n, to, from;
// ...
// assign count to pick to "n" and range to "to" and "from"
// ...
size_t size = to - from + 1;
std::vector<size_t> result(size);

std::mt19937_64 gen(time(nullptr)); // or whatever generator

for(size_t i = 0; i < n; ++i) {
size_t choice = (std::uniform_int_distribution<size_t>( i, size - 1))( gen );
size_t& current = result[i];
size_t& picked = result[choice];
if (current == 0) current = i + 1;
if (picked == 0) picked = choice + 1;
if (current != picked) std::swap(current, picked);
current += from - 1;
// std::cout << current << '\n';
}
result.resize(n);

Bonita Montero

unread,
Jan 13, 2023, 8:10:50 AMJan 13
to
Am 13.01.2023 um 13:21 schrieb Kenny McCormack:
> In article <tpr12i$1i7r6$1...@dont-email.me>,
> some lunatic <Bonita....@gmail.com> wrote:
> ...
>> Now I make a list of eligible numbers, randomly chode one of them
>> and remove the item from the list:
>>
>> #include <iostream>
>> #include <charconv>
>> #include <random>
>> #include <concepts>
>> #include <vector>
>
> You still just don't seem to get this whole topicality thing, do ya?

My code is beyond that topic. You may elect only a small number
of values like in the original code, but you might also chose
million of numbers.


Kenny McCormack

unread,
Jan 13, 2023, 8:56:18 AMJan 13
to
In article <tprl89$1k839$1...@dont-email.me>,
some lunatic <Bonita....@gmail.com> wrote:
>Am 13.01.2023 um 13:21 schrieb Kenny McCormack:
>> In article <tpr12i$1i7r6$1...@dont-email.me>,
>> some lunatic <Bonita....@gmail.com> wrote:
>> ...
>>> Now I make a list of eligible numbers, randomly chode one of them
>>> and remove the item from the list:
>>>
>>> #include <iostream>
>>> #include <charconv>
>>> #include <random>
>>> #include <concepts>
>>> #include <vector>
>>
>> You still just don't seem to get this whole topicality thing, do ya?
>
>My code is beyond that topic. You may elect only a small number
>of values like in the original code, but you might also chose
>million of numbers.

Whoooooosh!!!

--
Many people in the American South think that DJT is, and will be remembered
as, one of the best presidents in US history. They are absolutely correct.

He is currently at number 46 on the list. High praise, indeed!

Bonita Montero

unread,
Jan 13, 2023, 9:01:44 AMJan 13
to
Am 13.01.2023 um 14:55 schrieb Kenny McCormack:
> In article <tprl89$1k839$1...@dont-email.me>,
> some lunatic <Bonita....@gmail.com> wrote:
>> Am 13.01.2023 um 13:21 schrieb Kenny McCormack:
>>> In article <tpr12i$1i7r6$1...@dont-email.me>,
>>> some lunatic <Bonita....@gmail.com> wrote:
>>> ...
>>>> Now I make a list of eligible numbers, randomly chode one of them
>>>> and remove the item from the list:
>>>>
>>>> #include <iostream>
>>>> #include <charconv>
>>>> #include <random>
>>>> #include <concepts>
>>>> #include <vector>
>>>
>>> You still just don't seem to get this whole topicality thing, do ya?
>>
>> My code is beyond that topic. You may elect only a small number
>> of values like in the original code, but you might also chose
>> million of numbers.
>
> Whoooooosh!!!

It's not the absolute number of values which can be chosen
but the flexibility or finding the most efficient way to
have this flexibility.


Kenny McCormack

unread,
Jan 13, 2023, 9:17:59 AMJan 13
to
In article <tpro7o$1ki63$1...@dont-email.me>,
Due anziani che convivono da una vita:
Lei: "Caro, ormai ci potremmo anche sposare...".
Lui: "Ma cara, alla nostra eta', chi ci prenderebbe?".

--
"Women should not be enlightened or educated in any way. They should be
segregated because they are the cause of unholy erections in holy men.

-- Saint Augustine (354-430) --

Bonita Montero

unread,
Jan 15, 2023, 10:05:29 AMJan 15
to
Now it's perfect:

#include <iostream>
#include <unordered_set>
#include <charconv>
#include <random>
#include <concepts>

unordered_set<size_t> values;
mt19937_64 mt;
uniform_int_distribution<size_t> uid( from, to );
if( to - from != (size_t)-1 && to - from + 1 < n / 2 )
{
values.reserve( n );
while( values.size() < n )
{
size_t value;
do
value = uid( mt );
while( values.contains( value ) );
values.emplace( value );
}
}
else
{
if( to - from == (size_t)-1 )
throw bad_alloc();
values.reserve( to - from + 1 );
size_t i = from - 1;
do
values.emplace( ++i );
while( i != to );
while( values.size() > n )
for( ; ; )
{
auto itRemove = values.find( uid( mt ) );
if( itRemove == values.end() )
continue;
values.erase( itRemove );
break;
}
}
for( size_t value : values )
cout << value << endl;
}
catch( exception const &exc )
{
return
cout << exc.what() << endl,
EXIT_FAILURE;
}
}

If less than half of the range of numbers is selected (n < (to - from
+ 1) / 2) the unordered map is filled up to n elements and if an element
is found that's already there another value is chosen. If more than half
of the range is selected (n >= (to - from + 1) / 2) the set is filled up
to contain the whole range and the elements are incrementally removed at
a random value.

Ralf Goertz

unread,
Jan 15, 2023, 10:48:01 AMJan 15
to
Am Sun, 15 Jan 2023 16:06:00 +0100
schrieb Bonita Montero <Bonita....@gmail.com>:

> Now it's perfect:

[snip]

> If less than half of the range of numbers is selected (n < (to - from
> + 1) / 2) the unordered map is filled up to n elements and if an
> element is found that's already there another value is chosen. If
> more than half of the range is selected (n >= (to - from + 1) / 2)
> the set is filled up to contain the whole range and the elements are
> incrementally removed at a random value.

What's wrong with a clean two line solution:


#include <algorithm>
#include <iostream>
#include <random>
#include <charconv>
#include <vector>
#include <cstring>
mt19937_64 mt;
vector<int> values( to - from );
// ------
//only two lines to find random subset of fixed size:
fill(values.begin(), values.begin() + n, 1);
shuffle(values.begin(), values.end(), mt);
// ------
for( size_t i=0; i < values.size(); ++i )
if( values[i]) cout << i+from << endl;

Bonita Montero

unread,
Jan 15, 2023, 11:13:01 AMJan 15
to
Am 15.01.2023 um 16:47 schrieb Ralf Goertz:
> Am Sun, 15 Jan 2023 16:06:00 +0100
> schrieb Bonita Montero <Bonita....@gmail.com>:
>
>> Now it's perfect:
>
> [snip]
>
>> If less than half of the range of numbers is selected (n < (to - from
>> + 1) / 2) the unordered map is filled up to n elements and if an
>> element is found that's already there another value is chosen. If
>> more than half of the range is selected (n >= (to - from + 1) / 2)
>> the set is filled up to contain the whole range and the elements are
>> incrementally removed at a random value.
>
> What's wrong with a clean two line solution:

You might have less numbers than to - from + 1 (you omitted the 1 on
allocation of the vector).

Bart

unread,
Jan 15, 2023, 11:28:02 AMJan 15
to
On 15/01/2023 15:06, Bonita Montero wrote:
> Now it's perfect:

You said that last time too:

On 08/01/2023 05:01, Bonita Montero wrote:
> Now it's perfect:
>


I guess it's more perfect?