Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Strong birthday problem simulated -- mismatch with known value of 3064

82 views
Skip to first unread message

Paul

unread,
Feb 12, 2019, 5:42:15 PM2/12/19
to
The strong birthday problem asks for the minimum positive number of people so that the probability that no person in the set has a unique birthday >= 50%. The literature gives this number as 3064. Obvious assumptions: By "unique birthday", I mean the month/day. Birthdays assumed to be uniformly distributed throughout the year. Assume no one born in a leap year. I've done many simulations and my answers (output to the console) are consistently in the 3019-3021 range. I get 3019 with C++'s std::rand() and also with the later C++11 random number generators. I'd be very grateful to understand what I'm doing wrong? As a check, I used the same approach for the standard birthday problem, and I do indeed get the textbook answer of 23 for that one.
Code is below.
Thank you very much for your help.

#include <iostream>
#include <vector>
#include <algorithm>
#include <random>

int main()
{
constexpr int num = 1000001;
constexpr int daysInYear = 365;
std::vector<int> numTrials(num);
std::default_random_engine generator;
std::mt19937 gen(generator());
std::uniform_int_distribution<int> distribution(0,364);
for(int i = 0; i < num; ++i)
{
int counter = 0;
std::vector<int> calendar(daysInYear);
int numOdd = 0;
do
{
const int next = distribution(gen);
++calendar[next];
if(calendar[next] == 1)
++numOdd;
else if(calendar[next] == 2)
--numOdd;
++counter;
}
while(numOdd);

numTrials[i] = counter;
}
std::sort(numTrials.begin(), numTrials.end());
std::cout << numTrials[num/2];

}

red floyd

unread,
Feb 12, 2019, 7:06:01 PM2/12/19
to
Asking the obvious question. Have you examined the results
for daysInYear = 366? (Leapyear)?

Paul

unread,
Feb 12, 2019, 7:19:28 PM2/12/19
to
The idea is to compare simulated results with the mathematical literature.
Since the maths problem doesn't handle leap years, the simulation shouldn't
handle leap years either.
The mathematical reference is https://rss.onlinelibrary.wiley.com/doi/pdf/10.1111/j.1740-9713.2013.00705.x

It's not a question of simulating reality -- it's a question of simulating
the mathematics.

Paul Epstein

Robert Wessel

unread,
Feb 12, 2019, 8:04:45 PM2/12/19
to
On Tue, 12 Feb 2019 14:42:03 -0800 (PST), Paul <peps...@gmail.com>
wrote:
OK, I've been staring at this a bit, and I'm not really sure how
you're approaching the problem. It looks more like your (trying) to
keep adding people to the mix until there are no more unique
birthdays. I'm having trouble seeing how that's the same as the
stated problem. As a parallel, is the problem how many times must I
roll a die until I have a 50% chance of hitting a six, or how many
rolls must I take to have a 50% chance of having a six in the set?

Anyway, I did the following, and although it could use a better PRNG,
it produces the expected results.


#include <stdio.h>
#include <stdlib.h>

#define ITERS 25000
#define BEGIN 3000
#define END 3100

int testbday(int n)
{
int days[365] = {0};
int i;

for (i=0; i<n; i++)
days[rand()%365]++;

for (i=0; i<365; i++)
if (days[i] == 1)
return 1;

return 0;
}

int main()
{
int i, j, unique;

for (i=BEGIN; i<=END; i++)
{
unique = 0;
for (j=0; j<ITERS; j++)
unique += testbday(i);

printf("N=%i, Unique=%i/%i, %9f\n",
i, unique, ITERS, ((unique+1.0) / ITERS) );
}

return 0;
}



N=3000, Unique=13741/25000, 0.549680
N=3001, Unique=13807/25000, 0.552320
N=3002, Unique=13898/25000, 0.555960
N=3003, Unique=13812/25000, 0.552520
N=3004, Unique=13666/25000, 0.546680
N=3005, Unique=13666/25000, 0.546680
N=3006, Unique=13703/25000, 0.548160
N=3007, Unique=13653/25000, 0.546160
N=3008, Unique=13534/25000, 0.541400
N=3009, Unique=13648/25000, 0.545960
N=3010, Unique=13664/25000, 0.546600
N=3011, Unique=13604/25000, 0.544200
N=3012, Unique=13637/25000, 0.545520
N=3013, Unique=13561/25000, 0.542480
N=3014, Unique=13574/25000, 0.543000
N=3015, Unique=13588/25000, 0.543560
N=3016, Unique=13452/25000, 0.538120
N=3017, Unique=13550/25000, 0.542040
N=3018, Unique=13463/25000, 0.538560
N=3019, Unique=13353/25000, 0.534160
N=3020, Unique=13460/25000, 0.538440
N=3021, Unique=13347/25000, 0.533920
N=3022, Unique=13284/25000, 0.531400
N=3023, Unique=13466/25000, 0.538680
N=3024, Unique=13334/25000, 0.533400
N=3025, Unique=13390/25000, 0.535640
N=3026, Unique=13297/25000, 0.531920
N=3027, Unique=13184/25000, 0.527400
N=3028, Unique=13226/25000, 0.529080
N=3029, Unique=13179/25000, 0.527200
N=3030, Unique=13078/25000, 0.523160
N=3031, Unique=13170/25000, 0.526840
N=3032, Unique=13020/25000, 0.520840
N=3033, Unique=13136/25000, 0.525480
N=3034, Unique=13137/25000, 0.525520
N=3035, Unique=13045/25000, 0.521840
N=3036, Unique=12962/25000, 0.518520
N=3037, Unique=12812/25000, 0.512520
N=3038, Unique=13134/25000, 0.525400
N=3039, Unique=13038/25000, 0.521560
N=3040, Unique=13117/25000, 0.524720
N=3041, Unique=12918/25000, 0.516760
N=3042, Unique=12983/25000, 0.519360
N=3043, Unique=12991/25000, 0.519680
N=3044, Unique=13007/25000, 0.520320
N=3045, Unique=12890/25000, 0.515640
N=3046, Unique=13000/25000, 0.520040
N=3047, Unique=12841/25000, 0.513680
N=3048, Unique=12748/25000, 0.509960
N=3049, Unique=12811/25000, 0.512480
N=3050, Unique=12865/25000, 0.514640
N=3051, Unique=12796/25000, 0.511880
N=3052, Unique=12762/25000, 0.510520
N=3053, Unique=12741/25000, 0.509680
N=3054, Unique=12795/25000, 0.511840
N=3055, Unique=12613/25000, 0.504560
N=3056, Unique=12677/25000, 0.507120
N=3057, Unique=12587/25000, 0.503520
N=3058, Unique=12515/25000, 0.500640
N=3059, Unique=12581/25000, 0.503280
N=3060, Unique=12695/25000, 0.507840
N=3061, Unique=12576/25000, 0.503080
N=3062, Unique=12525/25000, 0.501040
N=3063, Unique=12635/25000, 0.505440
N=3064, Unique=12512/25000, 0.500520
N=3065, Unique=12316/25000, 0.492680
N=3066, Unique=12464/25000, 0.498600
N=3067, Unique=12377/25000, 0.495120
N=3068, Unique=12383/25000, 0.495360
N=3069, Unique=12334/25000, 0.493400
N=3070, Unique=12397/25000, 0.495920
N=3071, Unique=12222/25000, 0.488920
N=3072, Unique=12363/25000, 0.494560
N=3073, Unique=12091/25000, 0.483680
N=3074, Unique=12322/25000, 0.492920
N=3075, Unique=12174/25000, 0.487000
N=3076, Unique=12307/25000, 0.492320
N=3077, Unique=12237/25000, 0.489520
N=3078, Unique=12193/25000, 0.487760
N=3079, Unique=12141/25000, 0.485680
N=3080, Unique=12199/25000, 0.488000
N=3081, Unique=11990/25000, 0.479640
N=3082, Unique=12064/25000, 0.482600
N=3083, Unique=11995/25000, 0.479840
N=3084, Unique=12108/25000, 0.484360
N=3085, Unique=11866/25000, 0.474680
N=3086, Unique=12009/25000, 0.480400
N=3087, Unique=12162/25000, 0.486520
N=3088, Unique=12083/25000, 0.483360
N=3089, Unique=11915/25000, 0.476640
N=3090, Unique=12022/25000, 0.480920
N=3091, Unique=11756/25000, 0.470280
N=3092, Unique=11843/25000, 0.473760
N=3093, Unique=11892/25000, 0.475720
N=3094, Unique=11979/25000, 0.479200
N=3095, Unique=11872/25000, 0.474920
N=3096, Unique=11794/25000, 0.471800
N=3097, Unique=11823/25000, 0.472960
N=3098, Unique=11795/25000, 0.471840
N=3099, Unique=11661/25000, 0.466480
N=3100, Unique=11795/25000, 0.471840

Paul

unread,
Feb 12, 2019, 8:46:12 PM2/12/19
to
Ok, thanks.
I see the problem now.

Paul

Ralf Goertz

unread,
Feb 13, 2019, 5:09:49 AM2/13/19
to
Am Tue, 12 Feb 2019 14:42:03 -0800 (PST)
schrieb Paul <peps...@gmail.com>:

> The strong birthday problem asks for the minimum positive number of
> people so that the probability that no person in the set has a unique
> birthday >= 50%. The literature gives this number as 3064. Obvious
> assumptions: By "unique birthday", I mean the month/day. Birthdays
> assumed to be uniformly distributed throughout the year. Assume no
> one born in a leap year. I've done many simulations and my answers
> (output to the console) are consistently in the 3019-3021 range. I
> get 3019 with C++'s std::rand() and also with the later C++11 random
> number generators. I'd be very grateful to understand what I'm doing
> wrong? As a check, I used the same approach for the standard birthday
> problem, and I do indeed get the textbook answer of 23 for that one.
> Code is below. Thank you very much for your help.
>

>
> numTrials[i] = counter;
> }
> std::sort(numTrials.begin(), numTrials.end());
> std::cout << numTrials[num/2];
>
> }

You seem to add people to your sample until exactly 50% of them have a
unique birthday. Assuming numTrials is correctly taken, sorting it and
taking the middle value is in general not the correct way to estimate
the expected value (the mean of the distribution). What you print is the
median (which doesn't generally converge to the mean of a distribution).

However, the approach is probably flawed anyway since when I use the
mean instead of the median in your program I get values of around 3085.
The problem might be the following. Imagine you are at person 3063 and
1531 have a non-unique birthday. There are still some dates left with no
birthday at all. Now person 3064 has their birthday on such a date (or
on a date on which already two or more people have their birthdays)
leaving the number of non-uniqueness as it was. So you have to go on and
add another person. On the other hand if the birthday is on an exactly
ones taken date, you get you 50% and stop. So this distribution is right
tailed explaining the bigger mean compared to the median. In your
approach you ask for the sample size until 50% but what you should ask
is the distribution of unique and non-unique birthdays for each number
of people (as Robert has done).

Öö Tiib

unread,
Feb 13, 2019, 5:43:00 AM2/13/19
to
Closeness of 3058 and 3064 indicates that sample size is too little.
The rand()%365 is likely "loaded" dice, since RAND_MAX being
multiple of 365 is possible but quite unlikely.

Robert Wessel

unread,
Feb 13, 2019, 11:27:13 AM2/13/19
to
On Wed, 13 Feb 2019 02:42:47 -0800 (PST), 嘱 Tiib <oot...@hot.ee>
wrote:

>On Wednesday, 13 February 2019 03:04:45 UTC+2, robert...@yahoo.com wrote:
>
>>
(...)
>> N=3056, Unique=12677/25000, 0.507120
>> N=3057, Unique=12587/25000, 0.503520
>> N=3058, Unique=12515/25000, 0.500640
>> N=3059, Unique=12581/25000, 0.503280
>> N=3060, Unique=12695/25000, 0.507840
>> N=3061, Unique=12576/25000, 0.503080
>> N=3062, Unique=12525/25000, 0.501040
>> N=3063, Unique=12635/25000, 0.505440
>> N=3064, Unique=12512/25000, 0.500520
>> N=3065, Unique=12316/25000, 0.492680
>> N=3066, Unique=12464/25000, 0.498600
>> N=3067, Unique=12377/25000, 0.495120
(...)
>
>Closeness of 3058 and 3064 indicates that sample size is too little.
>The rand()%365 is likely "loaded" dice, since RAND_MAX being
>multiple of 365 is possible but quite unlikely.


I did say it needs a better PRNG. The standard generator is crap as
we all know. And RAND_MAX on the system (Windows) is 32767, so at the
very least after the mod 365, the first 282 values have a bias.

At least removing the bias is straight-forward, should someone care to
make the effort (discard any values returned from rand() that exceed
(89*385-1) - the "89" being the floor(RAND_MAX/365) for this system).

Paul

unread,
Feb 14, 2019, 4:14:14 PM2/14/19
to
On Wednesday, February 13, 2019 at 1:04:45 AM UTC, robert...@yahoo.com wrote:
I don't understand the logic of unique + 1.0.
For example, 11795/25000 is different to 0.47184 which is 11796/25000
My guess is that this is a typo and that you meant to say unique + 0.0
Is this correct? I understand that you are casting to double here.

Paul

Robert Wessel

unread,
Feb 14, 2019, 6:29:45 PM2/14/19
to
On Thu, 14 Feb 2019 13:14:05 -0800 (PST), Paul <peps...@gmail.com>
(...)
>
>I don't understand the logic of unique + 1.0.
>For example, 11795/25000 is different to 0.47184 which is 11796/25000
>My guess is that this is a typo and that you meant to say unique + 0.0
>Is this correct? I understand that you are casting to double here.


Yes, a typo. I was actually intending to multiply by 1.0 to get the
conversion.

Ralf Fassel

unread,
Feb 15, 2019, 5:13:16 AM2/15/19
to
* Robert Wessel <robert...@yahoo.com>
| >My guess is that this is a typo and that you meant to say unique + 0.0
| >Is this correct? I understand that you are casting to double here.
>
| Yes, a typo. I was actually intending to multiply by 1.0 to get the
| conversion.

...wouldn't double(unique) do the trick?

R'

Juha Nieminen

unread,
Feb 15, 2019, 5:15:57 AM2/15/19
to
Paul <peps...@gmail.com> wrote:
> I don't understand the logic of unique + 1.0.
> For example, 11795/25000 is different to 0.47184 which is 11796/25000
> My guess is that this is a typo and that you meant to say unique + 0.0
> Is this correct? I understand that you are casting to double here.

You needed to quote 200 lines of the original post to write this?

Editing the quote is allowed, you know?

--- news://freenews.netfront.net/ - complaints: ne...@netfront.net ---

Robert Wessel

unread,
Feb 15, 2019, 10:27:07 AM2/15/19
to
On Fri, 15 Feb 2019 11:13:03 +0100, Ralf Fassel <ral...@gmx.de>
wrote:
It would, although my example happened to be C, not C++. A C-style
cast would have worked, but I try to avoid those when not necessary.

james...@alumni.caltech.edu

unread,
Feb 15, 2019, 10:56:29 AM2/15/19
to
On Friday, February 15, 2019 at 10:27:07 AM UTC-5, robert...@yahoo.com wrote:
> On Fri, 15 Feb 2019 11:13:03 +0100, Ralf Fassel <ral...@gmx.de>
> wrote:
>
> >* Robert Wessel <robert...@yahoo.com>
...
> >| Yes, a typo. I was actually intending to multiply by 1.0 to get the
> >| conversion.
> >
> >...wouldn't double(unique) do the trick?
>
>
> It would, although my example happened to be C, not C++. A C-style
> cast would have worked, but I try to avoid those when not necessary.

Avoiding a cast when the conversion will happen automatically is a very
good idea - the conversions that occur implicitly are the safest ones.
Changing the code by multiplying or dividing by 1, or by adding or
subtracting 0, for the sole purpose of causing a conversion without a
cast, makes your code more confusing. If you want to cause a conversion,
use the construct whose sole purpose is to cause a conversion. The basic
arithmetic operations are not such constructs.

Geoff

unread,
Feb 15, 2019, 1:41:44 PM2/15/19
to
On Fri, 15 Feb 2019 10:15:47 +0000 (UTC), Juha Nieminen
<nos...@thanks.invalid> wrote:

>You needed to quote 200 lines of the original post to write this?
>
>Editing the quote is allowed, you know?

He used Goggle Groups. You expected better?

james...@alumni.caltech.edu

unread,
Feb 15, 2019, 1:56:04 PM2/15/19
to
I'm often use Google Groups, and I routinely edit quoted material to
trim out anything not needed to provide adequate context for my
response. So, while I don't expect better, it's certainly feasible to do
better.

Tim Rentsch

unread,
Mar 1, 2019, 8:57:22 AM3/1/19
to
Paul <peps...@gmail.com> writes:

> The strong birthday problem asks for the minimum positive number of
> people so that the probability that no person in the set has a unique
> birthday >= 50%. The literature gives this number as 3064. Obvious
> assumptions: By "unique birthday", I mean the month/day. Birthdays
> assumed to be uniformly distributed throughout the year. Assume no
> one born in a leap year. I've done many simulations and my answers
> (output to the console) are consistently in the 3019-3021 range. I
> get 3019 with C++'s std::rand() and also with the later C++11 random
> number generators. I'd be very grateful to understand what I'm doing
> wrong? As a check, I used the same approach for the standard
> birthday problem, and I do indeed get the textbook answer of 23 for
> that one.

[code snipped]

An interesting problem.

Your approach uses a Monte Carlo technique. This isn't
necessary. The problem can be solved nicely using a dynamic
programming approach. To start off:

With zero people in the room, the probability of zero
shared birthdays and zero unshared birthdays is 100%.

With one person in the room, the probability of zero
shared birthdays and one unshared birthday is 100%

With two people in the room, the probabilty of one
shared birthday and no unshared birthdays is 1/365,
and the probability of no shared birthdays and two
unshared birthdays is 364/365.

... and so forth ...

The idea is to add one more person at each step, calculating the
probabilities of all the possible configurations for that number
of people, having calculated the probabilities for all possible
configurations for smaller numbers of people previously. (The
step for N+1 people needs only the set of probabilities for
configurations involving N people.)

There is a separate probability for each pair of numbers (s,u),
where s is the number of shared birthdays, and u is the number
of unshared birthdays. Clearly s+u <= 365 (although coding it
I used an array dimensioned [366][366], not trying to be too
clever about saving space).

Give it a try! After you get that working, you might think
about extending the program to account for leap-days, which
yields some (to me) unexpected results.
0 new messages