Stats problem

Break

unread,

May 17, 1999, 3:00:00 AM5/17/99

to

A while ago I posted a little puzzle that under the subject "?????". It
was the first time I've ever posted something to rec.puzzles and not
gotten at least some response. I then attempted the
puzzle (I originally posted it with the intension of someone else
finding it and solving it since I thought the puzzle was too tough to
attempt myself).

Its quite interesting but It involves a summation that (although I
suspect is laughably simple to
simplify) I cannot seem to see my way through.

Here's the summation (It's an expected value)

(Sum x=2 to N+1) x*((x-1)/N)*( (II i=0 to x-2) ((N-i)/N) )

Fist term is
2*(1/N)*(N/N)

Last Term is
(N+1)*(N/N)*(N!/N^N)

Any ideas on simplifiying?

In case you missed it, the original problem was to estimate the total
number of pictures in a
set if a picture was randomly selected, noted, then replaced. The
procedure would repeat
itself until the picture selected had already been selected.

eg. number the pictures drawn in acending order.

A typical sample might be :

1 2 3 4 2

So we would stop after the 5th selection as 2 had already been selected.

Based on the information gathered in this trial (ie the number 5), it is
required to
estimate the total number of pictures in the set (Call it N).

You may assume the probability of each picture is random (ie prob of
been drawn 1/N)
and each selection is independant of any others.

Matt McLelland

unread,

May 17, 1999, 3:00:00 AM5/17/99

to

Break wrote:

> Based on the information gathered in this trial (ie the number 5), it is
> required to
> estimate the total number of pictures in the set (Call it N).

Hmm. This seems to clash with your description up to this point. At first, it
seemed that you were interested in finding the expected number of pulls before a
repetition [Given N - the deck size]. Nw you seem to say that you are
interested in estimating N from a single trial. The first of these is well
defined. The second is underspecified: you would need to identify what you
mean by estimate [maximum likelyhood estimate? confidence interval?]. You may
also need to give an a priori probability distribution of possible deck size.

bu...@pac2.berkeley.edu

unread,

May 18, 1999, 3:00:00 AM5/18/99

to

In article <37416AB6...@flash.net>,
Matt McLelland <mat...@flash.net> wrote:

> (sum x=0 to N) N!/(N-x)! * 1/N^x

> I let V(M) be the expected number of draws before repeat, in a deck of
> N, if there are only M pictures you haven't seen. Then V(M) =
> M/N*(V(M-1) + 1), with V(0)=0. It is pretty easy to see that the
> summation above is V(N). But, I can't solve this summation either, so
> good luck!

I worked out the same quantity (expected value of M given N) a
different way. I got an expression that looks different but is
actually equivalent. I don't think it's any easier to evaluate in
closed form, but here it is anyway in case anyone's curious.

The probability that the first repeat will occur on the (j+1)st card
(i.e., that M = j+1) is

P( M=j+1 | N ) = (j N! / (N-j)!) / N^(j+1).

[There are N!/(N-j)! ways to pick the first j cards with no repeats,
and j ways to pick the (j+1)st card so it will be a repeat.] The
expected (i.e., mean) value of M is just the sum of

(j+1) P( M=j+1 | N )

over all possible values of j:

E[M] = Sum[ j(j+1) N! / ((N-j)! N^(j+1)) ]
= (N-1)! Sum[ j(j+1) / (N-j)! N^j ]

The sums range from j=0 to N.

Matt's expression certainly looks easier to work with than mine, but
spot-checking with Mathematica reveals that they're equivalent.

Can I just betray my ignorance by asking what "method of moments"
means in this context? Does it just mean that that you estimate N,
given M, by picking the value of N such that the expected value of M
is equal to the observed value? In other words, you set the above
expression for E[M] equal to the observed M and solve for N?

(I'd call this the "method of moment," not "moments" -- the only
moment involved is the first one!)

-Ted

Break

unread,

May 18, 1999, 3:00:00 AM5/18/99

to

Matt McLelland wrote:

The possible deck size is unknown but always greater than 4. There is NO a priori
distribution of deck size. A point estimate will do nicely - I did not specify
what method may
be used as I will accept any unbiased estimate. I have not looked at maximum
likelihood as
I dont think it will lead to a closed form. I originally wanted the expected value
so that I could
use the method of moments estimater.

Hope this clears things up.

Matt McLelland

unread,

May 18, 1999, 3:00:00 AM5/18/99

to

Break wrote:

> The possible deck size is unknown but always greater than 4. There is NO a priori
> distribution of deck size. A point estimate will do nicely - I did not specify
> what method may
> be used as I will accept any unbiased estimate. I have not looked at maximum
> likelihood as
> I dont think it will lead to a closed form. I originally wanted the expected value
> so that I could
> use the method of moments estimater.
>
> Hope this clears things up.

Clear as mud. =) Does the following sound like a fair characterization of the problem
you are attempting to solve?

Given a random deck (of unknown size), you pull cards until you find a repeat on
your Mth pull. Estimate the deck size, N, in terms of M.

If so, then what is this expected value stuff? You can't compute an expected value
without putting a probability distribution on your deck size. Saying that it is a
"random integer" isn't well defined.

Break

unread,

May 18, 1999, 3:00:00 AM5/18/99

to

Matt McLelland wrote:

Uh huh. . .

Well - its almost a fair characterization - except that sampling is with replacement. Thus
the
"population" you are drawing from is always constant - being N.

What I wanted was the expected value of the distrubution in terms of this constant N (Since

this is the only parameter in the distribution). And then equate this to the observed M.
Thus
finding a method of moments estimate of N.

Now if that did'nt clear it up I'm afraid I'm alone on this. . .

Matt McLelland

unread,

May 18, 1999, 3:00:00 AM5/18/99

to

Break wrote:

> Well - its almost a fair characterization - except that sampling is with replacement. Thus
> the
> "population" you are drawing from is always constant - being N.
>
> What I wanted was the expected value of the distrubution in terms of this constant N (Since
>
> this is the only parameter in the distribution). And then equate this to the observed M.
> Thus
> finding a method of moments estimate of N.
>
> Now if that did'nt clear it up I'm afraid I'm alone on this. . .

Ok, I'm with you now... when you said method of moments last time, it just didn't click for
some reason. I can't solve your summation. Here is the summation I came up with:

(sum x=0 to N) N!/(N-x)! * 1/N^x

I let V(M) be the expected number of draws before repeat, in a deck of N, if there are only M
pictures you haven't seen. Then V(M) = M/N*(V(M-1) + 1), with V(0)=0. It is pretty easy to
see that the summation above is V(N). But, I can't solve this summation either, so good luck!

[You might have better luck in sci.math]

bu...@pac2.berkeley.edu

unread,

May 18, 1999, 3:00:00 AM5/18/99

to

In article <3741540B...@memo2.liberty.co.za>,
Break <shuaib...@memo2.liberty.co.za> wrote:

Here's my attempt to summarize: So you have a deck of N cards. You
pick cards, with replacement, one at a time until you get a repeat.
Say the repeat occurs on the Mth card.

> Well - its almost a fair characterization - except that sampling is
> with replacement. Thus the "population" you are drawing from is always
> constant - being N. What I wanted was the expected value of the
> distrubution in terms of this constant N (Since this is the only
> parameter in the distribution). And then equate this to the observed
> M. Thus finding a method of moments estimate of N.

[Lines reformatted to < 80 characters.]

Given N, you can certainly calculate the probability distribution for
M. The last line above is what confuses me, though. I suspect it's
what's confusing Matt McLelland too, but I don't know for sure so I'll
just speak for myself.

I don't know what sort of estimate of N you're talking about. Given
P(M | N), the probability distribution of M given N, you can certainly
come up with a maximum-likelihood estimate of N, but that's not what
you say you want. You said you wanted to know the "expected value" of
N. To me, that phrase means the mean value of the probability
distribution P(N | M). The problem is that you can't turn P(M | N)
into P(N | M) without assuming a prior distribution on N.

-Ted

Matt McLelland

unread,

May 19, 1999, 3:00:00 AM5/19/99

to

bu...@pac2.berkeley.edu wrote:

> Can I just betray my ignorance by asking what "method of moments"
> means in this context? Does it just mean that that you estimate N,
> given M, by picking the value of N such that the expected value of M
> is equal to the observed value? In other words, you set the above
> expression for E[M] equal to the observed M and solve for N?

Yes.

> (I'd call this the "method of moment," not "moments" -- the only
> moment involved is the first one!)

In general, you equate the first several moments as neccessary to give a unique
solution. In this case, we only had one parameter, N, and so we only needed
equate the first moment of the sample. Using the method of moments to estimate
a normal distribution, e.g., you would simply take the mean as the mean of the
sample and the variance as the variance of the sample.

hoch...@rocketmail.com

unread,

May 19, 1999, 3:00:00 AM5/19/99

to

> Can I just betray my ignorance by asking what "method of moments"
> means in this context? Does it just mean that that you estimate N,
> given M, by picking the value of N such that the expected value of M
> is equal to the observed value? In other words, you set the above
> expression for E[M] equal to the observed M and solve for N?
>

> (I'd call this the "method of moment," not "moments" -- the only
> moment involved is the first one!)
>

> -Ted
>

Matt McClelland already answered this, but here is a slightly
more detailed explanation. Suppose you are interested in estimating
some parameters a_1, a_2, ... a_j from data X_1, ... X_n.
Define the kth population moment
u_k = E(X^k), and the kth sample moment m_k = (X1^k + X2^k + ... Xn^k)/n

If you can write the a_i = f(u_1, .. u_r), you get an estimate
of a_i by substiuting m_k for u_k. This procedure does not
necessarily yield a unique estimator (as there may be more
than one way to express the a_i in terms of population moments)
and the resulting estimator is not necessarily any good
(although if the f's are continuous in will converge to the
right answer when n is large, since the sample moments do.)

Mike

--== Sent via Deja.com http://www.deja.com/ ==--
---Share what you know. Learn what you don't.---

r.e.s.

unread,

May 19, 1999, 3:00:00 AM5/19/99

to

Break <shuaib...@memo2.liberty.co.za> wrote ...

> Matt McLelland wrote:
> > Break wrote:

> > > [...]
> > > the original problem was to estimate the total number of pictures

> > > in a set if a picture was randomly selected, noted, then replaced.
> > > The procedure would repeat itself until the picture selected had
> > > already been selected. eg. number the pictures drawn in acending
> > > order.
> > > A typical sample might be: 1 2 3 4 2
> > > So we would stop after the 5th selection as 2 had already been
> > > selected.

> > > Based on the information gathered in this trial (ie the number 5),
> > > it is required to estimate the total number of pictures in the set
> > > (Call it N).

> > > You may assume the probability of each picture is random (ie prob
> > > of been drawn 1/N) and each selection is independant of any others.

> > [...]

> > you would need to identify what you mean by estimate
> > [maximum likelyhood estimate? confidence interval?].
> > You may also need to give an a priori probability
> > distribution of possible deck size.
>

> The possible deck size is unknown but always greater than 4.
> There is NO a priori distribution of deck size. A point
> estimate will do nicely - I did not specify what method may
> be used as I will accept any unbiased estimate. I have not
> looked at maximum likelihood as I dont think it will lead to
> a closed form.

^^^^^^^^^^^
I find the maximum likelihood estimator for N, say N*(k),
to be exactly:

N*(2) = 1
N*(k) = floor((1/2)*(k-1)^2 + (1/6)*(k-1)), k=3,4,5,...

which is essentially quadratic in k.

(The probability that the first "duplicate" appears on the k-th
trial, is

p(k|N) = (1-1/N)*(1-2/N)*...*(1-(k-2)/N)*(k-1)/N, k=2,3,...,N+1

and for fixed k this is maximized by N=N*(k) above. Of course
N* is not supposed to be an "unbiased" estimator, as from the
point of view of the likelihood principle, unbiasedness is an
irrelevant and often-misleading criterion.)

--
r.e.s. (Spam-block=XX)