# calculating the distribution, the percentage chance for a number of streaks of a certain length in a sample

3 views

### game...@yahoo.co.uk

Dec 9, 2006, 7:04:02 AM12/9/06
to
hi
winning/losing streaks as i am a trader.

i wondered how would i calculate the distribution, the percentage
chance for a number of streaks of a certain length in a trade sample.
eg. 50% chance of 6 streaks of 10 consecutive losing trades in a 1000
trade sample with a win rate of 50% or 30% chance of 10 streaks of 10
consecutive losing trades in a 1000 trade sample with a win rate of
50%.

the page only deals with chance of one streak in a sample, which i
calculated below, however im interested in the distribution of streaks

i calculated that in a 1000trade sample with a 50% win rate. the chance
of at least one streak of 10 trades (either winning or losing)is 48.4%.
the chance of at least one streak of 15 trades is about 1.5%

the formula is:
probability of at least one streak in the sample = (1+(n-r)*p) * (q ^r)

where, n is the sample size. r is the number of loses in a row. p is
winning prob. q is losing probability.

so,
(1+(1000-10)*0.5) * (0.5^10)= 0.484 or 48.4% chance of at least one
Taleb cautions there could be more than one streak in that sample

hope i read the formulas correctly. as im NOT a mathematician.

regards
ant

### game...@yahoo.co.uk

Dec 9, 2006, 7:15:54 AM12/9/06
to
oh i realise that market returns arent normally distributed.

### game...@yahoo.co.uk

Dec 11, 2006, 7:02:00 PM12/11/06
to
oh i realise that market returns arent normally distributed, i dont
mind if i get the formula for normal distribution though
regards

### matt271...@yahoo.co.uk

Dec 12, 2006, 10:05:42 AM12/12/06
to

The formula you quote is an approximation which, as they say, is good
if q^r is small. In your r = 10 example, q^r is apparently not small
enough. I make the actual probability equal to 38.54% (to two d.p.),
which is nowhere near the 48.44% that the formula gives. As expected,
for smaller q and/or larger r it gets better. For r = 15 the formula
gives 1.5060% (to 4 d.p.) as against the actual 1.4951%.

The approximation formula looks to have an obvious extension to more
than one run, but I'd need to check my logic and just now I've run out
of time - so I'll post back later...

### matt271...@yahoo.co.uk

Dec 12, 2006, 6:41:21 PM12/12/06
to

matt271...@yahoo.co.uk wrote:

Well, the "obvious" generalisation that I mentioned seems to come out
as

Prob = (C(n - k*r, k - 1) + p*C(n - k*r, k))*p^(k - 1)*q^(k*r)

where C(a, b) = a!/(b!*(a - b)!) is a binomial coefficient, k is the
number of runs, and n, r, p, q are as above. To be clear, this is
supposed to be the probability that *at least* k distinct runs, each of
*at least* r consecutive losses, will occur.

As with the original formula the approximation seems very poor with r =
10 (and the other values as stated above). I also did some tests with n
= 1000, q = 0.5, p = 0.5, r = 15 and varying k (view in fixed-width
font):

k Approx prob. from formula Actual prob.
- ------------------------- --------------
1 0.014951066441 0.015060424805
2 0.000108844540 0.000109873945
3 0.000000512070 0.000000517344
4 0.000000001749 0.000000001767

(BTW, the method that I'm using to calculate the exact value is not a
simple formula that you could calculate by hand - else I'd post it!
It's an iterative procedure that you need to run on a computer.)