i wondered how would i calculate the distribution, the percentage
chance for a number of streaks of a certain length in a trade sample.
eg. 50% chance of 6 streaks of 10 consecutive losing trades in a 1000
trade sample with a win rate of 50% or 30% chance of 10 streaks of 10
consecutive losing trades in a 1000 trade sample with a win rate of
the page only deals with chance of one streak in a sample, which i
calculated below, however im interested in the distribution of streaks
in a trade sample
i calculated that in a 1000trade sample with a 50% win rate. the chance
of at least one streak of 10 trades (either winning or losing)is 48.4%.
the chance of at least one streak of 15 trades is about 1.5%
the formula is:
probability of at least one streak in the sample = (1+(n-r)*p) * (q ^r)
where, n is the sample size. r is the number of loses in a row. p is
winning prob. q is losing probability.
(1+(1000-10)*0.5) * (0.5^10)= 0.484 or 48.4% chance of at least one
streak of 10trades in a trade sample of 1000trades. of course as
Taleb cautions there could be more than one streak in that sample
hope i read the formulas correctly. as im NOT a mathematician.
The formula you quote is an approximation which, as they say, is good
if q^r is small. In your r = 10 example, q^r is apparently not small
enough. I make the actual probability equal to 38.54% (to two d.p.),
which is nowhere near the 48.44% that the formula gives. As expected,
for smaller q and/or larger r it gets better. For r = 15 the formula
gives 1.5060% (to 4 d.p.) as against the actual 1.4951%.
The approximation formula looks to have an obvious extension to more
than one run, but I'd need to check my logic and just now I've run out
of time - so I'll post back later...
Well, the "obvious" generalisation that I mentioned seems to come out
Prob = (C(n - k*r, k - 1) + p*C(n - k*r, k))*p^(k - 1)*q^(k*r)
where C(a, b) = a!/(b!*(a - b)!) is a binomial coefficient, k is the
number of runs, and n, r, p, q are as above. To be clear, this is
supposed to be the probability that *at least* k distinct runs, each of
*at least* r consecutive losses, will occur.
As with the original formula the approximation seems very poor with r =
10 (and the other values as stated above). I also did some tests with n
= 1000, q = 0.5, p = 0.5, r = 15 and varying k (view in fixed-width
k Approx prob. from formula Actual prob.
- ------------------------- --------------
1 0.014951066441 0.015060424805
2 0.000108844540 0.000109873945
3 0.000000512070 0.000000517344
4 0.000000001749 0.000000001767
(BTW, the method that I'm using to calculate the exact value is not a
simple formula that you could calculate by hand - else I'd post it!
It's an iterative procedure that you need to run on a computer.)