What is the average length of the line?
Assume that x is 1, i.e., possible x values range from 0 to 1. If you
can find the average of the difference of two uniformly randomly chosen
values of x, which we'll call a, then the solution to your problem will
be x*y*a^2, where x and y are the actual width and height of your
canvas.
The part I don't know how to do is the determination of a. But we can
make a good stab at it.
The probability that your first chosen x (which we'll call x1) will be
between 0 and 0.1 is 0.1. The probability that it will be between 0.1
and 0.2 is also 0.1, etc. And, of course, your second choice for x (x2)
will be distributed the same way.
We've got two different cases to think of: the case where both x1 and
x2 are chosen in the same interval, and the case where they come from
different intervals. Let's start with the first of these.
We're going to assume that both x1 and x2 come from the interval 0-0.1,
and in order to find the average distance between them, we'll break this
interval into 10 pieces of 0.01 each, and we're going to assume a
discrete distribution--i.e., possible values are only 0.005,0.015, ...
0.095
p(x1=0.005; x2=0.005) = 1/100 --> average distance =0
p(x1=0.005; x2=0.015) = 1/100 --> average distance = 0.01
...
p(x1=0.005; x2=0.095) = 1/100 --> average distance = 0.09
sum(avg dist when x=0.01) = 0.45
p(x1=0.015; x2=0.005) = 1/100 --> average distance = 0.01
p(x1=0.015; x2=0.015) = 1/100 --> average distance = 0
p(x1=0.015; x2=0.025) = 1/100 --> average distance = 0.01
...
p(x1=0.015; x2=0.095) = 1/100 --> average distance = 0.08
sum(avg dist when x1=0.015) = 1+0+1+2+3+4+5+6+7+8 = 0.37
sum(avg dist when x1=0.025) = 2+1+0+1+2+3+4+5+6+7 = 0.31
sum(avg dist when x1=0.035) = 3+2+1+0+1+2+3+4+5+6 = 0.27
sum(avg dist when x1=0.045) = 4+3+2+1+0+1+2+3+4+5 = 0.25
sum (avg dist when x1=0.055) = sum(avg dist when x1=0.045) = 0.25
etc.
Add up all these sums: 2*(0.45+0.37+0.31+0.27+0.25) = 3.3
Multiply by the probability of getting any one of these, which is 1/100,
and we find that the average distance when x1 and x2 are in the same bin
of length 0.1 is 0.033, which probably means the right answer is 1/30.
When x1 and x2 come from different intervals, we'll get the same as the
above, but with the difference of the intervals added to the average
distance; i.e., the average distance between x1 and x2 when 0<x1<0.1 and
0.1<x2<0.2 will be 0.0333.. + 0.1 = 0.13333....
So now we just need to add all these together to average them:
p(0<x1<0.1; 0<x2<0.1) = 1/100 --> average distance = 0.0333...
p(0<x1<0.1; 0.1<x2<0.2) = 1/100 --> average distance = 0.1333...
p(0<x1<0.1; 0.2<x2<0.3) = 1/100 --> average distance = 0.2333...
...
p(0<x1<0.1; 0.9<x2<1.0) = 1/100 --> average distance = 0.9333...
p(0.1<x1<0.2; 0<x2<0.1) = 1/100 --> average distance = 0.1333...
p(0.1<x1<0.2; 0.1<x2<0.2) = 1/100 --> average distance = 0.0333...
...
p(0.1<x1<0.2; 0.9<x2<1.0) = 1/100 --> average distance = 0.8333...
etc.
Let's ignore the 0.03333... and we'll tack that on at the end. So now
what we're averaging is the same thing we were doing earlier, i.e.:
sum (avg dist when 0<x1<0.1) = 0 + 0.1+0.2+...+0.9 = 4.5
sum(avg dist when 0.1<x1<0.2) = 0.1 + 0 + 0.1 + 0.2 + ... + 0.8 = 3.7
etc.
which means we're going to find an average of 0.3333.... Now either the
answer is that the average distance is 1/3, or the answer is that the
average distance is 1/3+1/30+1/300+... = 10/27.
So your answer is either x*y/9 or 100*x*y/729. Seems to me the first of
these is the more likely. Someone who knows some statistics should be
able to solve this in a couple of lines.
-Doug Magnoli
[Delete the 2 and the 3 from my email address.]
The average length of the line will be the vector sum of the average of its
two components: sqrt( x^2 + y^2 )/3
--gary
"Charles R. Bond" <cb...@ix.netcom.com> wrote in message
news:3C163197...@ix.netcom.com...
I think your're right, but could you provide more detail? I recall
proving once that the average stroke in a multitrack disk drive
was 1/3 of the full stroke, and I see now that the problem of
determining the average stroke is the same as the problem of
finding an average coordinate change, when the coordinate is
uniformly distributed over an interval -- interesting.
So the result you have is the the average line length is 1/3 the
diagonal of the canvas...
S_0^1 S_0^1 S_0^1 S_0^1 sqrt((x1-x2)^2 + (y1-y2)^2) dx1 dx2 dy1 dy2
for which the result is
(1/15) * (sqrt(2) + 2 + (5* arcsinh(1)) ) ; about 12/23 or 0.521..
So for the hull on the rectangle, one would need ... assuming I have the
thing correct ... to replace the unit values in the integration with the
width and height.
--
Mark R. Diamond
Send email to server called psy dot uwa dot edu dot au and address to markd
I'm not much good with theory. That's why I'm an experimentalist. :-)
I wrote a quick UBASIC program and ran 30 million simulations 5 times. Each
time the average line length in a one-dimensional unit space converged very
quickly to 0.33333.... (to about 12 or 14 decimal places).
I then assumed that the vector sum would be the average line length. That
assumption looks like it might be wrong.
A further simulation picking two random points in a unit square yeilded an
average line length that converged to around 0.521192... after 1.2 million
interations. I suspected my random number generator, but using a different
one produced very similar results. I'm stumped. As to why it is not the
expected sqrt( 2 ) / 3 = 0.4714... I haven't a clue!!!!! But my sqrt( x^2 +
y^2 )/3 is clearly wrong, even though the X and Y components taken
seperately DO have the expected average length. Strange! It must have
something to do with the way the distributions of line lengths overlap each
other so that the probability of one of them being longer than average is
better than 50%, or some such high strangeness as that.
--gary
I'm at a bit of a loss as to why you would expect your "linearity logic" to
prevail. Ask yourself about the distribution of x^2 + y^2.
As the problem was originally stated is a very interesting problem.
Clearly, if the rectangle is HUGE by 1 you'll get a distribution close to
HUGE alone.
Best wishes, Jim
(1/15) * (sqrt(2) + 2 + 5*arcsinh(1)) = 0.521405433...
I don't know what I was thinking when I said that the expected value of the x
distance and the expected value of the y distance should be multiplied
together--that's clearly wrong. (What I was thinking, actually, was the same
as Gary's sqrt(xavg^2+yavg^2), where xavg=x/3 and yavg=y/3. Why I wrote what I
did is a mytery to me. This usually indicates it's time to go to
bed...although this time I've only been up about 32 hours.)
I didn't do this analytically, like Mark did, or experimentally, like Gary
did...I did it with an experimental approximation to analytically and got that
xavg=x/3 and yavg=y/3. Like Gary, I don't understand at all why this isn't
right.
I took Mark's expression for the average difference--the quadruple
integral--and had Mathematica evaluate it for a few cases. Here's what I get:
x=1, y=1, avg dist = 0.521405
x=2, y=2, avg dist = 16.685 = 0.521405 * 32
x=3, y=3, avg dist = 126.702 = 0.521405 * 243
So it looks like the integral gives (1/15)*(sqrt(2)+2+5*arcsinh(1))*dim^5 when
length=width=dim. But I have a bit of a problem with this. If the canvas is
2x2, we certainly don't expect the average distance between two points to be
16.685. What I'd expect for the case where the dimensions are w*h is that the
average distance would be 0.521405... * w*h. But that's not what one gets from
these integrals, unless I'm applying them wrong. (I integrated x1 and x2 from
0 to w and y1 and y2 from 0 to h:
Int [x1=0,w, Int[x2=0,w, Int[y1=0,h, Int[y2=0,h, sqrt[(x1-x2)^2+(y1-y2)^2]]]]
Mathematica is reticent to do this for all four integrals (or maybe I just was
too impatient), but it can evaluate the first three of these analytically, then
I integrate over y2 numerically.
Now, given the above, note the following:
x=2, y=3, avg dist = 47.4144 = 0.521405 * 90.9358
x=3, y=2, avg dist = 47.4144 = as above
90.9358 =/= (2*3)^(2.5), which is what I was expecting. Instead, 90.9358 =
6^2.51716. So I don't understand at all what's going on here.
x=2, y=4, avg dist = 103.011 = 0.521405 * 197.564
where 197.564 = (2*4)^2.54206.
This all leaves me far more puzzled than I was before I started this post.
Maybe Gary can plug these sizes into his experiment program and find out what
he gets for the average distance when the canvas isn't square? (Or, for that
matter, even when it is.) (I suppose I could write my own, but I think it's
bedtime.)
-Doug Magnoli
[Delete the 2 and the 3 from my email address.]
The scaling that should work here is that for a canvas that's 3x3, the average
distance between two points should be 3 times as big as the average distance for
the canvas that's 1x1. Seems to me that the average distance for a canvas that's
2x3 should be sqrt(2*3) times as big as the average distance for the 1x1 canvas.
I'm beyond the ability to think at this point, so it's off to bed and this can be
worried about on the morrow.
-Doug Magnoli
[Delete the 2 and the 3 from my email address.]
I'd think one also has to divide by
S_0^h S_0^h S_0^w S_0^w 1 dx1 dx2 dy1 dy2 = (hw)^2
(the rectangle being h by w).
The original problem is analytically very difficult. However, a simple guide
line can
be derived. The mean square length of the line joining two random points is
simply (x^2+y^2)/6
The root mean square distance is a first order approximation to the average
length of the line. For the case where one dimension, say the y dimension,
is much
shorter than the other, the mean is precisely sqrt(2/3)*RMS, or ~.816*RMS
When x and y are comparable, the mean is about .9*RMS. The latter point was
determined by simulation.
Earl
S_0^h S_0^h S_0^w S_0^w sqrt((x1-x2)^2+(y1-y2)^2) dx1 dx2 dy1 dy2
S_0^h S_0^h S_0^w S_0^w dx1 dx2 dy1 dy2
(where I'm using a notation for integration that I don't really like, but it's a lot clearer than other newsgroup methods of writing integrals when you're going to stack up four of them).
My results and your results are shown in the table.
h w
my result rms(h,w)
factor factor*rms(h,w)
1 1 0.521405
0.57735 0.9
0.519615
a a 0.521405*a
0.57735*a 0.9
0.519615*a
2 3 1.31707
1.47196 0.9
1.32476
2 4 1.60954
1.82574 0.9
1.64317
100 1 33.3423
40.8269 sqrt(2/3) 33.335
500 1 166.669
204.125 sqrt(2/3) 166.667
Notice that in every case the two methods give very similar results, especially considering that you said that when w and h are comparable, the mean is _about_ 0.9*rms (a result you said you found empirically). For the cases where h>>w, the two methods are in excellent agreement.
How did you get your expressions for mean square length? And from there, how did you come to the sqrt(2/3)*rms conclusion for the case where one dimension is much larger than the other?
tia,
-Doug Magnoli
[Delete the 2 and the 3 from my email address.]
Earl Paddon wrote:
[snip]
Int[x1=0,w, Int[x2=0,w, Int[y1=0,h, Int[y2=0,h, dy2] dy1] dx2] dx1] = w^2*h^2,
where w=width and h=height of canvas. If you do that to my calculated cases below
where a=b (height=width), you get:
h=1, w=1, avg dist = 0.521405
h=k, w=k, avg dist = k*0.521405
which makes sense. Also the units now make sense also--average distance now comes
out with units of length, rather than length^5.
However, for the cases where w=/=h, I still have a problem:
w=2, h=3, avg dist = 47.4144
= 0.521405 * 90.9358
= 0.521405 * 2^2 * 3^2 * 2.52599
= 2^2 * 3^2 * 1.31707
w=2, h=4, avg dist = 103.011
= 0.521405 * 197.564
= 0.521405 * 2^2 * 4^2 * 3.08694
= 2^2 * 4^2 * 1.60955
I don't recognize any of these numbers...they aren't the square roots of anything
meaningful, for example.
So it looks like the h=w case now makes sense (if you accept that the average
distance between two points on a 1x1 square is 0.521405..., when it seems to me, as
Gary originally indicated, that it ought to be sqrt(2)/3 ~ 0.471405... And no
matter what that is, it seems to me that it ought to scale linearly with the
dimensions of the canvas, so that whatever the result is for the 1x1 case, the hxw
case should give h*w*(1x1 result). So I still can't make any sense at all of the
w=/=h case.
Look at the case where you have a one inch by one inch region. You get an
expected result. Now consider the case where you have a one inch by one
light year region. Clearly you can't work with different units. So you
really have a
1 by 5,800,000,000*5280*12 region (I believe a light year is about
5,800,000,000 miles, my memory could be letting me down)
So now your analysis should reduce to a single random length (for the HUGE
direction). Does this thought help at all?
Best wishes, Jim
"Doug Magnoli" <dmagn...@attbi.com> wrote in message
news:3C178953...@attbi.com...
Following through on this (with a slightly different formulation, and with
extensive help from Wolfram's Integrator), I get
(2h^5 + 2w^5
+ 5h w^4 arcsinh(h/w) + 5h^4 w arcsinh(w/h)
- 2(h^4 - 3h^2 w^2 + w^4)sqrt(h^2 + w^2)
)/( 30 h^2 w^2 ).
Anyone care to do a sanity check on this?
BTW, I complaints about the notation (which I thoroughly agree with). My
inclination is to write things out in TeX format, but I'm never sure how
many people have TeX or, more importantly, are just familiar with the
notation. Any comments?
Trying several numerical examples, I get exactly the same results from your
function as I get from analytically integrating the first three integrals and
doing the last numerically. Notice that for h=w=1, your expression
simplifies to that given by Mark in his original post.
How did you get Mathematica's integrator to do this? I'm using Mathematica,
and it does the first 3 integrals fine, but when I ask it for the fourth, it
takes forever (I gave it an hour or so last night and it hadn't come up with
an answer yet), which is why I do the last one numerically.
-Doug Magnoli
[Delete the 2 and the 3 from my email address.]
It strikes me (finally) that the way the thing should scale (seems to me) with h (height) and w(width) is linearly with sqrt(w^2+h^2) / sqrt(2), which results in a kxk canvas giving an average distance between two points that's k*the distance for a 1x1 canvas. But this still doesn't explain my results for the case where h=/=w. I calculated average distance according to:
S_0^w S_0^w S_0^h S_0^h sqrt[(x1-x2)^2+(y1-y2)^2] dx1 dx2 dy1 dy2
S_0^w S_0^w S_0^h S_0^h dx1 dx2 dy1 dy2
w=1, h=1, avg dist = 0.521405 (for reference)
w=2, h=3, avg dist = 1.31707
= 0.521405 * 2.52599
sqrt(2^2+3^2) / sqrt(2) = 1.78614
2.52599 / 1.78614 = 1.41422 ---> very close to sqrt(2)
w=2, h=4, avg dist = 1.60954
= 0.521405 * 3.08693
sqrt(2^2+4^2) / sqrt(2) = 3.16228
3.08693 / 3.16228 = 0.977026
Still looking for how to scale this for the case where h=/=w.
Here's a peculiar observation: Using a one-dimensional space pick two
random lines and compare them. They will both average 0.3333... in length,
but the difference in their lengths will average 0.2667...(approximate by
simulation)
How peculiar! If they both average the same length why doesn't the
difference in their lengths average to zero?
But look at this: When two lines are picked at random the average length is
0.3333... BUT the average length of the SHORTER of the two is 0.2099
(approximate by simulation) and the vector sum of the shortest one and the
longest one (which averages 0.26667 longer) is:
sqrt( 0.2099^2 + (0.2099+0.2667)^2) = 0.521...
hmmmm.
So it turns out that the vector sum of the average component lengths DOES
give the average resultant length, but only if the average length of the
shorter vector and the average length of the longer vector is used in place
of the single average length.
--gary
I integrated over the differences in coordinates rather than over the
coordinates themselves. That is, let the two points be x units apart
horizontally and y units apart vertically, and compute
int_0^h int_0^w (1-x)(1-y)sqrt(x^2+y^2) dx dy
/ int_0^h int_0^w (1-x)(1-y) dx dy.
The 1-x (and mutatis mutandis 1-y) represents the probability of the
horizontal difference having the value x; graph x1 against x2 to see that
|x1-x2| = x on a pair of line segments the sum of whose lengths is
proportional to 1-x.
> Notice that for h=w=1, your expression
> simplifies to that given by Mark in his original post.
Indeed. It also has other expected properties, such as being symmetrical in h
and w, and spitting out the right units. I suspect it's correct, but it would
be nice if it were verified by someone more experienced with such things than
I.
>I'm trying to get my brain around why the vector sum of the two average
>components doesn't give the correct answer.
>
>Here's a peculiar observation: Using a one-dimensional space pick two
>random lines and compare them. They will both average 0.3333... in length,
>but the difference in their lengths will average 0.2667...(approximate by
>simulation)
>
>How peculiar! If they both average the same length why doesn't the
>difference in their lengths average to zero?
Why do you expect E(|x1-x2|) = |E(x1 - x2)|
In general, your problem seems to be that you expect E(f(x)) = f(E(x))
for any f
>But look at this: When two lines are picked at random the average length is
>0.3333... BUT the average length of the SHORTER of the two is 0.2099
>(approximate by simulation) and the vector sum of the shortest one and the
>longest one (which averages 0.26667 longer) is:
>
> sqrt( 0.2099^2 + (0.2099+0.2667)^2) = 0.521...
This is E(min(x1,x2)^2+(min(x1,x2) + (E(|x1 - x2|)))^2
Perhaps you can get something useful out of this, using min(x1,x2) =
(x1 + x2 - |x1 - x2|)/2.
--
Nis Jorgensen
Amsterdam
Please include only relevant quotes, and reply below the quoted text. Thanks
Urk! That should, of course, be (w-x) and (h-y) throughout. So:
int_0^h int_0^w (w-x)(h-y)sqrt(x^2+y^2) dx dy
/ int_0^h int_0^w (w-x)(h-y) dx dy,
and w-x represents the probability that the horizontal difference is x.
Sorry about that.
Charles, besides a theoretical solution, I would practically
test like this (and I do it, reference to the Line Drawing Thread):
s=0
time1=time
For i:=1 to imax Do
Begin
x1:= Random(xmax)
x2:= Random(xmax)
y1:= Random(ymax)
y2:= Random(ymax)
s:=s+Sqrt(Sqr(x2-x1)+Sqr(y2-y1)) (if Sqr(x2-x1) is executed Real )
End
time1=time-time1
s=0
time2=time
For i:=1 to imax Do
Begin
x1:= Random(xmax)
x2:= Random(xmax)
y1:= Random(ymax)
y2:= Random(ymax)
s:=s+Sqrt(Sqr(x2-x1)+Sqr(y2-y1))
e.g. ... MakeLine(x1,y1,x2,y2,col)
End
time2=time-time2
Print "consumed time =",time2-time1
Print "number of pixels=",Round(s) (evtl. +imax for zero length vectors)
Any objections, besides the slow speed ?
Best regards ---Gernot Hoffmann
Charles: by some intuition and tests I found this formula:
k = ymax/xmax
length/xmax = (1/3) Sqrt( 1 + k*k + (4/9)*k )
Note:
In one direction: length=(1/3)*xmax
as mentioned in another letter.
Best regards --Gernot Hoffmann