4 views

Skip to first unread message

Sep 30, 2006, 5:37:27 AM9/30/06

to

Out of curiosity, I tested these two approaches on a number of data sets

for which I make frequent use. For some reason Jens code is running

slower! I've been testing it out some lists of reals of size

n=500,1000,2500, and 5000. Is it possible that the time for the

conditional compares is exceeding the computational time of redundant

calculations? Could someone try this out?

(note: I'm working on some code for identifying outliers in large data

sets. The efficient calculation of L-1 and L-2 distance matrices are

important.)

Thanks

Mark

-----Original Message-----

From: Murray Eisenberg [mailto:mur...@math.umass.edu]

Subject: Re: distance function

Yes, I KNOW that I'm computing the distances twice in my solution:

that's why I said it's an "extravagant" solution!

Jens-Peer Kuska wrote:

> Hi Murray,

>

> at least you should compute the distances not twice because the matrix

> is symmetric with zero diagonal ...

>

> d[{p_,p_}]:=0.0

> d[{q_,p_}]/; OrderedQ[{q,p}]:=d[{q,p}]= Norm[p - q]

> d[{q_,p_}]:=d[{p,q}]

>

> Regards

> Jens

>

>

> Murray Eisenberg wrote:

>> If you don't mind an "extravagant" solution -- one that is

>> conceptually simple and short but is probably inefficient due to

>> redundant calculations -- then this works, I believe:

>>

>> d[{p_, q_}] := Norm[p - q]

>> allDistances[pts_] := Union[Flatten[Outer[d, pts, pts]]]

>>

>>

>>

>> dimm...@yahoo.com wrote:

>>> In the book of Gaylord et al. (1996) there is one exercise which

>>> asks (see page 113)

>>>

>>> "Given a list of points in the plane, write a function that finds

>>> the set of all distances between the points."

>>>

>>> Although there is one solution, that solution makes use of the Table

>>> and Length commands.

>>>

>>> Is it a way to define the same function using Higher-Order functions

>>> like Outer, MapThread etc?

>>>

>>> Thanks in advance for any help.

>>>

>>>

>

>

--

Murray Eisenberg mur...@math.umass.edu

Mathematics & Statistics Dept.

Lederle Graduate Research Tower phone 413 549-1020 (H)

University of Massachusetts 413 545-2859 (W)

710 North Pleasant Street fax 413 545-1801

Amherst, MA 01003-9305

Oct 1, 2006, 4:32:10 AM10/1/06

to

Coleman, Mark schrieb:

Hi Mark,

I think the most efficient proposal came from Bob Hanlon. I tested

some alternatives and the fastest is d @@@S ubsets[pts, {2}] (2.98s for

1000 points) followed by d @@@ Flatten[MapIndexed[Drop[#1, #2[[1]]-1] &,

Outer[List, Most@pts, Rest@pts, 1]], 1] with 3.31 s.

Peter

Oct 1, 2006, 4:43:32 AM10/1/06

to

Why not use caching? Also, Maxim said Norm is slow.

d[{p_,p_}]:=0

d[{p_,q_}]:=d[{p,q}]=d[{q,p}]=Sqrt[#.#]&[p-q]

If memory usage becomes a problem, you could always Clear d.

Alternatively, you could make a Module function that operates on an

entire matrix, using d as one of the temporary "variables" inside of

the Module. As long as you code it correctly (so that d doesn't appear

in the result expression), then d will automatically be Removed.

> --

> Murray Eisenberg mur...@math.umass.edu

> Mathematics & Statistics Dept.

> Lederle Graduate Research Tower phone 413 549-1020 (H)

> University of Massachusetts 413 545-2859 (W)

> 710 North Pleasant Street fax 413 545-1801

> Amherst, MA 01003-9305

>

>

Oct 2, 2006, 12:44:21 AM10/2/06

to

Could someone please explain why Mathematica seems to be expecting the

output of cc to be a vector?

output of cc to be a vector?

pts[1]=Table[{Random[],Random[]},{3}];

cc[p_,p_]=0;

cc[p_,q_]:=cc[q,p]=Sqrt[#.#]&[p-q];

ccC=Compile[{{p,_Real,1},{q,_Real,1}},cc[p,q]];

Outer[ccC,pts[1],pts[1],1]

Oct 2, 2006, 12:57:36 AM10/2/06

to

Coleman, Mark wrote:

>

>Out of curiosity, I tested these two approaches on a number of data sets

>for which I make frequent use. For some reason Jens code is running

>slower! I've been testing it out some lists of reals of size

>n=500,1000,2500, and 5000. Is it possible that the time for the

>conditional compares is exceeding the computational time of redundant

>calculations? Could someone try this out?

>

>(note: I'm working on some code for identifying outliers in large data

>sets. The efficient calculation of L-1 and L-2 distance matrices are

>important.)

>

>Thanks

>

>Mark

>

>

>

The package Statistics`ClusterAnalysis` has a very quick DistanceMatrix

function:

data=Table[Random[],{1000},{3}];

<<Statistics`ClusterAnalysis`

Sqrt[DistanceMatrix[data]];//Timing

{0.187 Second,Null}

The default distance function for DistanceMatrix is

SquaredEuclideanDistance, hence the use of Sqrt. It is possible to use

the option DistanceFunction->EuclideanDistance to obviate the need for Sqrt:

DistanceMatrix[data, DistanceFunction->EuclideanDistance];

However, this option seems to be about 25% slower. This is probably

because it is faster to take the square root of a matrix and rely on

vector operations rather than taking the square root of each matrix element.

Carl Woll

Wolfram Research

Oct 3, 2006, 6:27:49 AM10/3/06

to

Chris Chiasson wrote:

> Could someone please explain why Mathematica seems to be expecting the

> output of cc to be a vector?

>

> pts[1]=Table[{Random[],Random[]},{3}];

> cc[p_,p_]=0;

> cc[p_,q_]:=cc[q,p]=Sqrt[#.#]&[p-q];

> ccC=Compile[{{p,_Real,1},{q,_Real,1}},cc[p,q]];

> Outer[ccC,pts[1],pts[1],1]

> Could someone please explain why Mathematica seems to be expecting the

> output of cc to be a vector?

>

> pts[1]=Table[{Random[],Random[]},{3}];

> cc[p_,p_]=0;

> cc[p_,q_]:=cc[q,p]=Sqrt[#.#]&[p-q];

> ccC=Compile[{{p,_Real,1},{q,_Real,1}},cc[p,q]];

> Outer[ccC,pts[1],pts[1],1]

Hi Chris,

Written in the following way, Compile does not complain anymore:

In[1]:=

pts[1] = Table[{Random[], Random[]}, {3}];

ccC = Compile[{{p, _Real, 1}, {q, _Real, 1}},

If[p == q, 0, (Sqrt[#1 . #1] & )[p - q]]];

Outer[ccC, pts[1], pts[1], 1]

Out[3]=

{{0.,1.00434,0.161348},{1.00434,0.,0.845952},{0.161348,0.845952,0.}}

This, of course, does not explain why in first instance Compile was

expecting a rank 1 vector. Using the optional argument of Compile to

tell Mathematica what is the return type of an external function does

not help either:

ccC = Compile[{{p, _Real, 1}, {q, _Real, 1}}, cc[p, q], {{cc, _Real, 0}}];

HTH,

Jean-Marc

Oct 4, 2006, 6:24:11 AM10/4/06

to

Carl Woll wrote:

> [...]

> The package Statistics`ClusterAnalysis` has a very quick

> DistanceMatrix function:

> [...]

> [...]

> The package Statistics`ClusterAnalysis` has a very quick

> DistanceMatrix function:

The ClusterAnalysis.m file contains some usage messages

for DistanceMatrix, but not the code. Where is the code?

Oct 5, 2006, 3:55:13 AM10/5/06

to

Ray Koopman wrote:

>Carl Woll wrote:

>

>

>>[...]

>>The package Statistics`ClusterAnalysis` has a very quick

>>DistanceMatrix function:

>>[...]

>>

>>

>

>The ClusterAnalysis.m file contains some usage messages

>for DistanceMatrix, but not the code. Where is the code?

>

>

Sometimes a package developer will code up a function in the kernel, and

then include just the usage message in the .m file. This is what occurs

here. The DistanceMatrix function can be accessed without loading the

package, as it already exists in the kernel. Try the following from a

fresh kernel (without loading Statistics`ClusterAnalysis`):

Quit

data = Table[ Random[], {3}, {2}];

Statistics`ClusterAnalysis`DistanceMatrix[data]

and you will get the distance matrix. If you load the package, then you

won't need to fully qualify the function name to use it. So, if you were

looking at the package to discover how someone could write Mathematica

code to create a distance matrix so quickly you will discover that they

"cheated", i.e., they coded the function within the kernel.

Carl Woll

Wolfram Research

Oct 5, 2006, 4:00:26 AM10/5/06

to

Chris Chiasson schrieb:

> Could someone please explain why Mathematica seems to be expecting the

> output of cc to be a vector?

>

> pts[1]=Table[{Random[],Random[]},{3}];

> cc[p_,p_]=0;

> cc[p_,q_]:=cc[q,p]=Sqrt[#.#]&[p-q];

> ccC=Compile[{{p,_Real,1},{q,_Real,1}},cc[p,q]];

> Outer[ccC,pts[1],pts[1],1]

>

> Could someone please explain why Mathematica seems to be expecting the

> output of cc to be a vector?

>

> pts[1]=Table[{Random[],Random[]},{3}];

> cc[p_,p_]=0;

> cc[p_,q_]:=cc[q,p]=Sqrt[#.#]&[p-q];

> ccC=Compile[{{p,_Real,1},{q,_Real,1}},cc[p,q]];

> Outer[ccC,pts[1],pts[1],1]

>

I guess it's because p and q are vectors and compile doesn't analyze cc.

b.t.w. It doesn't make much sense to compile only the call to a function

but not the function itself:

pts[1] = Table[Random[], {1000}, {2}];

ccC = Compile[{{p, _Real, 1}, {q, _Real, 1}}, cc[p, q], {{_cc, _Real}}];

Timing[Outer[ccC, pts[1], pts[1], 1];]

lasts 17.5 seconds but if you compile cc inline:

pts[1] = Table[Random[], {1000}, {2}];

ccC = Compile[{{p, _Real, 1}, {q, _Real, 1}}, Evaluate@cc[p, q]];

Timing[Outer[ccC, pts[1], pts[1], 1];]

only 1.8 seconds are necessary.

Peter

Oct 5, 2006, 4:08:37 AM10/5/06

to

thanks!

I thought compile could go "inside" the function.

Of course, evaluating the code before compilation leaves out one of my

pattern definitions :-[

On 10/4/06, Peter Pein <pet...@dordos.net> wrote:

> Chris Chiasson schrieb:

> > Could someone please explain why Mathematica seems to be expecting the

> > output of cc to be a vector?

> >

> > pts[1]=Table[{Random[],Random[]},{3}];

> > cc[p_,p_]=0;

> > cc[p_,q_]:=cc[q,p]=Sqrt[#.#]&[p-q];

> > ccC=Compile[{{p,_Real,1},{q,_Real,1}},cc[p,q]];

> > Outer[ccC,pts[1],pts[1],1]

> >

>

> I guess it's because p and q are vectors and compile doesn't analyze cc.

>

> b.t.w. It doesn't make much sense to compile only the call to a function

> but not the function itself:

>

> pts[1] = Table[Random[], {1000}, {2}];

> ccC = Compile[{{p, _Real, 1}, {q, _Real, 1}}, cc[p, q], {{_cc, _Real}}];

> Timing[Outer[ccC, pts[1], pts[1], 1];]

>

> lasts 17.5 seconds but if you compile cc inline:

>

> pts[1] = Table[Random[], {1000}, {2}];

> ccC = Compile[{{p, _Real, 1}, {q, _Real, 1}}, Evaluate@cc[p, q]];

> Timing[Outer[ccC, pts[1], pts[1], 1];]

>

> only 1.8 seconds are necessary.

>

> Peter

>

Oct 6, 2006, 2:15:00 AM10/6/06

to

Was it ever not in the kernel, say in an older version?

If it wasn't in the kernel and it was in a human readable package, a

customer with a backlog of older versions may be able to tell us what

the code was.

If it wasn't in the kernel and it was in a human readable package, a

customer with a backlog of older versions may be able to tell us what

the code was.

On 10/5/06, Carl Woll <ca...@wolfram.com> wrote:

> Ray Koopman wrote:

>

> >Carl Woll wrote:

> >

> >

> >>[...]

> >>The package Statistics`ClusterAnalysis` has a very quick

> >>DistanceMatrix function:

> >>[...]

> >>

> >>

> >

> >The ClusterAnalysis.m file contains some usage messages

> >for DistanceMatrix, but not the code. Where is the code?

> >

> >

> Sometimes a package developer will code up a function in the kernel, and

> then include just the usage message in the .m file. This is what occurs

> here. The DistanceMatrix function can be accessed without loading the

> package, as it already exists in the kernel. Try the following from a

> fresh kernel (without loading Statistics`ClusterAnalysis`):

>

> Quit

>

> data = Table[ Random[], {3}, {2}];

>

> Statistics`ClusterAnalysis`DistanceMatrix[data]

>

> and you will get the distance matrix. If you load the package, then you

> won't need to fully qualify the function name to use it. So, if you were

> looking at the package to discover how someone could write Mathematica

> code to create a distance matrix so quickly you will discover that they

> "cheated", i.e., they coded the function within the kernel.

>

> Carl Woll

> Wolfram Research

>

>

Reply all

Reply to author

Forward

0 new messages

Search

Clear search

Close search

Google apps

Main menu