I would assign "dataset.shape[0]" to a local (size_t) variable to make it
visible to the C compiler that it will not change during the execution.
> cdef unsigned int i, j
> if m == NULL:
> raise TreeError("memory allocation error")
> self.distances = <double[:dataset.shape[0], :dataset.shape[0]]>m
> for i in xrange(dataset.shape[0]):
What happens if you replace this xrange() with Cython's prange()?
> for j in xrange(dataset.shape[0]):
> if i == j:
> self.distances[i,j] = .0
This looks like a good place for an "else:".
> self.distances[i,j] = Graph.get_euklid_distance(dataset[i],
> dataset[j])
Slicing the memory views might be more effort than you'd want to pay here.
If you inline the method manually, and use direct indexing instead of
slicing+indexing, it should become visibly faster.
Basically, when nesting loops over non-trivial data, avoid any overhead
whatsoever in the inner(most) loops.
> @cython.boundscheck(False)
> @cython.wraparound(False)
> @cython.nonecheck(False)
> @staticmethod
> cdef inline double get_euklid_distance(int[:] l, int[:] r):
> cdef double d = 0
> cdef int i
> for i in xrange(l.size):
> d += pow(r[i]-l[i], 2)
> return sqrt(d)
Stefan