The trick here is to turn this into matrix-multiplication.
Let X and Y be the input matrices, respectively of size d x m and d x n. Each column is a vector of length d, and you want to compute pairwise squared Euclidean distances between these columns and generate an m x n matrix R.
Then R can be expressed as
R[i, j]
= sum_k (X(k, i) - Y(k, j))^2
= sum_k X(k, i))^2 + sum_k Y(k, j)^2 - 2 * X(k, i) * Y(k, j)
Let M[i, j] be a matrix such that M[i, j] = sum_k X(k, i) * Y(k, j), then you can express the matrix M as X' * Y -- this is a matrix multiplication, which can be computed very fast using BLAS.
According to the analysis above, the computation can be re-organized into three steps
1 -- compute the sum of squares of each column in X and Y, resulting in sx and sy, this is column-wise sums
2 -- compute the matrix M, using matrix multiplication. Julia provides the At_mul_B function, so that you don't even need to store the transposed version of X.
3 -- combine the terms into R, as R[i, j] = sx[i] + sy[j] - M[i, j]
For squared Euclidean, R is just the result. If you want the Euclidean, just take the square roots.
The same decomposition can be applied to a variety of distances whose core part is a quadratic form. In Distance.jl, I used this strategy for many distances.