How silly of me, of course Matlab is using Tim Davis' code here, but it's the more advanced SSMULT code (it's GPL, you can see it under SuiteSparse/MATLAB_Tools/SSMULT if the license isn't a concern for you), rather than the simplistic version in CSparse and his book.
According to "On the Representation and Multiplication of Hypersparse Matrices" by Buluc and Gilbert, the Sulatycke-Ghose algorithm "examines all possible (i, j) positions of the input matrix A in the outermost loop and tests whether they are nonzero. Therefore, their algorithm has O(flops + n^2) complexity, performing unnecessary operations when flops < n^2."
One thing you can play with is trying
x = Base.LinAlg.CHOLMOD.CholmodSparse(1.0*x)
which cuts another 5 seconds off your test case, but be warned that it appears to leak memory (CholmodSparse apparently needs a finalizer?) if you don't run gc() every few iterations. Changing
y = x*transpose(x)
to
y = x*x'
cuts another few seconds, very comparable to Matlab. CholmodSparse does have A_mul_Bt methods defined, unlike SparseMatrixCSC.
I can get the pure-Julia version down to about 16 seconds by splitting the operation up into two passes, one just to determine the number of nonzeros in each column of the product then a separate pass to do the row indices and nonzeros. This avoids having to dynamically reallocate memory multiple times:
https://gist.github.com/tkelman/9175190
-Tony