I posted in
http://code.google.com/p/tensorcan/ a C version of the canonicalization algorithm for tensors
without free indices. It is roughly 60x faster than `double_coset_can_rep` in `tensor_can.py`.
For tensor computations with many contracted indices `double_coset_can_rep` takes most of the time,
e.g. 95% of the time in `test_riemann_invariants1` in
test_tensor.can.py (same using 'tensor.py' in PR 1700)
If there are few index contractions `double_coset_can_rep` takes little time, e.g. in the gamma matrix computations
in PR 1699 it takes 14% of the time.
If there is interest in this, I can write a wrapper for the C implementation of `double_coset_can_rep`, to speed up
SymPy tensor computations with many contracted indices, if the wrapper is installed.