The modular division is posing a bit of a problem. Existing curve25519
implementations that I've seen do modular inverses as a^(p-2) mod p, which
takes O(n^3) time and can't be parallelized easily.
So, I'm thinking about implementing a potentially faster constant-time (i.e.
resistant to side-channel analysis)
algorithm for modular inverse, based on the algorithm and worst-case
analysis in "An Analysis of Lehmer's Euclidean GCD Algorithm"
Does anyone know of an existing constant-time modular inverse
implementation, or other ideas for doing this? Converting the
left-shift binary extended GCD algorithm to constant time seems much easier,
but I'd like something a bit faster, if possible. The only research or
discussion of this topic I've been able to find is this Usenet thread, which
didn't get very far: