hi all,
I recently did a toy implementation of Falcon [1] and was encouraged to share my notes on implementation complexities with the group. I have no intention of deploying (or supporting the deployment of) Falcon at Google at this moment.
First, on the mathematics: Falcon is mathematically a very elegant scheme, that gave me a greater appreciation for cyclotomic number fields. (I can recommend Washington's "Introduction to Cyclotomic Fields" [2] on that topic, but the book is not required to implement Falcon)
My notes on the implementation:
I have to stress that this is not production quality code, and makes no overly eager attempt at being secure, performant, or even correctly following the spec. I usually have to implement an algorithm at least once in order to properly understand it, and this is that implementation, not one written to be actually useful. The main two observations I had when implementing where:
- This is the first time that I had to use a statistical z-test in order to debug a problem in my code. One of the samplers had a relatively simple mistake (I missed a bracket), but, since the sampling produced numbers that at least when eyeballing were in the correct size range, only statistical tests could narrow down which part of the sampler failed. Granted, I didn't use the test vectors the spec has for some of the samplers, so it might be possible to implement and debug Falcon without having to learn basic statistics, but it still was a unique experience.
- My system had an existing floating point implementation for arbitrary precision floating point calculations, using Java BigIntegers for the mantissa and ints for the exponent. While that, and the accompanying numeric analysis helper functions certainly were very far from optimized (Have I mentioned that this is not production quality code yet?), implementing and debugging Falcon got me to add special cased logic to support doubles for small precisions, as it took too long to step through the algorithm with the custom floating point logic. This floating point logic was good enough to compute all the ideal class groups I needed for my personal happiness (which requires a lot of real valued matrix operations), but Falcon needed something faster. While I want to be somewhat hesitant extrapolating from a toy implementation to a production quality one, this seems to make implementing Falcon in constant time a very difficult undertaking, having to balance implementation complexity, side channel protections, and performance very delicately (the arbitrary floating point logic was not even constant time, but using integers for mantissa and exponent is a prerequisite for a constant time implementation).
I've implemented several of the PQC algorithms from scratch (most are in the same library), so far Falcon was by far the most difficult to implement and debug one.
Sophie
--