Hi Anjan,
I do like your implementations: they are pretty readable, the code is well-organized, and I think you've joined a short list of people that have actually implemented Falcon. However, I am afraid that I need to encourage you to clearly label your implementations, at the top of your READMEs, as experimental and likely insecure. There are a lot of extremely subtle things in cryptographic implementations, and it is extremely easy to make devastating mistakes.
People often say "don't roll your own cryptography", and you stated that you don't have a cryptography background. While "don't roll your own" might thus be used against your work, I do not agree with the sentiment: this is instead just one of your steps to becoming a cryptographer, so I do appreciate your work and hope you continue on this path. However, at this point in time, your implementation is too easy to use and too shiny (proper README, docs, Python bindings...), so there is a risk that people will take your code and integrate it into their products.
Regarding constant-time behaviour of your code, I have concerns. For example, in the `decapsulate` operation of KEMs using the Fujisaki-Okamoto transform, it is critical that the timing does not reveal if the decapsulation actually failed when the KEM returns the random value instead of the decapsulated shared secret. In your implementation
for (size_t i = 0; i < ctlen; i++) {
flg |= static_cast<bool>(cipher[i] ^ c_prime[i]);
}
std::memcpy(kdf_in, g_out, 32 * !flg);
std::memcpy(kdf_in, z, 32 * flg);
flg is thus a secret. The way the code is written, however, invites compilers to generate a conditional jump, which for example seems to be happening with MSVC here:
https://godbolt.org/z/hY919xfd1
Another example is your xgcd implementation, which is clearly not constant-time. Falcon's reference implementation contains an example of a constant-time implementation with lots of documentation. GMP also probably does not provide, on all platforms, the constant-time 64-bit floating point arithmetic that Falcon requires.
I am afraid I don't have enough crypto implementation expertise to properly evaluate your code. I suggest clarifying the status of the constant-time-ness of your implementation in your READMEs, along with the warning.
Regarding RNG: It's probably better to explicitly rely on the system random number generator than the C++ stdlib. As I read your implementation, while the random_device you use to seed the PRNG *may* be the system RNG, you then extract the used random bytes from a non-cryptographically secure RNG, the Mersenne Twister. You may draw some inspiration from / take the implementation from Sprenkels at
https://github.com/dsprenkels/randombytes/. Putting a seed into SHAKE, which is then used as a XOF for all other randomness needs is pretty common across many implementations, and fine. Linux, for example, does something similar under the hood, using ChaCha20.
I hope this helps, and again, I do like many things about your implementations. Good luck with the work.
Cheers,
Thom