Hi Everyone,
Poly1305 is now available. The check-in of interest is
https://github.com/weidai11/cryptopp/commit/62e99837e8277290e90f93ee615aa5d711645822.
The first attempt used the library's Integer class. It performed poorly, clocking around 330 cpb. Tweaks got it down to 90 cpb, but it was still under-performing. We tried Bernstein's recommended floating point implementation, but it was too specialized and required a fair amount ASM. We settled on Andy Polyakov's 32-bit implementation using scalar multiplication (
http://www.openssl.org/blog/blog/2016/02/15/poly1305-revised/).
Polyakov's OpenSSL implementation was a very good tradeoff. It will run well on the platforms we support without the need for ASM, including i686, x86_64, 32-bit ARM and 64-bit ARM. Benchmarks on mostly modern Intel hardware (for example, 2.5 GHZ Core i5) shows its clocking between 2.2 and 3.0 cpb. For completeness, Andy provided specialized ASM implementations for OpenSSL, and it runs around 1 cpb.
Documentation is available in the Crypto++ manual, but its kind of lite:
https://www.cryptopp.com/docs/ref/class_poly1305.html. A wiki page should be forth coming. The important thing to take away is, do not reuse a security context. For each message, either the key or nonce (or both) must be unique.
Jeff