Buffer overflows in HAETAE / On crypto vs implementation errors.

614 views
Skip to first unread message

Markku-Juhani O. Saarinen

unread,
Jul 27, 2023, 5:10:55 PM7/27/23
to pqc-forum
Hi,

When reviewing the new NIST candidates, I wanted to refrain from reporting plain implementation bugs in Round 1. Still, the constant struggle just to keep some reference implementations from crashing is demoralizing to a cryptography engineer who knows about exploits and implementation security. One frequently needs to use the code to clarify the spec or perform experiments. Sometimes the line between an implementation error and an algorithm/specification error is blurred.

Let's look at Section 3.5 of the HAETAE specification and how that matches with the implementation.
"The rANS encoded values h and high bits of z1 lead to a varying signature size. In our current implementation we opted for a fixed signature size as reported in Table 3. [..] A field of two bytes indicates the length of the encoded values, the padding can be done with arbitrary data."

* The implementation indeed pads with arbitrary data -- whatever happens to be in the memory when the signing function is called. This makes the function non-deterministic: I had trouble matching the KAT files because the target memory was initialized differently from the machine where the KAT files were generated. However, putting my security engineer hat on, this indicates leakage of potentially sensitive information in the padding and would normally be considered a serious security vulnerability. 

As a cryptographer, I know that since the padding in the fixed-size signature is not checked in any way, the signatures are not "strongly unforgeable" as I can trivially turn a valid fixed-length signature into another valid fixed-length signature for the same message. This is not a big problem, but the spec actually talks about SUF-CMA (Section 5).

Let's look at the signature decoding implementation in `Reference_Implementation/src/packing.c`:
```
243:    size_enc_hb_z1 = ((uint16_t)sig[1]) << 8 | sig[0];
244:    sig += 2;
245:    size_enc_h = CRYPTO_BYTES - SEEDBYTES - L * N - 2 - size_enc_hb_z1;
246:
247:    memcpy(encoded_hb_z1, sig, size_enc_hb_z1);
248:    decode_hb_z1(&highbits_z1->vec[0].coeffs[0], encoded_hb_z1);
249:    sig += size_enc_hb_z1;
250:
251:    memcpy(encoded_h, sig, size_enc_h);
252:    decode_h(&h->vec[0].coeffs[0], encoded_h);
253:
254:    return 0;
```
* We can see that the signature unpacking function has no error checking and always returns 0 -- no signature is malformed according to the reference implementation. Well, the spec didn't explicitly ask for any checks. However, one doesn't need to be a cryptographer to see that HAETAE signatures can be malformed in some incredibly dangerous ways, and the reference implementations are vulnerable to all of them.

* The two-byte "z1" length field is decoded on line 243 and passed unchecked as an argument of `memcpy()` on line 247 (similarly in the optimized implementation). For those of us who remember "Heartbleed" from years ago: this is the type of bug that caused it.

Here the destination buffer is in the stack rather than the heap and in "real-life" production code this would be automatically classed as a critical/potential remote code execution vulnerability. The decoded, attacker-controlled length variable `size_enc_hb_z1` is used further down in pointer arithmetic, leading to all kinds of exploitable behavior. (Unless it wasn't previously clear: under no circumstances should the HAETAE code be used in an internet-facing production environment.)

* Looking at the `z` and `h` decoding functions (lines 248 and 252) reveals that they also happily read past their input buffers without alarms. However, examining the enormous, 26099-line (2.0MB !) `encoding.c` raises more questions about how embedded and hardware engineers can possibly implement these tables.

Now, this is not cryptanalysis -- any professional programmer can make the observations discussed above, even the thing about strong unforgeability (which is allowed in the spec for some reason,.)

The thing that distinguishes "cryptanalytically exploitable input validation issues" (as those discussed in relation to ALTEQ, MEDS, LESS, and DME) from these plain-old bugs is that they are much less likely not to be flagged in code reviews or KAT testing. The behavior of the code matches with the spec and one needs to know some cryptanalysis to even realize that they're exploitable. After all, I'm pretty sure the design teams would have "added checks" against these conditions if they had known that they were exploitable. I certainly would have pressed LESS to have additional checks had I realized two months ago that the issue is exploitable.

While I'm sure these particular "cryptanalytic input validation" issues in ALTEQ & Co are innocent mistakes, I also know that such input validation issues in signature verification make absolutely perfect "nobus" cryptographic authentication backdoors! Much better than the issues described above for HAETAE. A cryptographic module with vulnerabilities of this type could even pass CAVP (which is just randomized test vectors) and get FIPS certified. Hence one should be extra careful about documenting them and addressing them in specifications.

Cheers,
- markku

Julien Devevey

unread,
Jul 29, 2023, 4:16:47 AM7/29/23
to Markku-Juhani O. Saarinen, pqc-forum
Dear Markku, dear all,

thanks for pointing out the bug in our decoding routine, which we will fix in an updated version soon. We are currently working towards implementing the more efficient tANS instead of rANS (which will also vastly decrease the size of the encoding tables) and planned to change the encoding slightly anyways. In addition to the length field, which is unnecessary long, the code currently does not include the challenge polynomial but rather its hash (similar to Dilithium). However, as our challenge has binary coefficients, we can also directly send the polynomial itself without increasing the signature size.

Additionally, we have found a bug in polyfix.c, where the variable sqsum is not initialized to zero in each loop iteration, but only in the beginning of the function. If a hyperball sample is rejected, the next candidate is automatically very small, which may be exploitable cryptanalytically. Notably, this does not affect the signature rejection loop. Since rejection of a hyperball sample does not happen very often, we expect no significant increase in runtime for a fixed version.

On behalf of the Haetae team,
Julien
Reply all
Reply to author
Forward
0 new messages