Hi all,
I'd like to inform you that recent versions of Clang (15-18) can produce a secret-dependent branch in the Kyber / ML-KEM reference implementation. The vulnerable function is poly_frommsg.
Please find a vulnerability report below this message. Key facts:
The resulting timing leak is small but exploitable (a crude PoC on my laptop needs <10 minutes for ML-KEM 512 key recovery)
One example of a vulnerable instance is clang -Os, but there are more.
I also wrote a more educational piece about this issue, which is available here.
I’ve been in touch with Peter Schwabe from the Kyber team to get this patched upstream in the pqcrystals/kyber reference. The updated version is already available.
We’ve notified several libraries that integrate the reference implementation, as they are directly affected. Note that this does not rule out the possibility that other libraries, which have only slightly modified the relevant function, are vulnerable - either now or in the future.
I would like to thank Peter for the prompt and collaborative communication surrounding this issue.
Kind regards,
Antoon Purnal (PQShield)
===
Subject: control-flow timing leak in Kyber reference implementation when compiled with Clang 15-18 for -Os, -O1 and other options
Details: the function poly_frommsg produces a polynomial based on the bits of m during both encapsulation and decapsulation. Despite the source-level mitigations in poly_frommsg, the latest generations of Clang recognize that the code essentially performs a bit test and produces a secret-dependent branch for several compiler options.
Here are a few compiler options (Clang 15-16-17-18 on x86) which produce a branch (Godbolt link):
-Os
-O1
-O2 -fno-vectorize
-O3 -fno-vectorize
Note that plain -O2 and -O3 (i.e., without fno-vectorize) produce branch and vectorized code side by side in the binary. In practice, the vectorized instructions appear to be selected for execution at runtime.
Exploitable?: The timing leak can be used to implement a plaintext-checking oracle. On my laptop, a PoC local attack on the reference implementation leaks the ML-KEM 512 secret key in ~10 minutes using end-to-end decapsulation timing measurements.
I came up with this solution for our Libgcrypt-based implementation (not the official one, though):
void
_gcry_mlkem_poly_frommsg (gcry_mlkem_poly *r,
const unsigned char
msg[GCRY_MLKEM_INDCPA_MSGBYTES])
{
unsigned int i, j;
s16 mask;
s16 local_opt_blocker = _gcry_u32_opt_blocker_mask_zero;
for (i = 0; i < GCRY_MLKEM_N / 8; i++)
{
for (j = 0; j < 8; j++)
{
mask = -(s16)((msg[i] >> j)
& 1);
r->coeffs[8 * i + j] = (mask ^ local_opt_blocker)
& ((GCRY_MLKEM_Q + 1) / 2);
}
}
}
where
extern volatile u32 _gcry_u32_opt_blocker_mask_zero;
and has value zero, just as in the solution Daniel pointed to.
I think it should do the job both reliably and efficiently: (mask ^ local_opt_blocker) can be any value at run-time to the compiler, making it impossible to skip the computation (except the compiler would care to create a branch for local_opt_blocker to be either 0 or -1, which would be extremely unlikely). Yet the volatile variable has to be loaded only once at the beginning of the function call.
- Falko
MTG
AG
Dr. Falko Strenzke
Phone: +49
6151 8000 24
E-Mail: falko.s...@mtg.de
Web: mtg.de
MTG AG - Dolivostr. 11 - 64293 Darmstadt, Germany
Commercial register: HRB 8901
Register Court: Amtsgericht Darmstadt
Management Board: Jürgen Ruf (CEO), Tamer Kemeröz
Chairman of the Supervisory Board: Dr. Thomas Milde
This email may contain confidential and/or privileged
information. If you are not the correct recipient or have
received this email in error,
please inform the sender immediately and delete this
email.Unauthorised copying or distribution of this email is
not permitted.
Data protection information: Privacy policy
I'm not sure whether it's good to also force the optblocker variable to have default visibility. With default visibility, the possibility of interposition means that---assuming dynamic linking---it's always possible for post-linking code to touch the global variable, so a compiler claiming at link time to know that the variable is 0 is definitely wrong. On the other hand, I don't know how most of the people writing optimizations would learn this; much more blatant optimizer bugs happen all the time. Meanwhile interposition also adds annoying side conditions to formal verification.
I personally don't think it is necessary to make the variable visible. One well established use case for volatile variables are memory mapped registers. That means in order to make any assumption about the constancy such a variable the compiler would have to know that it is not modified by the underlying platform. So even when the linker is assigning the address to variable, on modern "large" systems this will be a virtual address and the operating system could still decide where to map it physically at load time. Generally I think such a feature of a compiler or linker, namely to be able to exclude that a volatile variable is mapped to a special address, would require a lot of cooperation between different parts of the compiler and linker, very detailed knowledge about the hardware platform, and it still be highly error prone. Maybe most importantly it will hardly be ever motivated to implement anything like this, since the when the developer declares a variable as volatile, it should be assumed they know what they want to achieve by this and consciously accept the cost.
- Falko
--
You received this message because you are subscribed to the Google Groups "pqc-forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pqc-forum+...@list.nist.gov.
To view this discussion on the web visit https://groups.google.com/a/list.nist.gov/d/msgid/pqc-forum/20240622095827.325626.qmail%40cr.yp.to.
--
You received this message because you are subscribed to the Google Groups "pqc-forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pqc-forum+...@list.nist.gov.
To view this discussion on the web visit https://groups.google.com/a/list.nist.gov/d/msgid/pqc-forum/20240623133031.78502.qmail%40cr.yp.to.
--
You received this message because you are subscribed to the Google Groups "pqc-forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pqc-forum+...@list.nist.gov.
To view this discussion on the web visit https://groups.google.com/a/list.nist.gov/d/msgid/pqc-forum/20240622095827.325626.qmail%40cr.yp.to.
To view this discussion on the web visit https://groups.google.com/a/list.nist.gov/d/msgid/pqc-forum/CAEseHRqmd%2Brm6o3dDAiyjkctwgQcXLnU72tv%2BiRrLKsJ4-k-CA%40mail.gmail.com.
Hello all,
I just found a slightly different implementation for the case of
the Kyber poly_frommsg() function that I want to propose for your
evaluation.
The difference is that the volatile opt_blocker applies to the
entire 1-bit mask which causes the boolean evaluation.
The volatile variable can optionally be made global.