[llvm-dev] Side-channel resistant values

Daan Sprenkels via llvm-dev

unread,

Sep 12, 2019, 5:30:40 AM9/12/19

to llvm...@lists.llvm.org

Hello all,

Many of us are dealing with data that should remain secret on a daily
basis. In my case, that's key material in cryptographic implementations.

To protect against side-channel attacks. We write our code in such a way
that the execution of the program is not dependent on the key and other
secret values. However compiler optimizations often makes this very hard
for us, and often our only resort is to fall back to writing in plain
assembly.

Let me give you an example: https://godbolt.org/z/b1-0_J

In this code, the programmer instructs the program to select a value
from the lookup table using a scanning approach, in an attempt to hide
the secret lookup index in `secret_lookup_idx`.

However, LLVM is smart and skips the memory lookups whenever i !=
secret_lookup_idx, exposing the function to cache side-channel attacks.

Now how to prevent this? Most tricks, for example using empty inline
assembly directives[1], are just ugly hacks.

So I am wondering: Is there any support for preventing these kinds of
optimizations? Or is there otherwise a "mostly recommended" way to
prevent these optimizations?

Thanks for your advice.

All the best,
Daan Sprenkels

PS. Perhaps, would there be interest to add such a feature to LLVM?
I found this repository on GitHub[2], adding a `__builtin_ct_choose`
intrinsic to clang. But as far as I know, this has never been upstreamed.

[1]: Chandler Carruth described this trick at CppCon15:
<https://youtu.be/nXaxk27zwlk?t=2472>. See it in practice:
<https://godbolt.org/z/UMPeku>
[2]: <https://github.com/lmrs2/ct_choose>,
<https://github.com/lmrs2/llvm/commit/8f9a4d952100ae03d06f10aee237bf8b3331da89>.
Later published at S&P18.
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

David Zarzycki via llvm-dev

unread,

Sep 12, 2019, 6:37:45 AM9/12/19

to Daan Sprenkels, llvm...@lists.llvm.org

I think adding a builtin to force CMOV or similar instructions on other architectures is long overdue. It’s generally useful, even if one isn’t mitigating speculative execution.

Finkel, Hal J. via llvm-dev

unread,

Sep 12, 2019, 12:18:28 PM9/12/19

to David Zarzycki, Daan Sprenkels, llvm...@lists.llvm.org

On 9/12/19 5:06 AM, David Zarzycki via llvm-dev wrote:

I think adding a builtin to force CMOV or similar instructions on other architectures is long overdue. It’s generally useful, even if one isn’t mitigating speculative execution.

I believe that you can currently get this effect using __builtin_unpredictable in Clang. __builtin_unpredictable wasn't added for this purpose, and it's a hint not a forced behavior, but I believe that it causes the backend to prefer cmov to branches during lowering.

-Hal

On Sep 12, 2019, at 12:30 PM, Daan Sprenkels via llvm-dev <llvm...@lists.llvm.org> wrote:

PS. Perhaps, would there be interest to add such a feature to LLVM?
I found this repository on GitHub[2], adding a `__builtin_ct_choose`
intrinsic to clang. But as far as I know, this has never been upstreamed.

[1]: Chandler Carruth described this trick at CppCon15:
<https://youtu.be/nXaxk27zwlk?t=2472>. See it in practice:
<https://godbolt.org/z/UMPeku>
[2]: <https://github.com/lmrs2/ct_choose>,
<https://github.com/lmrs2/llvm/commit/8f9a4d952100ae03d06f10aee237bf8b3331da89>.
Later published at S&P18.

_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

Chandler Carruth via llvm-dev

unread,

Sep 12, 2019, 3:22:13 PM9/12/19

to Finkel, Hal J., Matthew Riley, David Zarzycki, llvm...@lists.llvm.org

On Thu, Sep 12, 2019 at 9:18 AM Finkel, Hal J. via llvm-dev <llvm...@lists.llvm.org> wrote:

On 9/12/19 5:06 AM, David Zarzycki via llvm-dev wrote:

I think adding a builtin to force CMOV or similar instructions on other architectures is long overdue. It’s generally useful, even if one isn’t mitigating speculative execution.

I believe that you can currently get this effect using __builtin_unpredictable in Clang. __builtin_unpredictable wasn't added for this purpose, and it's a hint not a forced behavior, but I believe that it causes the backend to prefer cmov to branches during lowering.

I want to strongly advise against relying on this for anything to do with cryptography. There are a lot of optimizations that I think will undo this....

Sadly, I don't think we have any builtins that I think are reliable in this way. I agree this is a critically important thing, but it isn't as simple as exposing cmov IMO.

+Matthew Riley on my team is actually hoping to start working on getting a real data-invariant programming model moving for C++ and part of that will involve adding support to LLVM as well, so I suspect he'd be interested in this topic as well. Not sure what the timelines on any of our lpans are at this point though, so can't really promise much.

For now, I'd really suggest using the techniques used by BoringSSL and OpenSSL. Sadly, these predominantly rely on assembly. They do have some constructs to use C/C++ code and ensure it remains data-invariant, but it isn't because the constructs are actually reliable. Instead, they have testing infrastructure that they continually run and that checks the specific instruction stream produced by the compiler. Given the current state, that's about the only reliable approach I know of.

-Chandler

David Zarzycki via llvm-dev

unread,

Sep 13, 2019, 3:33:27 AM9/13/19

to Chandler Carruth, llvm...@lists.llvm.org, Matthew Riley

Hi Chandler,

The data-invariant feature sounds great but what about the general case? When performance tuning code, people sometimes need the ability to reliably generate CMOV, and right now the best advice is either “use inline assembly” or “keep refactoring until CMOV is emited” (and hope that future compilers continue to generate CMOV).

Given that a patch already exists to reliably generate CMOV, are there any good arguments against adding the feature?

Dave

Chandler Carruth via llvm-dev

unread,

Sep 13, 2019, 3:45:59 AM9/13/19

to David Zarzycki, llvm...@lists.llvm.org, Matthew Riley

On Fri, Sep 13, 2019 at 1:33 AM David Zarzycki via llvm-dev <llvm...@lists.llvm.org> wrote:

Hi Chandler,

The data-invariant feature sounds great but what about the general case? When performance tuning code, people sometimes need the ability to reliably generate CMOV, and right now the best advice is either “use inline assembly” or “keep refactoring until CMOV is emited” (and hope that future compilers continue to generate CMOV).

Given that a patch already exists to reliably generate CMOV, are there any good arguments against adding the feature?

For *performance* tuning, the builtin that Hal mentioned is IMO the correct design.

Is there some reason why it doesn't work?

David Zarzycki via llvm-dev

unread,

Sep 13, 2019, 4:03:02 AM9/13/19

to Chandler Carruth, llvm...@lists.llvm.org, Matthew Riley

On Sep 13, 2019, at 10:45 AM, Chandler Carruth <chan...@gmail.com> wrote:

On Fri, Sep 13, 2019 at 1:33 AM David Zarzycki via llvm-dev <llvm...@lists.llvm.org> wrote:
Hi Chandler,

The data-invariant feature sounds great but what about the general case? When performance tuning code, people sometimes need the ability to reliably generate CMOV, and right now the best advice is either “use inline assembly” or “keep refactoring until CMOV is emited” (and hope that future compilers continue to generate CMOV).

Given that a patch already exists to reliably generate CMOV, are there any good arguments against adding the feature?

For *performance* tuning, the builtin that Hal mentioned is IMO the correct design.

Is there some reason why it doesn't work?

I wasn’t aware of __builtin_unpredictable() until now and I haven’t debugged why it doesn’t work, but here are a couple examples, one using the ternary operator and one using a switch statement:

https://godbolt.org/z/S46I_q

Dave

Sanjay Patel via llvm-dev

unread,

Sep 13, 2019, 5:18:39 PM9/13/19

to David Zarzycki, llvm...@lists.llvm.org, Matthew Riley

I'm not sure if this is the entire problem, but SimplifyCFG loses the 'unpredictable' metadata when it converts a set of cmp/br into a switch:

https://godbolt.org/z/neLzN3

Filed here:

https://bugs.llvm.org/show_bug.cgi?id=43313

Craig Topper via llvm-dev

unread,

Sep 13, 2019, 5:41:55 PM9/13/19

to Sanjay Patel, David Zarzycki, llvm...@lists.llvm.org, Matthew Riley

I don't think the X86 cmov converter pass knows about unpredictable? Do we even preserve that metadata into Machine IR?

There's also this frontend bug with the builtin getting translated to IR https://bugs.llvm.org/show_bug.cgi?id=40031

~Craig

Finkel, Hal J. via llvm-dev

unread,

Sep 13, 2019, 9:05:07 PM9/13/19

to Craig Topper, Sanjay Patel, David Zarzycki, llvm...@lists.llvm.org, Matthew Riley

On 9/13/19 4:41 PM, Craig Topper via llvm-dev wrote:

I don't think the X86 cmov converter pass knows about unpredictable? Do we even preserve that metadata into Machine IR?

AFAIK, no. It doesn't make it past the SDAGBuilder (mostly, it is used in CGP in order to prevent the select -> branch conversion). It also, in both SDAGBuilder and CGP, prevents splitting of logical operations feeding conditional branches - splitting by forming more branches).

-Hal

David Zarzycki via llvm-dev

unread,

Sep 14, 2019, 1:19:38 AM9/14/19

to Sanjay Patel, Chandler Carruth, llvm...@lists.llvm.org, Matthew Riley

I’m struggling to find cases where __builtin_unpredictable() works at all. Even if we ignore cmp/br into switch conversion, it still doesn’t work:

int test_cmov(int left, int right, int *alt) {
return __builtin_unpredictable(left < right) ? *alt : 999;
}

Should generate:

test_cmov:
movl $999, %eax
cmpl %esi, %edi
cmovll (%rdx), %eax
retq

But currently generates:

test_cmov:
movl $999, %eax
cmpl %esi, %edi
jge .LBB0_2
movl (%rdx), %eax
.LBB0_2:
retq

Chandler Carruth via llvm-dev

unread,

Sep 14, 2019, 1:35:54 AM9/14/19

to David Zarzycki, llvm...@lists.llvm.org, Matthew Riley

The x86 backend is extremely aggressive in turning cmov with memory operands into branches because that is often faster even for poorly predicted branches due to the forced stall in the cmov.

David Zarzycki via llvm-dev

unread,

Sep 14, 2019, 2:12:39 AM9/14/19

to Chandler Carruth, llvm...@lists.llvm.org, Matthew Riley

Hi Chandler,

I feel like this conversation has come full circle. So to ask again: how does one force CMOV to be emitted? You suggested “__builtin_unpredictable()” but that gets lost in various optimization passes. Given other architectures have CMOV like instructions, and given the usefulness of the instruction for performance tuning, it seems like a direct intrinsic would be best. What am I missing?

Dave

On Sep 14, 2019, at 8:35 AM, Chandler Carruth <chan...@gmail.com> wrote:

Chandler Carruth via llvm-dev

unread,

Sep 14, 2019, 2:19:01 AM9/14/19

to David Zarzycki, llvm...@lists.llvm.org, Matthew Riley

On Sat, Sep 14, 2019 at 12:12 AM David Zarzycki <da...@znu.io> wrote:

Hi Chandler,

I feel like this conversation has come full circle. So to ask again: how does one force CMOV to be emitted? You suggested “__builtin_unpredictable()” but that gets lost in various optimization passes. Given other architectures have CMOV like instructions, and given the usefulness of the instruction for performance tuning, it seems like a direct intrinsic would be best. What am I missing?

LLVM operates at a higher level of abstraction IMO, so I don't really feel like there is something missing here. LLVM is just choosing a lowering that is expected to be superior even in the face of an unpredictable branch.

If there are real world benchmarks that show it this lowering strategy is a problem, file bugs with those benchmarks? We can always change the heuristics based on new information.

I think if you want to force a particular instruction to be used, there is already a pretty reasonable approach: inline assembly.

Sanjay Patel via llvm-dev

unread,

Sep 15, 2019, 11:10:24 AM9/15/19

to Chandler Carruth, David Zarzycki, llvm...@lists.llvm.org, Matthew Riley

To confirm with Dave's minimal example, the metadata does survive the IR optimizer in that case:

$ clang -O2 unpred.c -S -o - -emit-llvm | grep unpredictable
br i1 %cmp, label %cond.true, label %cond.end, !unpredictable !3

So yes, it's the backend (x86 cmov conversion pass) that doesn't have access to the metadata and transforms to branch.

This was discussed here with some compelling perf claims:

https://bugs.llvm.org/show_bug.cgi?id=40027

And similarly/originally filed including different perf harm:

https://bugs.llvm.org/show_bug.cgi?id=37144

And there are side-channel/crypto/constant-time comments here:

https://bugs.llvm.org/show_bug.cgi?id=42901

I'm not sure if this changes/adds anything for the above bugs and the examples in this thread, but we recently made an x86 change to favor cmov for perf in:

https://reviews.llvm.org/D67087

based on perf numbers in:

https://bugs.llvm.org/show_bug.cgi?id=43197

Reply all

Reply to author

Forward