__ bind(&no_stack_change);This label could move into the branch above?
struct __attribute__((packed)) EffectDimensions {Why is this necessary?
void* function_desc[3];Is there no way to achieve this without handwritten assembly?
__thread bool TrapHandlerGuard::is_active_ = 0;Why is this needed?
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
struct __attribute__((packed)) EffectDimensions {Why is this necessary?
On AIX, Clang (like XLC) promotes bit fields to unsigned int unless packed. This breaks the assertions that the size of these structs are 1 byte large.
void* function_desc[3];Is there no way to achieve this without handwritten assembly?
We tried multiple ways to do this without, but Clang kept optimizing the descriptor setup away since casting to it is UB. Doing it directly in assembly assures that we get the behavior we need without the optimizer getting in our way.
struct __attribute__((packed)) EffectDimensions {Korinne AdlerWhy is this necessary?
On AIX, Clang (like XLC) promotes bit fields to unsigned int unless packed. This breaks the assertions that the size of these structs are 1 byte large.
I suppose we could use bitfields or maybe compile v8 with `-fpack-struct` but I guess this is less invasive and more likely to just work.
void* function_desc[3];Korinne AdlerIs there no way to achieve this without handwritten assembly?
We tried multiple ways to do this without, but Clang kept optimizing the descriptor setup away since casting to it is UB. Doing it directly in assembly assures that we get the behavior we need without the optimizer getting in our way.
__thread bool TrapHandlerGuard::is_active_ = 0;AbdirahimWhy is this needed?
This is needed because clang complains that is_active is not initialized. on AIX is_active is a __thread.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
__ bind(&no_stack_change);This label could move into the branch above?
I think we might be able to. @mfar...@ibm.com do you think this would be ok?
__ bind(&no_stack_change);AbdirahimThis label could move into the branch above?
I think we might be able to. @mfar...@ibm.com do you think this would be ok?
"li 0, 0\n\t"I don't 100% understand what this code does; but gemini told me that this clobbers r0 which isn't on the clobber list?
__thread bool TrapHandlerGuard::is_active_ = 0;AbdirahimWhy is this needed?
This is needed because clang complains that is_active is not initialized. on AIX is_active is a __thread.
Mmh ok weird. Sounds like this might be a linker issue; while `thread_local` should actually be supported by clang?
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
"li 0, 0\n\t"I don't 100% understand what this code does; but gemini told me that this clobbers r0 which isn't on the clobber list?
It does indeed clobber r0, which is why it _is_ in the clobber list (along with memory).
The code is equivalent to the C++ code it replaces, initializing `function_desc` and assigning the address of the descriptor to the `Signature* fn`.
Line by line explanation:
```
std %1, 0(%2) // function_desc[0] = fn_ptr_
li 0, 0 // set r0 = 0 for use in the next two instructions
std 0, 8(%2) // function_desc[1] = 0
std 0, 16(%2) // function_desc[2] = 0
mr %0, %2 // fn = function_desc
```
__thread bool TrapHandlerGuard::is_active_ = 0;AbdirahimWhy is this needed?
Toon VerwaestThis is needed because clang complains that is_active is not initialized. on AIX is_active is a __thread.
Mmh ok weird. Sounds like this might be a linker issue; while `thread_local` should actually be supported by clang?
In trap-handler-internal.h, `is_active_` is declared as a `__thread` on AIX and links to a GCC bug report to explain why. However, the definition of the variable here still used `thread_local`, which while apparently acceptable on GCC gave a compile error on Clang.
We reached out to the AIX Clang devs to see the recommended way to fix this along with some other TLS-related questions. Hubert Tong (who commented in the GCC bug mentioned previously) told us that using `thread_local` here should be fine, but that it incurs extra overhead over using `__thread`. Since we still need to support GCC for now, it's easier to leave it consistent for AIX between GCC and Clang. Once we drop GCC we should be able to remove the ifdefs.
__thread bool TrapHandlerGuard::is_active_ = 0;AbdirahimWhy is this needed?
Toon VerwaestThis is needed because clang complains that is_active is not initialized. on AIX is_active is a __thread.
Mmh ok weird. Sounds like this might be a linker issue; while `thread_local` should actually be supported by clang?
In trap-handler-internal.h, is_active_ is defined as a __thread on AIX and links to a GCC bug report to explain why. However, the definition of the variable here still uses thread_local, which while apparently acceptable on GCC gave a compile error on Clang.
We reached out to the AIX Clang devs to see the recommended way to fix this along with some other TLS-related questions. Hubert Tong (who commented in the GCC bug mentioned previously) told us that using thread_local here should be fine, but that it incurs extra overhead over using __thread. Since we still need to support GCC for now, it's easier to leave it consistent for AIX between GCC and Clang. Once we drop GCC we should be able to remove the ifdefs.
"li 0, 0\n\t"Korinne AdlerI don't 100% understand what this code does; but gemini told me that this clobbers r0 which isn't on the clobber list?
It does indeed clobber r0, which is why it _is_ in the clobber list (along with memory).
The code is equivalent to the C++ code it replaces, initializing `function_desc` and assigning the address of the descriptor to the `Signature* fn`.
Line by line explanation:
```
std %1, 0(%2) // function_desc[0] = fn_ptr_
li 0, 0 // set r0 = 0 for use in the next two instructions
std 0, 8(%2) // function_desc[1] = 0
std 0, 16(%2) // function_desc[2] = 0
mr %0, %2 // fn = function_desc
```
Acknowledged, @verw...@chromium.org any other questions on this?
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
__thread bool TrapHandlerGuard::is_active_ = 0;AbdirahimWhy is this needed?
Toon VerwaestThis is needed because clang complains that is_active is not initialized. on AIX is_active is a __thread.
Korinne AdlerMmh ok weird. Sounds like this might be a linker issue; while `thread_local` should actually be supported by clang?
In trap-handler-internal.h, `is_active_` is declared as a `__thread` on AIX and links to a GCC bug report to explain why. However, the definition of the variable here still used `thread_local`, which while apparently acceptable on GCC gave a compile error on Clang.
We reached out to the AIX Clang devs to see the recommended way to fix this along with some other TLS-related questions. Hubert Tong (who commented in the GCC bug mentioned previously) told us that using `thread_local` here should be fine, but that it incurs extra overhead over using `__thread`. Since we still need to support GCC for now, it's easier to leave it consistent for AIX between GCC and Clang. Once we drop GCC we should be able to remove the ifdefs.
__ bind(&no_stack_change);AbdirahimThis label could move into the branch above?
Milad FarazmandI think we might be able to. @mfar...@ibm.com do you think this would be ok?
Yes this should be fine.
I'm working on this change now and will reply here once its uploaded.
__ bind(&no_stack_change);AbdirahimThis label could move into the branch above?
Milad FarazmandI think we might be able to. @mfar...@ibm.com do you think this would be ok?
AbdirahimYes this should be fine.
I'm working on this change now and will reply here once its uploaded.
I don't quite understand the change I need to resolve the orignal question.
@verw...@chromium.org Can you please provide code suggestion on how this would look?
I could do something like this but I don't see the benefit:
```
if (needs_return_buffer) {
Label no_stack_change, done;
if (switch_to_central_stack) {
__ CmpU64(kOldSPRegister, Operand(0), r0);
__ beq(&no_stack_change);__ addi(r3, kOldSPRegister,
Operand((kStackFrameExtraParamSlot + 1) * kSystemPointerSize));
__ b(&done);
// Moved inside this block,
__ bind(&no_stack_change);
__ addi(r3, sp,
Operand((kStackFrameExtraParamSlot + 1) * kSystemPointerSize));
} else {
__ bind(&no_stack_change);
__ addi(r3, sp,
Operand((kStackFrameExtraParamSlot + 1) * kSystemPointerSize));
}
__ bind(&done);
__ LoadU64(r4, MemOperand(r3, kSystemPointerSize));
__ LoadU64(r3, MemOperand(r3));
}
```
@verw...@chromium.org polite ping to request apply on the above comment.
*reply
Instead of
```
if (needs_return_buffer) {
Label no_stack_change, done;
if (switch_to_central_stack) {
__ CmpU64(kOldSPRegister, Operand(0), r0);
__ beq(&no_stack_change);
__ addi(r3, kOldSPRegister,
Operand((kStackFrameExtraParamSlot + 1) * kSystemPointerSize));
__ b(&done);
}
__ bind(&no_stack_change);
```
just do
```
if (needs_return_buffer) {
Label done;
if (switch_to_central_stack) {
Label no_stack_change;
__ CmpU64(kOldSPRegister, Operand(0), r0);
__ beq(&no_stack_change);
__ addi(r3, kOldSPRegister,
Operand((kStackFrameExtraParamSlot + 1) * kSystemPointerSize));
__ b(&done);
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Code-Review | +1 |
LGTM for trap-handler. You might want to split that into a separate CL to land sooner.
void* function_desc[3];Korinne AdlerIs there no way to achieve this without handwritten assembly?
Toon VerwaestWe tried multiple ways to do this without, but Clang kept optimizing the descriptor setup away since casting to it is UB. Doing it directly in assembly assures that we get the behavior we need without the optimizer getting in our way.
Acknowledged
Have you tried declaring fn itself as volatile? e.g.:
```
Signature* volatile fn = reinterpret_cast<Signature*>(function_desc);
```
Or have you tried putting something like DoNotOptimize(&fn) just before the call to fn, where DoNotOptimize() is defined like this:
```
template <typename Type>
inline void DoNotOptimize(const Type& value) {
// The "memory" constraint tells the compiler that the inline assembly
// must be assumed to access memory that |value| points to.
asm volatile("" : : "g"(value) : "memory");
}
```
I suspect that might get the same effect as your assembly code without having to use actual assembly!
Sorry if those are obvious and you've tried them! Feel free to resolve this comment if those suggestions don't work.
I would also recommend filing an issue against Clang and linking to it here. I am not familiar with AIX function descriptors at all but it seems like there should be ways to stop Clang from optimising things away as undefined behaviour.
__thread bool TrapHandlerGuard::is_active_ = 0;AbdirahimWhy is this needed?
Toon VerwaestThis is needed because clang complains that is_active is not initialized. on AIX is_active is a __thread.
Korinne AdlerMmh ok weird. Sounds like this might be a linker issue; while `thread_local` should actually be supported by clang?
AbdirahimIn trap-handler-internal.h, `is_active_` is declared as a `__thread` on AIX and links to a GCC bug report to explain why. However, the definition of the variable here still used `thread_local`, which while apparently acceptable on GCC gave a compile error on Clang.
We reached out to the AIX Clang devs to see the recommended way to fix this along with some other TLS-related questions. Hubert Tong (who commented in the GCC bug mentioned previously) told us that using `thread_local` here should be fine, but that it incurs extra overhead over using `__thread`. Since we still need to support GCC for now, it's easier to leave it consistent for AIX between GCC and Clang. Once we drop GCC we should be able to remove the ifdefs.
Acknowledged, @verw...@chromium.org any other questions on this?
In trap-handler-internal.h, is_active_ is defined as a __thread on AIX and links to a GCC bug report to explain why.
Makes sense, LGTM on this.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Ok I Think I better understand you suggestion now. I was a bit confused as well but I can see that you were suggesting moving the no_stack_change label to the branch where it gets bound to.
So is this a thing you'll still address? I'm confused about the state.
void* function_desc[3];Korinne AdlerIs there no way to achieve this without handwritten assembly?
Toon VerwaestWe tried multiple ways to do this without, but Clang kept optimizing the descriptor setup away since casting to it is UB. Doing it directly in assembly assures that we get the behavior we need without the optimizer getting in our way.
Mark SeabornAcknowledged
Have you tried declaring fn itself as volatile? e.g.:
```
Signature* volatile fn = reinterpret_cast<Signature*>(function_desc);
```Or have you tried putting something like DoNotOptimize(&fn) just before the call to fn, where DoNotOptimize() is defined like this:
```
template <typename Type>
inline void DoNotOptimize(const Type& value) {
// The "memory" constraint tells the compiler that the inline assembly
// must be assumed to access memory that |value| points to.
asm volatile("" : : "g"(value) : "memory");
}
```I suspect that might get the same effect as your assembly code without having to use actual assembly!
Sorry if those are obvious and you've tried them! Feel free to resolve this comment if those suggestions don't work.
I would also recommend filing an issue against Clang and linking to it here. I am not familiar with AIX function descriptors at all but it seems like there should be ways to stop Clang from optimising things away as undefined behaviour.
If any of Mark's suggestions work that would absolutely look better.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Acknowledged, I've made the change and tested it and things are still working. Thanks!
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
We've tried all sorts of combinations of using volatile before we went down the path of using inline asm. Sometimes things would not get optimized away and other times tests failed due to optimizations happening. We've come to the conclusion that using the inline asm is the only fool proof solution.
Korinne AdlerIs there no way to achieve this without handwritten assembly?
Toon VerwaestWe tried multiple ways to do this without, but Clang kept optimizing the descriptor setup away since casting to it is UB. Doing it directly in assembly assures that we get the behavior we need without the optimizer getting in our way.
Mark SeabornAcknowledged
Toon VerwaestHave you tried declaring fn itself as volatile? e.g.:
```
Signature* volatile fn = reinterpret_cast<Signature*>(function_desc);
```Or have you tried putting something like DoNotOptimize(&fn) just before the call to fn, where DoNotOptimize() is defined like this:
```
template <typename Type>
inline void DoNotOptimize(const Type& value) {
// The "memory" constraint tells the compiler that the inline assembly
// must be assumed to access memory that |value| points to.
asm volatile("" : : "g"(value) : "memory");
}
```I suspect that might get the same effect as your assembly code without having to use actual assembly!
Sorry if those are obvious and you've tried them! Feel free to resolve this comment if those suggestions don't work.
I would also recommend filing an issue against Clang and linking to it here. I am not familiar with AIX function descriptors at all but it seems like there should be ways to stop Clang from optimising things away as undefined behaviour.
If any of Mark's suggestions work that would absolutely look better.
Acknowledged
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
@msea...@chromium.org @verw...@chromium.org
Is there anything else we need to do to get this merged?
| Code-Review | +1 |
struct __attribute__((packed)) EffectDimensions {I'm not happy about these "packed" definitions leaking outside of AIX-define guarded code, but I can't think of an alternative which stills allows you to proceed with this CL. Please file a bug on Clang to make this configurable at a more granular level than `__attribute__((packed))`
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
struct __attribute__((packed)) EffectDimensions {I'm not happy about these "packed" definitions leaking outside of AIX-define guarded code, but I can't think of an alternative which stills allows you to proceed with this CL. Please file a bug on Clang to make this configurable at a more granular level than `__attribute__((packed))`
OK I will open a bug report with Clang to see if what the options are to make this configurable.
struct __attribute__((packed)) EffectDimensions {AbdirahimI'm not happy about these "packed" definitions leaking outside of AIX-define guarded code, but I can't think of an alternative which stills allows you to proceed with this CL. Please file a bug on Clang to make this configurable at a more granular level than `__attribute__((packed))`
OK I will open a bug report with Clang to see if what the options are to make this configurable.
Acknowledged