But can't a number of other hardware traps be used instead https://wiki.osdev.org/Exceptions ?
Not sure if a conditional jump to a trapping debug instruction would be slow or not.
Also why not read from a pointer to a page instead of reading directly? That way only a an atomic write to a pointer is needed instead of a whole memory protection syscall.
Also an atomicly written boolean flag is only one of many possible types of branches.
You could have an indirect call or possibly jump and just overwrite a function pointer to install your handler for example.
The standard way of doing things seems pretty sensible but it's just I've never actually seen a concrete comparison.
Basically in pseudocode why
void poll() {
page[0];
}
void trap() {
mprotect(page, 0);
}
over
void poll() {
page[0];
}
void trap() {
page = 0;
}
or
void poll() {
1 / bit;
}
void trap() {
bit = 0;
}
And why
void poll() {
if (flag) dostuff();
}
over
void poll() {
f();
}
void trap() {
f = handler;
}
Or
void poll() {
if (flag) __builtin_trao();
}
Probably makes sense to do safepoint polls the standard way but I've never seen a detailed comparison exactly why.
CPUs will never predict that fault will be taken (that would be silly) so no branch prediction state is needed for this. In contrast, conditional branches use the branch prediction logic and state, and both contaminate it and and are sensitive to it...
This is an evolving and ever-explored field...The "current" (and in typically used in production 8 and 11) versions of OpenJDK and HotSpot performs safepoint as an all-or-nothing, Stop-The-World (STW) event. Since the frequency of STW pauses will generally tend to be low (for obvious reasons, if it were high, you'd be facing much bigger complaints), the likelihood of a safepoint poll actually triggering will be VERY low (think "1 in a billion" or less for a practical order of magnitude feel). As such, code that accepts STW pauses tends to be optimized for trhe NOT triggering case, and a "blind load" from a potentially protected location has bee (empirically chosen as) the fastest way to perform the poll.
A way to look at a read from a potentially protected page is as1) An implicit "predicted not taken" form of a checkCPUs will never predict that fault will be taken (that would be silly) so no branch prediction state is needed for this.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/mechanical-sympathy/8cd7f05d-def0-4f9c-8fff-e54ca39312f2%40googlegroups.com.
The theoretically fastest way of safepoint polling is inserting a trap instruction. But icache overheads dominate. If the icache is based on physical addresses and not virtual ones then it should be possible to remap the page without doing icache synchronization.
You should be able to have very fast safepoints by remapping the page.
But I'm not sure there's a fast way to do a call/return from thread local storage. And a call/return from a constant page still might not be faster than just a load.
TLDR:
Limited self modifying code without icache syncing stuff could be possible with memory management tricks as long as the icache and other stuff is based on physical addresses.
On 29 May 2020, at 17:46, Steven Stewart-Gallus <stevensele...@gmail.com> wrote:Okay I have an idea.
I can't shake the idea you could do fun tricks with thread local executable pages.
The theoretically fastest way of safepoint polling is inserting a trap instruction.
But icache overheads dominate. If the icache is based on physical addresses and not virtual ones then it should be possible to remap the page without doing icache synchronization.
You should be able to have very fast safepoints by remapping the page.
But I'm not sure there's a fast way to do a call/return from thread local storage. And a call/return from a constant page still might not be faster than just a load.
TLDR:
Limited self modifying code without icache syncing stuff could be possible with memory management tricks as long as the icache and other stuff is based on physical addresses.