Re: purpose of an LFENCE

530 views
Skip to first unread message

Martin Thompson

unread,
Oct 4, 2019, 10:45:07 AM10/4/19
to mechanica...@googlegroups.com

See 3-529. 

On Fri, 4 Oct 2019 at 15:10, Peter Veentjer <alarm...@gmail.com> wrote:
I'm have been checking out the new fence API's in Java (Unsafe/VarHandle).

I understand how the higher level API are translated to the logical fences. E.g. release fence -> LoadStore+StoreStore. There are some great post including

Great explanation how a release fence needs to be combined with a StoreLoad to preserve sequential consistency

Also this post is great on the topic:

When I zoom into hardware things are a bit more blurry.

X86 provides the following guarantees:
Loads won't be reordered with older loads   [LoadLoad]
Stores won't be reordered with older stores (TSO) [StoreStore]
Stores won't be reordered with older loads [LoadStore]

One fundamental fence is the MFENCE because it will provide StoreLoad semantics. And on X86 the Unsafe.fullFence can be compiled to a MFENCE (in practice it uses the lock addl ...  but that isn't relevant for this discussion). This will prevent stores to be reordered with older stores and will make sure the memory is visible to other CPU's (by waiting for the store buffer to be drained).

The SFENCE was a bit more obscure to be because X86 proves TSO; so what is the point of adding a [StoreStore] fence is the platform provides it out of the box (so prevents stores to be reordered with older stores). Apparently there are certain instructions like those of SSE that are weakly ordered and these need to have this SFENCE. Ok. I can live with that.

But the LFENCE I can't place. Initially I thought it would provide a similar fix as the SFENCE; so prevent load load reordering for weakly ordered instructions like those of SSE. But apparently the LFENCE is a very different beast.

Could someone shed some light on the purpose of the LFENCE?

Francesco Nigro

unread,
Oct 4, 2019, 11:01:01 AM10/4/19
to mechanical-sympathy
One of the rare cases where it makes sense to read stack-overflow (just joking :P): see https://stackoverflow.com/questions/37452772/x86-64-usage-of-lfence?rq=1

Vitaly Davidovich

unread,
Oct 8, 2019, 10:12:53 PM10/8/19
to mechanica...@googlegroups.com
FWIW, I’ve only seen lfence used precisely in the 2 cases mentioned in this thread:
1) use of non-temporal loads (ie weak ordering, normal x86 guarantees go out the window)
2) controlling execution of non-serializing instructions like rdtsc

I’d be curious myself to hear of other cases.

On Fri, Oct 4, 2019 at 10:10 AM Peter Veentjer <alarm...@gmail.com> wrote:
I'm have been checking out the new fence API's in Java (Unsafe/VarHandle).

I understand how the higher level API are translated to the logical fences. E.g. release fence -> LoadStore+StoreStore. There are some great post including

Great explanation how a release fence needs to be combined with a StoreLoad to preserve sequential consistency

Also this post is great on the topic:

When I zoom into hardware things are a bit more blurry.

X86 provides the following guarantees:
Loads won't be reordered with older loads   [LoadLoad]
Stores won't be reordered with older stores (TSO) [StoreStore]
Stores won't be reordered with older loads [LoadStore]

One fundamental fence is the MFENCE because it will provide StoreLoad semantics. And on X86 the Unsafe.fullFence can be compiled to a MFENCE (in practice it uses the lock addl ...  but that isn't relevant for this discussion). This will prevent stores to be reordered with older stores and will make sure the memory is visible to other CPU's (by waiting for the store buffer to be drained).
I think you meant “prevent stores to be reordered with *later loads*”.  In fact, awaiting store buffer drain is how it prevents the later load from reordering with an earlier store - the load can’t retire (maybe not even issue) while the store is sitting in the buffer (which would cause the load-before-store reordering to be observed).

The SFENCE was a bit more obscure to be because X86 proves TSO; so what is the point of adding a [StoreStore] fence is the platform provides it out of the box (so prevents stores to be reordered with older stores). Apparently there are certain instructions like those of SSE that are weakly ordered and these need to have this SFENCE. Ok. I can live with that.

But the LFENCE I can't place. Initially I thought it would provide a similar fix as the SFENCE; so prevent load load reordering for weakly ordered instructions like those of SSE. But apparently the LFENCE is a very different beast.

Could someone shed some light on the purpose of the LFENCE?

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/mechanical-sympathy/52527501-bffd-4a82-96fa-3fa618bec111%40googlegroups.com.
--
Sent from my phone

Peter Veentjer

unread,
Oct 10, 2019, 5:18:15 AM10/10/19
to mechanical-sympathy


On Wednesday, October 9, 2019 at 5:12:53 AM UTC+3, Vitaly Davidovich wrote:
FWIW, I’ve only seen lfence used precisely in the 2 cases mentioned in this thread:
1) use of non-temporal loads (ie weak ordering, normal x86 guarantees go out the window)
2) controlling execution of non-serializing instructions like rdtsc

I’d be curious myself to hear of other cases.

Same here. It is the reason I'm asking the question: what is the purpose of the LFENCE (apart from #1)

I checked the Intel Manual; but I could not make a lot of sense under which condition #2 would be needed.

 

On Fri, Oct 4, 2019 at 10:10 AM Peter Veentjer <alarm...@gmail.com> wrote:
I'm have been checking out the new fence API's in Java (Unsafe/VarHandle).

I understand how the higher level API are translated to the logical fences. E.g. release fence -> LoadStore+StoreStore. There are some great post including

Great explanation how a release fence needs to be combined with a StoreLoad to preserve sequential consistency

Also this post is great on the topic:

When I zoom into hardware things are a bit more blurry.

X86 provides the following guarantees:
Loads won't be reordered with older loads   [LoadLoad]
Stores won't be reordered with older stores (TSO) [StoreStore]
Stores won't be reordered with older loads [LoadStore]

One fundamental fence is the MFENCE because it will provide StoreLoad semantics. And on X86 the Unsafe.fullFence can be compiled to a MFENCE (in practice it uses the lock addl ...  but that isn't relevant for this discussion). This will prevent stores to be reordered with older stores and will make sure the memory is visible to other CPU's (by waiting for the store buffer to be drained).
I think you meant “prevent stores to be reordered with *later loads*”. 

You are completely right. I should have checked my message more carefully.

 
In fact, awaiting store buffer drain is how it prevents the later load from reordering with an earlier store - the load can’t retire (maybe not even issue) while the store is sitting in the buffer (which would cause the load-before-store reordering to be observed).

The SFENCE was a bit more obscure to be because X86 proves TSO; so what is the point of adding a [StoreStore] fence is the platform provides it out of the box (so prevents stores to be reordered with older stores). Apparently there are certain instructions like those of SSE that are weakly ordered and these need to have this SFENCE. Ok. I can live with that.

But the LFENCE I can't place. Initially I thought it would provide a similar fix as the SFENCE; so prevent load load reordering for weakly ordered instructions like those of SSE. But apparently the LFENCE is a very different beast.

Could someone shed some light on the purpose of the LFENCE?

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

Francesco Nigro

unread,
Oct 10, 2019, 5:38:01 AM10/10/19
to mechanica...@googlegroups.com
RDTSC measurements that surround the instructions being measured need to be serialized together with them, regardless those instructions are temporal or not . They don't need to wait the store buffer to be drained, so no expensive mfence is needed just to ensure locally to not be reordered with the measured ones.
That's what I've understood at least, hope will help

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
--
Sent from my phone

--
You received this message because you are subscribed to a topic in the Google Groups "mechanical-sympathy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mechanical-sympathy/BWYEfPKJeGQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mechanical-symp...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/mechanical-sympathy/b0441369-9342-4c6a-bd72-4c1537f16d0e%40googlegroups.com.

Avi Kivity

unread,
Oct 10, 2019, 9:49:10 AM10/10/19
to mechanica...@googlegroups.com, Vitaly Davidovich
Reply all
Reply to author
Forward
0 new messages