With the advent of main memory becoming the slowest link do you think one day we will have more control on prefetch instructions on the CPU ? Also stop (or lock) the CPU from being cache polluted would be nice ... So far all of this feels like playing Russian roulette !
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Some times more control is not necessarily a good thing. Below is an interesting article about how misuse of prefetch hurt performance in the kernel.
On 21 April 2015 at 10:20, Vitaly Davidovich <vit...@gmail.com> wrote:
There're existing facilities to do both of the things you mention. Some processors have prefetch instructions that software can emit, and there are also non-temporal load/store instructions to bypass cache. Both of these come with a sizable YMMV label. Or did you mean something else?
On Mon, Apr 20, 2015 at 6:07 PM, ymo <ymol...@gmail.com> wrote:
With the advent of main memory becoming the slowest link do you think one day we will have more control on prefetch instructions on the CPU ? Also stop (or lock) the CPU from being cache polluted would be nice ... So far all of this feels like playing Russian roulette !
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
prefetcht0, prefetcht1, prefetcht2, prefetchnta, prefetchw - these are prefetch hint instructions.
movntXXX family of non temporal move instructions for writes that bypass cache.
sent from my phone
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
prefetcht0, prefetcht1, prefetcht2, prefetchnta, prefetchw - these are prefetch hint instructions.
movntXXX family of non temporal move instructions for writes that bypass cache.
sent from my phone
On Apr 20, 2015 10:25 PM, "ymo" <ymol...@gmail.com> wrote:
Ok i have to admit i am only interested in intel for the near (and maybe far) future. Can you elaborate which cpu instructions ( if any on intel ) you are referring to ?
On Monday, April 20, 2015 at 6:20:09 PM UTC-4, Vitaly Davidovich wrote:
There're existing facilities to do both of the things you mention. Some processors have prefetch instructions that software can emit, and there are also non-temporal load/store instructions to bypass cache. Both of these come with a sizable YMMV label. Or did you mean something else?
On Mon, Apr 20, 2015 at 6:07 PM, ymo <ymol...@gmail.com> wrote:
With the advent of main memory becoming the slowest link do you think one day we will have more control on prefetch instructions on the CPU ? Also stop (or lock) the CPU from being cache polluted would be nice ... So far all of this feels like playing Russian roulette !
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
Yeah, I've mostly seen people using compiler intrinsics for issuing prefetch (makes sense from portability perspective). Hotspot JVM actually had prefetch intrinsics in its native Unsafe class, but never exposed them in java. In fact, I think they took it out of the native Unsafe recently, so java isn't going to get the same love here.
sent from my phone
Ulrich Drepper's paper https://people.freebsd.org/~lstewart/articles/cpumemory.pdf has sections on using prefetch as well as non-temporal store/load instructions. He uses intrinsics instead of assembly if I remember. Here is the section on bypassing the cache - http://lwn.net/Articles/255364/
On Monday, April 20, 2015 at 7:40:38 PM UTC-7, Vitaly Davidovich wrote:
prefetcht0, prefetcht1, prefetcht2, prefetchnta, prefetchw - these are prefetch hint instructions.
movntXXX family of non temporal move instructions for writes that bypass cache.
sent from my phone
On Apr 20, 2015 10:25 PM, "ymo" <ymol...@gmail.com> wrote:
Ok i have to admit i am only interested in intel for the near (and maybe far) future. Can you elaborate which cpu instructions ( if any on intel ) you are referring to ?
On Monday, April 20, 2015 at 6:20:09 PM UTC-4, Vitaly Davidovich wrote:
There're existing facilities to do both of the things you mention. Some processors have prefetch instructions that software can emit, and there are also non-temporal load/store instructions to bypass cache. Both of these come with a sizable YMMV label. Or did you mean something else?
On Mon, Apr 20, 2015 at 6:07 PM, ymo <ymol...@gmail.com> wrote:
With the advent of main memory becoming the slowest link do you think one day we will have more control on prefetch instructions on the CPU ? Also stop (or lock) the CPU from being cache polluted would be nice ... So far all of this feels like playing Russian roulette !
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
If you are only interested in Intel processors - then this talk on programming a Xeon processor is fascinating - https://www.youtube.com/watch?v=m9dRPnfKTxsIt showcases the latest technologies in Haswell and how you can use it to reduce latency and increase throughput.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
prefetcht0, prefetcht1, prefetcht2, prefetchnta, prefetchw - these are prefetch hint instructions.
movntXXX family of non temporal move instructions for writes that bypass cache.
sent from my phone
On Apr 20, 2015 10:25 PM, "ymo" <ymol...@gmail.com> wrote:
Ok i have to admit i am only interested in intel for the near (and maybe far) future. Can you elaborate which cpu instructions ( if any on intel ) you are referring to ?
On Monday, April 20, 2015 at 6:20:09 PM UTC-4, Vitaly Davidovich wrote:
There're existing facilities to do both of the things you mention. Some processors have prefetch instructions that software can emit, and there are also non-temporal load/store instructions to bypass cache. Both of these come with a sizable YMMV label. Or did you mean something else?
On Mon, Apr 20, 2015 at 6:07 PM, ymo <ymol...@gmail.com> wrote:
With the advent of main memory becoming the slowest link do you think one day we will have more control on prefetch instructions on the CPU ? Also stop (or lock) the CPU from being cache polluted would be nice ... So far all of this feels like playing Russian roulette !
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
This is not news to you but the software prefetch instructions have major problems so much that everyone is saying dont use them. The major one being prefetch only for reads not writes ... on anything before the phi or Broadwell. Hardware prefetch is like lottery you win or you lose depending on your use case. I asked here before and was told the same )))Now my dream is intel coming up with a way for me to "instruct" the cpu ahead of time which loops i will be doing and then optimize it for me ... I was looking at the recent SPIR-V sepcification for GPUs and just could not stop wonder why the hell same thing cannot be done at the CPU level ? All the loops and data structures are known before the program is loaded. Why cant the CPU and cache work on a higher level like that ???I am sure its easy to say than do .. but one cannot stop but wonder !
On Monday, April 20, 2015 at 10:40:38 PM UTC-4, Vitaly Davidovich wrote:
prefetcht0, prefetcht1, prefetcht2, prefetchnta, prefetchw - these are prefetch hint instructions.
movntXXX family of non temporal move instructions for writes that bypass cache.
sent from my phone
On Apr 20, 2015 10:25 PM, "ymo" <ymol...@gmail.com> wrote:
Ok i have to admit i am only interested in intel for the near (and maybe far) future. Can you elaborate which cpu instructions ( if any on intel ) you are referring to ?
On Monday, April 20, 2015 at 6:20:09 PM UTC-4, Vitaly Davidovich wrote:
There're existing facilities to do both of the things you mention. Some processors have prefetch instructions that software can emit, and there are also non-temporal load/store instructions to bypass cache. Both of these come with a sizable YMMV label. Or did you mean something else?
On Mon, Apr 20, 2015 at 6:07 PM, ymo <ymol...@gmail.com> wrote:
With the advent of main memory becoming the slowest link do you think one day we will have more control on prefetch instructions on the CPU ? Also stop (or lock) the CPU from being cache polluted would be nice ... So far all of this feels like playing Russian roulette !
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
I think the other trick here is prefetch effectiveness will also vary across CPU generations, so it's something that would need to be reevaluated. This is definitely one of those things you measure and only use if it yields significant enough gains that warrant the maintenance headache.What do you mean by CPU optimizing your loops for you?
On Tue, Apr 21, 2015 at 11:25 AM, ymo <ymol...@gmail.com> wrote:
This is not news to you but the software prefetch instructions have major problems so much that everyone is saying dont use them. The major one being prefetch only for reads not writes ... on anything before the phi or Broadwell. Hardware prefetch is like lottery you win or you lose depending on your use case. I asked here before and was told the same )))Now my dream is intel coming up with a way for me to "instruct" the cpu ahead of time which loops i will be doing and then optimize it for me ... I was looking at the recent SPIR-V sepcification for GPUs and just could not stop wonder why the hell same thing cannot be done at the CPU level ? All the loops and data structures are known before the program is loaded. Why cant the CPU and cache work on a higher level like that ???I am sure its easy to say than do .. but one cannot stop but wonder !
On Monday, April 20, 2015 at 10:40:38 PM UTC-4, Vitaly Davidovich wrote:
prefetcht0, prefetcht1, prefetcht2, prefetchnta, prefetchw - these are prefetch hint instructions.
movntXXX family of non temporal move instructions for writes that bypass cache.
sent from my phone
On Apr 20, 2015 10:25 PM, "ymo" <ymol...@gmail.com> wrote:
Ok i have to admit i am only interested in intel for the near (and maybe far) future. Can you elaborate which cpu instructions ( if any on intel ) you are referring to ?
On Monday, April 20, 2015 at 6:20:09 PM UTC-4, Vitaly Davidovich wrote:
There're existing facilities to do both of the things you mention. Some processors have prefetch instructions that software can emit, and there are also non-temporal load/store instructions to bypass cache. Both of these come with a sizable YMMV label. Or did you mean something else?
On Mon, Apr 20, 2015 at 6:07 PM, ymo <ymol...@gmail.com> wrote:
With the advent of main memory becoming the slowest link do you think one day we will have more control on prefetch instructions on the CPU ? Also stop (or lock) the CPU from being cache polluted would be nice ... So far all of this feels like playing Russian roulette !
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
If you know a priori exactly the code path and memory you're going to touch, which is what you're hinting at I think, how does the dynamic "planning" that the cpu does fail? Afterall, it does plan execution via speculation and OoO execution, including loop detection.
sent from my phone
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
Hi,
The army of minions approach works well in the GPU case where the ecpectation is tasks will be mainly maths oriented and you are looking to exploit data parallelism.
Its not so good for the category of applications where you are focused upon performing small amounts of computational work, and doing a lot of messaging or if your main bottleneck is a single threaded network io dispatcher.
regards,
Richard
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
You are right for now because frankly speaking we have not figured how (extreme) data parallelism can be applied to all parts of the application stack. Said in another way at the application level we are still working with an array or structures. I think it is only a matter of time before we figure how to convert the whole application data into structure of arrays that these little minions can feed on. Here is a quote from Mike Acton on data parallelism that i really like "Rule of thumb: Where there is one, there are many. Try looking on the time axis"
It will never be the case that everything can be converted into an embarrassingly parallel problem.
Many things can be converted into a somewhat parallel problem.
Some things can be made embarrassingly parallel.
Some things can never be made even partially parallel.
Also when it comes to copying and zero'ing memory there is likely more to do. Does the compiler generate a "REP MOVSD" in all possible cases on x86, or use the latest XMM features if available? It would also be great to have a assembly instruction that provides a cache line zero'ed without fetching its existing contents from main memory. Gil has pointed out how useful that can be having done it on Vega, i.e. greatly save on memory traffic and object allocation latency.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
Even on single threads we should be able to extract more ILP given the common things we do like copying, searching, sorting, pattern matching. SIMD can provide the parallelism here.
As for speeding up allocations, I believe the C2 compiler issues prefetchnta hints as part of an allocation to prefetch further out in the allocation buffer so that memory is in cache once the *next* allocation is made, which I think is what you're sort of looking for? But zeroing is an interesting topic, which was discussed in this paper: http://users.cecs.anu.edu.au/~steveb/downloads/pdf/zero-oopsla-2011.pdf. There are a couple of styles to choose from, each with its own pros/cons. There's also an optimization in Hotspot that avoids zeroing arrays when the array is filled by user code post-allocation. In addition, I believe you can request a zero'd TLAB (-XX:ZeroTLAB, disabled by default). There're also attempts to minimize field zeroing (-XX:ReduceFieldZeroing, on by default).
--
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
Just thinking other things can come into play these days. Sandy Bridge introduced the "zeroing idioms" support, i.e. xor'ing a value with itself, and since Ivy Bridge those can be zero latency by pre allocating from the register file.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
I like how he says he is only covering the simple or really basic stuff. I my experience only a tiny minority of the development community know any of this.