Anton Ertl wrote:
> Robert Wessel <
robert...@yahoo.com> writes:
>>> If the program knows it, it can use a load instruction, and there
>>> would not have been a need for a prefetch instruction. The prefetch
>>> instructions are there for cases where the program does not know, but
>>> is pretty sure about needing the data. That's why prefetch
>>> instructions don't produce exceptions.
>>
>> Well, only if I have an (architectural) register free to use as the
>> target of the load. In ideal* cases you can issue prefetches hundreds
>> of cycles in advance (if, for example, you can predict misses to main
>> memory), you'd need a register free for those hundreds of cycles.
>
> The compiler can choose any free architectural register a target of
> the intended-to-be-a-known-good prefetch load, and then continue to
> treat it as free register. If there is no free register available
> right now, this load can be inserted at the next place when there is a
> free register (which is typically a few instructions later).
>
> So that's not a reason for introducing prefetch instructions in the
> architecture.
>
> - anton
There are a number of reasons:
1) Intel has now added PREFETCHW Prefetch-For-Write instruction
which was on AMD but was a noop on Intel. PrefetchW cannot be
simulated with a bogus store because it would change memory.
By prefetching for write it loads the cache line exclusive,
skipping first going to the shared state then upgrading later,
saving delays and bus traffic.
2) Intel call using loads to prefetch "preloading".
According to the Intel Optimization manual, prefetch seems
to act similar to Async I/O in that it queues the request but
does not wait for completion, and the prefetch instruction
can retire before the cache line arrives.
Preload would stall its retire until the data arrived.
3) The Intel Optimization manual 7.4.3 says in part:
"Currently, PREFETCH provides greater performance than preloading because:
- Has no destination register, it only updates cache lines.
- Does not stall the normal instruction retirement.
- Does not affect the functional behavior of the program.
- Has no cache line split accesses.
- Does not complete its own execution if that would cause a fault.
Currently, the advantage of PREFETCH over preloading instructions
are processor-specific. This may change in the future.
There are cases where a PREFETCH will not perform the data prefetch.
These include:
- PREFETCH causes a DTLB (Data Translation Lookaside Buffer) miss.
This applies to Pentium 4 processors with CPUID signature (blah blah...)
- An access to the specified address that causes a fault/exception.
- If the memory subsystem runs out of request buffers between the
first-level cache and the second-level cache.
- PREFETCH targets an uncacheable memory region (for example, USWC and UC).
"
Eric