In Martin Thompson's talks about Aeron he mentions writing threads doing an increment on an AtomicLong counter to claim a region in a buffer, but the initial 4 bytes for the length field aren't written, but only on completion of the frame, the length is set. This signals to the reader of the buffer that this particular write is complete and provides a happens before relation.
My question is about this length field; who is responsible for zero'ing it out? Once the buffer has been written, the content could be total gibberish and if it isn't zero'd, the reading thread could falsely assume it is written and boom..
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
A related question. What happens if 2 threads do a plain write in the same cache line but independent locations.
If this happens concurrently, can the system run into a 'lost update'? I'm sure it can't and I guess the cache coherence protocol takes care it. But would like to get confirmation anyway.
Switching topics slightly, prefetch extending the effective cache
line size was causing us some consternation, since we were never
able to find where it was documented. Do you have a reference to
it? When did it start happening?
It seems like it invalidates all software that was carefully written to honor 64 byte cache lines.
IIRC Pentium 4 had 128 byte "sectors", but it was never fully
explained what these were, and the word died with the P4.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
Switching topics slightly, prefetch extending the effective cache line size was causing us some consternation, since we were never able to find where it was documented. Do you have a reference to it? When did it start happening?
It seems like it invalidates all software that was carefully written to honor 64 byte cache lines.
IIRC Pentium 4 had 128 byte "sectors", but it was never fully explained what these were, and the word died with the P4.
--
Hi Francesco,
About your questions on prefetchers:
- Prefetchers normally kick in only after multiple cache lines in a specific pattern have been accessed. So I wouldn't worry too much for a single cache line.
- Prefetchers tend to only read lines, so they by itself cannot cause additional classic false sharing (but may cause additional aborts on TSX).
- The same is true for speculative execution. You have more to fight than just prefetching; speculative execution tends to pull in lots of data early. You can assume the cpu runs 150+ instructions ahead specualtively, if not more.
You can always test by enabling/disabling the prefetchers:
- There shouldn't be an automatic "get the next line" as much as there are pattern recognizers, and if there's a sequential pattern, the next lines will be prefeteched. it's not unconditional.
wrmsr -a 0x1a4 0xf // to disable
wrmsr -a 0x1a4 0x0 // to enable
See https://software.intel.com/en-us/articles/disclosure-of-hw-prefetcher-control-on-some-intel-processors for more info.
The wrmsr tool is available at: https://01.org/msr-tools/overview
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.