Hello...
Bruce Hoult wrote
> TSO can be easier to program. I'd argue that it sets
> lower limits on the number of cores you can practically
> have, and how tightly they must be coupled. It seems to
> work ok out to at least 32 cores, or 64 cores.
> A thousand might be a different matter.
ARM cores for cloud & server workloads? If performance suffering of 3% ~
6% is acceptable, TSO could be considered as an alternative for its
simpler programming model.
About SC and TSO and RMO hardware memory models..
I have just read the following webpage about the performance difference
between: SC and TSO and RMO hardware memory models
I think TSO is better, it is just around 3% ~ 6% less performance
than RMO and it is a simpler programming model than RMO. So i think ARM
must support TSO to be compatible with x86 that is TSO.
Read more here to notice it:
https://infoscience.epfl.ch/record/201695/files/CS471_proj_slides_Tao_Marc_2011_1222_1.pdf
About memory models and sequential consistency:
As you have noticed i am working with x86 architecture..
Even though x86 gives up on sequential consistency, it’s among the most
well-behaved architectures in terms of the crazy behaviors it allows.
Most other architectures implement even weaker memory models.
ARM memory model is notoriously underspecified, but is essentially a
form of weak ordering, which provides very few guarantees. Weak ordering
allows almost any operation to be reordered, which enables a variety of
hardware optimizations but is also a nightmare to program at the lowest
levels.
Read more here:
https://homes.cs.washington.edu/~bornholt/post/memory-models.html
Memory Models: x86 is TSO, TSO is Good
Essentially, the conclusion is that x86 in practice implements the old
SPARC TSO memory model.
The big take-away from the talk for me is that it confirms the
observation made may times before that SPARC TSO seems to be the optimal
memory model. It is sufficiently understandable that programmers can
write correct code without having barriers everywhere. It is
sufficiently weak that you can build fast hardware implementation that
can scale to big machines.
Read more here:
https://jakob.engbloms.se/archives/1435
Thank you,
Amine Moulay Ramdane.