The JMM is defined in terms of synchronization edges that create a happens-before edge - it doesn't discuss fences, which is an implementation detail that may be stronger (but not weaker) than what the model prescribes.
It's possible some (weak memory model) architectures don't necessarily order all load/store ops globally, like x86 (or other TSO archs). For example, I hear AArch64 is a bit funky (I can elaborate a bit more if interested).
There's also the aspect that compiler could, in theory, detect that the two threads don't sync with respect to each other, and reorder code. Now, real
compilers don't have such visibility, but in the model they could.
--
Sent from my phone