i4 bandwidth ~5 GBps to main memory, compare with AlphaServer GS1280
EV7 at ~12.8 GBps.
i4 latency is between ~100 and ~200 ns, compare EV7 average latency at
~260 ns, with a range of 83 ns to 390 ns for the furthest memory in the
torus.
Processor caches and the associated cache latency and bandwidth and
cache sizes play a huge factor here for aggregate application
performance, as main memory is much slower.
If/when the processors and servers arrive, the Kittson improvements
will be interesting. Links to some of the internal documents and
funding information are in the Wikipedia Itanium article
<
https://en.wikipedia.org/wiki/Itanium>, for folks interested in a
refresher around those details.
For those that want a refresher on Poulson itself, see
<
http://www.realworldtech.com/poulson/>
Did a quick look for Skylake references, and there's very little data
posted. For DDR4 via a recent top-end i7, looks like ~50 GBps, and
latencies ~25 ns.
<
http://www.tomshardware.com/reviews/adata-xpg-z1-crucial-ddr4-x99,4007-5.html>
(This data is single-socket, and which avoids the overhead of
coordinating across multiple sockets. Poulson was released ~3 years
ago, so direct comparisons with a more recent Broadwell and
just-released Skylake design are unfair.)
While there are a number of applications that are memory-bound (and the
above data does not address the cache latencies), the newer i4 and
x86-64 boxes are going to be much better at I/O with the PCIe updates,
and with faster controllers.
It's the application improvements and/or the box size reductions and/or
the hardware maintenance support cost reductions that really matter
here, though. Not generic benchmarks.
--
Pure Personal Opinion | HoffmanLabs LLC