@NTB:
We used it more than 12+ years ago but in a mirrored mode. So only 2 nodes for HA failover. PLX were the only guys who had a (sort of)working embedded switch at the time. Supermicro's bridge-in-a-bay offering might have NTB connectivity (not sure though).
Latency(at a very high level): fabric-latency + Remote-DRAM (Rd/Wr)access cost.
fabric-latency : this will be PCIe - depends what you have gen2/gen3 etc but in nsecs.
Remote-DRAM access cost : nsecs.
This is the lowest latency interconnect that we could build. But the servers had to be co-located physically close to each other because of the cable length limitations.
To fanout (>2), it might be better to go via an external PCIe switch.
Linux's NTB support was merged only 3-4 years ago. You should be able to find sample code too in the linux kernel repo. You can use the sample code to create a channel between two peers and start sending/receiving IO.
Chetan