The mean end-to-end (from writing to a socket to reading from a socket), round-trip latency across a modern 10G+ switch can be brought down to 30-40usec on modern hardware with relatively low effort and no specialized equipment (e.g.
https://blog.cloudflare.com/how-to-achieve-low-latency/), and can be driven as low as 3-5 usec with specialized hardware and software stacks (kernel bypass, etc) (e.g.
http://www.mellanox.com/related-docs/whitepapers/HP_Mellanox_FSI%20Benchmarking%20Report%20for%2010%20%26%2040GbE.pdf).
A trivial round trip ("what time do you have? [my time is X]" to "My clock shows Y for your request sent at X" [recieved at Z]". would allow you to measure the delta between the perceived wall clock difference between two machines to within the round trip latency. e.g. The difference between the clocks (at the time measured) in the above sequence is known to be (Z-Y) +/- (Z-X). You can use various statistical techniques to more closely estimate the bound when repeating the round trip queries many times and across periods of time. E.g. the amazingly effective techniques used (decades ago) by NTP to synchronize clocks to within milliseconds across wide geographical distances and slow/jittery networks still apply even at low latency scales (e.g. start with something like
http://www.ntp.org/ntpfaq/NTP-s-algo.htm or
https://www.cisco.com/c/en/us/about/press/internet-protocol-journal/back-issues/table-contents-58/154-ntp.html and dig into references if interested).
Keep in mind that at the levels you are looking at clock skew and drift are very real things. And then there is jitter...