Networking overhead… vlan routing perhaps; 1) with either adding an extra network device hop and latency from a network device/router or 2) overburdened switch handling the routing itself still introducing network latency. Latency is the storage and network i/o bandwidth killer.
I’m willing to bet two things:
1) changing your stripe size from 2 to 1 will make similar bandwidth results to the diagram 2 [54.3MB/s], even if the layout is as diagram 1 [separate nets].
2) If all your OSS/MDS and Clients nodes were in the same single vlan network…you’d see better performance than diagram’s 2 54.3MB/sec bandwidth throughput.
So, drop classful subnets…go with cidr / supernetting networks to get the ip spaces you need and drop the extra routing latency.
Regards,
Charles
--
===========================================
Charles Hammitt
Storage Systems Specialist
ITS Research Computing @
The University of North Carolina-CH
===========================================
I’d take a look at how the switch ASICs line up with the connections in the system to see if one set is more burdened than another perhaps…
And;
If routing is not handled by the switch, I’d see if there are issues with the way routing is working for the different networks, perhaps there is a different path or similar ASIC or other network performance / congestion problem. Simple network trace route might help flush out some questions; and a conversation with your networking team tracing cables and looking at interface stats.
Regards,
Charles
> 0) Use 3 subnets to assign the 3 nodes.
> 1) Run "netperf" in the two OSS separately, run "netserver" in
> "client";this step could simulate the networking scenario:
> "client" reads data from two OSS, but here is no disk i/o or
> other r/w;
It might be useful for you to see what is the OSS-OSS transfer
rate too, and the transfer rate in the client->OSS direction is
too, but since this is purely a networking issue it is a bit
offtopic here.
> 2) two OSS netperf's results are about 200 M/s, totally are
> 400M/s. so low - -!
I usually prefer to use 'nuttcp' for this, it is far more
convenient to run.
> 3) run only netperf at one OSS, the test result is 950M/s..
> this res is ok.
That is unusually good.
> 4) All the upper steps prove that, the networking is the
> bottleneck of the read performance.
> When 2 NODEs send TCP stream at the same time, and only 1 NODE
> recv TCP stream. The total throughput is half of normal value.
Well, if you introduce routing, and your router is not perhaps
fully non-blocking 10Gb/s, or it has insufficient or excessive
buffering, or it subtly changes latency patterns, that's pretty
unexeceptional. A lot of studying 'wireshark' traces will show
which particular limitation applies. For example enabling TSO
can give very bad results with some NICs.
But when setting up Lustre usually one tries to engineer the
simplest/best networking case, not a more complicated one.
> so oddball.. What induced that? Thanks a lot
Q: "Doctor, if I stab my hand with a fork it really hurts".
A: "Don't do it".
:-)
_______________________________________________
Lustre-discuss mailing list
Lustre-...@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss