we have setup of OSS and some clients with a dual Gigabit
trunk (miimon=100 mode=802.3ad xmit_hash_policy=layer3+4).
If the clients stripe over targets on different OSS, they see
a dual link bandwidth. If however, they stripe over targets on
the same OSS, they only get the bandwith of one link.
If I would attach the OSS with a single 10GbE link, could
a client then use the second link, when striping over targets
on same OSS?
Regards, Ralf
--
Ralf Utermann
_____________________________________________________________________
Universität Augsburg, Institut für Physik -- EDV-Betreuer
Universitätsstr.1
D-86135 Augsburg Phone: +49-821-598-3231
SMTP: Ralf.U...@Physik.Uni-Augsburg.DE Fax: -3411
_______________________________________________
Lustre-discuss mailing list
Lustre-...@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss
If I understand it correctly, xmit_hash_policy=layer3+4 would not
allow a single TCP connection to span multiple slaves.
> If the clients stripe over targets on different OSS, they see
> a dual link bandwidth. If however, they stripe over targets on
> the same OSS, they only get the bandwith of one link.
Each client would create three TCP connections to an OSS, one for
exchanging small control messages, one for incoming bulk messages, and
one for outgoing bulk messages. The control connection could be
ignored for bandwidth considerations. When you're reading, only the
incoming bulk connection on the client is in use, and when writing the
outgoing bulk connection in use. Therefore, for read or write to a
same server, any client would utilize only one of its slaves. I'd believe
that you'd probably see better aggregate bandwidth when doing read and
write simultaneously - the incoming and outgoing bulk connections
should have different source ports and therefore they should be using
different slaves.
> If I would attach the OSS with a single 10GbE link, could
> a client then use the second link, when striping over targets
> on same OSS?
There's a rather complex way of static configuration to allow for
better overall bandwidth (though between any single client and server
there's still one link in use):
http://manual.lustre.org/manual/LustreManual16_HTML/MoreComplicatedConfigurations.html#50401393_pgfId-1287958
Thanks,
Isaac
As an alternative, you might try the ksocklnd bonding on clients and
servers, e.g.:
options ksocklnd networks="tcp0(eth0, eth1)"
Then ksocklnd would create two sets of connections (control, bulk in,
and bulk out) and balance traffic over them.
The downside is that it might take a long time for ksocklnd to notice
a downed NIC and avoid it, and when the downed NIC comes back to life
later the ksocklnd might not be able to use it again.
Two more gotchas for ksocklnd bonding:
1. IP routing ultimately determines outgoing interfaces and must be
configured properly. For example, if both eth0 and eth1 of clients and
servers belong to a same IP subnet, all outgoing packets might be sent
by a same NIC because the destination IP addresses, though different,
belong to a same destination IP network.
2. All incoming messages might arrive on a same NIC. Please refer to
linux-*/Documentation/networking/ip-sysctl.txt for arp_ignore.
The specification for 802.3ad does not permit the striping of a single data
path across multiple links, i.e. a single TCP/UDP conversation takes place
with a lone physical interface, the TCP/IP stack does not split it apart so
it can use multiple paths.
If you use a single 10GigE link instead multiple GigE (assuming you have a
fast performing NIC that supports RDMA), you would see Gbit+ throughput for
a single conversation. However, both peers would need to be using 10GigE
NICs.
LACP bonding only provides more aggregate bandwidth over a given link, it
does not double (triple, quadruple, etc.) to a single thread of
communication without some application-specific or hardware-specific
optimizations.
hth,
Klaus
On 7/7/09 6:44 AM, "Ralf Utermann" <ralf.u...@physik.uni-augsburg.de>
etched on stone tablets:
> Dear list,
>
> we have setup of OSS and some clients with a dual Gigabit
> trunk (miimon=100 mode=802.3ad xmit_hash_policy=layer3+4).
> If the clients stripe over targets on different OSS, they see
> a dual link bandwidth. If however, they stripe over targets on
> the same OSS, they only get the bandwith of one link.
>
> If I would attach the OSS with a single 10GbE link, could
> a client then use the second link, when striping over targets
> on same OSS?
>
> Regards, Ralf
_______________________________________________