We have a 128 nodes (8 cores/node) 4x DDR IB cluster with 2:1
oversubscription and I use the IB net for:
- OpenMPI
- Lustre
- Admin (may change in future)
I'am very interested in using IB QoS, as in the near future I'm
deploying ADM processors having then 24 cores /node so I want to put a
barrier to trafic so as no trafic (specially OpenMPI) is starved by
others (specially Lustre I/O). So I read all the documentation I could
get
(http://www.mail-archive.com/lustre-...@lists.lustre.org/msg04092.html was really very helpful)
and made the configuration showed bellow.
I'll really be very grateful if someone in the the list could tell me
his/her opinion on the proposed configuration bellow. Any comment will
be welcomed, even if the whole think is a complete nonsense, as no one
in my zone (as far as I know) is using IB and QoS and is really painful.
Personal doubts:
- Am I taking properly into account 'latency' considerations for ?
- Any need to define 'QoS Switch Port 0 options'?.
- Is it interesting to make a difference for CAs and switches external
ports configuration?
- Not really very important to follow strictly the rule 'the weighting
values for each VL should be multiples of 64', at least in vlarb_high?
- Other 'weights suggested?
Thanks in Advance
----- /etc/opensm/qos-policy.conf --------------------
# SL asignation to Flows. GUIDs are Port GUIDs
qos-ulps
default :0 # default SL (OPENMPI)
any, target-port-guid 0x0002c90200279295 :1 # SL for Lustre MDT
any, target-port-guid 0x0002c9020029fda9,0x0002c90200285ed5 :2
# SL for Lustre OSTs
ipoib :3 # SL for Administration
end-qos-ulps
----- /etc/opensm/opensm.conf -----------------------
#
# QoS OPTIONS
#
# Enable QoS setup
qos FALSE
# QoS policy file to be used
qos_policy_file /etc/opensm/qos-policy.conf
# QoS default options
qos_max_vls 4
qos_high_limit 4
qos_vlarb_high 0:128,1:64,2:0,3:0
qos_vlarb_low 0:192,1:16,2:64,3:8
qos_sl2vl 0,1,2,3,15,15,15,15,15,15,15,15,15,15,15,15
# QoS CA options
qos_max_vls 4
qos_high_limit 4
qos_vlarb_high 0:128,1:64,2:0,3:0
qos_vlarb_low 0:192,1:16,2:64,3:8
qos_sl2vl 0,1,2,3,15,15,15,15,15,15,15,15,15,15,15,15
# QoS Switch Port 0 options
#qos_sw0_max_vls 0
#qos_sw0_high_limit -1
#qos_sw0_vlarb_high (null)
#qos_sw0_vlarb_low (null)
#qos_sw0_sl2vl (null)
# QoS Switch external ports options
qos_swe_max_vls 4
qos_swe_high_limit 255
qos_swe_vlarb_high 0:192,1:16,2:64,3:8
qos_swe_vlarb_low 0:0,1:0,2:0,3:0
qos_swe_sl2vl 0,1,2,3,15,15,15,15,15,15,15,15,15,15,15,15
--
Ramiro Alba
Centre Tecnològic de Tranferència de Calor
http://www.cttc.upc.edu
Escola Tècnica Superior d'Enginyeries
Industrial i Aeronàutica de Terrassa
Colom 11, E-08222, Terrassa, Barcelona, Spain
Tel: (+34) 93 739 86 46
--
Aquest missatge ha estat analitzat per MailScanner
a la cerca de virus i d'altres continguts perillosos,
i es considera que est� net.
My own experience was that Lustre traffic often fell victim of
aggressive MPI behavior, especially during collective communications.
> ----- /etc/opensm/qos-policy.conf --------------------
>
>
> # SL asignation to Flows. GUIDs are Port GUIDs
> qos-ulps
> default :0 # default SL (OPENMPI)
> any, target-port-guid 0x0002c90200279295 :1 # SL for Lustre MDT
> any, target-port-guid 0x0002c9020029fda9,0x0002c90200285ed5 :2
> # SL for Lustre OSTs
> ipoib :3 # SL for Administration
> end-qos-ulps
My understanding is that SL is determined only once for each
connected QP, which Lustre uses, during connection establishment. The
configuration above seemed to me to be able to catch connections from
clients to servers but not the other way. Servers do connect to
clients though that's not the usual case. Moreover, Lustre QPs are
persistent. So you might end up with quite some Lustre QPs in the
default SL. I've never done any IB QoS configuration, but it'd be
good to double check that the config above does catch all connections.
If servers run not just Lustre, it's possible to distinguish ULP
traffic further by the Lustre ServiceID. If servers serve more than
one Lustre file system, you can divide the traffic further by
assigning each file system a different PKey. But it's probably beyond
your concerns.
Cheers,
Isaac
_______________________________________________
Lustre-discuss mailing list
Lustre-...@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss
Ok, but the question is if this unwanted traffic going to default is
meaningful enough. What do you think?
> good to double check that the config above does catch all connections.
>
> If servers run not just Lustre, it's possible to distinguish ULP
> traffic further by the Lustre ServiceID. If servers serve more than
Yes. I saw this possibility a the lustre mailing list:
http://lists.lustre.org/pipermail/lustre-discuss/2009-May/010563.html
but it is said it has a drawback:
..........................................................................
The next step is to tell OpenSM to assign an SL to this service-id.
Here is an extract of our "QoS policy file":
qos-ulps
default : 0
any, service-id=0x.....: 3
end-qos-ulps
The major drawback of this solution is that the modification we made in
the ofa-kernel is not OpenFabrics Alliance compliant, because the
portspace list is defined in the IB standard.
...........................................................................
> one Lustre file system, you can divide the traffic further by
That's not my case at the moment.
> assigning each file system a different PKey. But it's probably beyond
> your concerns.
What do you thing about the 'weights' policy I've suggested in my
configuration?
Thanks for your answer
Kind Regards