Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: [IBMTCP-L] Multiple OSA's

19 views
Skip to first unread message

Grant Taylor

unread,
Jan 16, 2020, 10:22:21 PM1/16/20
to IBMTCP-L
On 1/16/20 9:01 AM, Ron Wells wrote:
> Being forced into OSPF, not needing an extra layer, nor wanting to
> nor the added complexity , no real support for multi OSA's or NIC's
> without OSPF.

Hum.

My reading of IBM z/OS V1R11 Communications Server TCP/IP Implementation
Volume 3 (SG24-7800-00) make me think that you can support multiple OSAs
/ NICs without OSPF. Particularly chapter 3 — VIPA without dynamic
routing §§ 3.1 and 3.1.1.

§ 3.1.1 — ARP takeover — is exactly what we do with thousands of systems
around the world on all sorts of switches from all sorts of vendors.

I fail to see why VIPA wouldn't work.

> Not supporting VIPA or DVIPA OSA environment >> or any multi NIC
> with VIPA.

Can you be more specific? What doesn't work? What are the failures?
Does anybody have any packet captures as evidence? Can anybody share
any horror stories about VIPA not working?

The z/OS Communications Server TCP/IP VIPA slide deck by Linda Harrison
w/ IBM supports this:

Link - z/OS Communications Server TCP/IP VIPA
- https://www-01.ibm.com/support/docview.wss?uid=tss1prs789&aid=1

Page 6 — OSA QDIO Gratuitous ARP Fail-over
Page 8 — Static VIPA for Inbound Connections
Page 9 — Dynamic VIPA (DVIPA)
Page 10 — Sysplex Distributor (Distributed DVIPA)

Per Linda's slide deck, DVIPA is only active on host (LPAR) at a time.
It just moves which host (LPAR) the DVIPA is active on.

Sysplex Distributor (Distributed DVIPA) is interesting. The VIPA is
active on multiple hosts (LPARs) in a Sysplex at the same time.
/However/ only /one/ of the hosts (LPARs) is the communicating with the
external network. /This/ /single/ host (LPAR) is the "Distributor
Stack" which is used to interconnect with the external network. Thus
Distributed DVIPA appears to the external network the same way that
Dynamic VIPA (DVIPA) does.

I see no reason why you would need OSPF for redundancy.

The documentation that I've skimmed this evening, one a Redbook, and the
other a slide deck, both from IBM, indicate that two ports on two OSAs
connected to the same network /should/ work perfectly fine, even for
z/OS, even without LACP.

Obviously, it would be good to have each OSA connected to separate
switches which are then interconnected. That way you have OSA and
switch redundancy.

I would love to know more / why VIPA / DVIPA / Distributed DVIPA doesn't
work.



--
Grant. . . .
unix || die

Grant Taylor

unread,
Jan 18, 2020, 12:34:42 AM1/18/20
to IBMTCP-L
On 1/17/20 10:21 AM, Hamilton, Robert wrote:
> <rob>
> Our z/OS vs. ACI UFC/MMA Battle Royale:
>
> We have always used multiple OSAs, specifying MULTIPATH PERCONNECTION
> to allow for some level of sharing of the OSAs. I think the phrase is
> "multiple equal-cost routes".

Equal Cost Multi(ple) Path, a.k.a. E.C.M.P., is an L3 /routing/ related
thing.

My understanding is that ((Distributed) D)VIPA is an L2 /MAC/ related thing.

> For each LPAR we defined an IP address for each OSA, and defined
> a primary (static) VIPA. We only connect in over the primary VIPA
> (for TSO, FTP, etc.; we have static application VIPAs, too), and we
> specify SOURCEVIPAINT to say that the primary VIPA is the source for
> all traffic from any one LPAR.

Okay.

Are the IPs assigned to the OSA effectively just for monitoring or other
similar maintenance?

> We don't do DVIPAs; our applications are...not capable of handling
> switchover to a DVIPA.
>
> Then our network guys installed Cisco ACI for Software-Defined
> Networking, and....Bad Things began to happen. ACI couldn't determine
> at any one point which MAC address was responsible for the primary
> VIPA addresses, so the leaf switch to which we were connected would
> (pardon the expression) flip-flop between the two MAC addresses of
> the two OSAs. When that started happening more than a thousand times
> a second...a change was a-comin'.

Yep. That sounds like the experience that we had prior to
Data-Plane-Learning being disabled.

Aside: I still don't understand what advantage that Data-Plane-Learning
provides. Particularly when every example I've seen it enabled in
caused problems.

> We implemented OMPROUTE, which was the preferred(-by-Cisco)
> solution. (The timeframe here is just less than 3 years ago.) There
> was an issue with ACI at the time that it didn't support more-specific
> routes within routes it was already generating, So Cisco couldn't
> agree to support our network configuration/architecture. DuMPROUTE.

Wow.

Why does Cisco want to use OSPF / OMPROUTE if the VIPAs and OSA IPs are
in the same subnet? OSPF is L3 routing.

I guess I can get behind having VIPAs in different subnets and route to
it via two or more OSA IPs.

> A white paper from Cisco, somewhat later, said that the better solution
> was to disable the IP Data-Plane Learning option in ACI, which was
> available in APIC Release 4.0(1h).

I'd be curious to read said white paper if have a copy or a pointer to
it. (See more below.)

> Unfortunately, data-plane-leaning is per-VRF (virtual routing and
> forwarding group)...and we only have one VRF. That would require us to
> create a second VRF and separate the mainframe from the other hosts
> on the same subnet, since we want data-plane-learning active for the
> OpenSystems servers on that subnet.

I'm somewhat surprised that the mainframe was in the same subnet as the
OpenSystems.

I'm surprised that ACI doesn't have a way to not have an answer to this.
It may be convoluted. (Put the mainframe on one VLAN that doesn't use
DPL and use LACP to connect to another VLAN for the OpenSystems that
does use DPL. Extending the broadcast domain across multiple VLANs.)
But I expect that there is an answer to it.

What benefit does DPL provide that traditional ARP & ND don't do?
Nobody at Cisco could explain that to me when I asked during our ACI
Battle Royale.

> NO ONE wants to split that Red Sea in two, so disabling
> data-plane-learning was...no.

Meh.

Splitting the Red Sea is work. But it can be worth it.

> The impending problem now is that we are down to three available
> addresses on that subnet and need a dozen more.

Ouch.

> Along the way we had disabled MULTIPATH; IIRC it had led to stale
> endpoints in the leaf switch.
>
> Disabling MULTIPATH means we have the primary/alternate OSA combination
> in effect for each of our LPARs. We ensure (the very best we can)
> that the one we want as primary is started before we start the
> secondary. That OSA shows up in NETSTAT DEVL as the primary, and
> leaves the secondary there as a failover.

That sounds like an operational nightmare. I guess the networking
people foisted some of that onto you since ACI didn't play well in the
sandbox.

> Unfortunately (...don't get me started...) it means that there is
> still traffic over the secondary OSA. The magic pothole is called
> "retransmits", which cause IP to hunt for a better route if it has
> had to retransmit the same packet more than twice. We have people who
> don't want those and feel they are causing monitoring and application
> issues, and have told me just this week to shut down the secondary
> OSA. So far, running with a single OSA has stopped some of the
> application/monitoring errors. I remind people about the relationship
> between correlation and causation, but...we are now down to funneling
> all the traffic on our primary LPAR through a single OSA.

If ACI is doing squirrelly stuff.... Well, all bets are off.

> Still looking for that one ring to rule them all. And may the odds
> be ever in your favor. After all, tomorrow is another day.

Disabling Data Plane Learning and enabling GARP learning fixed our
problems like a light switch. We were also told that both of those
changes were Cisco's "Best Practice" for ACI at the end of '18 /
beginning of '19.

> RON W.: We've talked about this before, about a year ago. Come to
> think of it, it was you sent me the link to that Cisco white paper. B-)

Um.... Not on this mailing list. (I subscribed in '19-12.) Maybe
somewhere else. I've not read a Cisco white paper on this, so I don't
see how I could recommend it to someone. I would be interested in (a
pointer to) a copy of said white paper, if you have it handy.

> </rob>
0 new messages