It has been a while since I reviewed
https://github.com/metal3-io/metal3-docs/pull/132. That's partly because
I'm not sure what the right way to deal with it is.
It's more or less impossible to specify in advance a system as large as
the one proposed without actually implementing it, which the authors
have done. On the other hand, once something has actually been
implemented, it is difficult to change again. I feel like this is
causing some inertia in the design document.
My main feedback has been that we should design around the idea that
Hosts will be assigned to different tenants (as in my proposed evolution
of the BareMetalHost API) in different namespaces, and that the network
configuration should be designed to keep the hosts assigned to each
namespace isolated from those assigned to other namespaces, enforced by
k8s RBAC. We don't necessarily need this to be implemented in the first
iteration, but we should at least be on the path to achieving it. It's
difficult for me to imagine any other compelling reason to integrate
provisioning with network config like this. I gather that in most
organisations the network and server teams generally refuse to work
together even when they're supposed to be building a multi-tenant cloud.
The idea that network teams will allow admin access to the switches for
any lesser purpose is almost inconceivable.
It appears to me that this is not going to make it into the proposal at
any point, because it would require a complete rethink/rewrite of the
existing prototype.
A lesser issue in the same vein is that I believe Maël suggested using
an architecture similar to the Cluster API to handle the multiple
implementations, where platform-specific controllers reconcile resources
for a particular platform that are then referenced from generic
resources. The superficial aspects of this were adopted without any of
the architectural benefits (i.e. allowing third-party controllers that
don't have to be compiled into the main binary), so that a rewrite of
the prototype would not be required. (For what it's worth, in my opinion
this was probably worse than leaving it as it was.)
Essentially it feels like the community is not really being asked for
input, but rather being asked to bless the existing prototype. Perhaps
this is inevitable for such an inherently complex problem. I certainly
appreciate the effort that folks have gone to to build this and document
and demo it for the community.
However, from my perspective I don't currently have a need for this
feature (although I have a personal interest in seeing something like
this happen). Should a need arise in the future, I currently feel like
it would probably require a wholesale redesign anyway to achieve a
useful level of multitenancy. If the current proposal were to be
accepted into Metal³, that would add the additional complication of
having to develop some sort of upgrade path. In other words, this can
only make _more_ work for future me. Under such circumstances it's
impossible to get excited about doing another review pass (which takes
about half a day because GitHub is terrible).
I'm not writing this to block the proposal, just to explain why I have
been quiet on the review lately (and, given my current workload levels,
for the foreseeable future). And hopefully to generate some discussion
in the community. Maybe I am completely wrong about the multi-tenancy
thing! It would certainly be interesting to hear about the intended use
cases for what the prototype does.
Perhaps a step forward that we could more readily agree on would be to
reduce the scope to this: what, if any, changes are required to the
BareMetalHost CRD to allow a network API to integrate with it, and why?
How would this operate in a multitenant environment as envisaged in the
API decomposition[1] proposal? If we can answer these questions, perhaps
we can add the required integration points in BMH/baremetal-operator
that would allow anybody to build a network API outside of Metal³
without us having to bless any specific one.
cheers,
Zane.
[1]
https://docs.google.com/document/d/1FL8oA0_WNcPdiC-0zrToinl8aol-NPQUjWe-OkcvTqA/edit