But as much as I am not looking forward to cutting-and-pasting all that
code and reworking it, in the medium term I think we will need to split
the current CRD into multiple CRDs. We ought to be able to delegate RBAC
control of any one of those phases to a user/service without delegating
all of it. For example, the CAPM3 should be able to deploy an image on a
host but not make the system forget how to contact the BMC. Making a
major API change is only going to get more painful as time goes on, so
now is the time to start thinking about it.
BareMetalAllocation(?):
Spec:
- Externally provisioned
- Offline
- Root device hints
- Network data(?)
- Metadata(?)
Status:
- Powered on
- Available (i.e. not powered off or provisioned, externally or
otherwise)
If this resource doesn't exist, the host would be left powered down. If
it exists, the host would be powered up ready for fast-track booting
unless the 'offline' flag is set. The BareMetalMachine would search for
these resources (rather than BareMetalHosts) - and the hardware
classification controller would apply its labels here. Probably the bmo
would also provide an 'available' label for use as a selector. This
finally provides an easy answer for how to take a host out of service
for maintenance, without forgetting that it exists or where its BMC
credentials are stored: simply delete this resource.
BareMetalDeployment
Spec:
- Image URL
- Image checksum
- Checksum algorithm
- User data
- RAID config
- Preboot options (nested virt, SMT, &c.)
Status:
- Complete
- Powered On
- Error message
Annotations:
- reboot.metal3.io
Rather than having a consumer reference, the deployment resource could
simply be owned by whatever resource is doing the deployment.
Predictable naming prevents conflicts between multiple consumers.
cheers,
Zane.
--
You received this message because you are subscribed to the Google Groups "Metal3 Development List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to metal3-dev+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/metal3-dev/5af117fc-2006-6068-5be6-6226c8fa0faf%40redhat.com.
It looks like I got the RAID Config in the wrong place here; according
to the Ironic docs the RAID configuration is done by the operator and
all the user can do is request a flavour that filters on a particular
RAID level. Come to think of it, that could also save us from having to
do a manual cleaning step before every RAID deployment.
> If we eliminate the online flag entirely (or make it part of a separate
> CRD for expressing the maintenance state of a host), could an Allocation
> and a Deployment be merged into 1 CRD? Aside from the flag they both
> seem to have partial instructions for provisioning a host.
>
> If we have separate resources for deploying an image to a host and for
> managing its power, and both are changed, which controller wins? What is
> more important, finishing the provisioning work or powering the host
> off? We have that same issue today, but all of the logic lives in 1
> controller so at least we can express the priority in 1 code base.
I'd expect that even with multiple CRDs we'd still have only one controller.
> What if we have a BMC type; a Host type for the provisioning instructions, boot instructions, and hardware details;
I think if the hardware details are not going to be separate then they
should be part of the BMC object. You probably don't want to delete them
every time you reprovision.
> and another type to manage things like powering off or rebooting (name TBD).
I do like the idea of a separate reboot request CRD (which will
necessarily need to allow holding the power off long-term as well)
instead of an annotation, if we can agree on what it should look like :)
I wonder if part of the reason the current interface is awkward is that
we actually want different controls for the 'ready' and 'provisioned'
states, to be used by different classes of actors for different reasons.
> That gives some separation of control via RBAC. Does that buy us enough to go through the trouble of making it possible to migrate the types?
TBH I don't think that's enough separation; it's reasonably foreseeable
that we might have to change it again.
cheers,
Zane.
--
You received this message because you are subscribed to the Google Groups "Metal3 Development List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to metal3-dev+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/metal3-dev/0b99774e-a61e-f8e1-25a6-d12516b9077f%40redhat.com.
I don't think we necessarily want to force the BMO to be in charge of
producing the metadata, but we should want to enable people to build
systems with a component whose role it is to provide trusted information
about the environment in which the host is running and a separate 'user'
component that can only provide the image and the untrusted user-data.
It looks like I got the RAID Config in the wrong place here; according
to the Ironic docs the RAID configuration is done by the operator and
all the user can do is request a flavour that filters on a particular
RAID level. Come to think of it, that could also save us from having to
do a manual cleaning step before every RAID deployment.
--
You received this message because you are subscribed to the Google Groups "Metal3 Development List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to metal3-dev+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/metal3-dev/0b99774e-a61e-f8e1-25a6-d12516b9077f%40redhat.com.
>
> It looks like I got the RAID Config in the wrong place here; according
> to the Ironic docs the RAID configuration is done by the operator and
> all the user can do is request a flavour that filters on a particular
> RAID level. Come to think of it, that could also save us from having to
> do a manual cleaning step before every RAID deployment.
>
>
> I'm not sure what this means. Are you saying Ironic doesn't change the
> RAID settings to match what is required? I thought it did that as part
> of cleaning.
The current PR for adding RAID support adds an additional state
(Preparing, between Ready and Provisioning) that runs a 'manual
cleaning' in Ironic to set up RAID every time we provision. Presumably
we would be able to run this manual cleaning only once and then
provision multiple images with the same RAID configuration without doing
a manual cleaning every time. Although, on reflection the real
difference here would simply be that we remembered what the last RAID
config we set up on the host was... which would be a natural thing to do
with this API proposal, but in fact is something we could do
independently any time if we thought it was valuable.
>
> > That gives some separation of control via RBAC. Does that buy us
> enough to go through the trouble of making it possible to migrate
> the types?
>
> TBH I don't think that's enough separation; it's reasonably foreseeable
> that we might have to change it again.
>
> cheers,
> Zane.
>
> --
> You received this message because you are subscribed to the Google
> Groups "Metal3 Development List" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to metal3-dev+...@googlegroups.com
> <mailto:metal3-dev%2Bunsu...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/metal3-dev/0b99774e-a61e-f8e1-25a6-d12516b9077f%40redhat.com.
>
--
You received this message because you are subscribed to the Google Groups "Metal3 Development List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to metal3-dev+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/metal3-dev/002a22e4-9ad6-26b4-20ca-e1b25becd859%40redhat.com.
> send an email to metal...@googlegroups.com
> <mailto:metal3-dev%2Bunsu...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/metal3-dev/0b99774e-a61e-f8e1-25a6-d12516b9077f%40redhat.com.
>
--
You received this message because you are subscribed to the Google Groups "Metal3 Development List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to metal...@googlegroups.com.
On 26/05/20 1:36 pm, Doug Hellmann wrote:
> What does "allocated to some resource pool" mean? That a
> BareMetalHostAllocation resource exists
Yes.
On 26/05/20 1:36 pm, Doug Hellmann wrote:
> Is looking at multiple resources a big concern?
>
>
> As we expand the API to support setting new details like RAID and BIOS
> settings, we have to keep answering questions about when those settings
> can be changed and what it means to change them at other times.
Yeah, I think we've generally settled on an approach where we say they
are fixed at the time we start provisioning. (We should probably try to
do better at making sure we save them in the status so users aren't
misled as to what is current.)
> Having
> them all on one resource means that updates can at least be atomic, so
> that once provisioning starts changes might be ignored. If some of the
> settings are on different resources, then we have to deal with cases
> where one resource is updated but another isn't and some operation
> starts but then the instructions change.
I think I see what you're saying. If a user changes settings in one
resource (either the Host or Allocation) and immediately creates a
BareMetalDeployment, then is there a risk that the controller will read
an outdated cache of the settings that it then starts provisioning with?
Could a sequence of causally-ordered changes to different resources
nevertheless result in them being applied out of order? Given that the
premise of Kubernetes APIs is that resources are always converging to
the current spec, which theoretically makes ordering irrelevant, there's
a good chance that the answer is yes.
The way to solve is this is to write the data as status to a single
resource. So assuming that when a BareMetalDeployment is created we
first copy the desired image details to the BareMetalHost's status, and
then deploy based on that, we can ensure that we have the latest
settings from the BareMetalHost itself (a write to the status would fail
if the settings had been updated).
Any settings that end up part of the BareMetalAllocation are trickier.
We'll still want to copy them to the BareMetalHost, but this does not
guarantee ordering. A client could ensure ordering by waiting for the
new settings to be copied before creating the BareMetalDeployment,
though this does require some smarts on the part of the client. However,
if we "roll up" the image details from
BareMetalDeployment->BareMetalAllocation->BareMetalHost then we can
guarantee that the ordering is correct.
> So my thought with the proposal was to change effectively change the
> default based on the stage we are at:
>
> * Only the BMC is specified - host will be powered off, implementing (1)
> * Host is allocated to some resource pool - it will be powered up and
> ready to provision by default. An 'offline' flag allows (2) & (3)
> * Host is provisioned - the reboot annotation (or a new API)
> implements (4)
>
>
> What does "allocated to some resource pool" mean? That a
> BareMetalHostAllocation resource exists
Yes.
> How would I represent an externally provisioned host for which I want
> power control to support fencing?
Good question. I proposed that the ExternallyProvisioned flag would
exist in the BareMetalHostAllocation, but that the reboot annotation
would be on the BareMetalDeployment. So if there is no deployment there
is nowhere to put the annotation. That would be a good reason to
implement the reboot request CRD that we talked about upthread, so you
could still fence an externally provisioned host.
In typing this I realised something important about security. It's a
mistake to link all of these resources in the same namespace, because
then ability to reboot one host is ability to reboot all of the hosts in
that namespace (and transferring a host between namespaces requires
deleting the CR with the BMC credentials, which was the thing we are
trying to avoid). Instead the Allocation should define the namespace in
which to look for BareMetalDeployment and reboot request CRs for that
host (it could default to the same namespace). So the meaning of the
Allocation would be allocating control over what is running on the
server to a particular namespace.
> How would I represent an externally provisioned host for which I do not
> yet have BMC credentials but where I may have hardware inventory details?
You could go ahead and create the BareMetalHostAllocation (to set the
ExternallyProvisioned flag) and HardwareDetails CRs. Since they're tied
together by name, in theory you wouldn't need to create the
BareMetalHost yet at all, unless we programmed the controller to delete
all Allocations/HardwareDetails resources that weren't associated with a
Host. Or you could just create the Host with blank credentials (I'm not
sure we handle this well today, but it's the ~same problem in either case).
- ZB
--
You received this message because you are subscribed to the Google Groups "Metal3 Development List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to metal3-dev+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/metal3-dev/8505fd98-ea9a-b498-9f5f-64f0af6b14e2%40redhat.com.
--
You received this message because you are subscribed to the Google Groups "Metal3 Development List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to metal3-dev+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/metal3-dev/8505fd98-ea9a-b498-9f5f-64f0af6b14e2%40redhat.com.