I'm experimenting installing OKD 4.9 using IPI on vSphere 7.0.2 with 3 ESXi hosts.
creates the expected objects, such as the Fedora CoreOS template, then bootstrap VM, then the 3 masters (each one on a different ESXi) and arrives at a point where the bootstrap node has been destroyed as expected, but it seems cluster creation doesn't complete and no worker node VM has been still created.
[ERROR] updating grafana: waiting for Grafana Route to become ready failed: waiting for route openshift-monitoring/grafana: no status available
[ERROR] updating kube-state-metrics: reconciling kube-state-metrics Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/kube-state-metrics: got 1 unavailable replicas
[ERROR] updating openshift-state-metrics: reconciling openshift-state-metrics Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/openshift-state-metrics: got 1 unavailable replicas
[ERROR] updating prometheus-adapter: reconciling PrometheusAdapter Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/prometheus-adapter: got 2 unavailable replicas
[INFO] Cluster operator network ManagementStateDegraded is False with :
[INFO] Cluster operator network Progressing is True with Deploying: Deployment "openshift-network-diagnostics/network-check-source" is waiting for other operators to become ready
[ERROR] Cluster initialization failed because one or more operators are not functioning properly.
...
in .openshift_install.log it seems in a few minutes 716 out of 745 done, but then nothing else
time="2022-03-09T01:44:48+01:00" level=debug msg="Still waiting for the cluster
to initialize: Working towards 4.9.0-0.okd-2022-02-12-140851: 716 of 745 done (9
6% complete)"
time="2022-03-09T01:45:03+01:00" level=debug msg="Still waiting for the cluster
to initialize: Some cluster operators are still updating: authentication, consol
e, image-registry, ingress, kube-apiserver, monitoring"
time="2022-03-09T01:45:44+01:00" level=debug msg="Still waiting for the cluster
to initialize: Some cluster operators are still updating: authentication, consol
e, image-registry, ingress, kube-apiserver, monitoring"
time="2022-03-09T01:45:48+01:00" level=debug msg="Still waiting for the cluster
to initialize: Working towards 4.9.0-0.okd-2022-02-12-140851: 719 of 745 done (9
6% complete)"
time="2022-03-09T01:46:03+01:00" level=debug msg="Still waiting for the cluster
to initialize: Working towards 4.9.0-0.okd-2022-02-12-140851: 722 of 745 done (9
6% complete)"
time="2022-03-09T01:48:18+01:00" level=debug msg="Still waiting for the cluster
to initialize: Some cluster operators are still updating: authentication, consol
e, ingress, monitoring"
time="2022-03-09T02:22:19+01:00" level=error msg="Cluster operator authentication Degraded is True with IngressStateEndpoints_MissingSubsets::OAuthClientsController_SyncError::OAuthServerDeployment_PreconditionNotFulfilled::OAuthServerRouteEndpointAccessibleController_SyncError::OAuthServerServiceEndpointAccessibleController_SyncError::OAuthServerServiceEndpointsEndpointAccessibleController_SyncError::WellKnownReadyController_SyncError: IngressStateEndpointsDegraded: No subsets found for the endpoints of oauth-server\nOAuthClientsControllerDegraded: no ingress for host oauth-openshift.apps.myocp.localdomain.local in route oauth-openshift in namespace openshift-authentication\nOAuthServerDeploymentDegraded: waiting for the oauth-openshift route to contain an admitted ingress: no admitted ingress for route oauth-openshift in namespace openshift-authentication\nOAuthServerDeploymentDegraded: \nOAuthServerRouteEndpointAccessibleControllerDegraded: route \"openshift-authentication/oauth-openshift\": status does not have a host address\nOAuthServerServiceEndpointAccessibleControllerDegraded: Get \"
https://172.30.124.149:443/healthz\": dial tcp
172.30.124.149:443: connect: connection refused\nOAuthServerServiceEndpointsEndpointAccessibleControllerDegraded: oauth service endpoints are not ready\nWellKnownReadyControllerDegraded: failed to get oauth metadata from openshift-config-managed/oauth-openshift ConfigMap: configmap \"oauth-openshift\" not found (check authentication operator, it is supposed to create this)"
One doubt: do the master nodes have to reach vCenter Server and ESXi hosts with their names?
I can ssh to the master nodes as the core user with public key and run any debug/verify command in case
Thanks in advance,
Gianluca