Deploying a cluster with kubevirt-ansible

Ihar Hrachyshka

unread,

Nov 27, 2018, 2:58:25 PM11/27/18

to kubevi...@googlegroups.com

Hi,

I am trying to use kubevirt-ansible to deploy an all-in-one single
node cluster with kubevirt (not reusing an existing cluster). I would
prefer plain kubernetes but openshift may also work for my needs.

What I am doing is this:

0. Install galaxy deps: ansible-galaxy install -p $HOME/galaxy-roles
-r requirements.yml && export ANSIBLE_ROLES_PATH=$HOME/galaxy-roles
1. Give the node a name in /etc/hosts (kubedev) for its ip address.
2. Configure password-less login.
3. Specify 'kubedev' for the following sections in inventory file:
masters, etcd, nodes, nfs.
4. ansible-playbook -i inventory playbooks/cluster/kubernetes/config.yml

This fails with:

TASK [kubernetes-master : deploy kubernetes]
*******************************************************************************************************************************************
Tuesday 27 November 2018 11:24:27 -0800 (0:00:02.423) 0:00:54.262 ******
fatal: [kubedev]: FAILED! => {
"changed": true,
"cmd": "/root/deploy_kubernetes.sh | grep \"kubeadm join\"",
"delta": "0:02:15.227363",
"end": "2018-11-27 11:26:43.711808",
"rc": 1,
"start": "2018-11-27 11:24:28.484445"
}

STDERR:

[WARNING Port-6443]: Port 6443 is in use
[WARNING Port-10251]: Port 10251 is in use
[WARNING Port-10252]: Port 10252 is in use
[WARNING
FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]:
/etc/kubernetes/manifests/kube-apiserver.yaml already exists
[WARNING
FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml]:
/etc/kubernetes/manifests/kube-controller-manager.yaml already exists
[WARNING
FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml]:
/etc/kubernetes/manifests/kube-scheduler.yaml already exists
[WARNING FileAvailable--etc-kubernetes-manifests-etcd.yaml]:
/etc/kubernetes/manifests/etcd.yaml already exists
[WARNING Port-10250]: Port 10250 is in use
[WARNING Port-2379]: Port 2379 is in use
[WARNING DirAvailable--var-lib-etcd]: /var/lib/etcd is not empty
error marking master: timed out waiting for the condition
Error from server (AlreadyExists):
clusterrolebindings.rbac.authorization.k8s.io "add-on-cluster-admin"
already exists
Error from server (AlreadyExists):
clusterrolebindings.rbac.authorization.k8s.io "add-on-default-admin"
already exists

Kubelet journal suggests some issue with certificates:

Nov 27 11:40:51 kubedev dockerd-current[7820]: E1127 19:40:51.666783
1 authentication.go:62] Unable to authenticate the request due to
an error: [x509: certificate signed by unknown authority, x509:
certificate signed by unknown authority]
Nov 27 11:40:51 kubedev kubelet[14804]: E1127 11:40:51.667243 14804
reflector.go:134]
k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list
*v1.Pod: Unauthorized
Nov 27 11:40:51 kubedev dockerd-current[7820]: E1127 19:40:51.667516
1 authentication.go:62] Unable to authenticate the request due to
an error: [x509: certificate signed by unknown authority, x509:
certificate signed by unknown authority]
Nov 27 11:40:51 kubedev kubelet[14804]: E1127 11:40:51.667845 14804
reflector.go:134] k8s.io/kubernetes/pkg/kubelet/kubelet.go:442: Failed
to list *v1.Service: Unauthorized
Nov 27 11:40:51 kubedev dockerd-current[7820]: E1127 19:40:51.668635
1 authentication.go:62] Unable to authenticate the request due to
an error: [x509: certificate signed by unknown authority, x509:
certificate signed by unknown authority]
Nov 27 11:40:51 kubedev kubelet[14804]: E1127 11:40:51.668963 14804
reflector.go:134] k8s.io/kubernetes/pkg/kubelet/kubelet.go:451: Failed
to list *v1.Node: Unauthorized
Nov 27 11:40:51 kubedev kubelet[14804]: E1127 11:40:51.718960 14804
kubelet.go:2236] node "kubedev" not found

Ideas on what's going on here?

After that, I tried to take the OpenShift route, but am stuck on
incomplete instructions in README.md. Here is what I do.

1. Again, configure passwordless ssh and /etc/hosts.
2. For inventory file, do the same except for nodes section, I use an
example from the README: kubedev openshift_node_labels="{'region':
'infra','zone': 'default'}" openshift_schedulable=true
3. ansible-playbook -i inventory -e@vars/all.yml
playbooks/cluster/openshift/config.yml

This fails because of missing openshift-ansible modules, which makes sense:

ERROR! Unable to retrieve file contents
Could not find or access
'/home/ihar/kubevirt-ansible/playbooks/cluster/openshift/openshift-ansible/playbooks/prerequisites.yml'

So I check out the repo:

git clone -b release-3.10 https://github.com/openshift/openshift-ansible

Now, I should probably somehow register the new ansible module, so I tried this:

export ANSIBLE_ROLES_PATH=$HOME/galaxy-roles:./openshift-ansible/roles/

It still gives me the same error about missing prerequisites.yml file.

Now, perhaps I am just missing some basic background about how ansible
modules are installed and imported, but I am not sure I, as a kubevirt
developer, should necessarily care about these details. At least,
instructions should guide me, ideally giving me a one liner to deploy
an all in one. Now, granted, for OpenShift, README also suggests that
I should attach and configure an extra disk for docker storage plus
update container runtime settings, which I did not (again, because I
am missing some basic information about how to do it right). But
kubernetes option doesn't suggest any changes for that, and for
OpenShift it doesn't seem like the error I receive is anything related
to wrong CRI configuration (it fails at the very start, and doesn't
deploy a single container).

Thoughts on what I am doing wrong?

Ihar

Ihar Hrachyshka

unread,

Nov 27, 2018, 5:39:20 PM11/27/18

to kubevi...@googlegroups.com

OK Looks like deploying kubernetes cluster is known to be broken:
https://github.com/kubevirt/kubevirt-ansible/issues/455

I also found there were, at a point, instructions on how to plug in
openshift-ansible modules but then the info was lost after README
refactoring: https://github.com/kubevirt/kubevirt-ansible/pull/111
(restoring this info here:
https://github.com/kubevirt/kubevirt-ansible/pull/488)

So with this knowledge, I repeated deployment and long into the
deployment (longer than I ever got before which is great) I got this
error:

TASK [openshift_control_plane : Check for
apiservices/v1beta1.metrics.k8s.io registration]
***************************************************************************************************************************************************
Tuesday 27 November 2018 14:27:24 -0800 (0:00:00.178) 0:10:50.462 ******
FAILED - RETRYING: Check for apiservices/v1beta1.metrics.k8s.io
registration (30 retries left).
...
FAILED - RETRYING: Check for apiservices/v1beta1.metrics.k8s.io
registration (1 retries left).
fatal: [kubedev]: FAILED! => {
"attempts": 30,
"changed": true,
"cmd": [
"oc",
"get",
"apiservices/v1beta1.metrics.k8s.io"
],
"delta": "0:00:00.167471",
"end": "2018-11-27 14:30:12.507846",
"failed_when_result": true,
"rc": 1,
"start": "2018-11-27 14:30:12.340375"
}

STDERR:

error: the server doesn't have a resource type "apiservices"

And if I use master kubeconfig then indeed the cluster doesn't know
about the resource:

[ihar@kubedev kubevirt-ansible]$ oc get apiservices/v1beta1.metrics.k8s.io
Error from server (NotFound): apiservices.apiregistration.k8s.io
"v1beta1.metrics.k8s.io" not found

Thoughts?
Ihar

Ihar Hrachyshka

unread,

Nov 29, 2018, 12:05:26 PM11/29/18

to kubevi...@googlegroups.com

For the record, I reported a formal issue for openshift path here:
https://github.com/kubevirt/kubevirt-ansible/issues/494

To recollect, we have plain kubernetes deployment broken too (both
deploying a new cluster as well as reusing an existing one):

https://github.com/kubevirt/kubevirt-ansible/issues/455
https://github.com/kubevirt/kubevirt-ansible/issues/4558

This completely blocks me from working on SR-IOV installer
integration. I would love to see some more traction on these issues.

Ihar

Ihar Hrachyshka

unread,

Nov 29, 2018, 1:40:35 PM11/29/18

to kubevi...@googlegroups.com

Sebastian and I are trying to fix the repo to the point where can
actually deploy on an existing cluster. See some fixes here:
https://github.com/booxter/kubevirt-ansible/tree/fix-kubevirt-ansible

This did NOT bring me to success but at least now it tried to deploy
multus. It still fails, now because apparently the procedure broke
networking on the cluster, so ansible can no longer reach out to kube
api to check if multus pod is up:

# ansible-playbook -i inventory playbooks/kubevirt.yml
-e@vars/all.yml -e cluster=kubernetes
...
fatal: [kubdev]: FAILED! => {
"attempts": 20,
"changed": true,
"cmd": "kubectl -n kube-system get daemonset | grep
kube-multus-amd64 | awk '{ if ($3 == $4) print \"0\"; else print
\"1\"}'",
"delta": "0:00:00.094178",
"end": "2018-11-29 20:30:20.185225",
"rc": 0,
"start": "2018-11-29 20:30:20.091047"
}

STDERR:

The connection to the server 10.35.19.194:6443 was refused - did you
specify the right host or port?

My kubectl is broken after the call.

Ihar

Ihar Hrachyshka

unread,

Nov 30, 2018, 4:16:21 PM11/30/18

to kubevi...@googlegroups.com

And to wrap up the thread, with help from Sebastian and some more
effort, I was able to deploy on an existing kubernetes cluster. The
fixes needed to get installation through are in the following PR:
https://github.com/kubevirt/kubevirt-ansible/pull/497

Reply all

Reply to author

Forward