Tectonic Terraform installer fails on azure

170 views
Skip to first unread message

Kai Timmer

unread,
Oct 6, 2017, 12:06:52 PM10/6/17
to CoreOS User
I attempted a Tectonic cluster installation on azure. The installer in the end failed with the following message:

null_resource.tectonic (remote-exec): A dependency job for tectonic.service failed. See 'journalctl -xe' for details.
Error applying plan:

1 error(s) occurred:

* null_resource.tectonic: 1 error(s) occurred:

* Script exited with non-zero exit status: 1

Logged in on the master i get the following output from journalctl:

Oct 06 16:03:52 tectonic-test-master-0 python[963]: 2017/10/06 16:03:52.421590 WARNING Failed to flush firewall
Oct 06 16:03:52 tectonic-test-master-0 kubelet-wrapper[933]: W1006 16:03:52.957218     933 cni.go:189] Unable to update cni config: No networks found in /etc/kubernetes/cni/net.d
Oct 06 16:03:52 tectonic-test-master-0 kubelet-wrapper[933]: E1006 16:03:52.957351     933 kubelet.go:2136] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Oct 06 16:03:55 tectonic-test-master-0 kubelet-wrapper[933]: E1006 16:03:55.675555     933 reflector.go:190] k8s.io/kubernetes/pkg/kubelet/kubelet.go:400: Failed to list *v1.Service: Get https://tectonic-test-api.docker-intl.example.com:443/api/v1/services?resourceVersion=0: dial tcp 13.90.197.190:443: i/o timeout
Oct 06 16:03:55 tectonic-test-master-0 kubelet-wrapper[933]: E1006 16:03:55.678435     933 reflector.go:190] k8s.io/kubernetes/pkg/kubelet/kubelet.go:408: Failed to list *v1.Node: Get https://tectonic-test-api.docker-intl.example.com:443/api/v1/nodes?fieldSelector=metadata.name%3Dtectonic-test-master-0&resourceVersion=0: dial tcp 13.90.197.190:443: i/o timeout
Oct 06 16:03:55 tectonic-test-master-0 kubelet-wrapper[933]: E1006 16:03:55.678448     933 reflector.go:190] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:46: Failed to list *v1.Pod: Get https://tectonic-test-api.docker-intl.example.com:443/api/v1/pods?fieldSelector=spec.nodeName%3Dtectonic-test-master-0&resourceVersion=0: dial tcp 13.90.197.190:443: i/o timeout

What can I do to get Tectonic running?

Regards,
Kai

Rob Szumski

unread,
Oct 6, 2017, 3:18:26 PM10/6/17
to Kai Timmer, CoreOS User
Hmm, looks like a bootstrapping failure. This is sometimes caused by slow S3 download speeds. Can you try restarting the machine and/or the tectonic service?

 - Rob

--
You received this message because you are subscribed to the Google Groups "CoreOS User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to coreos-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Kai Timmer

unread,
Oct 9, 2017, 5:11:52 AM10/9/17
to CoreOS User
Hi,
I retried a bunch of times. Always with the same result. I also tried a terraform destroy and restarting the whole process, still ended up with this error.

Any other hints?

Thanks,
Kai

Rob Szumski

unread,
Oct 11, 2017, 2:42:58 PM10/11/17
to Kai Timmer, CoreOS User
Are there any crashlooping pods? There is a root kubeconfig generated by the installer that should be able to give you access to use kubectl. If the API server is not up, something else must be going on.

What does `systemctl status tectonic.service` show in terms of failures in the ExecStartPre section?

- Rob

Kai Timmer

unread,
Oct 17, 2017, 6:52:35 AM10/17/17
to CoreOS User



Hi, again,
so, with the new version of the Tectonic installer (tectonic_1.7.5-tectonic.1) the installer finishes fine, but I still can't get he tectonic service up and running.

The tectonic.service at startup, complains about a missing dependency the bootkube.service, which can't be started and writes the following log:
eviction_manager.go:238] eviction manager: unexpected err: failed GetNode: node 'tectonic-test-master-0' not found
cni.go:189] Unable to update cni config: No networks found in /etc/kubernetes/cni/net.d
kubelet.go:2136] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready:
cni.go:189] Unable to update cni config: No networks found in /etc/kubernetes/cni/net.d

It seems like the problem is still the same as with the old installer, but the new one doesn't wait for the script to return and just "completes" successfully without verifying that the service is actually started.

Regards,
Kai

ste...@retrogaming.org

unread,
Oct 17, 2017, 11:16:35 AM10/17/17
to CoreOS User
Hi,

similar problem here:
Oct 17 14:59:05 tectonic-test-master-0 bash[752]: [ 1337.905131] bootkube[5]: Tearing down temporary bootstrap control plane...
Oct 17 14:59:05 tectonic-test-master-0 bash[752]: [ 1337.905198] bootkube[5]: Error: API Server is not ready: timed out waiting for the condition
Oct 17 14:59:05 tectonic-test-master-0 bash[752]: [ 1337.905267] bootkube[5]: Error: API Server is not ready: timed out waiting for the condition
Oct 17 14:59:05 tectonic-test-master-0 bash[752]: [ 1337.905336] bootkube[5]: API Server is not ready: timed out waiting for the condition
Oct 17 14:59:05 tectonic-test-master-0 systemd[1]: bootkube.service: Main process exited, code=exited, status=1/FAILURE
Oct 17 14:59:05 tectonic-test-master-0 systemd[1]: Failed to start Bootstrap a Kubernetes cluster.
Oct 17 14:59:05 tectonic-test-master-0 systemd[1]: bootkube.service: Unit entered failed state.
Oct 17 14:59:05 tectonic-test-master-0 systemd[1]: bootkube.service: Failed with result 'exit-code'.

Terraform deployment was all ok. 

Apply complete! Resources: 124 added, 0 changed, 0 destroyed.

5 nodes running with 3 works, 1 master, 1 etcd

Cheers

Stefan

Jon Mosco

unread,
Oct 19, 2017, 1:05:02 PM10/19/17
to CoreOS User
Same issue here with VMware and the newest Tectonic installer.  Tectonic reports a success, and the nodes are not configured correctly, and producing similar errors:

Oct 19 17:03:57 tectonic01 kubelet-wrapper[755]: W1019 17:03:57.394651     755 cni.go:189] Unable to update cni config: No networks found in /etc/kubernetes/cni/net.d
Oct 19 17:03:57 tectonic01 kubelet-wrapper[755]: E1019 17:03:57.395274     755 kubelet.go:2136] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Oct 19 17:03:57 tectonic01 kubelet-wrapper[755]: E1019 17:03:57.744746     755 reflector.go:190] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:46: Failed to list *v1.Pod: Get https://controllers.example.com:443/api/v1/pods?fieldSelector=spec.nodeName%3Dtectonic01&resourceVersion=0: EOF

Alex Somesan

unread,
Oct 23, 2017, 2:28:13 PM10/23/17
to CoreOS User
Hello Jon, Kai,

I suspect this to be a case of misconfigured/overlapping address ranges.

Can you both please post here the contents of your tfvars files (minus any sensitive information). Can you please at the very least make sure to include all parameters that specify network address ranges from your tfvars?

Also of interest is the log output of the flannel pods from the same nodes where you saw the tectonic.service failures.

Thank you!
Alex

Stefan Ernst

unread,
Oct 23, 2017, 4:42:06 PM10/23/17
to CoreOS User
Hi,

so this is my tfvars, I left most of it default (copied from the example tfvars) and simply changed the logon / dns config. I stripped out all the comments so this is everything actually configured


tectonic_admin_email = "admin@..."

tectonic_admin_password_hash = "...."

tectonic_azure_client_secret = "...."

tectonic_azure_external_dns_zone_id = "/subscriptions/..../resourceGroups/dns/providers/Microsoft.Network/dnszones/my.cloud"

tectonic_azure_location = "eastus"

tectonic_azure_ssh_key = "/Users/.../.ssh/id_rsa_azure.pub"

tectonic_base_domain = "my.cloud"

tectonic_calico_network_policy = false

tectonic_cl_channel = "stable"

tectonic_cluster_cidr = "10.2.0.0/16"

tectonic_cluster_name = "tectonic-test"

tectonic_etcd_count = "0"

tectonic_experimental = false

tectonic_license_path = "/Users/.../tectonic-test/tectonic-license.txt"

tectonic_master_count = "1"

tectonic_pull_secret_path = "/Users/.../tectonic-test/config.json"

tectonic_service_cidr = "10.3.0.0/16"


tectonic_vanilla_k8s = false

tectonic_worker_count = "3"

Cheers

Stefan

Kai Timmer

unread,
Oct 24, 2017, 8:39:47 AM10/24/17
to CoreOS User

Hello,
so this is my config file with all the values that are set to something else than the default:

tectonic_admin_email = "m...@email.com"
tectonic_admin_password_hash = "$hashedfoobar"
tectonic_azure_client_secret = "80f5911f-c926-4368-a9f2-9a9c768a836a"
tectonic_azure_cloud_environment = "AZUREPUBLICCLOUD"
tectonic_azure_etcd_storage_type = "Premium_LRS"
tectonic_azure_etcd_vm_size = "Standard_DS2_v2"
tectonic_azure_external_dns_zone_id = "/subscriptions/id/resourceGroups/docker-intl-test/providers/Microsoft.Network/dnszones/docker-intl.mydomain.rocks"
tectonic_azure_external_resource_group = "docker-intl-test"
tectonic_azure_location = "westeurope"
tectonic_azure_master_storage_type = "Premium_LRS"
tectonic_azure_master_vm_size = "Standard_DS2_v2"
tectonic_azure_ssh_key = "~/.ssh/id_rsa.pub"
tectonic_azure_worker_storage_type = "Premium_LRS"
tectonic_azure_worker_vm_size = "Standard_DS2_v2"
tectonic_base_domain = "docker-intl.mydomain.rocks"
tectonic_calico_network_policy = false
tectonic_cl_channel = "stable"
tectonic_cluster_cidr = "10.2.0.0/16"
tectonic_cluster_name = "tectonic-test"
tectonic_etcd_count = "0"
tectonic_experimental = false
tectonic_license_path = "~/tectonic/tectonic-license.txt"
tectonic_master_count = "1"
tectonic_pull_secret_path = "~/tectonic/pull-secret.json"
tectonic_service_cidr = "10.3.0.0/16"
tectonic_vanilla_k8s = false
tectonic_worker_count = "2"

I hope this helps tracking the issue down.

Regards,
Kai

--
You received this message because you are subscribed to a topic in the Google Groups "CoreOS User" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/coreos-user/zKKCPtJDVpU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to coreos-user...@googlegroups.com.

Alex Somesan

unread,
Oct 30, 2017, 11:51:40 AM10/30/17
to CoreOS User
Kai, 

Sorry for the late response.
On a very quick skim through your tfvars I see at least two issues:

1) The value for tectonic_azure_cloud_environment is incorrect. It should be one of the values detailed here: https://www.terraform.io/docs/providers/azurerm/index.html#environment
OTH, for public cloud you don't have to specify it since that is the default in Terraform.

2) The paths to the various files requested by the installer (SSH key, tectonic license and pull secret) have to absolute, not relative. This is a known limitation of Terraform.
So instead of tectonic_azure_ssh_key = "~/.ssh/id_rsa.pub" you actually need to have tectonic_azure_ssh_key = "/path/to/home/directory/.ssh/id_rsa.pub" 
The same goes for tectonic_license_path and tectonic_pull_secret_path.

Let me know if that helped.

Alex
Reply all
Reply to author
Forward
0 new messages