Deployment delete stuck: "Live nodes" when there aren't any

133 views
Skip to first unread message

Bryan Sullivan

unread,
Nov 8, 2017, 11:01:47 AM11/8/17
to cloudify-users
Hi all,
If anyone has ideas on what's happening here and how to avoid this stuck situation, let me know.
In the OPNFV Models project (github mirror)  I'm working to script an automated deployment/test of Cloudify with various cloud-native control plane stacks including k8s. I'm running into a situation in the cleanup steps for a k8s-based app "deployment" in which the Cloudify Manager thinks there are live nodes remaining, when there aren't. This causes a deadlock that I can only resolve by doing a forced deployment delete. I am scripting this process via the Cloudify API (see k8s-cloudify.sh)  but have to use the cfy command directly on the k8s master node (where Cloudify CLI is installed) as a fallback  (that's issue #2 - deployment force delete is not working via the API...).

See below for some output showing the problem. Any advice is appreciated.

+++++
Deployment delete stuck: "Live nodeswhen there aren't any

bryan@opnfv-admin:~$ curl -u admin:admin --header 'Tenant: default_tenant' http://$manager_ip/api/v3.1/node-instances | jq -r '.items[0].state'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  4712  100  4712    0     0  38415      0 --:--:-- --:--:-- --:--:-- 38622
deleted
bryan@opnfv-admin:~$ 
bryan@opnfv-admin:~$ curl -X DELETE -u admin:admin --header 'Tenant: default_tenant'  http://$manager_ip/api/v3.1/deployments/k8s-hello-world
{"message": "Can't delete deployment k8s-hello-world - There are live nodes for this deployment. Live nodes ids: nginx_service_w0nyub", "error_code": "dependent_exists_error", "server_traceback": "Traceback (most recent call last):\n  File \"/opt/manager/env/lib/python2.7/site-packages/manager_rest/rest/rest_decorators.py\", line 96, in wrapper\n    return func(*args, **kwargs)\n  File \"/opt/manager/env/lib/python2.7/site-packages/manager_rest/rest/rest_decorators.py\", line 136, in wrapper\n    response = f(*args, **kwargs)\n  File \"/opt/manager/env/lib/python2.7/site-packages/manager_rest/rest/resources_v1/deployments.py\", line 164, in delete\n    deployment_id, bypass_maintenance, args.ignore_live_nodes)\n  File \"/opt/manager/env/lib/python2.7/site-packages/manager_rest/resource_manager.py\", line 337, in delete_deployment\n    ('uninitialized', 'deleted')])))\nDependentExistsError: Can't delete deployment k8s-hello-world - There are live nodes for this deployment. Live nodes ids: nginx_service_w0nyub\n"}bryan@opnfv-admin:~$ 
bryan@opnfv-admin:~$ curl -u admin:admin --header 'Tenant: default_tenant' http://$manager_ip/api/v3.1/node-instances | jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  4712  100  4712    0     0  31106      0 --:--:-- --:--:-- --:--:-- 31205
{
  "items": [
    {
      "relationships": [],
      "tenant_name": "default_tenant",
      "runtime_properties": {},
      "created_by": "admin",
      "private_resource": false,
      "state": "deleted",
      "version": 12,
      "host_id": null,
      "deployment_id": "k8s-hello-world",
      "scaling_groups": [],
      "id": "kubernetes_master_mg8s8d",
      "node_id": "kubernetes_master"
    },
    {
      "relationships": [
        {
          "target_name": "kubernetes_master",
          "type": "cloudify.kubernetes.relationships.managed_by_master",
          "target_id": "kubernetes_master_mg8s8d"
        }
      ],
      "tenant_name": "default_tenant",
      "runtime_properties": {
        "kubernetes": {
          "status": {
            "pod_ip": null,
            "start_time": null,
            "host_ip": null,
            "reason": null,
            "conditions": null,
            "phase": "Pending",
            "message": null,
            "container_statuses": null
          },
          "kind": "Pod",
          "spec": {
            "hostname": null,
            "security_context": {
              "run_as_non_root": null,
              "supplemental_groups": null,
              "se_linux_options": null,
              "fs_group": null,
              "run_as_user": null
            },
            "image_pull_secrets": null,
            "service_account_name": "default",
            "termination_grace_period_seconds": 30,
            "restart_policy": "Always",
            "active_deadline_seconds": null,
            "node_name": null,
            "dns_policy": "ClusterFirst",
            "host_ipc": null,
            "service_account": "default",
            "host_pid": null,
            "host_network": null,
            "subdomain": null,
            "volumes": [
              {
                "gce_persistent_disk": null,
                "persistent_volume_claim": null,
                "azure_disk": null,
                "glusterfs": null,
                "flocker": null,
                "aws_elastic_block_store": null,
                "azure_file": null,
                "iscsi": null,
                "flex_volume": null,
                "secret": {
                  "items": null,
                  "default_mode": 420,
                  "secret_name": "default-token-n6xd9"
                },
                "photon_persistent_disk": null,
                "empty_dir": null,
                "fc": null,
                "cinder": null,
                "git_repo": null,
                "rbd": null,
                "name": "default-token-n6xd9",
                "vsphere_volume": null,
                "config_map": null,
                "quobyte": null,
                "nfs": null,
                "host_path": null,
                "cephfs": null,
                "downward_api": null
              }
            ],
            "containers": [
              {
                "image_pull_policy": "IfNotPresent",
                "tty": null,
                "stdin": null,
                "stdin_once": null,
                "name": "nginx",
                "env": null,
                "image": "nginx:1.7.9",
                "args": null,
                "volume_mounts": [
                  {
                    "read_only": true,
                    "sub_path": null,
                    "mount_path": "/var/run/secrets/kubernetes.io/serviceaccount",
                    "name": "default-token-n6xd9"
                  }
                ],
                "lifecycle": null,
                "working_dir": null,
                "command": null,
                "security_context": null,
                "termination_message_path": "/dev/termination-log",
                "readiness_probe": null,
                "liveness_probe": null,
                "ports": [
                  {
                    "host_port": null,
                    "container_port": 80,
                    "protocol": "TCP",
                    "host_ip": null,
                    "name": null
                  }
                ],
                "resources": {
                  "requests": null,
                  "limits": null
                }
              }
            ],
            "node_selector": null
          },
          "api_version": "v1",
          "metadata": {
            "uid": "9d493baf-c434-11e7-a57c-b8ca3a5ecb7c",
            "owner_references": null,
            "generation": null,
            "namespace": "default",
            "labels": {
              "app": "nginx"
            },
            "generate_name": null,
            "deletion_timestamp": null,
            "cluster_name": null,
            "finalizers": null,
            "deletion_grace_period_seconds": null,
            "self_link": "/api/v1/namespaces/default/pods/nginx",
            "resource_version": "4010",
            "creation_timestamp": "2017-11-08T03:26:31Z",
            "annotations": null,
            "name": "nginx"
          }
        }
      },
      "created_by": "admin",
      "private_resource": false,
      "state": "deleted",
      "version": 13,
      "host_id": null,
      "deployment_id": "k8s-hello-world",
      "scaling_groups": [],
      "id": "nginx_pod_9rszi7",
      "node_id": "nginx_pod"
    },
    {
      "relationships": [
        {
          "target_name": "kubernetes_master",
          "type": "cloudify.kubernetes.relationships.managed_by_master",
          "target_id": "kubernetes_master_mg8s8d"
        },
        {
          "target_name": "nginx_pod",
          "type": "cloudify.relationships.depends_on",
          "target_id": "nginx_pod_9rszi7"
        }
      ],
      "tenant_name": "default_tenant",
      "runtime_properties": {
        "kubernetes": {
          "status": {
            "load_balancer": {
              "ingress": null
            }
          },
          "kind": "Service",
          "spec": {
            "cluster_ip": "10.97.1.59",
            "external_i_ps": null,
            "load_balancer_ip": null,
            "external_name": null,
            "deprecated_public_i_ps": null,
            "load_balancer_source_ranges": null,
            "selector": {
              "app": "nginx"
            },
            "type": "NodePort",
            "ports": [
              {
                "protocol": "TCP",
                "node_port": 31569,
                "target_port": 80,
                "port": 80,
                "name": null
              }
            ],
            "session_affinity": "None"
          },
          "api_version": "v1",
          "metadata": {
            "uid": "9fc67f6e-c434-11e7-a57c-b8ca3a5ecb7c",
            "owner_references": null,
            "generation": null,
            "namespace": "default",
            "labels": null,
            "generate_name": null,
            "deletion_timestamp": null,
            "cluster_name": null,
            "finalizers": null,
            "deletion_grace_period_seconds": null,
            "self_link": "/api/v1/namespaces/default/services/nginx-service",
            "resource_version": "4027",
            "creation_timestamp": "2017-11-08T03:26:35Z",
            "annotations": null,
            "name": "nginx-service"
          }
        }
      },
      "created_by": "admin",
      "private_resource": false,
      "state": "deleting",
      "version": 19,
      "host_id": null,
      "deployment_id": "k8s-hello-world",
      "scaling_groups": [],
      "id": "nginx_service_w0nyub",
      "node_id": "nginx_service"
    }
  ],
  "metadata": {
    "pagination": {
      "total": 3,
      "offset": 0,
      "size": 0
    }
  }
}
bryan@opnfv-admin:~$ 

Trammell -

unread,
Nov 8, 2017, 12:43:45 PM11/8/17
to cloudify-users
I see that one node is in state "deleting", which is different than deleted. Most certainly something happened in the delete operation that caused the operation to fail, or get stuck, before the state in the Cloudify API was updated.


```
      "created_by": "admin",
      "private_resource": false,
      "state": "deleting",
      "version": 19,
      "host_id": null,
      "deployment_id": "k8s-hello-world",
      "scaling_groups": [],
      "id": "nginx_service_w0nyub",
      "node_id": "nginx_service"
```

You can update the node-instance state in the API, see: http://docs.getcloudify.org/api/v3/#update-node-instance.

Why deployment isn't getting updated is another problem. 

Reply all
Reply to author
Forward
0 new messages