Kubernetes Plugin 1.2 Agents unable to establish connection to master, but log they can

2,145 views
Skip to first unread message

Brandon

unread,
Feb 20, 2018, 12:02:22 AM2/20/18
to jenkins...@googlegroups.com

Hello, I am trying to use the community helm chart for Jenkins and the Kubernetes plugin 1.2 so that agents can be dynamically provisioned according to build needs.

Ultimately, builds are failing because the  agents are not able to connect to the master. However, I am seeing that in both the agents and master log that they believe they have established a connection. What is going on? How can I debug this further.

This is using the default agent container and doing service discovery for a Jenkins URL and tunnel according to the services in the jenkins namespace.

k get svc -n jenkins
NAME                    TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
jenkins-jenkins         NodePort    10.233.45.129   <none>        8080:32080/TCP   7d
jenkins-jenkins-agent   ClusterIP   10.233.11.112   <none>        50000/TCP        7d


#master log
INFO: Created Pod: default-37fj9 in namespace jenkins
Feb 12, 2018 11:16:32 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch
INFO: Waiting for Pod to be scheduled (0/100): default-37fj9
Feb 12, 2018 11:16:38 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch
INFO: Waiting for agent to connect (0/100): default-37fj9
Feb 12, 2018 11:16:39 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch
INFO: Waiting for agent to connect (1/100): default-37fj9
Feb 12, 2018 11:16:40 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch
INFO: Waiting for agent to connect (2/100): default-37fj9
Feb 12, 2018 11:16:41 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch
INFO: Waiting for agent to connect (3/100): default-37fj9
Feb 12, 2018 11:16:42 PM hudson.TcpSlaveAgentListener$ConnectionHandler run
INFO: Accepted JNLP4-connect connection #3 from /10.233.108.132:38682
Feb 12, 2018 11:16:42 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch
INFO: Waiting for agent to connect (4/100): default-37fj9
Feb 12, 2018 11:16:43 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch
INFO: Waiting for agent to connect (5/100): default-37fj9
Feb 12, 2018 11:16:44 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch
INFO: Waiting for agent to connect (6/100): default-37fj9



#agent logs
INFO: Locating server among [http://jenkins-jenkins.jenkins.svc.cluster.local:8080/]
Feb 12, 2018 11:16:41 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve
INFO: Remoting server accepts the following protocols: [JNLP4-connect, JNLP-connect, Ping, JNLP2-connect]
Feb 12, 2018 11:16:41 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Agent discovery successful
  Agent address: jenkins-jenkins-agent.jenkins.svc.cluster.local
  Agent port:    50000
  Identity:      5d:e6:82:6c:06:4a:ef:96:ae:47:09:b2:cc:58:e8:0a
Feb 12, 2018 11:16:41 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Handshaking
Feb 12, 2018 11:16:41 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Connecting to jenkins-jenkins-agent.jenkins.svc.cluster.local:50000
Feb 12, 2018 11:16:41 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Trying protocol: JNLP4-connect
Feb 12, 2018 11:16:43 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Remote identity confirmed: 5d:e6:82:6c:06:4a:ef:96:ae:47:09:b2:cc:58:e8:0a
Feb 12, 2018 11:16:55 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Connected

-- 
Brandon Pinske
Site Reliability Engineer
Rabb.it
Message has been deleted

bra...@rabb.it

unread,
Feb 20, 2018, 2:10:20 AM2/20/18
to Jenkins Users


bra...@rabb.it

unread,
Feb 21, 2018, 3:25:00 PM2/21/18
to Jenkins Users
I've downgraded to the 1.2 version of the plugin and gone through the debug steps listed here https://github.com/jenkinsci/kubernetes-plugin

No luck. 

Carlos Sanchez

unread,
Feb 22, 2018, 3:04:31 AM2/22/18
to Jenkins Users
That log suggests that the slave is not "online" for jenkins. If you go to the jenkins nodes page under /computer/ what do you see? maybe the slave is offline for some reason, lack of space or something

On Wed, Feb 21, 2018 at 9:25 PM, <bra...@rabb.it> wrote:
I've downgraded to the 1.2 version of the plugin and gone through the debug steps listed here https://github.com/jenkinsci/kubernetes-plugin

No luck. 

--
You received this message because you are subscribed to the Google Groups "Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/b17a41a6-25f2-4aff-906d-b53879d33f0e%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

bra...@rabb.it

unread,
Feb 22, 2018, 4:15:14 AM2/22/18
to Jenkins Users
I'm actually seeing now that the workers are being OOMkilled at some point. Which doesn't make any sense because all of my kubernetes workers have >50gb free and this pod only requires 256mb.

Name:         jnlp-1sh50
Namespace:    jenkins-test
Node:         kubeworker-rwva1-prod-10/10.0.0.217
Start Time:   Thu, 22 Feb 2018 01:08:20 -0800
Labels:       jenkins=slave
              jenkins/jenkins-test-jenkins-slave=true
Annotations:  <none>
Status:       Failed
IP:           10.233.115.31
Containers:
  jnlp:
    Container ID:  docker://4cf893a70ad3d074b9708f076baef9600b0abfa9d5726d41711ccb39adcc9814
    Image:         jenkins/jnlp-slave:3.10-1
    Image ID:      docker-pullable://jenkins/jnlp-slave@sha256:db1cb9e803fe2aeb440435cf0da4195b63685664c5976d2fefead839631d070e
    Port:          <none>
    Args:
      2200e9b3a7435e57a10c48570b0afe4210cbc87a910766503b5eace2d4f32a86
      jnlp-1sh50
    State:          Terminated
      Reason:       OOMKilled
      Exit Code:    137
      Started:      Thu, 22 Feb 2018 01:08:21 -0800
      Finished:     Thu, 22 Feb 2018 01:09:10 -0800
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:     200m
      memory:  256Mi
    Requests:
      cpu:     200m
      memory:  256Mi
    Environment:
      JENKINS_SECRET:  2200e9b3a7435e57a10c48570b0afe4210cbc87a910766503b5eace2d4f32a86
      JENKINS_TUNNEL:  jenkins-test-jenkins-agent:50000
      JENKINS_NAME:    jnlp-1sh50
      JENKINS_URL:     http://jenkins-test-jenkins:8080
      HOME:            /home/jenkins
    Mounts:
      /home/jenkins from workspace-volume (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-dtwdj (ro)
Conditions:
  Type           Status
  Initialized    True
  Ready          False
  PodScheduled   True
Volumes:
  workspace-volume:
    Type:    EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
  default-token-dtwdj:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-dtwdj
    Optional:    false
QoS Class:       Guaranteed
Node-Selectors:  <none>
Tolerations:     <none>
Events:
  Type    Reason                 Age   From                               Message
  ----    ------                 ----  ----                               -------
  Normal  Scheduled              2m    default-scheduler                  Successfully assigned jnlp-1sh50 to kubeworker-rwva1-prod-10
  Normal  SuccessfulMountVolume  2m    kubelet, kubeworker-rwva1-prod-10  MountVolume.SetUp succeeded for volume "workspace-volume"
  Normal  SuccessfulMountVolume  2m    kubelet, kubeworker-rwva1-prod-10  MountVolume.SetUp succeeded for volume "default-token-dtwdj"
  Normal  Pulled                 2m    kubelet, kubeworker-rwva1-prod-10  Container image "jenkins/jnlp-slave:3.10-1" already present on machine
  Normal  Created                2m    kubelet, kubeworker-rwva1-prod-10  Created container
  Normal  Started                2m    kubelet, kubeworker-rwva1-prod-10  Started container


bra...@rabb.it

unread,
Feb 22, 2018, 4:19:03 AM2/22/18
to Jenkins Users
Disk space and memory are all at an extremely low utilization for all of my hosts. So it's not real resource exhaustion. I'm thinking timeline wise it seems that OOMkilled state is possibly a result of kubernetes terminating the pod due to disconnection.


INFO: Excess workload after pending Spot instances: 1
Feb 22, 2018 9:05:10 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud provision
INFO: Template: Kubernetes Pod Template
Feb 22, 2018 9:05:10 AM okhttp3.internal.platform.Platform log
INFO: ALPN callback dropped: HTTP/2 is disabled. Is alpn-boot on the boot class path?
Feb 22, 2018 9:05:10 AM hudson.slaves.NodeProvisioner$StandardStrategyImpl apply
INFO: Started provisioning Kubernetes Pod Template from kubernetes with 1 executors. Remaining excess workload: 0
Feb 22, 2018 9:05:20 AM hudson.slaves.NodeProvisioner$2 run
INFO: Kubernetes Pod Template provisioning successfully completed. We have now 2 computer(s)
Feb 22, 2018 9:05:20 AM okhttp3.internal.platform.Platform log
INFO: ALPN callback dropped: HTTP/2 is disabled. Is alpn-boot on the boot class path?
Feb 22, 2018 9:05:20 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch
INFO: Created Pod: jnlp-t2c36 in namespace jenkins-test
Feb 22, 2018 9:05:20 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch
INFO: Waiting for Pod to be scheduled (0/100): jnlp-t2c36
Feb 22, 2018 9:05:26 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch
INFO: Waiting for Pod to be scheduled (1/100): jnlp-t2c36
Feb 22, 2018 9:05:32 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch
INFO: Waiting for Pod to be scheduled (2/100): jnlp-t2c36
Feb 22, 2018 9:05:38 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch
INFO: Waiting for Pod to be scheduled (3/100): jnlp-t2c36
Feb 22, 2018 9:05:44 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch
INFO: Waiting for Pod to be scheduled (4/100): jnlp-t2c36
Feb 22, 2018 9:05:47 AM hudson.model.Descriptor verifyNewInstance
WARNING: Father of ContainerEnvVar [getValue()=http://jenkins-test-jenkins:8080, getKey()=JENKINS_URL] and its getDescriptor() points to two different instances. Probably malplaced @Extension. See http://hudson.361315.n4.nabble.com/Help-Hint-needed-Post-build-action-doesn-t-stay-activated-td2308833.html
Feb 22, 2018 9:05:50 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch
INFO: Waiting for Pod to be scheduled (5/100): jnlp-t2c36
Feb 22, 2018 9:05:56 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch
INFO: Waiting for Pod to be scheduled (6/100): jnlp-t2c36
Feb 22, 2018 9:06:02 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch
INFO: Waiting for Pod to be scheduled (7/100): jnlp-t2c36
Feb 22, 2018 9:06:08 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch
INFO: Waiting for Pod to be scheduled (8/100): jnlp-t2c36
Feb 22, 2018 9:06:14 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch
INFO: Waiting for Pod to be scheduled (9/100): jnlp-t2c36
Feb 22, 2018 9:06:20 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch
INFO: Waiting for Pod to be scheduled (10/100): jnlp-t2c36
Feb 22, 2018 9:06:26 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch
INFO: Waiting for Pod to be scheduled (11/100): jnlp-t2c36
Feb 22, 2018 9:06:32 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch
INFO: Waiting for Pod to be scheduled (12/100): jnlp-t2c36
Feb 22, 2018 9:06:38 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch
INFO: Waiting for Pod to be scheduled (13/100): jnlp-t2c36
Feb 22, 2018 9:06:44 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch
INFO: Waiting for Pod to be scheduled (14/100): jnlp-t2c36
Feb 22, 2018 9:06:50 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch
INFO: Waiting for Pod to be scheduled (15/100): jnlp-t2c36
Feb 22, 2018 9:06:56 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch
INFO: Waiting for Pod to be scheduled (16/100): jnlp-t2c36
Feb 22, 2018 9:07:02 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch
INFO: Waiting for Pod to be scheduled (17/100): jnlp-t2c36
Feb 22, 2018 9:07:08 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch
INFO: Waiting for Pod to be scheduled (18/100): jnlp-t2c36
Feb 22, 2018 9:07:14 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch
INFO: Waiting for Pod to be scheduled (19/100): jnlp-t2c36
Feb 22, 2018 9:07:20 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch
INFO: Waiting for Pod to be scheduled (20/100): jnlp-t2c36
Feb 22, 2018 9:07:26 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch
INFO: Waiting for Pod to be scheduled (21/100): jnlp-t2c36
Feb 22, 2018 9:07:32 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch
INFO: Waiting for Pod to be scheduled (22/100): jnlp-t2c36
Feb 22, 2018 9:07:38 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch
INFO: Waiting for Pod to be scheduled (23/100): jnlp-t2c36
Feb 22, 2018 9:07:44 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch
INFO: Waiting for Pod to be scheduled (24/100): jnlp-t2c36
Feb 22, 2018 9:07:50 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch
INFO: Waiting for Pod to be scheduled (25/100): jnlp-t2c36
Feb 22, 2018 9:07:56 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch
INFO: Waiting for Pod to be scheduled (26/100): jnlp-t2c36
Feb 22, 2018 9:08:02 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch
INFO: Waiting for Pod to be scheduled (27/100): jnlp-t2c36
Feb 22, 2018 9:08:03 AM hudson.slaves.CloudRetentionStrategy check
INFO: Disconnecting jnlp-t2c36
Feb 22, 2018 9:08:03 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
INFO: Terminating Kubernetes instance for agent jnlp-t2c36

Carlos Sanchez

unread,
Feb 22, 2018, 5:59:49 AM2/22/18
to Jenkins Users
OOMKilled has nothing to do with the host resources, but the process in the container using more memory than available. 

--
You received this message because you are subscribed to the Google Groups "Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-users+unsubscribe@googlegroups.com.

anilk...@mwebware.com

unread,
Sep 3, 2018, 2:14:21 PM9/3/18
to Jenkins Users
Hi all, Can anyone please help with the following error
Error from server (Forbidden): pods is forbidden: User "system:anonymous" cannot list pods in the namespace "default"

Setup Kubernetes CLI (kubectl)
Options:
Kubernetes server endpoint -> server from config (ex: https://ip:6443)
Certificate of certificate authority -> authority data
Credentials  -> System Credentials (Is that correct ?)
I provided above details in Jenkins Kubernetes CLI Setup
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-use...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages