ISSUE "no nodes available to schedule pods" WHEN submitting jobs using "kubectl"

1,353 views
Skip to first unread message

bhabani nayak

unread,
Jul 6, 2017, 6:16:18 PM7/6/17
to Seldon Users, Manoranjan Jena
Hello,

I have installed seldon on my linux machine to explore the recommendation example (Movielens 100K) and 
facing an issue of "ml100k-import-6wt31" pod not being started with warning "FailedScheduling" warning 
with message "no nodes available to schedule pods". 

I do not think it is a memory problem as the system has sufficient free memory. I suspect that the 
pods and services inside minikube node are not able to communicate with each other due to DNS or ip 
issue. 

Appreciate your help to fix this. Below are the steps and the logs.

Logs attached:
- kubectl describe pod ("ml100k-import-6wt31")
- minikube nodes
- minikube pods
- minikube services
- minikube jobs
- minikube service url
- curl response from seldon server
- nslookup output
- Linux system information
- seldon Server log


What I have done so far

1) Installed seldon by following the document http://docs.seldon.io/install.html

1.1) For a single machine exploration i installed minikube.

2) Started minikube with 12GB of memory
minikube start --memory=12000

3) Trying to setup recommendation "Movielens 100K Worked Example" by following the document http://docs.seldon.io/ml100k.html 
(kubectl create -f ml100k-import.json)

4) Try to search for the fix in the internet, but no luck.


------------------------------- ISSUE -------------------------------

When I create the kubernetes job (kubectl create -f ml100k-import.json) to download movielens data, I see "FailedScheduling" warning with message "no nodes available to schedule pods". Though minikube node is available. The ml100k pod is still in pending state for 20h.

ml100k-import-6wt31                          0/1       Pending   0          20h



---------- kubectl describe pod ml100k-import-6wt31 ------------

root@sprod:~# kubectl describe pod ml100k-import-6wt31
Name: ml100k-import-6wt31
Namespace: default
Node: <none>
Labels: controller-uid=5c869f84-61e4-11e7-bc4b-525400d2a1fd
job-name=ml100k-import
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"Job","namespace":"default","name":"ml100k-import","uid":"5c869f84-61e4-11e7-bc4b-525400d2a1fd","apiVersion...
Status: Pending
IP:
Created By: Job/ml100k-import
Containers:
  ml100k-create:
    Image: seldonio/examples-ml100k:2.2.5_v2
    Port: <none>
    Command:
      /create_ml100k_recommender.sh
    Environment:
      GRAFANA_ADMIN_PASSWORD: <set to the key 'grafana-admin-password.txt' in secret 'grafana-admin-password'> Optional: false
    Mounts:
      /seldon-data from data-volume (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-jh0v8 (ro)
Conditions:
  Type Status
  PodScheduled False
Volumes:
  data-volume:
    Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName: seldon-claim
    ReadOnly: false
  default-token-jh0v8:
    Type: Secret (a volume populated by a Secret)
    SecretName: default-token-jh0v8
    Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: <none>
Events:
  FirstSeen LastSeen Count From SubObjectPath Type Reason Message
  --------- -------- ----- ---- ------------- -------- ------ -------
  20h 9s 4175 default-scheduler Warning FailedScheduling no nodes available to schedule pods
  


---------- minikube nodes ------------

root@sprod:~# kubectl get nodes --output=wide
NAME       STATUS    AGE       VERSION   EXTERNAL-IP   OS-IMAGE            KERNEL-VERSION
minikube   Ready     21h       v1.6.4    <none>        Buildroot 2017.02   4.9.13



---------- minikube pods ------------

root@sprod:~#  kubectl get pods --output=wide
NAME                                         READY     STATUS    RESTARTS   AGE       IP            NODE
influxdb-grafana-842592602-mqznp             2/2       Running   0          21h       172.17.0.12   minikube
kafka-controller-1424591021-jqxpk            1/1       Running   0          21h       172.17.0.13   minikube
kafka-stream-impressions-433453233-4vg67     1/1       Running   0          21h       172.17.0.20   minikube
kafka-stream-predictions-398714225-v0m8m     1/1       Running   0          21h       172.17.0.21   minikube
memcached1-2136693305-hpjrh                  1/1       Running   0          21h       172.17.0.5    minikube
memcached2-2533120572-q04wm                  1/1       Running   0          21h       172.17.0.6    minikube
ml100k-import-6wt31                          0/1       Pending   0          21h       <none>        <none>
mysql-3966129362-jh1fr                       1/1       Running   0          21h       172.17.0.4    minikube
redis-1963070708-qtf5w                       1/1       Running   0          21h       172.17.0.7    minikube
seldon-control-2707388371-j3xxb              1/1       Running   0          21h       172.17.0.11   minikube
seldon-server-3494098190-tn6kh               3/3       Running   0          21h       172.17.0.19   minikube
spark-master-controller-3720462731-rbcqd     1/1       Running   0          21h       172.17.0.15   minikube
spark-ui-proxy-controller-1688034969-v0q8w   2/2       Running   0          21h       172.17.0.18   minikube
spark-worker-controller-3381690000-5jvqq     1/1       Running   0          21h       172.17.0.17   minikube
spark-worker-controller-3381690000-5l27w     1/1       Running   0          21h       172.17.0.16   minikube
td-agent-server-3988194731-c3j6k             1/1       Running   0          21h       172.17.0.14   minikube
zookeeper1-467704625-gk8xf                   1/1       Running   0          21h       172.17.0.9    minikube
zookeeper2-1006738229-c35f9                  1/1       Running   0          21h       172.17.0.8    minikube
zookeeper3-1545771833-8p7lr                  1/1       Running   0          21h       172.17.0.10   minikube



---------- minikube services ------------

root@sprod:~# kubectl get services --output=wide
NAME                  CLUSTER-IP   EXTERNAL-IP   PORT(S)                       AGE       SELECTOR
kafka-service         10.0.0.149   <nodes>       9092:30010/TCP                21h       app=kafka
kubernetes            10.0.0.1     <none>        443/TCP                       21h       <none>
memcached1            10.0.0.197   <none>        11211/TCP                     21h       name=memcached1
memcached2            10.0.0.191   <none>        11211/TCP                     21h       name=memcached2
monitoring-grafana    10.0.0.189   <pending>     80:30002/TCP                  21h       name=influxGrafana
monitoring-influxdb   10.0.0.221   <none>        8083/TCP,8086/TCP             21h       name=influxGrafana
mysql                 10.0.0.21    <none>        3306/TCP                      21h       name=mysql
redis                 10.0.0.207   <none>        6379/TCP                      21h       name=redis
seldon-server         10.0.0.254   <nodes>       80:30015/TCP,5000:30017/TCP   21h       name=seldon-server
spark-master          10.0.0.102   <none>        7077/TCP                      21h       component=spark-master
spark-ui-proxy        10.0.0.22    <pending>     8000:30005/TCP                21h       component=spark-ui-proxy
spark-webui           10.0.0.83    <none>        8080/TCP                      21h       component=spark-master
td-agent-server       10.0.0.93    <none>        24224/TCP,24224/UDP           21h       name=td-agent-server
zookeeper-1           10.0.0.223   <none>        2181/TCP,2888/TCP,3888/TCP    21h       server-id=1
zookeeper-2           10.0.0.68    <none>        2181/TCP,2888/TCP,3888/TCP    21h       server-id=2
zookeeper-3           10.0.0.134   <none>        2181/TCP,2888/TCP,3888/TCP    21h       server-id=3



---------- minikube jobs ------------

root@sprod:~# kubectl get jobs
NAME            DESIRED   SUCCESSFUL   AGE
ml100k-import   1         0            20h



---------- minikube service url ------------

No url except for seldon server

root@sprod:~# minikube service --url seldon-server

minikube service --url spark-master
-- no output



---------- curl response from seldon server ------------

When i try to curl to spark master or zookeepr (using their endpoint address) from seldon server, 
i get empty reply from server. 

root@sprod:~# kubectl exec -ti seldon-control-2707388371-j3xxb -- /bin/bash
root@seldon-control-2707388371-j3xxb:/home/seldon# curl 172.17.0.15:7077
curl: (52) Empty reply from server

However, to google.com, it gets some response.

root@seldon-control-2707388371-j3xxb:/home/seldon# curl google.com
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="http://www.google.com/">here</A>.
</BODY></HTML>



----------nslookup output------------

root@sprod:~# kubectl exec -ti seldon-control-2707388371-j3xxb -- /bin/bash
root@seldon-control-2707388371-j3xxb:/home/seldon# nslookup zookeeper-1
Server: 10.0.0.10
Address: 10.0.0.10#53

Name: zookeeper-1.default.svc.cluster.local
Address: 10.0.0.223



---------- Linux system information ------------

root@sprod:~# cat /etc/*release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.10
DISTRIB_CODENAME=yakkety
DISTRIB_DESCRIPTION="Ubuntu 16.10"
NAME="Ubuntu"
VERSION="16.10 (Yakkety Yak)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.10"
VERSION_ID="16.10"
VERSION_CODENAME=yakkety
UBUNTU_CODENAME=yakkety

root@sprod:~# uname -a
Linux sprod 4.8.0-58-generic #63-Ubuntu SMP Mon Jun 26 17:08:21 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

root@sprod:~# free -m
              total        used        free      shared  buff/cache   available
Mem:          64379       13037       29760         664       21581       50025
Swap:         65488           0       65488



---------- seldon Server log ------------

root@sprod:~# kubectl logs seldon-server-3494098190-tn6kh seldon-server
Jul 06, 2017 12:12:16 AM org.apache.catalina.startup.SetAllPropertiesRule begin
WARNING: [SetAllPropertiesRule]{Server/Service/Connector} Setting property 'maxSpareThreads' to '100' did not find a matching property.
Jul 06, 2017 12:12:16 AM org.apache.catalina.core.AprLifecycleListener lifecycleEvent
INFO: The APR based Apache Tomcat Native library which allows optimal performance in production environments was not found on the java.library.path: /usr/java/packages/lib/amd64:/usr/lib/x86_64-linux-gnu/jni:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/lib/jni:/lib:/usr/lib
Jul 06, 2017 12:12:17 AM org.apache.coyote.AbstractProtocol init
INFO: Initializing ProtocolHandler ["http-bio-8080"]
Jul 06, 2017 12:12:17 AM org.apache.coyote.AbstractProtocol init
INFO: Initializing ProtocolHandler ["ajp-bio-8009"]
Jul 06, 2017 12:12:17 AM org.apache.catalina.startup.Catalina load
INFO: Initialization processed in 1772 ms
Jul 06, 2017 12:12:17 AM org.apache.catalina.core.StandardService startInternal
INFO: Starting service Catalina
Jul 06, 2017 12:12:17 AM org.apache.catalina.core.StandardEngine startInternal
INFO: Starting Servlet Engine: Apache Tomcat/7.0.78
Jul 06, 2017 12:12:17 AM org.apache.catalina.startup.HostConfig deployDirectory
INFO: Deploying web application directory /usr/local/tomcat/webapps/examples
Jul 06, 2017 12:12:18 AM org.apache.catalina.startup.HostConfig deployDirectory
INFO: Deployment of web application directory /usr/local/tomcat/webapps/examples has finished in 910 ms
Jul 06, 2017 12:12:18 AM org.apache.catalina.startup.HostConfig deployDirectory
INFO: Deploying web application directory /usr/local/tomcat/webapps/ROOT
Jul 06, 2017 12:12:18 AM org.apache.catalina.loader.WebappClassLoaderBase validateJarFile
INFO: validateJarFile(/usr/local/tomcat/webapps/ROOT/WEB-INF/lib/servlet-api-2.5-20081211.jar) - jar not loaded. See Servlet Spec 3.0, section 10.7.2. Offending class: javax/servlet/Servlet.class
Jul 06, 2017 12:12:18 AM org.apache.catalina.loader.WebappClassLoaderBase validateJarFile
INFO: validateJarFile(/usr/local/tomcat/webapps/ROOT/WEB-INF/lib/servlet-api-2.5.jar) - jar not loaded. See Servlet Spec 3.0, section 10.7.2. Offending class: javax/servlet/Servlet.class
Jul 06, 2017 12:12:31 AM org.apache.catalina.startup.TldConfig execute
INFO: At least one JAR was scanned for TLDs yet contained no TLDs. Enable debug logging for this logger for a complete list of JARs that were scanned but no TLDs were found in them. Skipping unneeded JARs during scanning can improve startup time and JSP compilation time.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/seldon/ROOT/WEB-INF/lib/slf4j-jcl-1.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/seldon/ROOT/WEB-INF/lib/slf4j-log4j12-1.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
2017-07-06 00:12:45.546 INFO net.spy.memcached.MemcachedConnection:  Added {QA sa=memcached1/10.0.0.197:11211, #Rops=0, #Wops=0, #iq=0, topRop=null, topWop=null, toWrite=0, interested=0} to connect queue
2017-07-06 00:12:45.547 INFO net.spy.memcached.MemcachedConnection:  Added {QA sa=memcached2/10.0.0.191:11211, #Rops=0, #Wops=0, #iq=0, topRop=null, topWop=null, toWrite=0, interested=0} to connect queue
2017-07-06 00:12:45.551 INFO net.spy.memcached.MemcachedConnection:  Added {QA sa=memcached1/10.0.0.197:11211, #Rops=0, #Wops=0, #iq=0, topRop=null, topWop=null, toWrite=0, interested=0} to connect queue
2017-07-06 00:12:45.552 INFO net.spy.memcached.MemcachedConnection:  Added {QA sa=memcached2/10.0.0.191:11211, #Rops=0, #Wops=0, #iq=0, topRop=null, topWop=null, toWrite=0, interested=0} to connect queue
2017-07-06 00:12:45.555 INFO net.spy.memcached.MemcachedConnection:  Connection state changed for sun.nio.ch.SelectionKeyImpl@43ce1ee7
2017-07-06 00:12:45.556 INFO net.spy.memcached.MemcachedConnection:  Connection state changed for sun.nio.ch.SelectionKeyImpl@7e231328
2017-07-06 00:12:45.561 INFO net.spy.memcached.MemcachedConnection:  Connection state changed for sun.nio.ch.SelectionKeyImpl@63971701
2017-07-06 00:12:45.564 INFO net.spy.memcached.MemcachedConnection:  Connection state changed for sun.nio.ch.SelectionKeyImpl@38fb0801
Jul 06, 2017 12:12:50 AM org.apache.catalina.startup.HostConfig deployDirectory
INFO: Deployment of web application directory /usr/local/tomcat/webapps/ROOT has finished in 32,324 ms
Jul 06, 2017 12:12:50 AM org.apache.catalina.startup.HostConfig deployDirectory
INFO: Deploying web application directory /usr/local/tomcat/webapps/host-manager
Jul 06, 2017 12:12:50 AM org.apache.catalina.startup.HostConfig deployDirectory
INFO: Deployment of web application directory /usr/local/tomcat/webapps/host-manager has finished in 176 ms
Jul 06, 2017 12:12:50 AM org.apache.catalina.startup.HostConfig deployDirectory
INFO: Deploying web application directory /usr/local/tomcat/webapps/manager
Jul 06, 2017 12:12:50 AM org.apache.catalina.startup.HostConfig deployDirectory
INFO: Deployment of web application directory /usr/local/tomcat/webapps/manager has finished in 98 ms
Jul 06, 2017 12:12:50 AM org.apache.catalina.startup.HostConfig deployDirectory
INFO: Deploying web application directory /usr/local/tomcat/webapps/docs
Jul 06, 2017 12:12:50 AM org.apache.catalina.startup.HostConfig deployDirectory
INFO: Deployment of web application directory /usr/local/tomcat/webapps/docs has finished in 61 ms
Jul 06, 2017 12:12:50 AM org.apache.coyote.AbstractProtocol start
INFO: Starting ProtocolHandler ["http-bio-8080"]
Jul 06, 2017 12:12:51 AM org.apache.coyote.AbstractProtocol start
INFO: Starting ProtocolHandler ["ajp-bio-8009"]
Jul 06, 2017 12:12:51 AM org.apache.catalina.startup.Catalina start
INFO: Server startup in 33878 ms


Thanks,
Bhabani
Reply all
Reply to author
Forward
0 new messages