CDAP on Kubernetes using latest cdap-operator

40 views
Skip to first unread message

Tuomas D

unread,
Nov 16, 2022, 3:52:41 AM11/16/22
to CDAP User

Hi CDAP Users,

I'm trying to install CDAP on Kubernetes using cdap-operator but having some challenges. I'm using following resources as references.

I'm trying this on my local system with following setup

  • Rancher Desktop running Kubernetes 1.24.1
  • Elastic Search, Hadoop (HDFS), PostgresSQL up & running on local system
  • See steps I have taken to setup CDAP at the end of this message

1. cdap-operator fails to start when build from latest develop branch

When I deploy manager.yaml pod will not start due to error

  • "container has runAsNonRoot and image will run as root (pod: "controller-manager-647976859b-w4tbx_system(186773f5-27ff-4c13-864f-243f62dc638e)"

After changing runAsNonRoot in manager.yaml to false get past above error but then livenessProbe and readinessProbe fail and pod does not remain in running state

I'm I missing some step?

controller:latest has not been updated in long time so maybe that's problem?

2. pipeline studio and few other pods not getting created

After removing livenessProbe and readinessProbe from manager.yaml I was able to create cluster with 6.7.1 and latest CDAP builds.

When accessing CDAP UI I noticed that not all services were created. Following show up as red in System Admin UI and I don't see any corresponding pods created in Kubernetes

  • Delta Assessor (does not show up in CDAP UI sometimes as a service)
  • Pipeline Studio
  • Wrangler Service

When using Pipeline Studio UI it shows errors with pipeline studio service communication (expected as service is not created).

Is there some configuration that could be missing and these services not getting created?

See list of pods that are getting started below in setup steps.

3. Up to date details on CDAP on Kubernetes, configurations, roadmap, etc

While looking through and trying out referenced instructions they look to be out dated, for example sample YAMLs in CDAP documentation do not work as they are referencing to older API versions.

There is also section about limitations and not sure that applies anymore?

I also see that cdap-operator project is in alpha and not much activity in git repository, is there plans to move this to stable release?

Is there any more detailed documentation on cdap-operator and CDAP configuration with operator (reading through 40k+ long YAML or template files is hard)?

Details on setup steps I have followed

  1. Manually update local cdap-operator Makefile with --server-side --force-conflicts options
    • extra options from from pull request #84
  2. make install
  3. make run
  4. kubectl create namespace system
  5. kubectl apply -k config/crd  --server-side --force-conflicts 
    • added same options as in step 1 to remove errors due to large CRD
  6. find config/rbac -type f -not -name "kustomization.yaml" -exec kubectl apply -f {} \;
  7. kubectl apply -f config/manager/manager.yaml
  8. create cdap-secret as Secret in kubernetes
  9. kubectl create serviceaccount cdap
  10. kubectl create clusterrolebinding cdap --clusterrole=edit --serviceaccount=default:cdap
  11. kubectl apply -f config/samples/cdap_v1alpha1_cdapmaster.yaml
    • changed cdap_v1alpha1_cdapmaster.yaml with my system config (elastic search, hdfs, postgres, )

This creates following CDAP resources in Kubernetes

NAMESPACE        NAME                                                                   READY   STATUS      RESTARTS        AGE
default          pod/cdap-cdap-userinterface-7c78798b7d-wvk6b   1/1     Running     1 (14h ago)       15h
default          pod/cdap-cdap-runtime-0                                        1/1     Running     0                       15h
default          pod/cdap-cdap-metadata-74f8fd48f5-rmvgg           1/1     Running     0                       15h
default          pod/cdap-cdap-messaging-0                                    1/1     Running     0                       15h
default          pod/cdap-cdap-metrics-0                                         1/1     Running     0                       15h
default          pod/cdap-cdap-appfabric-0                                      1/1     Running     0                       15h
default          pod/cdap-cdap-logs-0                                              1/1     Running     0                       15h
system          pod/controller-manager-fd48db776-zl9z4               1/1     Running     0                       15h
default          pod/cdap-cdap-router-6764f8495f-dbmxf               1/1     Running     0                       15h
default          pod/cdap-cdap-preview-0                                         1/1     Running     0                       15h

NAMESPACE        NAME                                                         TYPE           CLUSTER-IP      EXTERNAL-IP     PORT(S)                      AGE
system          service/controller-manager-metrics-service   ClusterIP      10.43.119.84    <none>          8443/TCP                     15h
default          service/cdap-cdap-router                               NodePort     10.43.124.122   <none>         11015:31273/TCP        15h
default          service/cdap-cdap-userinterface                     NodePort     10.43.4.170     <none>          11011:31525/TCP         15h
default          service/cdap-cdap-metadata-service              ClusterIP      10.43.230.152   <none>        46845/TCP                    15h
default          service/cdap-cdap-messaging-service            ClusterIP      10.43.123.23    <none>         39215/TCP                    15h
default          service/cdap-cdap-runtime                            ClusterIP      10.43.138.54    <none>          41909/TCP                    15h
default          service/cdap-cdap-metrics-processor            ClusterIP      10.43.143.126   <none>         41893/TCP                    15h
default          service/cdap-cdap-metrics                             ClusterIP      10.43.46.165    <none>          38593/TCP                    15h
default          service/cdap-cdap-log-saver                         ClusterIP      10.43.36.123    <none>           35083/TCP                    15h
default          service/cdap-cdap-log-query                         ClusterIP      10.43.17.234    <none>           40523/TCP                    15h
default          service/cdap-cdap-dataset-executor              ClusterIP      10.43.127.210   <none>          42295/TCP                    15h
default          service/cdap-cdap-preview                            ClusterIP      10.43.176.141   <none>           37433/TCP                    15h
default          service/cdap-cdap-appfabric                          ClusterIP      10.43.231.144   <none>          41805/TCP                    15h
default          service/cdap-cdap-secure-store-service        ClusterIP      10.43.133.173   <none>          41805/TCP                    15h
default          service/cdap-cdap-dataset-service                ClusterIP      10.43.61.129    <none>            36075/TCP                    15h

NAMESPACE     NAME                                                      READY   UP-TO-DATE   AVAILABLE   AGE
system        deployment.apps/controller-manager         1/1        1                    1           15h
default       deployment.apps/cdap-cdap-metadata        1/1       1                    1           15h
default       deployment.apps/cdap-cdap-router             1/1       1                    1           15h
default       deployment.apps/cdap-cdap-userinterface  1/1       1                    1           15h

NAMESPACE     NAME                                                                         DESIRED   CURRENT   READY   AGE
default       replicaset.apps/cdap-cdap-userinterface-7c78798b7d   1              1         1       15h
default       replicaset.apps/cdap-cdap-metadata-74f8fd48f5          1               1         1       15h
system       replicaset.apps/controller-manager-fd48db776             1              1         1       15h
default       replicaset.apps/cdap-cdap-router-6764f8495f              1               1         1       15h

NAMESPACE        NAME                                                 READY   AGE
default          statefulset.apps/cdap-cdap-runtime       1/1     15h
default          statefulset.apps/cdap-cdap-messaging   1/1     15h
default          statefulset.apps/cdap-cdap-metrics        1/1     15h
default          statefulset.apps/cdap-cdap-appfabric     1/1     15h
default          statefulset.apps/cdap-cdap-logs             1/1     15h
default          statefulset.apps/cdap-cdap-preview        1/1     15h

Thanks for your help. Tuomas

UI_Services.jpg

Arjan Bal

unread,
Nov 16, 2022, 5:24:27 AM11/16/22
to cdap...@googlegroups.com
Hi Tuomas,
The Confluence Docs don't mention applying manager.yaml. I think applying manager.yaml caused issue number 1.
For issue 2, can you try deploying CDAP on a cluster with k8s version <= 1.22? We use k8s java client library v12 in CDAP. From k8s 1.23 onwards, v12 is not supported and we haven't tested CDAP on versions > k8s 1.22.

Regards,
Arjan

--
You received this message because you are subscribed to the Google Groups "CDAP User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cdap-user+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cdap-user/4edecd34-bb17-4e8b-a735-66a6cc18dd8an%40googlegroups.com.

Tuomas

unread,
Nov 17, 2022, 11:37:06 AM11/17/22
to cdap...@googlegroups.com

Thanks for the info Arjan.

I was able to create working setup following Confluence Docs with Kubernetes 1.21 version.

With some local changes I was also able to get working setup following CDAP in Kubernetes Deployment Guide with Kubernetes 1.24 version

  • this and CDAP-Operator Git README is where manager.yaml is also mentioned in steps
  • I noticed someone was updating cdap-operator image today and with latest image livenessProbe and readinessProbe now work. Still had to remove --leader-elect argument from manager.yaml

Issue with Pipeline Studio pod not starting was related to pipeline jar artifact not being available in HDFS. It was not automatically uploaded, likely because something was not consistent in postgres db and hdfs, after resetting db and hdfs it started working. I was able to find info in Fabric logs to help in debugging this.

If there is some more detailed system and configuration document would be great to read though it. I was able to find some configuration options, like changing master kubernetes namespace, in https://github.com/cdapio/cdap/blob/develop/cdap-kubernetes/src/main/java/io/cdap/cdap/master/environment/k8s/KubeMasterEnvironment.java

- Tuomas

You received this message because you are subscribed to a topic in the Google Groups "CDAP User" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/cdap-user/U3SrPdPO0MU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to cdap-user+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cdap-user/CAL7ZupD8GTh1L930TSXz2bGcDzoWTOO20iA1itcPczyeF3K1oA%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages