Installing Submariner with Helm on OpenShift 4.6

104 views
Skip to first unread message

Joe Grap

unread,
Mar 22, 2021, 6:16:36 PM3/22/21
to submariner-users
Hello,

I am following the installation guide for Submariner via Helm.  I have two on-prem OpenShift 4.6 clusters I am looking to link with Submariner.  These two clusters are located on the same subnet, and they do not have overlapping IPs.

I have not configured GlobalNet since the cluster IPs do not overlap, and I also disabled NAT, as I do not think there would be NAT'ing between the gateways.

After the installation of the Broker and Submariner Operator on Cluster A, I noticed that there were errors in the events related to the Lighthouse service account.  Lighthouse Agent and Lighthouse CoreDNS required the "submariner-lighthouse" service account to run.  To move past this for temporarily, I created this service account manually and provided it with permissions, and Lighthouse Agent and Lighthouse CoreDNS started coming up.

After this, I observed that Lighthouse Agent was encountering issues talking to the API server on startup for the serviceimports and restarting.  Looking at the logs, it is showing an error trusting the certificate for the API server.  I will attach the logs.

From this, I have a couple questions:
(1) Is there anything in the guide steps for Helm installation related to the service accounts to be created that I may have misconfigured?
(2)  To resolve the certificate issue, is there anything I can provide to the deployment?  If we needed to work around the issue temporarily, is there any way to disable certificate verification in Lighthouse Agent?

Thank you very much for your time.

Regards,
Joe
submariner-lighthouse-agent.log

Miguel Angel Ajo

unread,
Mar 26, 2021, 1:26:42 PM3/26/21
to Joe Grap, Steve Mattar, submariner-users
Hey Joe!

Sorry for the delay in our response, I was supposed to respond but I got distracted by our 0.9 items.

On Mon, Mar 22, 2021 at 11:16 PM Joe Grap <grap....@gmail.com> wrote:
Hello,

I am following the installation guide for Submariner via Helm.  I have two on-prem OpenShift 4.6 clusters I am looking to link with Submariner.  These two clusters are located on the same subnet, and they do not have overlapping IPs.

I have not configured GlobalNet since the cluster IPs do not overlap, and I also disabled NAT, as I do not think there would be NAT'ing between the gateways.

Ok, that makes sense. We are adding automatic nat/no-nat between endpoints in 0.9, hopefully that setting will go away at some point for now it will be the fall-back if the best configuration can't be determined.
 
After the installation of the Broker and Submariner Operator on Cluster A, I noticed that there were errors in the events related to the Lighthouse service account.  Lighthouse Agent and Lighthouse CoreDNS required the "submariner-lighthouse" service account to run.  To move past this for temporarily, I created this service account manually and provided it with permissions, and Lighthouse Agent and Lighthouse CoreDNS started coming up.


oh, that sounds like there is a problem with the service-account definition in helm. +Steve Mattar can you have an eye on that ^
let's open a bug to track it in the charts repo.
 
After this, I observed that Lighthouse Agent was encountering issues talking to the API server on startup for the serviceimports and restarting.  Looking at the logs, it is showing an error trusting the certificate for the API server.  I will attach the logs.

Hmm, and the submariner-gateway is ok?

The CA & Token is passed down from the operator config, can check via

     kubectl get Submariner submariner -n submariner-operator -o yaml

If the contents of the CA & Token parameters make sense?

I've seen also miss configuration sometimes in openshift, where the the kubeconfig has skipInsecureCertificates (or similar setting) so the FQDN of the api endpoints will not be checked on the certificate, but ... we don't have a setting for insecure certificates in submariner.


 

From this, I have a couple questions:
(1) Is there anything in the guide steps for Helm installation related to the service accounts to be created that I may have misconfigured?
 
I believe that part is a bug, we made changes to the service accounts lately and probably not well propagated into the helm charts, although CI seems to be passing. 
 
(2)  To resolve the certificate issue, is there anything I can provide to the deployment?  If we needed to work around the issue temporarily, is there any way to disable certificate verification in Lighthouse Agent?

There is not yet, I'm afraid, but we are considering adding it, can you help me by opening a bug on the submariner-operator?, we can propagate that setting down from there to submariner-gateway & lighthouse.


By looking at your logs I see:
 x509: certificate is valid for 172.30.0.1, not 10.5.57.171

which seems to be the IP of the internal endpoint of the API, that is weird.
 

Thank you very much for your time.


Sorry for the delay.

Did you have the opportunity to check with subctl? I wonder if it's any different for your specific openshift deployment in terms of the certificates. 
 

Regards,
Joe

--
You received this message because you are subscribed to the Google Groups "submariner-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to submariner-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/submariner-users/ff462625-6f14-40a7-aaac-714992733d69n%40googlegroups.com.


--
Miguel Ángel Ajo  @mangel_ajo  
OpenShift / Kubernetes / Multi-cluster Networking team.
ex OSP / Networking DFG, OVN Squad Engineering


Daniel Farrell

unread,
Mar 26, 2021, 2:58:10 PM3/26/21
to Miguel Angel Ajo, Joe Grap, Steve Mattar, submariner-users
On Fri, Mar 26, 2021 at 1:26 PM Miguel Angel Ajo <majo...@redhat.com> wrote:
Hey Joe!

Sorry for the delay in our response, I was supposed to respond but I got distracted by our 0.9 items.

On Mon, Mar 22, 2021 at 11:16 PM Joe Grap <grap....@gmail.com> wrote:
Hello,

I am following the installation guide for Submariner via Helm.  I have two on-prem OpenShift 4.6 clusters I am looking to link with Submariner.  These two clusters are located on the same subnet, and they do not have overlapping IPs.

I have not configured GlobalNet since the cluster IPs do not overlap, and I also disabled NAT, as I do not think there would be NAT'ing between the gateways.

Ok, that makes sense. We are adding automatic nat/no-nat between endpoints in 0.9, hopefully that setting will go away at some point for now it will be the fall-back if the best configuration can't be determined.
 
After the installation of the Broker and Submariner Operator on Cluster A, I noticed that there were errors in the events related to the Lighthouse service account.  Lighthouse Agent and Lighthouse CoreDNS required the "submariner-lighthouse" service account to run.  To move past this for temporarily, I created this service account manually and provided it with permissions, and Lighthouse Agent and Lighthouse CoreDNS started coming up.


oh, that sounds like there is a problem with the service-account definition in helm. +Steve Mattar can you have an eye on that ^
let's open a bug to track it in the charts repo.

Do you have a version of the Helm charts with PR #125 (merged about two weeks ago)? I think that should have fixed the Lighthouse SA issue (which is why Helm CI is passing).

Thanks,
Daniel

Joe Grap

unread,
Mar 29, 2021, 3:12:45 PM3/29/21
to Daniel Farrell, Miguel Angel Ajo, Steve Mattar, submariner-users
Hello,

It is quite all right.  I understand you have other things needing your attention.  I appreciate you getting back to me.

Hmm, and the submariner-gateway is ok?

The CA & Token is passed down from the operator config, can check via

     kubectl get Submariner submariner -n submariner-operator -o yaml

If the contents of the CA & Token parameters make sense?

I've seen also miss configuration sometimes in openshift, where the the kubeconfig has skipInsecureCertificates (or similar setting) so the FQDN of the api endpoints will not be checked on the certificate, but ... we don't have a setting for insecure certificates in submariner

I checked the submariner-operator project for the gateway, and it does not appear that the DaemonSet for the submariner-gateway has any desired instances:
oc get daemonsets -n submariner-operator
NAME                    DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                AGE
submariner-gateway      0         0         0       0            0           submariner.io/gateway=true   134m
submariner-globalnet    0         0         0       0            0           submariner.io/gateway=true   134m
submariner-routeagent   0         0         0       0            0           <none>                       134m

I checked the contents of the CA and token, and they seem to make sense.  I didn't look closely at the certificates in the chain after decoding, though.  I did observe that I could not decode the CA with 'base64 -d', and I had to use 'base64 -di'.

I checked the kubeconfig as well, and I didn't see any parameters for skipping insecure certificates in the file.  I believe the certificates may be self-signed for the OpenShift clusters.  Would this cause issues like this?

There is not yet, I'm afraid, but we are considering adding it, can you help me by opening a bug on the submariner-operator?, we can propagate that setting down from there to submariner-gateway & lighthouse.

I have created the following issue in GitHub:  https://github.com/submariner-io/submariner-operator/issues/1198

Did you have the opportunity to check with subctl? I wonder if it's any different for your specific openshift deployment in terms of the certificates.

I did try subctl using https://submariner.io/operations/deployment/subctl/.  I ran into a little issue in the documentation saying that I can provide a repository for the 'deploy-broker' command, and it was not a valid option.  It is necessary that we can configure the location where we obtain the images, since we cannot directly access the internet.  I realize now that this might be an error in the documentation, and it might really be on the 'join' command that I provide the repository / image overrides.  I can try this again.

Do you have a version of the Helm charts with PR #125 (merged about two weeks ago)? I think that should have fixed the Lighthouse SA issue (which is why Helm CI is passing).

I may not have these merges.  I may have added the Helm repo just before this merge.  Let me update my Helm repositories, and I can try this again.

Regards, 
Joe

Joe Grap

unread,
Apr 8, 2021, 9:56:48 AM4/8/21
to Daniel Farrell, Miguel Angel Ajo, Steve Mattar, submariner-users
Hello,

A couple days ago, I was able to get Submariner deployed using subctl.  It appears that installation with subctl utilized the API server's hostname (api.<cluster domain>) for the broker endpoint, rather than the steps in the Helm installation that utilize the Node 1 IP, gathering it from the kubernetes service endpoints.  After this, I was able to get past the certificate issue on start up of Lighthouse Agent.  From here, it was getting the clusters to communicate / trust each other.

I'll be trying a Helm installation with the learnings with subctl.

Regards,
Joe
Reply all
Reply to author
Forward
0 new messages