Kiali on AWS EKS

Visto 147 veces
Saltar al primer mensaje no leído

Arturas R.

no leída,
18 sept 2020, 5:43:1918/9/20
a kiali-users
Hello,

I'm stuck googling and cannot find an answer, but why Kiali operator does many replicasets and creates new every few mins making Kiali unusable, any ideas?

see screenshot below

Arturas R.

no leída,
18 sept 2020, 6:18:4518/9/20
a kiali-users
It seems like a pattern

kiali-57d8848f89       1         1         0       0s
kiali-599769b9d6       0         1         1       117s
kiali-599769b9d6       0         0         0       117s
kiali-57d8848f89       1         1         1       35s
kiali-7f5585654b       0         0         0       22m
kiali-77fd86cf8d       1         0         0       0s
kiali-57d8848f89       0         1         1       117s
kiali-77fd86cf8d       1         0         0       0s
kiali-57d8848f89       0         1         1       117s
kiali-77fd86cf8d       1         1         0       0s
kiali-57d8848f89       0         0         0       117s
kiali-77fd86cf8d       1         1         1       35s
kiali-7f567bd99b       0         0         0       22m
kiali-5b5cc94d77       1         0         0       0s
kiali-77fd86cf8d       0         1         1       117s
kiali-5b5cc94d77       1         0         0       0s
kiali-77fd86cf8d       0         1         1       117s
kiali-5b5cc94d77       1         1         0       0s
kiali-77fd86cf8d       0         0         0       117s
kiali-5b5cc94d77       1         1         1       24s
kiali-5986ccb8c5       0         0         0       21m
kiali-86d59bfd55       1         0         0       0s
kiali-5b5cc94d77       0         1         1       117s
kiali-86d59bfd55       1         0         0       0s
kiali-5b5cc94d77       0         1         1       117s
kiali-86d59bfd55       1         1         0       0s
kiali-5b5cc94d77       0         0         0       117s
kiali-86d59bfd55       1         1         1       7s
kiali-748c994bd4       0         0         0       21m
kiali-6f9f5d989f       1         0         0       0s
kiali-86d59bfd55       0         1         1       116s
kiali-6f9f5d989f       1         0         0       0s
kiali-86d59bfd55       0         1         1       116s
kiali-6f9f5d989f       1         1         0       0s
kiali-86d59bfd55       0         0         0       116s

Lucas Ponce

no leída,
18 sept 2020, 6:21:5518/9/20
a Arturas R.,John Mazzitelli,kiali-users
The operator is also responsible to re-start/re-create the Pod on configuration changes.

But this looks odd, probably @John Mazzitelli will jump on the thread and comment with more details.

--
You received this message because you are subscribed to the Google Groups "kiali-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kiali-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kiali-users/35004910-14a0-4334-b124-359c43bb9acbn%40googlegroups.com.

John Mazzitelli

no leída,
18 sept 2020, 10:15:5318/9/20
a kiali-users
Are we sure the operator is doing that? Or is the cluster re-starting the pod due to low cluster resources or liveness probes failing?

Can you provide the logs for the kiali operator? What about the events and/or logs for the kiali pods?

In short, I would need more data to be able to figure out why the pods are constantly restarting.

----- Original Message -----
> The operator is also responsible to re-start/re-create the Pod on
> configuration changes.
>
> But this looks odd, probably @John Mazzitelli <ma...@redhat.com> will jump
> > <https://groups.google.com/d/msgid/kiali-users/35004910-14a0-4334-b124-359c43bb9acbn%40googlegroups.com?utm_medium=email&utm_source=footer>
> > .
> >
>
> --
> You received this message because you are subscribed to the Google Groups
> "kiali-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kiali-users...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/kiali-users/CAGUuq3XM5%3DevNUGh4ZHYtTstkgJXH%2B-nkrTnM2a8y35DTNAB2g%40mail.gmail.com.
>

Artūras Radzevičius

no leída,
18 sept 2020, 10:21:0818/9/20
a John Mazzitelli,kiali-users
I will provide logs later today, but liveness/readiness wouldn't create new replicasets. New RS means there is a change in deployment and rollout must be performed.

You received this message because you are subscribed to a topic in the Google Groups "kiali-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/kiali-users/XbCCa_dOOY4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to kiali-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kiali-users/550650212.13915447.1600438542977.JavaMail.zimbra%40redhat.com.

John Mazzitelli

no leída,
18 sept 2020, 10:26:3718/9/20
a kiali-users
> I will provide logs later today, but liveness/readiness wouldn't create new
> replicasets. New RS means there is a change in deployment and rollout must
> be performed.

The only reason why a deployment would be changing by the operator is if you are modifying (or SOMETHING is modifying) the Kiali CR and the operator deems it necessary to change the deployment which will redeploy the pod.

Are you changing the Kiali CR periodically? If so, that explains this.

Artūras Radzevičius

no leída,
18 sept 2020, 10:34:4118/9/20
a John Mazzitelli,kiali-users
I'm not changing Kiali CR, it stopped eventually after ~40mins. It was initial deployment to cluster, could it be related to view only mode, cause I saw ansible output about roles for each namespace, I have ~30 of them..

--
You received this message because you are subscribed to a topic in the Google Groups "kiali-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/kiali-users/XbCCa_dOOY4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to kiali-users...@googlegroups.com.

John Mazzitelli

no leída,
18 sept 2020, 10:40:2918/9/20
a kiali-users
If it was the initial deployment, the Deployment resource is one of the last resources created (after all the roles are created for the different accessible namespaces).

I would need to see the kiali operator logs to see if its running multiple reconciliation loops and if so what it is doing in those loops.

John Mazzitelli

no leída,
18 sept 2020, 10:42:5918/9/20
a kiali-users
Oh, and what version of Kiali is this? And how did you install it (what mechanism did you use and what's the command line used if not using OLM)?
> You received this message because you are subscribed to the Google Groups
> "kiali-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kiali-users...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/kiali-users/CACU2Jbpmz98%2Bq1AMr7Z1qcKKmoMJbmL4e0Qixn3kSUvdSARDHA%40mail.gmail.com.
>

Arturas R.

no leída,
21 sept 2020, 2:40:0521/9/20
a kiali-users
Hey, sorry for late reply.
I'm using Kiali 1.23.0, AWS EKS 1.16
I'm deploying Kiali Operator using Helm.

For Operator logs there is a lot of stuff to depersonalize, I can provide some exact some output if you point, ansible task etc.

For all namespaces it does this for every namespace, no additional logs
--------------------------- Ansible Task StdOut -------------------------------
TASK [default/kiali-deploy : Create resource [role-viewer] on [kubernetes]] ****
--------------------------- Ansible Task StdOut -------------------------------
TASK [default/kiali-deploy : Create resource [rolebinding] on [kubernetes]] ****
--------------------------- Ansible Task StdOut -------------------------------
TASK [default/kiali-deploy : Create additional Kiali roles in [kubernetes] namespace [my-custom-namespace]] ***

after doing namespaces there are these logs
--------------------------- Ansible Task StdOut -------------------------------
TASK [default/kiali-deploy : Wait for Monitoring Dashboards CRD to be ready] ***
--------------------------- Ansible Task StdOut -------------------------------
TASK [default/kiali-deploy : Find all Monitoring Dashboards packaged with the operator] ***
--------------------------- Ansible Task StdOut -------------------------------
-------------------------------------------------------------------------------
--------------------------- Ansible Task StdOut -------------------------------
TASK [default/kiali-deploy : Remove Monitoring Dashboards that should no longer be deployed] ***
--------------------------- Ansible Task StdOut -------------------------------
TASK [default/kiali-deploy : Create the Monitoring Dashboards] *****************
--------------------------- Ansible Task StdOut -------------------------------
TASK [default/kiali-deploy : Force the Kiali pod to restart if necessary] ******
--------------------------- Ansible Task StdOut -------------------------------
TASK [default/kiali-deploy : include_tasks] ************************************
--------------------------- Ansible Task StdOut -------------------------------
TASK [default/kiali-deploy : operator_sdk.util.k8s_status] *********************
{"level":"info","ts":1600425881.533758,"logger":"logging_event_handler","msg":"[playbook debug]","name":"my-kiali","namespace":"my-kiali","gvk":"kiali.io/v1alpha1, Kind=Kiali","event_type":"runner_on_ok","job":"2522543887406335640","EventData.TaskArgs":""}
--------------------------- Ansible Task StdOut -------------------------------
 TASK [Log reconciliation processing time] ******************************** 
ok: [localhost] => {
    "msg": "Processing time: [114] seconds"
}
--------------------------- Ansible Task StdOut -------------------------------

 TASK [debug] ******************************** 
ok: [localhost] => {
    "msg": "KIALI RECONCILIATION IS DONE."
}
-------------------------------------------------------------------------------
{"level":"info","ts":1600425881.7596316,"logger":"runner","msg":"Ansible-runner exited successfully","job":"2522543887406335640","name":"my-kiali","namespace":"my-kiali"}
--------------------------- Ansible Task Status Event StdOut  -----------------
PLAY RECAP *********************************************************************
localhost                  : ok=471  changed=3    unreachable=0    failed=0    skipped=103  rescued=0    ignored=0   
-------------------------------------------------------------------------------

and after it there is an Ansible debug log

I would like to take a notice to this, it's 114s and I got new ReplicaSet created every 115-117s
--------------------------- Ansible Task StdOut -------------------------------
 TASK [Log reconciliation processing time] ******************************** 
ok: [localhost] => {
    "msg": "Processing time: [114] seconds"
}

Arturas R.

no leída,
21 sept 2020, 2:50:3221/9/20
a kiali-users
Also my deployment is done in following order
  1. Install Kiali Operator using Helm
  2. Create Kiali CR
My KO deployment values:

nameOverride: "my-kiali"
fullnameOverride: "my-kiali"
cr:
    name: "my-kiali"
    namespace: "my-kiali"

and my CR yaml:
Screenshot 2020-09-21 at 09.49.28.png

John Mazzitelli

no leída,
21 sept 2020, 15:14:1921/9/20
a kiali-users
I'm curious what happens if you do not set nameOverride and fullnameOverride.

We had another issue come up recently that was caused by that.

So - try that out. Do not set the name overrides and see if you get the same problem.

Something must be happening where the resources are changing for each reconciliation run of the operator and that in turn causes another reconciliation. I have not seen that in any tests I've done - so there must be something going on that is changing one or more Kiali resources that is not expected.

----- Original Message -----
> Also my deployment is done in following order
>
> 1. Install Kiali Operator using Helm
> 2. Create Kiali CR
> https://groups.google.com/d/msgid/kiali-users/01a12251-6348-4c96-a869-55a88a19f6e3n%40googlegroups.com.
>

Arturas R.

no leída,
22 sept 2020, 6:12:5522/9/20
a kiali-users
I removed it, but it did the same as previous deployment.
I did some digging around and found one (stupid) thing, could it be the case?

➜  kubectl -n kiali get rs
NAME                              DESIRED   CURRENT   READY   AGE
kiali-5b46b69fbb                  0         0         0       4m19s
kiali-5d7df47d47                  0         0         0       8m41s
kiali-6848bf9fb9                  0         0         0       12m
kiali-7f87d85c97                  1         1         1       2m8s
kiali-86cf9c6587                  0         0         0       6m29s
kiali-kiali-operator-7cb7f7d79b   1         1         1       13m

I took /kiali-configuration/config.yaml file from RS kiali-86cf9c6587 pod and compared it to config file in kiali-5b46b69fbb pod and I found that order of namespaces is changed.
Line number is the same for config files, but just the order.
Screenshot 2020-09-22 at 13.11.33.png

John Mazzitelli

no leída,
22 sept 2020, 7:12:3922/9/20
a kiali-users
Yes, this is something to key on - the change in the ConfigMap.

OK, so the accessible_namespaces setting is changing for some reason between different deployments of Kiali.

Can you tell me what the value of the Kiali CR's deployment.accessible_namespaces is? Is it "[**]" or is it a list of namespaces? How many namespaces? And are those namespaces getting created/destroyed while the operator is processing the CR?
> > https://groups.google.com/d/msgid/kiali-users/01a12251-6348-4c96-a869-55a88a19f6e3n%40googlegroups.com
> > .
> > >
> >
> >
>
> --
> You received this message because you are subscribed to the Google Groups
> "kiali-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kiali-users...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/kiali-users/c0fd8fcc-3afd-4a2a-934a-247d73844e4fn%40googlegroups.com.
>

Arturas R.

no leída,
22 sept 2020, 7:19:4222/9/20
a kiali-users
It is [**] and namespaces are not modified during install.

John Mazzitelli

no leída,
22 sept 2020, 7:31:5022/9/20
a kiali-users
OK, I think I might know what this is.

With accessible_namespaces set to [**], the operator will perform a k8s query and find all the namespaces that are visible (it looks like you have about 30 namespaces). It then performs all the role-based tasks it needs to do and then stores each namespace in the list (as you saw) in the config map.

If the operator does not sort those names, and if that k8s API query does not return the same exact list each time (sorted the same way), then what you are seeing does not surprise me.

I wrote this up here: https://github.com/kiali/kiali/issues/3231

I'll try to confirm that this is really the problem and fix it if so.

Thanks.
> https://groups.google.com/d/msgid/kiali-users/5184ee8a-06e7-4b21-9d78-9f6769ef4891n%40googlegroups.com.
>

Arturas R.

no leída,
22 sept 2020, 7:38:2922/9/20
a kiali-users
Makes sense, thanks!

jmaz...@redhat.com

no leída,
22 sept 2020, 11:38:1622/9/20
a kiali-users
Can you take a look at your operator logs and find the log message:

- name: Listing of all accessible namespaces (includes regex matches)
  debug:
    msg: "{{ kiali_vars.deployment.accessible_namespaces }}"

I'm going to continue with the theory that this list is unsorted - but if you can look at that log message for each run of the operator, see if you can confirm the order of the namespaces are different for each run. I suspect they are. And if so, that's the cause of the problem.

Arturas R.

no leída,
22 sept 2020, 12:12:4922/9/20
a kiali-users
Yes, I didn't look at every each log, but I took 3 random logs and the order is different in all of them.

Arturas R.

no leída,
22 sept 2020, 12:16:1622/9/20
a kiali-users
This is run #3 vs run #11
Screenshot 2020-09-22 at 19.14.19.png

Edgar Hernández

no leída,
23 sept 2020, 9:59:4223/9/20
a Arturas R.,kiali-users
Not sure why, but I don't see the logs text in the image.
I only see line numbers and squares on a dark/gray background...

Jay Shaughnessy

no leída,
23 sept 2020, 10:10:3023/9/20
a kiali...@googlegroups.com

+1, the attachment does not seem valid.

Arturas R.

no leída,
23 sept 2020, 10:21:0423/9/20
a kiali-users
Did you guys read whole thread or just commenting out of random?

I cannot provide exact namespace list due it has sensitive data, discloses my client information.

Edgar Hernández

no leída,
23 sept 2020, 10:32:3323/9/20
a Arturas R.,kiali-users
Yes, it's understandable that you need to hide/mask data.
And yes, from what Mazz has said, I understand that the order of the namespaces is causing the operator to re-create the Kiali deployment.

But, hey! Give us some hints to understand what we are seeing!
Even with all the context, it's hard to understand what this image means, or what are you trying to communicate.

Please, enable us to understand and help...!

John Mazzitelli

no leída,
23 sept 2020, 10:58:1223/9/20
a kiali-users
I submitted a PR that MIGHT solve this problem. I say "might" because I was unable to replicate, but from the problem description, it sounds like the ordering of the accessible_namespaces is random and not consistent (probably dependent on what the k8s master API returns, I think). If this is the case (and it sounds like it is), my PR should fix this.

See: https://github.com/kiali/kiali-operator/pull/135

This will be in the next release of the operator in a few weeks. If you have a dev environment, feel free to test that out. The PR is already merged in master.

----- Original Message -----
> Yes, it's understandable that you need to hide/mask data.
> And yes, from what Mazz has said, I understand that the order of the
> namespaces is causing the operator to re-create the Kiali deployment.
>
> But, hey! Give us some hints to understand what we are seeing!
> Even with all the context, it's hard to understand what this image
> means, or what are you trying to communicate.
>
> Please, enable us to understand and help...!
>
>
> On 9/23/20 9:21 AM, Arturas R. wrote:
> > Did you guys read whole thread or just commenting out of random?
> >
> > I cannot provide exact namespace list due it has sensitive data,
> > discloses my client information.
> >
> > On Wednesday, September 23, 2020 at 5:10:30 PM UTC+3
> > jsha...@redhat.com wrote:
> >
> >
> > +1, the attachment does not seem valid.
> >
> >
> > On 9/23/2020 9:59 AM, Edgar Hernández wrote:
> >> Not sure why, but I don't see the logs text in the image.
> >> I only see line numbers and squares on a dark/gray background...
> >>
> >> On 9/22/20 11:16 AM, Arturas R. wrote:
> >>> This is run #3 vs run #11
> >>> <http://kiali.io/v1alpha1>,
> >>> <https://groups.google.com/d/msgid/kiali-users/2ab49dd7-553d-49d7-ab5a-10e570480c3bn%40googlegroups.com?utm_medium=email&utm_source=footer>.
> >>
> >> --
> >> You received this message because you are subscribed to the
> >> Google Groups "kiali-users" group.
> >> To unsubscribe from this group and stop receiving emails from it,
> >> send an email to kiali-users...@googlegroups.com.
> >> To view this discussion on the web visit
> >> https://groups.google.com/d/msgid/kiali-users/7234afb4-1a85-6767-6703-45492cc93de9%40redhat.com
> >> <https://groups.google.com/d/msgid/kiali-users/7234afb4-1a85-6767-6703-45492cc93de9%40redhat.com?utm_medium=email&utm_source=footer>.
> >
> > --
> > You received this message because you are subscribed to the Google
> > Groups "kiali-users" group.
> > To unsubscribe from this group and stop receiving emails from it, send
> > an email to kiali-users...@googlegroups.com
> > <mailto:kiali-users...@googlegroups.com>.
> > To view this discussion on the web visit
> > https://groups.google.com/d/msgid/kiali-users/88e8269c-660d-4e4a-921b-9a8023b3c338n%40googlegroups.com
> > <https://groups.google.com/d/msgid/kiali-users/88e8269c-660d-4e4a-921b-9a8023b3c338n%40googlegroups.com?utm_medium=email&utm_source=footer>.
>
> --
> You received this message because you are subscribed to the Google Groups
> "kiali-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kiali-users...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/kiali-users/7abaf3c2-a845-03c7-1c0c-bc4548c04888%40redhat.com.
>

Artūras Radzevičius

no leída,
23 sept 2020, 11:43:0023/9/20
a John Mazzitelli,kiali-users
Thanks for PR @John Mazzitelli, if it isn't my case, it shouldn't mess up anything anyway :)

@Edgar and @Jay I've replaced actual values with * to see a picture.
left (red) side is 1st run and right (green) is 5th run of Operator
Screenshot 2020-09-23 at 18.36.50.png

Edgar Hernández

no leída,
23 sept 2020, 14:19:5123/9/20
a Artūras Radzevičius,John Mazzitelli,kiali-users
Thanks for the image. It helps a lot!.

So, the issue went away after ~40 mins. I'm wondering: did it never come back?
It would be a concern if something re-triggers the issue making Kiali unusable again.

@Mazz / @Jay I'm wondering if this fix is a candidate for backport to v1.24. Sounds like an annoying issue and a potential blocker.

Arturas R.

no leída,
23 sept 2020, 14:51:5323/9/20
a kiali-users
Yes, it stopped after 30-40 mins and never happened again.

I've created new environment and tested 2 scenarios.

Scenario 1:
  1. Deployed Istio
  2. Deployed Prometheus
  3. Deployed Kiali
  4. Created 30 random namespaces
  5. It took only few runs to chunk it, I have 5 old and 1 active ReplicaSets. I still can find that namespace order is different in 1st and 2nd run but, I only have 3 records for TASK [default/kiali-deploy : Listing of all accessible namespaces (includes regex matches). 3rd run order is the same as 2nd run order.
Scenario 2:
  1. Deployed Istio
  2. Deployed Prometheus
  3. Created 30 random namespaces
  4. Deployed Kiali
  5. I've already have 8 replicasets in 23 mins and I can find 8 records TASK [default/kiali-deploy : Listing of all accessible namespaces (includes regex matches) in logs. I've looked into few runs and every has different namespace order.

jmaz...@redhat.com

no leída,
25 sept 2020, 6:59:1625/9/20
a kiali-users
Someone hit what I thought was the same problem, only to find out they had a second, unknown to them, Kiali CR trying to install a Kiali in the same deployment namespace. I wonder if the same thing is happening to you.


What happened was someone had created a second Kiali CR in another namespace but it was configured to try to install Kiali in the same place where their first Kiali CR wanted to install it - hence, they were bouncing the Kiali over and over because they confused the operator - two Kiali CRs attempting to install Kiali in the same place.

You should check to make sure that isn't happening to you. Do a "kubectl get kiali --all-namespaces" and make sure you don't have any rogue Kiali CRs around causing your problem.

Arturas R.

no leída,
25 sept 2020, 7:15:2925/9/20
a kiali-users
I've checked again, I have only 1 Kiali CR. Also in the issue user mentions that he has continuous loop of pod restarts, mine are only at initial deploy for ~30 mins.
Responder a todos
Responder al autor
Reenviar
0 mensajes nuevos