persistent 503 errors after pod killed

178 views

Skip to first unread message

Ryan Harley

unread,

Jan 18, 2021, 12:36:25 AM1/18/21

to envoy-users

Hi Guys, I am using envoy as part of Istio on a Kubernetes deployment. We have a headless service that Istio is configuring as an ORIGINAL_DST cluster in envoy. This seems to work in a sunny day case fine, but when we delete a pod forcefully, sometimes we get 503 no healthy upstream errors afterwards indefinitely. When this happens it seems like the only thing that will fix it is restarting the downstream application or stopping all traffic for a while. Any ideas on what could be causing this?

Ryan Harley

unread,

Jan 21, 2021, 6:38:54 PM1/21/21

to envoy-users

I have done some more investigation of this, and it looks like when things go badly envoy seems to continuously eject and uneject the deleted pod . The pod with that IP address no longer exists so it just goes into an eject and uneject loop. From another container in the cluster I can see that nslookup has removed the deleted pod's IP address from the service. What I don't understand is why envoy has not removed that IP address from the cluster as well? Anyone have any ideas about why this would happen?

Reply all

Reply to author

Forward

0 new messages