ALS failure leads to stale configuration and 503's

37 views
Skip to first unread message

Oscar Moreno Garza

unread,
Jun 3, 2020, 5:58:41 PM6/3/20
to envoy-dev
This is a bit hard to replicate, so bear with me (though i'd drop by here before filing a github issue with replication).

I am also maksing som assumptions that i will highlight with green since i am not very familiar with the envoy model/code around als:


If the AccessLogService goes down during envoy serving traffic (saw a bunch connection error to the ALS cluster),  all of the messages will queue and eventually the main thread will become resource exhausted, meaning control plane updates will get dropped. but worker threads will still serve traffic no problem.

This results in stale configuration, which may lead to connection errors on cluster/routes updates.

Just wondering if this is by design (seems a bit weird to me to have logs compromise production traffic so this might be unintended)?
Is there a knob to tweak this behavior? (if an ALS message fails, drop the message don't keep it in memory and don't retry)
Or is this a potential bug/issue that was previously unreported?


Oscar Moreno Garza

unread,
Jun 5, 2020, 5:27:58 PM6/5/20
to envoy-dev
Looking through the code at https://github.com/envoyproxy/envoy/blob/abdbbde827e3a76d014feb9a94ec4f803b1950c3/source/extensions/access_loggers/grpc/grpc_access_log_impl.cc#L115

Seems to suggest this is indeed, not by design but most likely something un-intended.

Matt Klein

unread,
Jun 9, 2020, 10:24:39 AM6/9/20
to Oscar Moreno Garza, envoy-dev
Any unbounded ALS memory usage should have been fixed in https://github.com/envoyproxy/envoy/pull/10882.

--
You received this message because you are subscribed to the Google Groups "envoy-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to envoy-dev+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/envoy-dev/33f19769-3fce-4c6f-b8cd-119509814a6co%40googlegroups.com.

Oscar Moreno Garza

unread,
Jun 9, 2020, 2:27:20 PM6/9/20
to envoy-dev
Thanks for the response Matt, we where running v1.11 which doesn't seem to have that PR merged.
To unsubscribe from this group and stop receiving emails from it, send an email to envo...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages