understanding envoy memory usage

3,598 views
Skip to first unread message

whja...@gmail.com

unread,
Mar 13, 2018, 3:35:18 PM3/13/18
to envoy-users
Hi,

I have Envoy running as a sidecar process for a gRPC service. Envoy is using a large amount of memory and I'm trying understand why and if there are options for limiting its memory footprint:

Here are charts for the server.memory_heap_size and server.memory_allocated metrics from the admin /stats API.

Displaying image001.png

Displaying image002.png


# server info (the binary is from the 1.5.0 docker image)

$ curl -s localhost:9901/server_info

envoy 9ed62923a8ff6745407046c4451ce757348d966f/Clean/RELEASE live 276 276 0


I turned on heap profiling and am puzzled at what's reported for the time where the memory_heap_size is growing:


$ pprof --text envoy-bin ./sidecar_heap_profile.0226.heap

Using local file envoy-bin.

Using local file ./sidecar_heap_profile.0226.heap.

Total: 6482.4 MB

  6482.4 100.0% 100.0%   6482.4 100.0% spdlog::details::os::thread_id::tid

     0.0   0.0% 100.0%      0.0   0.0% allocate_dtv

     0.0   0.0% 100.0%      0.0   0.0% __new_exitfn

     0.0   0.0% 100.0%      0.0   0.0% __cxa_thread_atexit_impl

     0.0   0.0% 100.0%      0.0   0.0% __GI___strdup

     0.0   0.0% 100.0%      0.0   0.0% __GI___pthread_once

     0.0   0.0% 100.0%      0.8   0.0% __libc_start_main

     0.0   0.0% 100.0%   6481.2 100.0% start_thread


Seems odd that some method in the logging library to do with getting the thread id should be using so much memory.


OS details:


$ cat /etc/redhat-release

Red Hat Enterprise Linux Server release 7.4 (Maipo)

 

# i'm running with a custom glibc because rhel7’s glibc is too old (hopefully that’s not relevant)

/opt/glibc-2.18-rhel7-x86_64/lib/ld-linux-x86-64.so.2 --library-path /opt/glibc-2.18-rhel7-x86_64/lib /opt/envoy-1.5.0/envoy-bin -c sidecar.yaml --base-id 12801 --log-path /opt/envoy_log/sidecar.log --service-cluster sidecar


This behavior does seem to be triggered by specific client activity so I will be looking into getting the specifics of that. But in the meantime I wanted to see if anyone understood this behavior and knows of a way to ask Envoy to limit its memory footprint.


Thanks,


Whitney

Matt Klein

unread,
Mar 13, 2018, 3:43:02 PM3/13/18
to Whitney Jackson, envoy-users
This must be a bug in spdlog, probably related to a specific platform. I haven't heard of this before. If you can come up with any more debugging details that would be helpful. I might also post in the spdlog project.

--
You received this message because you are subscribed to the Google Groups "envoy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to envoy-users+unsubscribe@googlegroups.com.
To post to this group, send email to envoy...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/envoy-users/8772926d-8723-47a3-8a5e-ece4ff662e9f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--

whja...@gmail.com

unread,
Mar 13, 2018, 3:49:11 PM3/13/18
to envoy-users
Looks like pasting in the charts didn't work...grrr. Trying again using the "Insert Image" button:



whja...@gmail.com

unread,
Mar 13, 2018, 4:02:55 PM3/13/18
to envoy-users
Thanks for the suggestion. I posted to the spdlog project here: https://github.com/gabime/spdlog/issues/659

I'll let you know if I'm able to get more debugging details.

whja...@gmail.com

unread,
Mar 13, 2018, 5:36:24 PM3/13/18
to envoy-users
Here's a minimal config that I was able to use to observe the issue: https://pastebin.com/ytakJzf6

Also, I got this note from the spdlog developer:

strange.. maybe your old platform doesnt support thread local storage well?
Please try defining SPDLOG_DISABLE_TID_CACHING before including spdlog.h (or uncmment it in teakme.h) and see if it helps after recompiling.

So I'm going to try to use envoyproxy/envoy-build-centos to build a binary rather than using the one from envoyproxy/envoy:v1.5.0. If the problem still happens I'll try setting SPDLOG_DISABLE_TID_CACHING. 

whja...@gmail.com

unread,
Apr 10, 2018, 10:03:52 AM4/10/18
to envoy-users
I just wanted to circle back and close this out. It turned out that adjusting http2_settings (initial_stream_window_size and initial_connection_window_size) fixed my issue.

The heap profile I posted earlier in this thread appears to have been bogus. I'm not exactly sure why but it seems related to the way I'm running Envoy (the pre-built binary with a custom glibc on RHEL). After following the build instructions for Centos my binary behaved the same as before but the heap profiles looked a lot more sane. At that point I tuned http2_settings and my problem went away.  

Matt Klein

unread,
Apr 10, 2018, 12:36:04 PM4/10/18
to Whitney Jackson, envoy-users
Thank you for following up on this. We should probably reduce the defaults for the h2 stream/connection window sizes. They were left huge for legacy reasons.

On Tue, Apr 10, 2018 at 7:03 AM, <whja...@gmail.com> wrote:
I just wanted to circle back and close this out. It turned out that adjusting http2_settings (initial_stream_window_size and initial_connection_window_size) fixed my issue.

The heap profile I posted earlier in this thread appears to have been bogus. I'm not exactly sure why but it seems related to the way I'm running Envoy (the pre-built binary with a custom glibc on RHEL). After following the build instructions for Centos my binary behaved the same as before but the heap profiles looked a lot more sane. At that point I tuned http2_settings and my problem went away.  

--
You received this message because you are subscribed to the Google Groups "envoy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to envoy-users+unsubscribe@googlegroups.com.
To post to this group, send email to envoy...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages