Memory Usage Analysis

8 views
Skip to first unread message

Jason Garber

unread,
Jun 18, 2020, 4:27:50 PM6/18/20
to mod...@googlegroups.com
Hey Grant, All,

We've been running a live event with about 1,000 people and getting hit with up to hundreds of requests per second.  I'm running 20 processes and 20 threads per process.

Every once in a while the memory across all processes spikes up to 200+MB and the load average skyrockets.  I've seen it hit as high as 250 (vs. 0.7 normal)

service httpd graceful
fixes the issue (for a while)

Normal Example:
[deploy@daas7 DaaS-TMT-0]$ ps aux | grep 'apache' | grep 'TMT-0' | awk '{print $6/1024 " MB " $11}'; cat /proc/meminfo | grep -E '(MemFree|Avail)'; uptime
123.969 MB wsgi-DaaS-TMT-0
123.984 MB wsgi-DaaS-TMT-0
119.52 MB wsgi-DaaS-TMT-0
126.121 MB wsgi-DaaS-TMT-0
121.086 MB wsgi-DaaS-TMT-0
121.016 MB wsgi-DaaS-TMT-0
145.945 MB wsgi-DaaS-TMT-0
118.406 MB wsgi-DaaS-TMT-0
126.672 MB wsgi-DaaS-TMT-0
112.234 MB wsgi-DaaS-TMT-0
111.328 MB wsgi-DaaS-TMT-0
135.461 MB wsgi-DaaS-TMT-0
117.73 MB wsgi-DaaS-TMT-0
136.438 MB wsgi-DaaS-TMT-0
113.359 MB wsgi-DaaS-TMT-0
118.289 MB wsgi-DaaS-TMT-0
123.535 MB wsgi-DaaS-TMT-0
126.746 MB wsgi-DaaS-TMT-0
122.766 MB wsgi-DaaS-TMT-0
115.934 MB wsgi-DaaS-TMT-0
MemFree:         4993068 kB
MemAvailable:   25089688 kB
 13:01:36 up 7 days,  9:27,  4 users,  load average: 0.55, 0.82, 2.46

Server almost unresponsive:
[deploy@daas7 DaaS-TMT-0]$ ps aux | grep 'apache' | grep 'TMT-0' | awk '{print $6/1024 " MB " $11}'; cat /proc/meminfo | grep -E '(MemFree|Avail)'; uptime
275.457 MB wsgi-DaaS-TMT-0
277.633 MB wsgi-DaaS-TMT-0
274.633 MB wsgi-DaaS-TMT-0
285.215 MB wsgi-DaaS-TMT-0
278.156 MB wsgi-DaaS-TMT-0
272.445 MB wsgi-DaaS-TMT-0
277.543 MB wsgi-DaaS-TMT-0
274.371 MB wsgi-DaaS-TMT-0
277.699 MB wsgi-DaaS-TMT-0
273.18 MB wsgi-DaaS-TMT-0
273.363 MB wsgi-DaaS-TMT-0
278.094 MB wsgi-DaaS-TMT-0
276.719 MB wsgi-DaaS-TMT-0
277.074 MB wsgi-DaaS-TMT-0
274.324 MB wsgi-DaaS-TMT-0
275.32 MB wsgi-DaaS-TMT-0
273.684 MB wsgi-DaaS-TMT-0
271.797 MB wsgi-DaaS-TMT-0
283.133 MB wsgi-DaaS-TMT-0
255.16 MB wsgi-DaaS-TMT-0
28.8008 MB /usr/bin/convert
MemFree:          262352 kB
MemAvailable:   18945328 kB
 13:18:50 up 7 days,  9:44,  4 users,  load average: 253.79, 100.74, 40.20

After httpd graceful after a couple of minutes:

[deploy@daas7 DaaS-TMT-0]$ ~/stats.sh
100.383 MB wsgi-DaaS-TMT-0
110.719 MB wsgi-DaaS-TMT-0
101.176 MB wsgi-DaaS-TMT-0
128.449 MB wsgi-DaaS-TMT-0
112.527 MB wsgi-DaaS-TMT-0
109.465 MB wsgi-DaaS-TMT-0
103.875 MB wsgi-DaaS-TMT-0
98.8438 MB wsgi-DaaS-TMT-0
108.414 MB wsgi-DaaS-TMT-0
108.133 MB wsgi-DaaS-TMT-0
107.07 MB wsgi-DaaS-TMT-0
118.824 MB wsgi-DaaS-TMT-0
101.527 MB wsgi-DaaS-TMT-0
127.004 MB wsgi-DaaS-TMT-0
100.871 MB wsgi-DaaS-TMT-0
125.188 MB wsgi-DaaS-TMT-0
100.566 MB wsgi-DaaS-TMT-0
108.91 MB wsgi-DaaS-TMT-0
101.215 MB wsgi-DaaS-TMT-0
109.711 MB wsgi-DaaS-TMT-0
MemFree:         7607044 kB
MemAvailable:   25815540 kB
 13:25:51 up 7 days,  9:51,  4 users,  load average: 1.25, 38.56, 36.12

My main question is does anyone have any suggestions for seeing inside the daemon processes down to the python object level to see what is going on?

Thanks,
Jason

Graham Dumpleton

unread,
Jun 18, 2020, 7:42:13 PM6/18/20
to mod...@googlegroups.com
One possible cause for this can be object reference count cycles which the garbage collector cannot break.

So first off, try creating a background thread that periodically logs number of objects.

I think it is gc.get_count(). The thresholds of when it should kick in are given by gc.get_threshold().

If need be, you can then start dumping out counts of objects of particular types that exist by looking at gc.get_objects().

Anyway, this may give some clues. Have had to use this many many years ago to debug a memory growth issue in Django due to custom __del__() methods on objects causing problems. My memory of what I did is very vague though and don't think I have any code I used laying around, but will have a quick search.

Graham

--
You received this message because you are subscribed to the Google Groups "modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to modwsgi+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/modwsgi/CAAHBju%2B-w_rF9eEeG_JLL63fgxw34B15VXv5odEcYPPmbbazVA%40mail.gmail.com.

Graham Dumpleton

unread,
Jun 18, 2020, 7:46:22 PM6/18/20
to mod...@googlegroups.com
So I found the following code, which reminds me that the issue was that code being run when the garbage collector invoked __del__() methods was deadlocking. This caused the garbage collector to stop running. The code below used a trick with objects to be able to log each time the garbage collector runs. So if you see it stops running you know it could be because of a deadlock in a __del__() method.

import time
import threading

class Monitor(object):

    initialized = False
    lock = threading.Lock()

    count = 0

    @classmethod
    def initialize(cls):
        with Monitor.lock:
            if not cls.initialized:
                cls.initialized = True
                cls.rollover()

    @staticmethod
    def rollover():
        print('RUNNING GARBAGE COLLECTOR', time.time())

        class Object(object):
            pass

        o1 = Object()
        o2 = Object()

        o1.o = o2
        o2.o = o1

        o1.t = Monitor()

        del o1
        del o2

    def __del__(self):
        global count
        Monitor.count += 1
        Monitor.rollover()

Monitor.initialize()

Graham Dumpleton

unread,
Jun 18, 2020, 7:48:23 PM6/18/20
to mod...@googlegroups.com
I haven't read the discussion, but this was posted as part of:

Graham Dumpleton

unread,
Jun 18, 2020, 8:17:43 PM6/18/20
to mod...@googlegroups.com
Should add that if it does come down to deadlock in garbage collector, next step is to then use a variant of:


to dump out stack traces where all threads are and try to work out which is blocked.

Jason Garber

unread,
Jun 18, 2020, 11:26:18 PM6/18/20
to mod...@googlegroups.com
Hey Grant, thank you.  I have not seen these issues outside if heavy load and have run systems for months without even cycling the daemon processes, so I intend to do the steps you suggested both outside of and under heavy load. 

That being said, do you have any general suggestions for accurately loading our application for simulating realistic high load while I analyze as per your excellent comments from earlier?

Thanks!
Jason

Reply all
Reply to author
Forward
0 new messages