Modwsi Daemon mode python giving error (until manual restart of apache)

38 views
Skip to first unread message

Sujai Kumar

unread,
May 15, 2019, 11:44:54 AM5/15/19
to modwsgi
Hello All,

We have a Apache/Python/Django using modwsgi. We hit upon a situation where the following error is happening in the site, 

Error Log

2019-05-09 09:18:08,471 - root - ERROR] - Error : Traceback (most recent call last):
  File ".....core/api/api_util.py", line 15, in log_user_access
    caller_func_name = inspect.stack()[1][3]  # retrieves the name of the function that calls this function
  File ".../Python-3.6.3s/lib/python3.6/inspect.py", line 1491, in stack
    return getouterframes(sys._getframe(1), context)
  File ".../Python-3.6.3s/lib/python3.6/inspect.py", line 1468, in getouterframes
    frameinfo = (frame,) + getframeinfo(frame, context)
  File ".../Python-3.6.3s/lib/python3.6/inspect.py", line 1442, in getframeinfo
    lines, lnum = findsource(frame)
  File ".../Python-3.6.3s/lib/python3.6/inspect.py", line 825, in findsource
    if pat.match(lines[lnum]): break
IndexError: list index out of range 

My understanding
I think that the issue is happening because of apache getting reloaded due to below configurations. At the same time the logrotate triggers 'apache reload' around the same time.

Quick Fix
The quick fix that we had to do was to restart the apache. Just after the restart the application started working.

Question:
What has caused this situation and how to avoid this occurring in future so that we are not reloading it manually? Thanks,

Logrotate Configs


[Sujai@machine0009 logrotate.d]$ cat apache

# THIS FILE IS AUTOMATICALLY DISTRIBUTED BY PUPPET.  ANY CHANGES WILL BE

# OVERWRITTEN.


/opt/www/logs/*log {

  compress

  daily

  dateext

  missingok

  olddir /opt/www/logs/roll

  rotate 31

  sharedscripts

  postrotate

    /bin/systemctl reload httpd.service >/dev/null 2>/dev/null || true

  endscript

}

[Sujai@machine0009 logrotate.d]$ cat httpd

/var/log/httpd/*log {

    missingok

    notifempty

    sharedscripts

    delaycompress

    postrotate

        /bin/systemctl reload httpd.service > /dev/null 2>/dev/null || true

    endscript

}


Thanks & regards,
Sujaikumar

Graham Dumpleton

unread,
May 15, 2019, 6:27:48 PM5/15/19
to mod...@googlegroups.com
Hard to tell just from the information you have given because you have truncated the file system paths so can't tell what package this code is executed from. The stack trace doesn't even look complete.

At a guess you are hitting an issue where there if there is a background thread running when the process is shutdown, that thread can start up when the interpreter is in a half destroyed state. This can result in strange things happening due to the interpreter having started to destroy objects and clear our modules.

Is this only happening when the processes are being shutdown?

Graham

--
You received this message because you are subscribed to the Google Groups "modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to modwsgi+u...@googlegroups.com.
To post to this group, send email to mod...@googlegroups.com.
Visit this group at https://groups.google.com/group/modwsgi.
To view this discussion on the web visit https://groups.google.com/d/msgid/modwsgi/b75f3ce9-7408-4be2-b35f-e131e78bfb2e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Sujai Kumar

unread,
May 15, 2019, 9:36:36 PM5/15/19
to mod...@googlegroups.com
Thanks for looking into this.

Well, the stack trace doesn't have much information. The issue happens when caller function name was about to be logged (using the inspect module). This was a working application for over an year and suddenly the situation has occurred.

My inference is that the python state has gone bad due to the apache reload that happens simultaneously due to logrotate configs. My understanding is that, on apache reload, the python processes goes gracefully down rereads the apache configuration(if any) and comes up. Why would existing python library (inspect module) give the 'list index out of range' error? Is there any other way that this situation could happen?

Again the apache restart (systemctl restart httpd) has to be done manually to bring the python process back to normal state.

Thanks & regards,
Sujaikumar

Graham Dumpleton

unread,
May 15, 2019, 10:33:39 PM5/15/19
to mod...@googlegroups.com
Are you sure you mod_wsgi is pulling in the correct Python shared library, given that it looks like you are using a custom Python installation.

Is Python 3.6 installed as system Python?

Are you wanting to actually use a separate Python installation you installed from source code?

Did you compile mod_wsgi from source code against any separate Python installation? If you did that, was LD_RUN_PATH environment variable set to the lib directory for that Python installation where the Python shared library is, when mod_wsgi was being built, to ensure that at run time it would use that library and not the system Python one.

If you are getting the wrong Python shared library pulled in, and there are patch level differences, you could see all sorts of strange issues.

Sujai Kumar

unread,
May 15, 2019, 10:42:10 PM5/15/19
to mod...@googlegroups.com
Hi Graham,

Yes, mod_wsgi is pulling the right python. I have verified them before writing to you. As mentioned earlier, the system is working fine for more than a year now. This issue happened suddenly and there were no code change recently and hence the concern. The only thing, that I see as external factor is 'apache reload' (as part of logrotate simultaneously). Are there any other things that we need to check? I shall send the complete stack trace in some time.

Thanks & regards,
Sujaikumar

Graham Dumpleton

unread,
May 15, 2019, 10:45:36 PM5/15/19
to mod...@googlegroups.com
As already mentioned, if it is only on process shutdown (caused by a reload), then it is possibly because it is trying to execute code when the interpreter has been half destroyed. You cut off the file system paths and so I can't see what:

File ".....core/api/api_util.py", line 15, in log_user_access

is likely a part of (if open source project) and infer anything about whether a separate thread is running. So a more complete stack trace may help.

Also, given you are using an ancient Python version, what mod_wsgi version are you using?

Sujai Kumar

unread,
May 15, 2019, 11:11:41 PM5/15/19
to mod...@googlegroups.com
Hi Graham,

Here are the version of softwares being used.
Python -> 3.6
mod_wsgi -> 4.6.4
Sujai@machine0009:/usr/lib64/httpd/modules> ldd mod_wsgi.so
        linux-vdso.so.1 =>  (0x00007ffc74cd7000)
        libpython3.6m.so.1.0 => /data/ev/Python-3.6.3s/lib/libpython3.6m.so.1.0 (0x00007fbd76c66000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fbd76a4a000)
        libc.so.6 => /lib64/libc.so.6 (0x00007fbd7667d000)
        libssl.so.1.0.0 => /data/dev/openssl-1.0.2sh/lib/libssl.so.1.0.0 (0x00007fbd7640b000)
        libcrypto.so.1.0.0 => /data/dev/openssl-1.0.2sh/lib/libcrypto.so.1.0.0 (0x00007fbd75fbd000)
        libz.so.1 => /data/dev/Python-3.6.3s/lib/libz.so.1 (0x00007fbd75da2000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007fbd75b9e000)
        libutil.so.1 => /lib64/libutil.so.1 (0x00007fbd7599b000)
        libm.so.6 => /lib64/libm.so.6 (0x00007fbd75699000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fbd773fa000)
Sujai@machine0009 :/usr/lib64/httpd/modules> strings mod_wsgi.so | grep -w -A 1 "wsgi_init"
wsgi_init
4.6.4

Stack Trace of the Error
2019-05-09 09:18:08,471 - root - ERROR] - Error : Traceback (most recent call last):
  File "/data/dev/deployment/production/release/server-python/sf_django/sf_app/views/utils/__init__.py", line 135, in function_wrapper
    return_value = func(caller_object, request, *args, **kwargs)
  File "/data/dev/deployment/production/release/server-python/sf_django/starfish_app/views/api/api_id.py", line 27, in get
    response = get_sub_discipline_wise_api_ids(user_itss_id, parameters["bs_revision_id"])
  File "/data/dev/deployment/production/release/server-python/sf_django/starfish_app/core/api/api_id.py", line 21, in get_sub_discipline_wise_api_ids
    log_user_access(user_itss_id, func_param={"bs_revision_id": bs_revision_id})
  File "/data/dev/deployment/production/release/server-python/sf_django/starfish_app/core/api/api_util.py", line 15, in log_user_access

    caller_func_name = inspect.stack()[1][3]  # retrieves the name of the function that calls this function
  File "/data/dev/Python-3.6.3s/lib/python3.6/inspect.py", line 1491, in stack
    return getouterframes(sys._getframe(1), context)
  File "/data/dev/Python-3.6.3s/lib/python3.6/inspect.py", line 1468, in getouterframes

    frameinfo = (frame,) + getframeinfo(frame, context)
  File "/data/dev/Python-3.6.3s/lib/python3.6/inspect.py", line 1442, in getframeinfo
    lines, lnum = findsource(frame)
  File "/data/dev/Python-3.6.3s/lib/python3.6/inspect.py", line 825, in findsource

    if pat.match(lines[lnum]): break
IndexError: list index out of range

In what case the interpreter will have half destroyed code. Won't the 'apapche reload' perform the process shutdown gracefully (maening, all existing connections to a thread would be served before the thread redeems itself)?

Thanks & regards,
Sujaikumar

Graham Dumpleton

unread,
May 15, 2019, 11:28:48 PM5/15/19
to mod...@googlegroups.com
Haha, for some reason I read it as Python 2.6 when I said ancient.

Anyway, can you grab section of code around line 825 in /data/dev/Python-3.6.3s/lib/python3.6/inspect.py.

Python 3.6.7 doesn't match that code at that line number, so wondering whether you are using older patch revision which was since updated. In 3.6.7 it has:

    if iscode(object):
        if not hasattr(object, 'co_firstlineno'):
            raise OSError('could not find function definition')
        lnum = object.co_firstlineno - 1. #### <----- This is line 825
        pat = re.compile(r'^(\s*def\s)|(\s*async\s+def\s)|(.*(?<!\w)lambda(:|\s))|^(\s*@)')
        while lnum > 0:
            if pat.match(lines[lnum]): break
            lnum = lnum - 1
        return lines, lnum

In your stack trace, line 825 is 3 lines later. Has the code in that function changed, with your version not having the "while lnum > 0"?

Sujai Kumar

unread,
May 16, 2019, 12:47:52 AM5/16/19
to mod...@googlegroups.com
Hi Graham,

The python version is Python-3.6.3 and not 3.6.7. I have built this from source. Inspect module came from build of Python. That code was not manually edited. I still feel that the error has happened to occur with 'inspect' module once. It could have happened with any other module though. I believe that it is do with some sequencing of python getting stopped./started causing the interpreter in unstable state. Do you think if any other case would cause this situation?

Thanks & regards,
Sujaikumar

Graham Dumpleton

unread,
May 16, 2019, 12:59:17 AM5/16/19
to mod...@googlegroups.com
I know you are using Python 3.6.3. I don't have the source code for that version handy, and why I asked if you could provide the code snippet to see whether it was any different.

I already explained one scenario where problems can arise. That is specific to process shutdown. You haven't provided any additional details for me to confirm whether this is occurring on process shutdown or in a different situation. You keep mentioning reload, so can only surmise it is an error on process shutdown. That process would be getting killed though, so it doesn't make sense that it would cause an ongoing problem.

You also provide no details about the configuration of mod_wsgi you are using. So I don't know if you are using embedded mode (not recommended), or daemon mode.

If I had to make another guess based on the little information you have given, the error you are seeing is just a subsequent error due to some bug (which was why curious about inspect code), and not the original error. If you are forced to restart processes, then it may be because your application fails on loading the first time and has made stateful changes which then means it cannot be reloaded. Due to lack of log information (preferable at LogLevel info), I have no way to confirm that.

All I can suggest is ensure you are using daemon mode, not embedded mode. That you set 'startup-timeout' to 15 on the WSGIDaemonProcess, so that if you do have a bad app initialisation occurring, which is quite possible with Django especially, that after 15 seconds of failed loading, it will restart the daemon process automatically.

That is all I can suggest without more complete logs (LogLevel info), which clearly shows contextual log messages from before or after the error so can see what processes are being restarted, whether using daemon mode or not etc etc.

Graham

Sujai Kumar

unread,
May 16, 2019, 1:32:35 AM5/16/19
to mod...@googlegroups.com
Hi Graham,

Please find attached the inspect.py file.

The process is running in Daemon mode.

mod_wsgi configuration
WSGIScriptAlias /rest /data/dev/deployment/production/release/server-python/sf_django/sf_app/wsgi.py process-group=sapp-django

WSGIDaemonProcess  sf_django processes=8 threads=10 maximum-requests=1000000 display-name=djangoa python-path=/data/dev/deployment/production/release/server-python/sf_django
WSGIProcessGroup sapp-django


Please let me know if you need any specific information.

Thanks & regards,
Sujaikumar

inspect.py

Graham Dumpleton

unread,
May 16, 2019, 5:17:06 AM5/16/19
to mod...@googlegroups.com
On 16 May 2019, at 3:32 pm, Sujai Kumar <sujai....@gmail.com> wrote:

Hi Graham,

Please find attached the inspect.py file.

The process is running in Daemon mode.

mod_wsgi configuration
WSGIScriptAlias /rest /data/dev/deployment/production/release/server-python/sf_django/sf_app/wsgi.py process-group=sapp-django

WSGIDaemonProcess  sf_django processes=8 threads=10 maximum-requests=1000000 display-name=djangoa python-path=/data/dev/deployment/production/release/server-python/sf_django
WSGIProcessGroup sapp-django


Please let me know if you need any specific information.

Add:

    startup-timeout=15

to WSGIDaemonProcess and monitor logs in future when reload occurs, preferably enabling LogLevel info level so get additional information.

The most likely cause is that you are hitting a known Django issue whereby Django initialisation is no longer re-entrant. Thus if you have a transient failure on startup during initialisation of Django, such as a database not being ready, and an error occurs, it causes the failure of the WSGI script loading. Problem is that on the next request when the WSGI script is loaded again, it will then encounter the error saying Django cannot be initialised as it isn't re-entrant. I suspect in your case though, that whatever logging code you have for exceptions is itself encountering an error when trying to log details including the stack trace. That error is being displayed and the original error details are being lost, so you cannot see what the real problem is.

The startup-timeout option was specially added to get around this issue with Django initialisation not being re-entrant. When loading of the WSGI script file keeps failing for the time period (in seconds) set for that option, the process will be shutdown and restarted automatically. This way if the original trigger was a transient issue, things will work after a restart. Right now you are restarting manually to recover, but that option can allow it to be automatic. Although your site with that option may keep running, recovering automatically, you should still monitor logs and dig into the underlying cause, removing any custom logging stuff for exceptions you may have, if that itself is failing when logging the original error and hiding it.

There isn't really anything more can offer than this suggestion.

Graham


For more options, visit https://groups.google.com/d/optout.
<inspect.py>

Sujai Kumar

unread,
May 17, 2019, 4:31:53 AM5/17/19
to mod...@googlegroups.com

Thanks Graham for your suggestion.

I got hold of the httpd log and saw this (which seems normal to me), (LogLevel is set to Info)

[Wed May 08 00:01:02.370191 2019] [core:notice] [pid 87977:tid 140036195301504] AH00094: Command line: '/usr/sbin/httpd -D FOREGROUND'
[Thu May 09 00:01:09.003672 2019] [mpm_worker:notice] [pid 87977:tid 140036195301504] AH00297: SIGUSR1 received.  Doing graceful restart ==> This is due to logrotate cron
[Thu May 09 00:01:10.531825 2019] [auth_digest:notice] [pid 87977:tid 140036195301504] AH01757: generating secret for digest authentication ...
[Thu May 09 00:01:10.535744 2019] [mpm_worker:notice] [pid 87977:tid 140036195301504] AH00292: Apache/2.4.6 (Red Hat Enterprise Linux) mod_wsgi/4.5.20 Python/3.6 OpenSSL/1.0.2m configured -- resuming normal operations
[Thu May 09 00:01:10.535769 2019] [core:notice] [pid 87977:tid 140036195301504] AH00094: Command line: '/usr/sbin/httpd -D FOREGROUND'
[Thu May 09 12:21:17.161693 2019] [mpm_worker:notice] [pid 87977:tid 140036195301504] AH00296: caught SIGWINCH, shutting down gracefully ==> This is because of the manual restart
[Thu May 09 12:21:18.985890 2019] [suexec:notice] [pid 55852:tid 139877324667008] AH01232: suEXEC mechanism enabled (wrapper: /usr/sbin/suexec)
[Thu May 09 12:21:19.002481 2019] [auth_digest:notice] [pid 55852:tid 139877324667008] AH01757: generating secret for digest authentication ...
[Thu May 09 12:21:19.010133 2019] [mpm_worker:notice] [pid 55852:tid 139877324667008] AH00292: Apache/2.4.6 (Red Hat Enterprise Linux) mod_wsgi/4.5.20 Python/3.6 OpenSSL/1.0.2m configured -- resuming normal operations

Also only error that I have seen so far in error_log is the one below. Do you see that this has caused the issue? I see that after those error the wsgi got loaded successfully.  With all those things, I will take your suggestion of adding start-timeout to the daemon process but would like to understand what has caused this.

[Thu May 09 17:22:21.048348 2019] [wsgi:error] [pid 55855:tid 139877092456192] [remote 10.180.64.220:39802] /data/sfdev/Python-3.6.3s/lib/python3.6/site-packages/urllib3/contrib/socks.py:37: DependencyWarning: SOCKS support in urllib3 requires the installation of optional dependencies: specifically, PySocks.  For more information, see https://urllib3.readthedocs.io/en/latest/contrib.html#socks-proxies
[Thu May 09 17:22:21.048376 2019] [wsgi:error] [pid 55855:tid 139877092456192] [remote 10.180.64.220:39802]   DependencyWarning
[Thu May 09 12:22:39.109198 2019] [wsgi:info] [pid 55857:tid 139877092456192] mod_wsgi (pid=55857): Create interpreter 'starfish-api'.
[Thu May 09 12:22:39.119767 2019] [wsgi:info] [pid 55857:tid 139877092456192] mod_wsgi (pid=55857): Adding '/data/dev/deployment/production/release/server-python/sf_django' to path.
[Thu May 09 12:22:39.126810 2019] [wsgi:info] [pid 55857:tid 139877092456192] [remote 10.180.64.220:39818] mod_wsgi (pid=55857, process='sf-api', application='sf-api'): Loading WSGI script '/data/dev/deployment/production/release/server-python/sf_django/sf_api/wsgi.py'.


Thanks & regards,
Sujaikumar


Graham Dumpleton

unread,
May 17, 2019, 4:47:13 PM5/17/19
to mod...@googlegroups.com
There is nothing out of the ordinary in that as far as logs.

One further suggestion. If you are using daemon mode, and only host the one WSGI application, it is recommended that you use:

    WSGIApplicationGroup %{GLOBAL}

in addition to the WSGIProcessGroup directory that delegates the WSGI application to the daemon process group.

That WSGIApplicationGroup directive, with that value, will tell Python to use the main interpreter context rather than a sub interpreter. This is preferred because not all third party Python modules will work correctly in sub interpreters. So this could avoid problems later on, especially if using any packages in data science field such as numpy, scipy etc, which are known not to work in sub interpreters.

Graham

Reply all
Reply to author
Forward
0 new messages