Deleted file leak, file upload, daemon mode, webob, WSGIApplicationGroup

20 views
Skip to first unread message

Ezra Peisach

unread,
Mar 2, 2021, 8:42:19 PM3/2/21
to modwsgi

Ok, this will be complicated.
Recently moving to Python3,
Running mod_wsgi from apache,

We needed to add:

WSGIApplicationGroup %{GLOBAL}

due to third party code (numpy, lxml). This means that requests are served by primary python process

WSGIDaemonProcess wsgi_app_ssl processes=10 threads=1 python-path="/path_to_venv..."  

  WSGIProcessGroup  wsgi_app_ssl


We then have some scripts:

WSGIScriptAlias /service/review_v2              /path/doServiceRequest_review.wsgi

 WSGIScriptAlias /service/status_update_tasks_v2 /path/doServiceRequest_ctl_v2.wsgi

.....


Application handling is a standard 


from webob import Request, Response

def __call__(self, environment, responseApplication):

        myRequest  = Request(environment)

 .....

 After a single request is processed with a file that is being uploaded in a FieldStorage


The issue is that after the request, the file descriptor is still open, but deleted (using lsof).

This file appears to be open in every process.of httpd. (same filename).

a) Is the apache configuration correct in this case?

b) Am I missing something here - i.e. is WebOB at fault here?

WebOB uses cgi - which has a cleanup __del__ which is supposed to close the file - but.I have not debugged down that far....





Graham Dumpleton

unread,
Mar 2, 2021, 8:59:14 PM3/2/21
to mod...@googlegroups.com

On 3 Mar 2021, at 12:39 pm, Ezra Peisach <ezra.p...@rcsb.org> wrote:


Ok, this will be complicated.
Recently moving to Python3,
Running mod_wsgi from apache,

We needed to add:

WSGIApplicationGroup %{GLOBAL}

due to third party code (numpy, lxml). This means that requests are served by primary python process

No, that isn't what it means. Setting the application group forces which sub interpreter context within each process is used. In this case it sets it to the main or first interpreter context, which behaves like command line Python. There will still be a copy of this application (interpreter context) in all 10 of the processes in the daemon process group.

WSGIDaemonProcess wsgi_app_ssl processes=10 threads=1 python-path="/path_to_venv..."  

  WSGIProcessGroup  wsgi_app_ssl

We then have some scripts:

WSGIScriptAlias /service/review_v2              /path/doServiceRequest_review.wsgi

 WSGIScriptAlias /service/status_update_tasks_v2 /path/doServiceRequest_ctl_v2.wsgi

.....



This is where may now have a problem as setting the application group globally means both those WSGI applications now run in the same sub interpreter context of each process. If those WSGI applications are not compatible when run together, eg., try and both use same global data object of imported module for different things, then you can get problems.

Application handling is a standard 


from webob import Request, Response

def __call__(self, environment, responseApplication):

        myRequest  = Request(environment)

 .....

 After a single request is processed with a file that is being uploaded in a FieldStorage

The issue is that after the request, the file descriptor is still open, but deleted (using lsof).

This file appears to be open in every process.of httpd. (same filename).



That would only be the case if there had been multiple requests against the WSGI application.

As mentioned above, there is still a copy of the WSGI application in each process, and thus as each process handles a request, then that process would also end up opening the file.

a) Is the apache configuration correct in this case?



It is okay, but with concern over whether your multiple WSGI applications can now run together in the same sub interpreter context.

If both WSGI applications use numpy, you would have to use multiple daemon process groups and keep them separate.

  # Add this outside of VirtualHost to ensure only daemon mode used.

  WSGIRestrictEmbedded On

  # Two daemon process group.

  WSGIDaemonProcess wsgi_app_ssl_1 processes=5 threads=1 python-path="/path_to_venv..."
  WSGIDaemonProcess wsgi_app_ssl_2 processes=5 threads=1 python-path="/path_to_venv..."

  # Force first into one daemon process group.

  WSGIScriptAlias /service/review_v2 /path/doServiceRequest_review.wsgi process-group=wsgi_app_ssl_1 application-group=%{GLOBAL}

  # And second into other daemon process group.

  WSGIScriptAlias /service/status_update_tasks_v2 /path/doServiceRequest_ctl_v2.wsgi process-group=wsgi_app_ssl_2 application-group=%{GLOBAL}

If one doesn't use numpy, then you can restrict which one has to run in the main interpreter context.

  # Add this outside of VirtualHost to ensure only daemon mode used.

  WSGIRestrictEmbedded On

  # Single daemon process group.

  WSGIDaemonProcess wsgi_app_ssl processes=10 threads=1 python-path="/path_to_venv..."  

  # Force one using numpy into main interpreter context.

  WSGIScriptAlias /service/review_v2 /path/doServiceRequest_review.wsgi process-group=wsgi_app_ssl application-group=%{GLOBAL}

  # For second application group not specified, meaning it will run in named sub interpreter where name based on host and URL mount point.

  WSGIScriptAlias /service/status_update_tasks_v2 /path/doServiceRequest_ctl_v2.wsgi process-group=wsgi_app_ssl

Note I am using options to WSGIScriptAlias to set process group and application group instead of the separate directives.

b) Am I missing something here - i.e. is WebOB at fault here?

WebOB uses cgi - which has a cleanup __del__ which is supposed to close the file - but.I have not debugged down that far....



Relying on __del__ to cleanup file descriptors can be bad because if something holds the object in memory, it may only be cleaned up later when garbage collector kicks in.

Anyway, hope that helps explain things.

Graham

Ezra Peisach

unread,
Mar 3, 2021, 6:11:08 AM3/3/21
to mod...@googlegroups.com

Thank you for your response.

If both applications use numpy and lxml, is it safe to use the same global WSGIApplicationGroup, but use separate process groups for each applcation?  The applications are related, but do not interact with each other, except through database and filesystem.

I will try this.

Independently, for webob, I believe if a file upload request is larger than 10Kb, it buffers to a temporary file, but never closes at end, relying on pythonic cleanup when class scope is exited. That I can report independently.

--
You received this message because you are subscribed to a topic in the Google Groups "modwsgi" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/modwsgi/rvOgQsj-kN0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to modwsgi+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/modwsgi/9D3B14DB-861B-4B3E-AD3F-91E855A9F8D6%40gmail.com.

Graham Dumpleton

unread,
Mar 3, 2021, 6:15:03 AM3/3/21
to mod...@googlegroups.com

On 3 Mar 2021, at 10:11 pm, Ezra Peisach <ezra.p...@rcsb.org> wrote:

Thank you for your response.

If both applications use numpy and lxml, is it safe to use the same global WSGIApplicationGroup, but use separate process groups for each applcation?  The applications are related, but do not interact with each other, except through database and filesystem.

That was the example I already provided. Eg.

 # Add this outside of VirtualHost to ensure only daemon mode used.

  WSGIRestrictEmbedded On

  # Two daemon process group.

  WSGIDaemonProcess wsgi_app_ssl_1 processes=5 threads=1 python-path="/path_to_venv..."
  WSGIDaemonProcess wsgi_app_ssl_2 processes=5 threads=1 python-path="/path_to_venv..."

  # Force first into one daemon process group.

  WSGIScriptAlias /service/review_v2 /path/doServiceRequest_review.wsgi process-group=wsgi_app_ssl_1 application-group=%{GLOBAL}

  # And second into other daemon process group.

  WSGIScriptAlias /service/status_update_tasks_v2 /path/doServiceRequest_ctl_v2.wsgi process-group=wsgi_app_ssl_2 application-group=%{GLOBAL}

Am using application-group and process-group options on WSGIScriptAlias, instead of WGSIProcessGroup/WSGIApplicationGroup, as the options are more precise and do the same thing. Using both options as same time also has side effect or preloading WSGI script on process start, rather than first request, which can be beneficial in some cases.

You received this message because you are subscribed to the Google Groups "modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to modwsgi+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/modwsgi/f2847256-7248-33ce-b9f3-792d570117ba%40rcsb.org.

Ezra Peisach

unread,
Mar 3, 2021, 6:46:08 AM3/3/21
to mod...@googlegroups.com

Thank you for the followup.

With your suggestion - two process groups, application-group on WSGIScriptAlias - I am seeing file descriptor open on what appears to be the parent and process, and two subprocesses.

I have a sneaky suspicion that webob is causing a resource leak.  Without application-group specified - a similar pattern.

Reducing file upload to under 10k - reduces to a single leak per process.

I will take an independent server - and reduce everything down to as minimal a test case as possible - and see if webob or my code is doing something odd with the Request. My reading of the code is that it should fall out of scope and cleanup.  Similar python class arrangement suggests that cleanups should be happening - but I need more testing.


from lsof:

httpd     38004         xdev   11u      REG              253,0     695699  151322661 /tmp/#151322661 (deleted)
httpd     38004         xdev   12u      REG              253,0     694951  151322663 /tmp/#151322663 (deleted)
httpd     38004 38254   xdev   11u      REG              253,0     695699  151322661 /tmp/#151322661 (deleted)
httpd     38004 38254   xdev   12u      REG              253,0     694951  151322663 /tmp/#151322663 (deleted)
httpd     38004 38255   xdev   11u      REG              253,0     695699  151322661 /tmp/#151322661 (deleted)
httpd     38004 38255   xdev   12u      REG              253,0     694951  151322663 /tmp/#151322663 (deleted)
httpd     38004 38256   xdev   11u      REG              253,0     695699  151322661 /tmp/#151322661 (deleted)
httpd     38004 38256   xdev   12u      REG              253,0     694951  151322663 /tmp/#151322663 (deleted)

Ezra Peisach

unread,
Mar 3, 2021, 1:25:01 PM3/3/21
to mod...@googlegroups.com

I have tracked down the issue in webob.

A reference assignment copy of a cgi FieldStorage within webob results in resources not cleaning up.


A minimal test case is:

  def __call__(self, environment, responseApplication):
        """          Request callable entry point                                                                                                                                                                                                    
                                                                                                                                                                                                                                                     
        """
        myRequest  = Request(environment)

        myResponse = Response()
        myResponse.status       = '200 OK'
        myResponse.content_type = 'text/html'


        p = myRequest.params                                                                                                                                                                                                                        
        return myResponse(environment,responseApplication)

results in the resource leak - when running in daemon mode. If you get rid of the "p = ...." - then no resource leak.

I will report to webob developers. I am just going to see if I can use mod_wsgi-express to reproduce the scenario.


Thank you Graham for explaining a better way to configure what I am trying to do.

Graham Dumpleton

unread,
Mar 3, 2021, 5:43:15 PM3/3/21
to mod...@googlegroups.com

On 3 Mar 2021, at 10:46 pm, Ezra Peisach <ezra.p...@rcsb.org> wrote:

Thank you for the followup.

With your suggestion - two process groups, application-group on WSGIScriptAlias - I am seeing file descriptor open on what appears to be the parent and process, and two subprocesses.



Add the "display-name" option to WSGIDaemonProcess so you can distinguish what are the mod_wsgi daemon process. See:


When using Apache/mod_wsgi, no WSGI process gets forked. The only process that forks is the Apache parent process, and it doesn't have the WSGI application code loaded and no requests are handled in it.

So not sure if lsof is confusing things and showing notional separate process ID for each thread in the process, which under the covers Linux used to do not not sure if does now. Or, if your WSGI application code is doing something that causes forked processes to occur. If the later and that is at point before file descriptor cleaned up, and the forked process then exec's something else, the open file will still be marked against that forked sub process.
Reply all
Reply to author
Forward
0 new messages