Intermittently Slow Flask app

423 views
Skip to first unread message

Gary Conley

unread,
Dec 7, 2020, 5:31:04 PM12/7/20
to modwsgi
I have a flask app running with mod_wsgi version 4.7.1 on Centos7 (httpd). Python 3.6.

The server is brand new, 64GB RAM, Dual Xeon 3Ghz, all SSD drives, 8gbs fiber to a Stornext SAN. All very fast hardware.

I have a problem where the app tends to slow down dramatically under certain circumstances and have had no success to date in finding the cause.

The app itself is fairly simple. It processes images from a directory and uploads them to a DAM (php based - under NGINX) running on another server. The processing consists of generating preview and thumbnail images from the original images (mostly raw file formats - NEF, CR2 etc) using imageMagick, ufraw and exiftool, extracting xmp data using exiftool and uploading these elements to the DAM through the DAM's API.

The processing is multithreaded using python queues and workers, nothing fancy.

The app gathers data about the images from a mysql database and persists  transactional data to mysql for workflow management purposes.

There is a simple html template with controls for selecting which directories to upload and provide the user with feedback on progress, errors and so on, using ajax calls. The user can also abort the upload process.

Under normal circumstances the app will process 2 images per second. There have been instances however where the app slows way down, taking 7-10 seconds to upload 1 image, a factor of 15 to 20 times slower.

I have run metrics on the various steps of the upload procedure and it appears that every aspect of the app slows down. Generating a preview image, which normally takes less than a second takes 40 seconds, extracting metadata with exiftool, typically less than half a second takes 7-10 seconds. Database response seems to remain constant. Upload to the DAM also takes much longer.

When the slow down occurs requests from the browser to the app time out. Aborting the upload procedure becomes impossible and the only way to stop it is to stop Apache (httpd) directly on the server.

We have checked CPU usage (less than 10%), memory usage (less than 8 GB on a 64GB machine) and also checked IO, confirming that we were able read and write at up to 9.5gbs while the app was running at a snail's pace.

The only thing we have been able to isolate as having any effect on the upload speed is the size of the upload queue. When a user selects a directory to be uploaded the files in that directory (and all sub-directories) are sorted by size and put into a python queue from which they are then uploaded with between 10 and 20 threads, which is user configurable. We have tested with queues up to 10,000 files with no issue at all. We had a slow down with a queue that was over 35,000 images.

The content of the images makes little to no difference in speed. Processing huge image files (such as 5GB PSB files) does slow down the upload process, but only to 1.5 seconds per file. The last instance of slow down occurred on 5MB jpgs. The speed on 60MB NEF and 200KB jpgs is virtually the same under normal circumstances.

We suspect a memory issue, but don't see any increase in memory usage using htop, top or glances.

We restarted httpd in the hopes it would clear up any memory leak, with no improvement. We even rebooted the machine with similar hopes, again with no improvement.

We tried change the number of threads in our wsgi config (from 5 to 15) and also changed the number of processes to 2. We even tried setting threads to 1, which had disastrous effects. None of this made any improvement. We put our settings back to 1 process and 5 threads.

Any clues on what could be causing this slow down or ideas on how to isolate what is causing it would be much appreciated. We've spent days trying to track it down with the only solution being to break up our jobs into smaller chunks, which is very non-optimum for our workflow as it has to be done manually due to the nature of the content.

We can send files if needed, but are not sure what to send.

Thank you.

Gary

Graham Dumpleton

unread,
Dec 8, 2020, 1:48:03 AM12/8/20
to mod...@googlegroups.com
Is the image processing being done within the web application processes, or in a separate set of processes which operate only based on seeing what is stuck in the upload directory used to queue up images? 

Just trying to understand better how the work is broken up.

--
You received this message because you are subscribed to the Google Groups "modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to modwsgi+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/modwsgi/389a39bd-2ad1-4c65-a784-0762029d0825n%40googlegroups.com.

Gary Conley

unread,
Dec 8, 2020, 11:17:35 AM12/8/20
to modwsgi
Hi Graham,

Thanks for the reply.

All processing is done within the web application processes, if I understand your question correctly.

This is my first web app, I've done all my prior development as PySide desktop applications and actually migrated this app from a desktop app.

From the upload.html template the user sees a lists of directories that can be uploaded. These are all in a central "in box" directory which is updated every 10 seconds. The user selects directories and clicks "Upload Selected" which sends a request containing the selected paths to a '_launch_upload' route in the Flask app.py.

In my Flask app.py the '_launch_upload' route adds the selected paths to a Python queue. I then start a new python Thread targeting a _start_queue method. _start_queue in turn takes each path and instantiates a "loader" class which subclasses Thread and then calls the run() method on the loader object, which performs the actual upload of that directory.

The loader object puts the path for each image into a queue (self.q) from which they are processed in parallel up to 24 at a time (user configurable). This is done using a queue/worker configuration as in:

for _ in range(threadcount):
    t = Thread(target=self.worker)
    t.daemon = True
    t.start()
self.q.join()

The worker method is where all the image processing and upload to the DAM occurs.

The workers take an image off the self.q queue and process them until the queue is empty.

All image processing is done using subprocess.run calling the appropriate app.

When self.q.join() returns the first directory has been fully processed and the run method of the loader object returns. This returns control to the start_upload method in app.py which calls run on the next loader object and so on until all directories are processed.

I may have implied that the app works on a watch folder basis but this is not the case. It is entirely based on the user selection, for various reasons.

I also realized I failed to mention that we are running in wsgi daemon mode.

As a final note, there is an abort_upload route which can access the loader objects and call a stop_upload method on the object which changes the value of a keep_loading flag to false, which stops all processing on the loader object. It also empties the queue of any unprocessed directories. For some reason this method also stopped working. If I call it within the first few seconds of calling launch_upload it works fine, but if I let it go for a bit it no longer has any effect. This had been working well, but has since stopped and yet I didn't change anything in this part of the code. It seems to be related to this other problem with the slow uploads, but I can't be 100% certain.

I hope that is all clear.

Let me know if any logs or code would be helpful.

I really appreciate your help. I'm somewhat new to web development as I said, but learning fast!

Best,

Gary

Graham Dumpleton

unread,
Dec 8, 2020, 7:02:37 PM12/8/20
to mod...@googlegroups.com
Performing a lot of image processing in the web application processes is generally a bad idea. Usually you would do this using a backend task queuing system like Celery.

The main reason it is a bad idea is that Python does not perform very well when you have multiple threads and there are heavily CPU bound. This is because the Python global interpreter lock will result in Python application code effectively being serialised even though you have multiple threads. So for CPU bound work, multithreading is a convenience, but not a performant solution.

First thing I would do is confirm how many threads you actually have running in the process and what they are doing, when the process slows down. For this you can employ the code at:


You will need to update the example code to Python 3 as is still Python 2.

With that code in place, you can trigger a dump of what all the threads are up to.

Look for threads being stuck in code which may not be performant in handling large directory listings or general image processing.

Also look for more threads than expected, perhaps because where you are starting worker threads is getting executed more than once for some reason.

Graham

Gary Conley

unread,
Dec 8, 2020, 8:00:16 PM12/8/20
to modwsgi
Thanks Graham,

I'll do as you suggest and see what we get.

I suspect I have some sort of issue with large directories. As a workaround I've been breaking the directories down into 4000 images at a time and the performance is acceptable. So, while image processing may not be a great idea, it is working well for me provided I don't have huge directories. I had one as large as 10,000 that also ran fine, but 30,000+ was a total bust with performance rapidly going from 2 images per second to 7 seconds per image. With 4000 images in a directory I get consistent performance of 1-2 images per second.

Not sure if that helps narrow down the investigation, but thought I'd mention it.

Gary

Rory Campbell-Lange

unread,
Dec 9, 2020, 2:39:46 AM12/9/20
to mod...@googlegroups.com
On 08/12/20, Gary Conley (ga...@goldeneraproductions.org) wrote:
> I suspect I have some sort of issue with large directories. As a workaround
> I've been breaking the directories down into 4000 images at a time and the
> performance is acceptable. So, while image processing may not be a great
> idea, it is working well for me provided I don't have huge directories. I
> had one as large as 10,000 that also ran fine, but 30,000+ was a total bust
> with performance rapidly going from 2 images per second to 7 seconds per
> image. With 4000 images in a directory I get consistent performance of 1-2
> images per second.

Off topic, but I suggest not having more than 1,000 files per directory
if you can manage it, as running "ls" against a directory with more
images than than on cloud storage or indifferent storage backends will
cause a noticeable lag.

A common scheme is to work out how many images you might receive within
a peak time. If, for instance, you never receive more than 1000 images
in an hour, it is worth considering a date-based subdirectory structure
based on date, for example:

./images
2020120811
2020120811-01.jpg
2020120811-02.jpg
2020120812
2020120812-01.jpg
2020120812-02.jpg
2020120823
2020120823-01.jpg
2020120907
2020120907-01.jpg
2020120907-02.jpg
2020120907-03.jpg

etc.

Other directory structures based on exif data, image type, natural image
naming structures and so on can work too. Also, if the location of each
image is in a database you can avoid doing a directory scan if you
*know* where each images is. Still, the subdirectory approach is still a
good idea for maintenance and backup purposes.

Rory


Rory Campbell-Lange

unread,
Dec 9, 2020, 2:45:19 AM12/9/20
to mod...@googlegroups.com
On 09/12/20, Rory Campbell-Lange (ro...@campbell-lange.net) wrote:
> On 08/12/20, Gary Conley (ga...@goldeneraproductions.org) wrote:
> > I suspect I have some sort of issue with large directories. As a workaround
> > I've been breaking the directories down into 4000 images at a time and the
> > performance is acceptable. So, while image processing may not be a great
> > idea, it is working well for me provided I don't have huge directories. I
> > had one as large as 10,000 that also ran fine, but 30,000+ was a total bust
> > with performance rapidly going from 2 images per second to 7 seconds per
> > image. With 4000 images in a directory I get consistent performance of 1-2
> > images per second.
>
> Off topic, but I suggest not having more than 1,000 files per directory
> if you can manage it, as running "ls" against a directory with more
> images than than on cloud storage or indifferent storage backends will
> cause a noticeable lag.

Torek's answer on Stack Overflow suggests that git restricts the number
of files in a directory to 6700 by default

https://stackoverflow.com/a/18732276

Gary Conley

unread,
Dec 16, 2020, 2:43:11 PM12/16/20
to modwsgi
Thanks for Rory and Grahams suggestions.

For various reasons I haven't had the time to reconfigure the app to get the stack traces output, but have been breaking down the directories into smaller chunks and the app has been running at an acceptable speed. The large directories seems to be the main issue that was killing performance. It is still about 1/2 the speed I would expect, and have consistently achieved at times.

I looked into implementing Celery for image processing but it seems this would require a considerable refactoring of my code, which is rather complex and I'd rather not change.

But there is another issue which seems to be more a fundamental issue.

I have configured the app to run as one daemon process, by not setting the processes option in the WSGIDaemonProcess directive, but it seems the app is somehow running with more than one process.

The indications that this is the case are shown in the response to two routes which act on global variables in my main app.py file, and intermittently return different responses.

For example, I have an "abort_upload" route which empties a python queue of active upload jobs. This queue is a global variable in my main app.py Flask file. Most of the time calling this route has no effect, the app continues to process the queue. And the response I get in the browser indicates there is nothing in the queue. But the app just keeps running for hours and hours which can only occur if there is work in the queue. At one point I tried calling the abort_upload route continuously and it eventually aborted the queue and responded with the number of items in the queue that I expected.

I also have an ajax call which gets log data from the app every 2 seconds. Typically this works fine, but intermittently it will return log data that is half a day old, then continues to return the current log information.

Graham suggested that there might be a situation where multiple threads are being executed for the same task. At one point we had configured the WSGIDaemonProcess directive for multiple processes and one thread based on a post on StackOverflow, which was a disaster as the app then began processing the same image in many concurrent threads!

I have this exact same app running on a dev site with none of these issues. Both running Centos7, both running the same version of Apache, mod_wsgi and the  Apache and WSGI configs look the same. But it does appear to be running more than one process on our production site.

I have no idea how to determine that there are in fact multiple processes running or how to ensure that there is only one running in the first place. My understanding is that the WSGIDaemonProcess directive should be all I need to ensure a single process for my app and thus ensure only one instance of the global variables.

Thanks in advance for taking the time to go through this and help me out.

Gary

Graham Dumpleton

unread,
Dec 16, 2020, 3:43:03 PM12/16/20
to mod...@googlegroups.com
Can you provide the current configuration you are using for mod_wsgi? Ensure you include the context, ie., everything in the VirtualHost (except SSL stuff if worried about that being seen), plus mod_wsgi specific directives from outside of the VirtualHost as well, eg, WSGIRestrictEmbedded. 

If I can see the configuration I can check whether it is doing what you think it is.

Graham

--
You received this message because you are subscribed to the Google Groups "modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to modwsgi+u...@googlegroups.com.

Gary Conley

unread,
Dec 16, 2020, 6:53:04 PM12/16/20
to modwsgi
Hi Graham

Thanks for the rapid reply.

You will have to bear with me a bit as this is my first flask app... I've attached what I think you are asking for, plus a log file and my main app.py file. Probably a bit rough around the edges in places. The file attached is a zip file.

This is the only app running under httpd on this server by the way. And in checking the logs I noticed I have 4 PIDs, tending to indicate I have 4 processes running. You'll probably see from my app that it really can only have one process running as I'm really only trying to do one thing, process images for upload to a DAM, and there is only one user doing this, me.

You will see the entries in the log file from yesterday morning showing an abort requested. I was the only user and had only one page open to run the app. You will see at 9.47 am the app reports there are 0 jobs running, and then 5 minutes later reports 1 job running 3 in the queue. That is accessing two global variables in app.py, phrasea_upload_runnables and upload_queue. What makes no sense at all is that both calls, 5 minutes apart should have been addressing the same variables with the same values. The PIDs on both those calls are the same.

One other thing I should mention. When we first ran into the performance issues caused by the huge directories, we tried running the app under nginx, which we never got working. nginx is installed but not running on the server. I included my nginx ini file which specifies 4 processes.

Thanks for taking the time to look this over.

Best regards,

Gary

phrasea_upload

Graham Dumpleton

unread,
Dec 16, 2020, 7:45:35 PM12/16/20
to mod...@googlegroups.com
You have an incorrect path in the configuration meaning you are actually running embedded mode and not daemon mode. This means you are subject to Apache dynamic process management and thus you can end up with more than one process.

Your config is:

Listen 82

#/etc/apache2/sites-available/phrasea_upload_flask.conf
<VirtualHost *:82>
   ServerName phrasea_upload
   ServerAlias phrasea_upload.avlib.net

   WSGIDaemonProcess phrasea_uploadapp user=apache group=apache threads=15 home=/var/www/upload-flask
   WSGIScriptAlias / /var/www/upload-flask/phrasea_upload_flask.wsgi

   Alias /static /var/www/upload-flask/static
   Alias /templates /var/www/upload-flask/templates

   <Directory "/var/www/phrasea_upload_flask">
       WSGIProcessGroup phrasea_uploadapp
       Require all granted
       WSGIScriptReloading On
   </Directory>

   LogLevel info
   ErrorLog /etc/httpd/logs/phrasea_upload_flask.log
</VirtualHost>

The path which is wrong is:

    /var/www/phrasea_upload_flask

given in the Directory directive. It doesn't actually match the path for the WSGI script file, meaning that the WSGIProcessGroup directive was ignored.

Change this to:

Listen 82

# Ensure that mod_wsgi embedded mode is disabled so don't accidentally run stuff in embedded mode.
WSGIRestrictEmbedded On

#/etc/apache2/sites-available/phrasea_upload_flask.conf
<VirtualHost *:82>
   ServerName phrasea_upload
   ServerAlias phrasea_upload.avlib.net

   WSGIDaemonProcess phrasea_uploadapp user=apache group=apache threads=15 home=/var/www/upload-flask
   # Set the daemon mode process group and application interpreter context here explicitly.
   WSGIScriptAlias / /var/www/upload-flask/phrasea_upload_flask.wsgi process-group=phrasea_uploadapp application-group=%{GLOBAL}

   Alias /static /var/www/upload-flask/static
   Alias /templates /var/www/upload-flask/templates

   <Directory "/var/www/upload-flask">
       Require all granted
   </Directory>

   LogLevel info
   ErrorLog /etc/httpd/logs/phrasea_upload_flask.log
</VirtualHost>

Rather than use WSGIProcessGroup, have set process-group on WSGIScriptAlias instead. Also set application-group to uses the main interpreter context and not a sub interpreter. This can avoid problems with third party Python modules that don't work in sub interpreters properly.

You also didn't need WSGIScriptReloading as that is default for daemon mode.

That the path didn't match meant that "Require all granted" wasn't being applied either. That it worked without that being applied means you are likely on one of the Linux distributions which somehow break Apache access controls and set access for the whole filesystem or URL namespace at higher scope somewhere.

Since you have "info" for LogLevel, with daemon mode now being properly applied, you should clearly see WSGI script file being loaded in daemon mode process. Right now you have:

[Sun Dec 13 03:29:39.291197 2020] [wsgi:info] [pid 16283:tid 139637154567936] [client 10.12.17.31:41458] mod_wsgi (pid=16283, process='', application='phrasea_upload|'): Loading Python script file '/var/www/upload-flask/phrasea_upload_flask.wsgi'.

See how "process" is an empty string. This means that it was using embedded mode. With fixed config that "process" should show "phrasea_uploadapp" and application should be empty string, with the latter indicating main interpreter context rather than sub interpreter.

Graham

--
You received this message because you are subscribed to the Google Groups "modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to modwsgi+u...@googlegroups.com.

Gary Conley

unread,
Dec 16, 2020, 11:31:55 PM12/16/20
to modwsgi
Graham,

It is people like you that make the open-source world go round. Thank you for pointing out my errors and solving them at the same time.

I implemented the new config and the app is running as expected in terms of functionality, which is great.

I still have a bit of an issue with speed, but nothing near as bad as it was before. I have at times seen performance in the sub 0.5 second per file range. Same hardware, same app, same code. And it would run for hours at that speed, but gradually slow down. I am now seeing performance in the 0.8 to 1 second per file range. I do understand the issues with python multi-threading and have successfully implemented multi-processing in other (albeit simpler) standalone applications. But it seems odd that I would have had better performance under the same conditions and see a gradual slow down. Rebooting the server made no difference. If it had been a memory leak or something of that nature one would think rebooting would make a difference.

If it is a simple matter to implement celery in my app then I may go that route. Unfortunately I'm not familiar enough with it yet to make an accurate assessment. It may not be worth the effort. My app is working as it is and the performance may be adequate. If I could achieve consistent 0.5 seconds per file performance it would be nice and would utilize my hardware, which is basically loafing along right now. CPU nevers gets above about 30% and there are other services running on this machine! So there's plenty of performance to be gained... And there is a phenomena where it seems to slow down over time.

But I'm now into straight python and off topic for this mailing list.

If you have any ideas I welcome them. But once again thanks for your help. Really appreciated.

Best,

Gary

Graham Dumpleton

unread,
Dec 17, 2020, 12:09:02 AM12/17/20
to mod...@googlegroups.com
I'll go back over and read email again when I have a chance and see if anything else can suggest, but right now suggest a couple of checks you can make which don't require changing any code.

First up, if you didn't feel comfortable about adding that code I linked before for dumping stack traces, at least do an external check for how many threads are running in the process just to make sure it looks right.

For threads=15 you should actually see about 18 threads used by mod_wsgi daemon process from memory.

There is the main process thread, 15 request handler threads and a number of background threads used by mod_wsgi for checking liveness and handling process shutdown. Can't remember off hand the exact number of the later, so may actually be 19, but also depends on the configuration.

On top of that you will have your background threads. So just check occasionally to make sure the number is what you expect on top of those used by mod_wsgi.

Creating background threads like you are from requests is not usually a good idea. How you are creating them doesn't look to be itself thread safe and so technically if multiple requests hit at the same time initially I think you could end up creating more than one background thread. Really needs some locking around thread creation. I will look at what you are doing and comment more on that later.

Anyway, you can check the number of threads in a process using 'ps' or looking in /proc for process.


The other thing you might also check is how many temporary files are under /tmp and /var/tmp. This is just a hunch. For some stuff one can get temp files created as a side effect of doing stuff and my thinking here is maybe they are growing in number over time and not getting cleaned up properly. So am wondering whether this may be inadvertently slowing the machine done if doing things in the tmp directories. Although /tmp is usually cleared on reboot, the /var/tmp directory usually isn't.

Graham

Graham Dumpleton

unread,
Dec 17, 2020, 12:22:30 AM12/17/20
to mod...@googlegroups.com
BTW, this looks like it could be short circuited.

    for loaddir in loaddirs:
        if os.path.isdir(loaddir):
            # filecheck = glob(loaddir + '/**/*.*', recursive=True)
            filecheck = []
            for root, _, files in os.walk(loaddir):
                for f in files:
                    filecheck.append(os.path.join(root, f))
            if filecheck:
                # searchpath, startdir, phraseaserver, auth, refspath, jsondir, threadcount, db
                return PhraseaLoader(upload_dir, to_phrasea_dir, phrasea_server, auth, refspath, json_meta_path, threadcount, db, make_previews, debug_mode)
            else:
                print('Empty folder. Nothing to do here. Load directory: {}'.format(loaddir))
                return None

What I mean by that is that since the contents of file check list is never actually used, you don't need to completely populate it. Return as soon as you find an entry.

So could you instead just do the following, which is return as soon as you find any directory which has a non empty files list.

    for loaddir in loaddirs:
        if os.path.isdir(loaddir):
            for root, _, files in os.walk(loaddir):
                if files:
                    return PhraseaLoader(upload_dir, to_phrasea_dir, phrasea_server, auth, refspath, json_meta_path, threadcount, db, make_previews, debug_mode)

            print('Empty folder. Nothing to do here. Load directory: {}'.format(loaddir))
            return None

The logic also seems not to be quite right either, as it appears to bail out as soon as it finds an empty directory tree, rather than going onto the next one in loaddirs. Should that "return None" really exist? Or is it meant to stop when finds first empty directory tree?

Graham

Gary Conley

unread,
Dec 17, 2020, 12:59:06 PM12/17/20
to modwsgi
Thanks very much Graham,

You are spot on. The logic is not correct. I should actually return a list and then iterate through the list in _start_queue. I've modified the code. There is a large job running that I need to let finish and then I'll try it out. This should provide a slight performance improvement.

I'm working on some debug code that will tell me see how long each step of the image processing is slowing down. For each image I upload the original image to the DAM using an API call,  generate a preview and a thumbnail, calculate a sha256 hash, extract metadata with exiftool, upload preview and thumbnail to the DAM, also with API calls and record all these transactions in a mysql database. And there are several other steps involved to qualify the images before upload. So any one of these could be where I'm losing performance. Once I have a grip on that I'll have a better idea where to go.

But again, the big mystery is that I have had twice the performance I'm getting now, somewhat out of the blue without any obvious change. And so we investigate!!

I'll check into your other suggestions to see where that leads me.

Thanks again.

Gary Conley

unread,
Dec 20, 2020, 6:59:54 PM12/20/20
to modwsgi
Hi Graham,

I finally figured out what was causing my slows. It had nothing to do with my flask app or mod_wsgi but was related to the DAM and how it handles file uploads. That's all in PHP code, which is now fixed and the performance is back to what I was expecting to begin with. Better actually.

Your help in getting my flask app running properly in daemon mode enabled me to trace this down. So thank you for that. And I do see that I have an issue with not being thread safe and will dig in and get that sorted out.

Best,

Gary
Reply all
Reply to author
Forward
0 new messages