Pyramid large downloads

18 views
Skip to first unread message

Mike Orr

unread,
Sep 8, 2025, 5:52:38 PM (13 days ago) Sep 8
to pylons-...@googlegroups.com
I have a Pyramid site that needs to support downloading data on the
fly. The user will go to an incident page, press a button, and it will
pack the data into a temporary zip file and serve it. I've done
downloads before using FileResponse on a persistent file, or setting
the response headers (content-type and content-disposition) and
setting the body to generated CSV. But this time the zip file content
may be inconveniently large to fit into memory (250 MB) or too large
(2 GB), and I want the zip file and its temporary source directory to
be deleted at the end of the request.

My first thought is to open the file for reading, set
'response.body_file' to it, and delete the file, depending on Unix's
ability to delay deleting the file until all open filehandles are
closed. Is that likely to work through Waitress + Traefik (webserver)
+ load balancer? Or is there another way?

Otherwise I could put it in a persistent directory of recent download
files, but that would be more complicated and I'd have to have a job
that prunes the directory, so I'd rather not pursue that route.

The application is still on Pyramid 1.3 and Python 3.10. It will be
upgraded in the next several months, but not in time for this feature.

--
Mike Orr <slugg...@gmail.com>

Eldav

unread,
Sep 8, 2025, 6:03:37 PM (13 days ago) Sep 8
to pylons-discuss
Hello Mike,


Laurent.

Theron Luhn

unread,
Sep 8, 2025, 6:11:06 PM (13 days ago) Sep 8
to pylons-...@googlegroups.com
Personally in cases like this I upload the file to S3 and then redirect the user to the signed URL. S3 lifecycle can be configured to delete the file after a day, which keeps storage costs minimal.

If you want to serve the file directly, I’d write to a TemporaryFile and then pass that in as body_file. TemporaryFile will take care of deleting the file once it’s closed, which will happen automatically once all references to it are dropped.

You could also use a finished callback to explicit delete the file. https://docs.pylonsproject.org/projects/pyramid/en/1.3-branch/narr/hooks.html#using-finished-callbacks
> --
> You received this message because you are subscribed to the Google Groups "pylons-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pylons-discus...@googlegroups.com.
> To view this discussion visit https://groups.google.com/d/msgid/pylons-discuss/CAH9f%3Dur6%2BWQm5-OYwYGePg1VNj48tMq%3DAA9GtUmKYC0ZUe9-cA%40mail.gmail.com.

Mike Orr

unread,
Sep 8, 2025, 6:31:42 PM (13 days ago) Sep 8
to pylons-...@googlegroups.com
Thanks for the ideas. They led me to another one that might work best:
'tempdir.TemporaryDirectory'. I'll try putting that in the request
object, and then having a finished callback that calls .cleanup() on
it to delete the directory. The temporary directory would contain both
the zip file and the temporary source directory the zip file is built.
I'll let you know whether it works.
> To view this discussion visit https://groups.google.com/d/msgid/pylons-discuss/96CDD021-E95E-469F-8738-07EF3AE8B910%40luhn.com.



--
Mike Orr <slugg...@gmail.com>

Mike Orr

unread,
Sep 11, 2025, 10:02:37 AM (11 days ago) Sep 11
to pylons-...@googlegroups.com
The TemporaryDirectory approach worked fine. I tried generating an 800
MB zip file and serving it for download. The only noticeable thing was
it took a minute to run the zip subcommand on hundreds of files, and
at the end of the download, Firefox's download progress meter delayed
for several seconds before completing it. But these are both due to
the size of the file. Here's an approximation of the Python code:

```
import os
import subprocess
import tempfile
import pyramid.response
def archive(request): # View callable.
tmp_dir = tempfile.TemporaryDirectory()
request.add_finished_callback(cleanup_tmp_dir) # Function defined below.
request.tmp_dir = tmp_dir
tpath = tmp_dir.name # Path of temporary directory.
zip_filename = "incident-10830.zip"
source_dirname = "incident-10830"
zip_path = os.path.join(tpath, zip_filename)
source_path = os.path.join(tpath. source_dirname)
# ... Set up the source directory ....
command = ["/usr/bin/env", "zip", "r", zip_filename, source_dirname]
# Must chdir to temp directory to run 'zip' to get the archive's
item paths right.
olddir = os.getcwd()
os.chdir(tpath)
try:
subprocess.run(command)
finally:
os.chdir(olddir)
disposition = "attachment; filename=" + zip_filename
response = pyramid.response.FileResponse(zip_path)
response.headers["Content-disposition"] = disposition
return response

def cleanup_tmp_dir(request):
request.tmp_dir.cleanup() # Delete the temp dir and its contents.
# The zip file and its source directory will
```

I used the 'zip' command instead of Python's 'zipfile' module because
my existing script that this view is replacing did, and I don't
remember why. I used 'zipfile' somewhere in another sites. The zip
command does give a nice progress report in the log as it adds each
file and how much it compresses it.
--
Mike Orr <slugg...@gmail.com>

Jonathan Vanasco

unread,
Sep 13, 2025, 5:30:38 PM (9 days ago) Sep 13
to pylons-discuss
Theron Luhn:

> Personally in cases like this I upload the file to S3 and then redirect the user to the signed URL. S3 lifecycle can be configured to delete the file after a day, which keeps storage costs minimal.

I prefer to do this as well.  You can use pre-signed urls to ensure only that user is able to download the file as well.

Mike Orr:


> Thanks for the ideas. They led me to another one that might work best: 'tempdir.TemporaryDirectory'.

I recommend against this approach, as it does not scale well.  The issue is this requires a dedicated Pyramid worker (thread or process).  IMHO, it's best to offload stuff like this onto something like Nginx/Apache/HAProxy that can handle concurrency and streaming better, while freeing up the worker.  A common approach is to store some metadata about the generated file and stash it in a database, then have a cronjob (or similar) read that database and delete expired files after 24 hours.

If you do have to do this within Pyramid, I suggest creating a dedicated Pyramid App that only serves the relevant views.  That will prevent these views from tying up workers that serve your main app.

Instead of building a stripped down application that only pulls in the required libraries, you can shard your routes and views.  The technique I used is to put these views in a dedicated folder, like "views_download", and using a config file variable like `enable_views_public` and `enable_views_download` to enable/disable them on an instance.  Then your `routes.py` file looks like:

    if  enable_views_admin:
        config.scan("myapp.views_admin")
    if  enable_views_public:
        config.scan("myapp.views")

A caveat to this strategy is that you need to run each process on its own port or socket.

With this strategy though, any issues with slow-clients or too-many-requests against the admin routes will be limited to affecting the admin Pyramid application, and not tie up your normal app.

Mike Orr

unread,
Sep 13, 2025, 6:37:07 PM (8 days ago) Sep 13
to pylons-...@googlegroups.com
On Sat, Sep 13, 2025 at 2:30 PM Jonathan Vanasco <jvan...@gmail.com> wrote:
> I recommend against this approach, as it does not scale well. The issue is this requires a dedicated Pyramid worker (thread or process). IMHO, it's best to offload stuff like this onto something like Nginx/Apache/HAProxy that can handle concurrency and streaming better, while freeing up the worker. A common approach is to store some metadata about the generated file and stash it in a database, then have a cronjob (or similar) read that database and delete expired files after 24 hours.

Good to know, but this is a low-volume site with a restricted
userbase. The feature is currently used only a few times a year, and I
could see that going to a couple times a week at most. But while it's
used infrequently, some of its use cases are essential. It replaces an
offline script that will be harder to support in the new IT
infrastructure. And I needed to implement it quickly to fold it into a
release this month.

> > Personally in cases like this I upload the file to S3 and then redirect the user to the signed URL. S3 lifecycle can be configured to delete the file after a day, which keeps storage costs minimal.

I made a prototype of that for another feature, serving attachment
files interactively in the site. But one of the sysadmins recommended
against this approach, saying using S3 to serve directly to users was
prone to delays.

Jonathan Vanasco

unread,
Sep 13, 2025, 8:43:53 PM (8 days ago) Sep 13
to pylons-discuss
> The feature is currently used only a few times a year, and I could see that going to a couple times a week at most

A common problem is that if someone hits reload, or has an internet connectivity issue, you end up with one worker stalled and the second worker handling this.  You can very easily grind a site to halt with this.  I learned this the hard way.  I now shard all admin routes for apps using the technique mentioned above, so the expensive operations of a few users do not affect the performance of the public routes.  I sometimes layer in S3 as well.

I've never had those issues with S3.  
Reply all
Reply to author
Forward
0 new messages