Serving very large pdf files with django

176 views
Skip to first unread message

Gary Roach

unread,
Sep 3, 2016, 1:12:04 PM9/3/16
to django-users
Hi all,

I am working on a project where I need to serve up large (100 -150 MB)
static pdf files for viewing. The pdf files are jpg photos of pages from
old log books. Downloading into the user's system is out of the question
for obvious reasons. In addition the user may only need to see one or
two pages out of a document. I plan on putting the static files in their
own directory outside of the django programming and the postgresql
database. I have looked at several addons to django but can't seem to
get a "feel" for the whole problem. Has anyone had a similar problem and
do they have a good solution .

All help will be sincerely appreciated.

OS Debian Linux
Desktop KDE4
Python 3.5
Django 1.9.x

Gary R.

ludovic coues

unread,
Sep 3, 2016, 6:26:44 PM9/3/16
to django...@googlegroups.com
You have a few solution. You already excluded the user downloading the
full document. You can display the document directly in your page,
with solution like pdfjs, or you can extract the individual image from
the document on your server and display directly the image.

If you show the pdf, I fear your only solution is that the user
download the whole document in their browser. Same problem as with
downloading the full document. A solution would be to split the
document into individual page.
> --
> You received this message because you are subscribed to the Google Groups
> "Django users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-users...@googlegroups.com.
> To post to this group, send email to django...@googlegroups.com.
> Visit this group at https://groups.google.com/group/django-users.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-users/b5e154b5-3c01-6f39-f4a1-6e285093aa45%40verizon.net.
> For more options, visit https://groups.google.com/d/optout.



--

Cordialement, Coues Ludovic
+336 148 743 42

ADEWALE ADISA

unread,
Sep 3, 2016, 7:58:27 PM9/3/16
to django...@googlegroups.com

hi,
My view is that instead of combining all the images in one pdf, its better to leave it as image and serve them individually by publishing link to each image with text describing each page. Then the user can view any page they are interested in. 'cause that single file is too large.

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscribe@googlegroups.com.

06us...@gmail.com

unread,
Sep 4, 2016, 2:52:47 AM9/4/16
to Django users

Hello,

Maybe you could use ImageMagick to show one page of the pdf as an image ? This is what I have implemented on my project : first page of pdf files can be pre-viewed before user uploads the whole file (if needed). In your case, pre-viewing could be parametrized to show page ‘n’ of the pdf file, probably. In my project, pre-viewing image is generated when pdf file is uploaded and saved by admin in myproject/media/files directory. Once pdf files are in media directory, users can preview those pdf files and tnen upload them, if needed (only first page can be pre-viewed by user. This is an implementation choice).


http://www.imagemagick.org/script/index.php


Implementation principles/credit can be found here :

http://www.yaconiello.com/blog/auto-generating-pdf-covers/


I am attaching the signals file_post_save file_pre_save, for information.


Cheers.

Django 1.8
Python 3.4


Note : to install ImageMagick:

sudo apt-get install libmagickwand-dev imagemagick libmagickcore-dev

The convert command allows to convert any page of pdf file into png image (for instance), eg :

convert -thumbnail 1280 test.pdf[0] test_1280.png
Several parameters exist to tune image quality (in my case, need to use density, trim and quality).

 

models.py

[…]

###########

# SIGNALS #

###########

from django.db.models.signals import post_save, pre_delete

from django.dispatch import receiver

from myproject.settings import MEDIA_ROOT

import subprocess


# What to do after a File is saved - receiver definition

def file_post_save(sender, instance=False, **kwargs):

            # This post save function creates a thumbnail for the File

            file = File.objects.get(pk=instance.pk)

            command = "convert -density 300 -trim -thumbnail %s %s%s[0] -quality 100 %s%s" % (file.thumbnail_size, MEDIA_ROOT, file.file, MEDIA_ROOT, file.thumbnail)

            proc = subprocess.Popen(command,

                        shell=True,

                        stdin=subprocess.PIPE,

                        stdout=subprocess.PIPE,

                        stderr=subprocess.PIPE,

            )

            stdout_value = proc.communicate()[0]

post_save.connect(file_post_save, sender=File, dispatch_uid="file_post_save_uid")

 

# What to do before a File is deleted - receiver definition

def file_pre_delete(sender, instance=False, **kwargs):

            # This pre delete function deletes file and thumbnail from media directory

            file = File.objects.get(pk=instance.pk)

            command = "rm %s%s %s%s" % (MEDIA_ROOT, file.file, MEDIA_ROOT, file.thumbnail)

            proc = subprocess.Popen(command,

                        shell=True,

                        stdin=subprocess.PIPE,

                        stdout=subprocess.PIPE,

                        stderr=subprocess.PIPE,

            )

            stdout_value = proc.communicate()[0]

pre_delete.connect(file_pre_delete, sender=File, dispatch_uid="file_pre_delete_uid")

Avraham Serour

unread,
Sep 4, 2016, 11:00:36 AM9/4/16
to django-users
if you need to server the pdf then it will be downloaded to the user computer, even if you are visualizing in browser with js.

What you can do is show screenshots of the pdf pages, then you would be serving individual jpgs for each page and not a single 100mb file.

You can create previews for each page using something like imagemagick as mentioned before and create a layout to view that using django template.



--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscribe@googlegroups.com.
To post to this group, send email to django...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.

Gary Roach

unread,
Sep 5, 2016, 10:41:07 PM9/5/16
to django...@googlegroups.com
Thanks you all for you contributions

I have decided to break down the pdf files into individual jpg pages and serve them that way. I plan on putting each document in its own folder and then have the individual pages listed within the folder. I haven't worked out the exact retrieval scheme yet but it will probably be something like:

Documents
    Document 1
        ;        :
        ;        jpg 1
        :            :
        :       jpg n
        ;
    Document n

This should work and allow fast recovery.

Thanks again for your help

Gary R
To unsubscribe from this group and stop receiving emails from it, send an email to django-users...@googlegroups.com.

To post to this group, send email to django...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
Reply all
Reply to author
Forward
0 new messages