compressing uploaded file

892 views
Skip to first unread message

Daniel Gerzo

unread,
Apr 19, 2011, 7:36:43 PM4/19/11
to django...@googlegroups.com
Hello all,

I am uploading some text files through django (using a form FileField),
and I am getting InMemoryUploadedFile objects this way. In my
handle_uploaded_subtitles() method, which gets the list of
InMemoryUploadedFile objects, I would like to compress these files (so
that I will get either gzip or bzip2 file, and then save it to a
FileField model field.

Currently, I have a this code:

def handle_uploaded_subtitles(self, files):
for file in files:
sub_file = SubtitleFile(file_name=file.name, etc)
# here I need to compress the file
sub_file.file.save(file.name, file)

Does anyone here have an idea how can I accomplish this?
Thanks!

--
Kind regards
Daniel

Julio Ona

unread,
Apr 19, 2011, 8:22:49 PM4/19/11
to django...@googlegroups.com
Hi Daniel,

you should see:

or

But basically you should import the compress function from the library and use it.

from bz2 import compress

[...]


def handle_uploaded_subtitles(self, files):
   for file in files:
       sub_file = SubtitleFile(file_name=file.name, etc)
       bz_file = compress(file)
       sub_file.file.save(file.name, bz_file)

Regards,


--
You received this message because you are subscribed to the Google Groups "Django users" group.
To post to this group, send email to django...@googlegroups.com.
To unsubscribe from this group, send email to django-users...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/django-users?hl=en.




--
Julio Ona

Daniel Gerzo

unread,
Apr 20, 2011, 4:02:54 AM4/20/11
to django...@googlegroups.com

Hello Juliom, thanks for reply.

I have of course seen both of these before I sent the mail,
unfortunately I couldn't figure out how to use it on my
InMemoryUploadedFile object.

> But basically you should import the compress function from the library
> and use it.
>

> <http://docs.python.org/library/gzip.html#module-gzip>from bz2 import


> compress
>
> [...]
>
>
> def handle_uploaded_subtitles(self, files):
> for file in files:
> sub_file = SubtitleFile(file_name=file.name, etc)
> bz_file = compress(file)

I wish it would be that easy :-)

What you are proposing fails with Exception:

bzfile = compress(file)
argument 1 must be convertible to a buffer, not InMemoryUploadedFile


Further, I wasn't able to find a method in the mentioned libraries that
would make this possible, or at least I didn't figure out how to pass an
InMemoryUploadedFile to them to compress it.

When I try to do this:

file.write(zlib.compress(file.read()))
sub_file.file.save(file.name, file)

that does not seem to compress it's content, the file gets saved, but
when I run file(1) on it, it doesn't recognize it as a gzip file and
neither gzip(1) does. When I compare the original file and the new file
with diff(1) it's the same but, there are a few additional bytes at the
end of the new file (I don't think it's a compressed content as it's too
few bytes.)

I also tried:

gzipfile = gzip.GzipFile(fileobj=file)
sub_file.file.save(file.name, gzipfile)

however, that fails for me with this exception:

'NoneType' object is not subscriptable
which comes from
/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py
in __init__:

if mode[0:1] == 'r':

So do you have any more specific ideas how to accomplish what I am
trying to do?

Thank you.

> sub_file.file.save(file.name, bz_file)
>
> Regards,
>
> On Tue, Apr 19, 2011 at 7:36 PM, Daniel Gerzo <dge...@gmail.com
> <mailto:dge...@gmail.com>> wrote:
>
> Hello all,
>
> I am uploading some text files through django (using a form
> FileField), and I am getting InMemoryUploadedFile objects this way.
> In my handle_uploaded_subtitles() method, which gets the list of
> InMemoryUploadedFile objects, I would like to compress these files
> (so that I will get either gzip or bzip2 file, and then save it to a
> FileField model field.
>
> Currently, I have a this code:
>
> def handle_uploaded_subtitles(self, files):
> for file in files:
> sub_file = SubtitleFile(file_name=file.name

> <http://file.name>, etc)


> # here I need to compress the file

> sub_file.file.save(file.name <http://file.name>, file)


>
> Does anyone here have an idea how can I accomplish this?
> Thanks!
>
> --
> Kind regards
> Daniel

--
Kind regards
Daniel Gerzo

Ian Clelland

unread,
Apr 20, 2011, 11:56:52 AM4/20/11
to django...@googlegroups.com
On Wed, Apr 20, 2011 at 1:02 AM, Daniel Gerzo <dge...@gmail.com> wrote:
Hello Juliom, thanks for reply.

I have of course seen both of these before I sent the mail, unfortunately I couldn't figure out how to use it on my InMemoryUploadedFile object.

But basically you should import the compress function from the library
and use it.

<http://docs.python.org/library/gzip.html#module-gzip>from bz2 import

compress

[...]


def handle_uploaded_subtitles(self, files):
   for file in files:
       sub_file = SubtitleFile(file_name=file.name, etc)
       bz_file = compress(file)

I wish it would be that easy :-)

What you are proposing fails with Exception:

bzfile = compress(file)
argument 1 must be convertible to a buffer, not InMemoryUploadedFile


Well, an InMemoryUploadedFile isn't a real file, so I'm not surprised that that doesn't work. You'll have to pull the data out of it, and compress that.
 
Try something like this:

def handle_uploaded_subtitles(self, files):
   for uploaded_file in files:
       sub_file = SubtitleFile(file_name=file.name, etc)
       data = bz2.compress(uploaded_file.read())
       # Here I'm assuming that SubtitleFile.file is a real file object
       sub_file.file.write(data)
       sub_file.file.close()

If your files are large, then you can read them in lines, or in chunks, and use a BZ2Compressor object to compress them one-at-a-time.

Further, I wasn't able to find a method in the mentioned libraries that would make this possible, or at least I didn't figure out how to pass an InMemoryUploadedFile to them to compress it.

When I try to do this:

file.write(zlib.compress(file.read()))

Don't do that -- I'm pretty sure that writing a file that you already have open for reading will produce undefined results.

(Also, I'd try to stay away from using 'file' as a variable name -- it just hides the built-in file type name, and makes it hard to tell what, say, file.read refers to)


--
Regards,
Ian Clelland
<clel...@gmail.com>

Daniel Gerzo

unread,
Apr 20, 2011, 7:19:47 PM4/20/11
to django...@googlegroups.com
On 20.4.2011 17:56, Ian Clelland wrote:
> Well, an InMemoryUploadedFile isn't a real file, so I'm not surprised
> that that doesn't work. You'll have to pull the data out of it, and
> compress that.
> Try something like this:
>
> def handle_uploaded_subtitles(self, files):
> for uploaded_file in files:
> sub_file = SubtitleFile(file_name=file.name <http://file.name>, etc)

> data = bz2.compress(uploaded_file.read())
> # Here I'm assuming that SubtitleFile.file is a real file object
> sub_file.file.write(data)
> sub_file.file.close()

No, sub_file.file is a FileField attribute. It expects an object, such
as InMemoryUploadedFile which has chunks() attribute. So your proposed
solution doesn't work either. I need to call sub_file.file.save() in
order to get the uploaded file saved at the proper place.

I did spent quite some time on this to get it working, but finally I
have a solution that, even thought it may not be perfect, at least
works. So for whoever comes across this issue, here's the code that works:

def handle_uploaded_files(self, files):
import bz2
import StringIO
from django.core.files.base import ContentFile

bz2comp = bz2.BZ2Compressor()
result = StringIO.StringIO()

for fobj in files:
# compress the data
for chunk in fobj.chunks():
result.write(bz2comp.compress(chunk))
result.write(bz2comp.flush())
result.seek(0)
# create new MyModel object which has FileField attribute
my_file = MyMode(file_name=fobj.name, etc)
my_file.file_field.save(fobj.name, ContentFile(result.read()))
sub_file.save()


> If your files are large, then you can read them in lines, or in chunks,
> and use a BZ2Compressor object to compress them one-at-a-time.

Indeed, it seems to work. Thanks for the ideas :)

Reply all
Reply to author
Forward
0 new messages