Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Message from discussion Multiple FileStorage.save() operations

Received: by 10.236.190.104 with SMTP id d68mr10190790yhn.45.1343826184673;
        Wed, 01 Aug 2012 06:03:04 -0700 (PDT)
X-BeenThere: pocoo-libs@googlegroups.com
Received: by 10.236.120.110 with SMTP id o74ls2714193yhh.2.gmail; Wed, 01 Aug
 2012 06:03:03 -0700 (PDT)
Received: by 10.236.192.164 with SMTP id i24mr3793331yhn.14.1343826183963;
        Wed, 01 Aug 2012 06:03:03 -0700 (PDT)
Date: Wed, 1 Aug 2012 06:03:02 -0700 (PDT)
From: Ludvig Ericson <ludvig.eric...@gmail.com>
To: pocoo-libs@googlegroups.com
Message-Id: <9882f000-32f4-4739-8304-9f4e891bbe8d@googlegroups.com>
In-Reply-To: <85b98acf-bca1-44d9-947b-bb9f5e6c23ac@googlegroups.com>
References: <e321db95-279e-40fb-9aa2-fbf05bb93036@googlegroups.com>
 <CA+5DKYwCBC4Pck2qBm3K+sOVjogp9qDTDgJ6+ZRxM07yJQx8UQ@mail.gmail.com>
 <85b98acf-bca1-44d9-947b-bb9f5e6c23ac@googlegroups.com>
Subject: Re: Multiple FileStorage.save() operations
MIME-Version: 1.0
Content-Type: multipart/mixed; 
	boundary="----=_Part_91_30932708.1343826182416"

------=_Part_91_30932708.1343826182416
Content-Type: multipart/alternative; 
	boundary="----=_Part_92_27213874.1343826182417"

------=_Part_92_27213874.1343826182417
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit

*[Sorry if this reply appears twice, I have a non-existant Apps account 
subscribed to this group.]*

Most people expect a function like *F.save* to advance the file pointer to 
the end of the file.

This won't change I'm fairly confident, and Werkzeug will store your 
uploaded file on disk if it's too large to hold in RAM already!

As for *StringIO*, it can hold binary data just fine without unnecessary 
encodings.

My advise is to make the validation function operate on the file object as 
returned by *request.files[k]*, that way you can remain agnostic as to 
whether the file has been stored on disk already or exists in RAM.

I realize that might not be possible. If not then you can either a) save to 
disk and reopen the saved file, or b) save to disk and rewind the file 
pointer and resave it again to permanent storage.

Does that help?

-Ludvig
On Wednesday, August 1, 2012 2:37:31 AM UTC+2, wgoulet wrote:
>
> But does that work for binary data? In my application I'm processing 
> zipfiles and Java keystore files. My read of StringIO is that it works 
> great for anything that can be directly represented as ASCII or Unicode, 
> but it seems like a lot of overhead to base64 encode something so I can 
> store it in memory.
>
> On Tuesday, July 31, 2012 6:31:38 PM UTC-5, mr.meker wrote:
>>
>> You shouldn't need to save a file twice. You are doing extra disk IO when 
>> you should keep it in RAM until it needs to hit the disk. This is what you 
>> should use instead of writing the file to /tmp. 
>> http://docs.python.org/library/stringio.html
>>
>> On Tue, Jul 31, 2012 at 3:27 PM, wgoulet wrote:
>>
>>> Hi,
>>>
>>> I'm a relatively new Python and brand new Flask user, so please bear 
>>> with me.
>>>
>>> I'm developing a webapp that requires that I permit users to upload 
>>> files which are validated before I store them in their final location on 
>>> the web server's local filesystem. To satisfy this requirement, I have 
>>> defined helper methods that I use to create a temporary copy of a 
>>> FileStorage object that is processed to determine if it is valid. Once this 
>>> check passes, I then want to save the uploaded file in a permanent location.
>>>
>>> Here's an example subset of my code:
>>>
>>> def confirm():
>>>     amfile = request.files['zipfile']
>>>     if validate_file(amfile):
>>>             
>>>  amfile.save(os.path.join("/www/docs",secure_filename(amfile.filename)) 
>>>
>>> def validate_file(infile):
>>>     infile.save(os.path.join("/tmp",secure_filename(infile.filename))
>>>     # My validation code goes in here; I read the file in /tmp and 
>>> return true or false depending on the results
>>>     
>>>
>>> The problem I'm running into is that when I call the save() function on 
>>> a FileStorage object twice in a row, the second save() function creates an 
>>> empty copy of the file. In the code above, I have a valid copy of my file 
>>> stored in /tmp, but the file in /www/docs is zero file size. I don't think 
>>> that copying the file from /tmp to /www/docs is the right solution, because 
>>> the validation code could potentially destroy or overwrite the temp copy 
>>> (say if the file is a zip file as it is in my case)
>>>
>>> Looking at the source of FileStorage.save(), it looks like the reason 
>>> for this behavior is that the shutil.copyfileobj function advances the 
>>> filepointer when it copies from the source, but it doesn't move it back to 
>>> the beginning of the file stream when it's finished (the copyfileobj docs 
>>> state as much).
>>>
>>> As a simple test, I modified the FileStorage.save() function in my local 
>>> werkzeug install to add a file seek call before the copyfileobj call as 
>>> follows:
>>>
>>>  from shutil import copyfileobj
>>>         close_dst = False
>>>         if isinstance(dst, basestring):
>>>             dst = file(dst, 'wb')
>>>             close_dst = True
>>>         try:
>>>             # Reset file pointer before copying from object
>>>             self.stream.seek(0)
>>>             copyfileobj(self.stream, dst, buffer_size)
>>>         finally:
>>>             if close_dst:
>>>                 dst.close()
>>>
>>> With this change, I can use the FileStorage.save() multiple times to 
>>> save multiple copies of the FileStorage file.
>>>
>>> Would it make sense to modify FileStorage.save() as I've done here, or 
>>> is there another, better way to achieve my goal?
>>>
>>>
------=_Part_92_27213874.1343826182417
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: quoted-printable

<i>[Sorry if this reply appears twice, I have a non-existant Apps account s=
ubscribed to this group.]</i><div><br></div><div>







<p class=3D"p1">Most people expect a function like <i>F.save</i> to advance=
 the file pointer to the end of the file.</p>
<p class=3D"p1">This won't change I'm fairly confident, and Werkzeug will s=
tore your uploaded file on disk if it's too large to hold in RAM already!</=
p>
<p class=3D"p1">As for <i>StringIO</i>, it can hold binary data just fine w=
ithout unnecessary encodings.</p>
<p class=3D"p1">My advise is to make the validation function operate on the=
 file object as returned by <i>request.files[k]</i>, that way you can remai=
n agnostic as to whether the file has been stored on disk already or exists=
 in RAM.</p>
<p class=3D"p1">I realize that might not be possible. If not then you can e=
ither a) save to disk and reopen the saved file, or b) save to disk and rew=
ind the file pointer and resave it again to permanent storage.</p>
<p class=3D"p1">Does that help?</p>
<p class=3D"p1">-Ludvig</p>On Wednesday, August 1, 2012 2:37:31 AM UTC+2, w=
goulet wrote:<blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-le=
ft: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;">But does that wor=
k for binary data? In my application I'm processing zipfiles and Java keyst=
ore files. My read of StringIO is that it works great for anything that can=
 be directly represented as ASCII or Unicode, but it seems like a lot of ov=
erhead to base64 encode something so I can store it in memory.<br><br>On Tu=
esday, July 31, 2012 6:31:38 PM UTC-5, mr.meker wrote:<blockquote class=3D"=
gmail_quote" style=3D"margin:0;margin-left:0.8ex;border-left:1px #ccc solid=
;padding-left:1ex">You shouldn't need to save a file twice. You are doing e=
xtra disk IO when you should keep it in RAM until it needs to hit the disk.=
 This is what you should use instead of writing the file to /tmp.&nbsp;<a h=
ref=3D"http://docs.python.org/library/stringio.html" target=3D"_blank">http=
://docs.python.org/<wbr>library/stringio.html</a><br>
<br><div class=3D"gmail_quote">On Tue, Jul 31, 2012 at 3:27 PM, wgoulet&nbs=
p;wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;bo=
rder-left:1px #ccc solid;padding-left:1ex">
Hi,<div><br></div><div>I'm a relatively new Python and brand new Flask user=
, so please bear with me.</div><div><br></div><div>I'm developing a webapp =
that requires that I permit users to upload files which are validated befor=
e I store them in their final location on the web server's local filesystem=
. To satisfy this requirement, I have defined helper methods that I use to =
create a temporary copy of a FileStorage object that is processed to determ=
ine if it is valid. Once this check passes, I then want to save the uploade=
d file in a permanent location.</div>
<div><br></div><div>Here's an example subset of my code:</div><div><br></di=
v><div>def confirm():</div><div>&nbsp; &nbsp; amfile =3D request.files['zip=
file']</div><div>&nbsp; &nbsp; if validate_file(amfile):</div><div>&nbsp; &=
nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;amfile.save(os.path.join("/<wbr>www=
/docs",secure_filename(<wbr>amfile.filename))&nbsp;</div>
<div><br></div><div>def validate_file(infile):</div><div>&nbsp; &nbsp; infi=
le.save(os.path.join("/<wbr>tmp",secure_filename(infile.<wbr>filename))</di=
v><div>&nbsp; &nbsp; # My validation code goes in here; I read the file in =
/tmp and return true or false depending on the results</div>
<div>&nbsp; &nbsp;&nbsp;</div><div><br></div><div>The problem I'm running i=
nto is that when I call the save() function on a FileStorage object twice i=
n a row, the second save() function creates an empty copy of the file. In t=
he code above, I have a valid copy of my file stored in /tmp, but the file =
in /www/docs is zero file size. I don't think that copying the file from /t=
mp to /www/docs is the right solution, because the validation code could po=
tentially destroy or overwrite the temp copy (say if the file is a zip file=
 as it is in my case)</div>
<div><br></div><div>Looking at the source of FileStorage.save(), it looks l=
ike the reason for this behavior is that the shutil.copyfileobj function ad=
vances the filepointer when it copies from the source, but it doesn't move =
it back to the beginning of the file stream when it's finished (the copyfil=
eobj docs state as much).</div>
<div><br></div><div>As a simple test, I modified the FileStorage.save() fun=
ction in my local werkzeug install to add a file seek call before the copyf=
ileobj call as follows:</div><div><br></div><div><div>&nbsp;from shutil imp=
ort copyfileobj</div>
<div>&nbsp; &nbsp; &nbsp; &nbsp; close_dst =3D False</div><div>&nbsp; &nbsp=
; &nbsp; &nbsp; if isinstance(dst, basestring):</div><div>&nbsp; &nbsp; &nb=
sp; &nbsp; &nbsp; &nbsp; dst =3D file(dst, 'wb')</div><div>&nbsp; &nbsp; &n=
bsp; &nbsp; &nbsp; &nbsp; close_dst =3D True</div><div>&nbsp; &nbsp; &nbsp;=
 &nbsp; try:</div><div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # Reset fi=
le pointer before copying from object</div>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; self.stream.seek(0)</div><di=
v>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; copyfileobj(self.stream, dst, b=
uffer_size)</div><div>&nbsp; &nbsp; &nbsp; &nbsp; finally:</div><div>&nbsp;=
 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if close_dst:</div><div>&nbsp; &nbsp; &=
nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; dst.close()</div></div><div><br>
</div><div>With this change, I can use the FileStorage.save() multiple time=
s to save multiple copies of the FileStorage file.</div><div><br></div><div=
>Would it make sense to modify FileStorage.save() as I've done here, or is =
there another, better way to achieve my goal?</div>
<div><br></div></blockquote></div>
</blockquote></blockquote></div>
------=_Part_92_27213874.1343826182417--

------=_Part_91_30932708.1343826182416--