Using FileStorage.stream().read() before FileStorage.save() gotcha

4,899 views
Skip to first unread message

wgoulet

unread,
Aug 1, 2012, 4:06:14 PM8/1/12
to pocoo...@googlegroups.com
Hi all,

Thanks to the tips I got yesterday I switched my application design to stream the contents of my files into memory rather than trying to write my file to disk twice (invoking FileStorage.save() twice).

To get the data from the FileStorage object into a buffer for processing, I decided to use the stream() method to get an input stream to the data that I could read from. But again, after using the stream.read() method to read a few bytes from the FileStorage object, the file pointer was advanced and thus when I used FileStorage.save() later, my file was again truncated. I fixed this by using seek() to reset the input stream back to the beginning before using save().

To me, it still seems to make sense to reset the file pointer to the beginning before calling shutilcopyfileobj in the FileObject.save() method so that you can guarantee that you'll always save the entire file to disk, regardless of what you've done with the input stream.

Is there any downside to making this change in the library?

Thanks,
Walter

Jan Riechers

unread,
Aug 13, 2012, 1:43:55 PM8/13/12
to pocoo...@googlegroups.com
Hello dear Pocoo-people,

I am currently developing a quite big web application using Flask, but
have concerns about the tasking in general.

At present the setup looks like the following:
Webserver: nginx
Routing: werkzeug (later switching to uWSGI)
Templating: jinja2 (what else? ;) )
Database: mongoDB
Python-Interpreter: pypy 1.9

My question arises from the fact that I can't make usage of
greenlet/eventlet using pypy, but Im unsure if I can maintain
availability for the webusers running only the above setup.

Also I found Celery, and since I haven't worked with this extension, it
makes me worry. Two question arise:

1st: Can I use safely pypy with Flask to handle a bunch of users at a
time - or will my processes lock up each other?
2nd: Would it make sense to integrate Celery in order to keep task
handling lightweight (I love this greenlet term...) but concurrent so
the application can handle a load of users at a time?

I would highly appreciate feedback and once more, thanks again to your
crowd to bring up Flask, Jinja2 and werkzeug, wonderful tools to work with!

Jan


Jan Riechers

unread,
Aug 17, 2012, 2:57:44 PM8/17/12
to pocoo...@googlegroups.com
Hello dear Pocoo-people,

I am currently developing a quite big web application using Flask, but
have concerns about the tasking/threading in general.

At present the setup looks like the following:
Webserver: nginx
Routing: werkzeug (later I would like to switching to uWSGI)
Templating: jinja2 (what else? )
Database: mongoDB
Python-Interpreter: pypy 1.9

My question arises from the fact that I can't make usage of
greenlet/eventlet using pypy, but Im unsure if I can maintain
availability for the webusers running only the above setup.

Also I found Celery, and since I haven't worked with this extension, it
makes me worry. Two question arise:

1st: Can I use safely pypy with Flask to handle a bunch of users at a
time - or will my processes lock up each other, also in particular file
i/o and background processing of scheduled tasks?

Steven Kryskalla

unread,
Aug 17, 2012, 4:28:43 PM8/17/12
to pocoo...@googlegroups.com
On Fri, Aug 17, 2012 at 11:57 AM, Jan Riechers <janp...@freenet.de> wrote:
> Hello dear Pocoo-people,
>
> I am currently developing a quite big web application using Flask, but have
> concerns about the tasking/threading in general.

You might have better luck posting on the flask mailing list:

http://flask.pocoo.org/mailinglist/

Pardon me if you already posted there and didn't get a response.

Also, the reason I suspect not many people have responded is because
your setup is a little unique. I don't think many people are using
pypy for serving web applications. What made you decide on using pypy?

> Routing: werkzeug (later I would like to switching to uWSGI)

Do you mean application server? uWSGI afaik isn't used for routing,
it's a WSGI app server. Werkzeug is used both for routing (URL
mapping), and for its development WSGI server (which flask uses when
you use app.run).

> 1st: Can I use safely pypy with Flask to handle a bunch of users at a time -
> or will my processes lock up each other, also in particular file i/o and
> background processing of scheduled tasks?

I would move heavy file i/o and background processing to a separate
process (e.g. using celery, as you mentioned). To avoid your flask app
from "locking up" from too many concurrent users you must keep the
processing time down (e.g. less than 1 second). Running a high
performance app server like uWSGI or gunicorn will help you process
multiple requests concurrently vs. the werkzeug server, which is
single threaded and can only serve one response at a time.

> 2nd: Would it make sense to integrate Celery in order to keep task handling
> lightweight (I love this greenlet term...) but concurrent so the application
> can handle a load of users at a time?

Yes, it makes sense, but I would recommend forgetting about your love
of greenlets for a moment. The easiest model (IMO) is to just run
*processes*. Not threads, not greenlets, nothing fancy. If your
performance is really hurting, only then you should look at using
threads or greenlets for optimization. The simplest thing that works
is just running multiple background worker processes that take jobs
off of a queue.

Celery is one system that does this, but there are others. I have used
pyres in the past with flask and really liked how it worked:

https://github.com/binarydud/pyres/

This looked interesting as well: http://python-rq.org/

There's also this flask snippet: http://flask.pocoo.org/snippets/73/

And a bunch of other ones as well (beanstalkd, gearman, etc.). The
ones I linked to are all more lightweight than celery, which from what
I've seen can be a bit tricky to setup.

best,
Steve
Reply all
Reply to author
Forward
0 new messages