Memory Efficient Way to Handle Large Uploads (via jQuery ajax HTTP POST)

647 views
Skip to first unread message

Riyad

unread,
Jan 26, 2011, 7:41:26 PM1/26/11
to play-framework
Situation
============
I implemented a DnD HTML5 image upload app. For folks familiar with
the HTML5 File API, I'm reading the images into a Data URI format
(Base64-encoded image data) and sending that over to Play.

The $.ajax call looks something like this
--------------------
$.ajax({
type: 'POST',
url: '/upload',
data: {fileName: file.name, fileSize: file.size, data:
evt.target.result},
success:
function onUploadComplete(response) {
//stuff
},
dataType: 'text'
});
--------------------

On the Play side of things, method def looks like:
-------------------
public static void upload(String fileName, int fileSize, String data)
{
}
-------------------

Everything works.

PROBLEM
====================
Some images uploaded can be large, 8MB or so. In Base64-encoded form,
they are even larger. Keeping that all in-memory as a String while I
decode it (roughly doubling the memory requirements) and process it
(general thumbnails, etc.) won't scale when this roles out and
multiple people are uploading large images at the same time.


QUESTION
====================
Is there a way I can get Play to stream that content directly to a
temporary file I can process from OR should I get access to the stream
in my controller and parcel off those bytes into a file?

I tried getting Play to efficiently shuffle the content off into a
file by changing the "data" type to "File" and seeing if Play would
humor me and do it... it didn't work out.

I tried changing the form type to multipart/blahblah but I get a "no
boundary" exception in the console.

I am really looking for the most memory efficient way to get those
bytes coming in from the client out of memory and onto disk as fast as
possible with as little memory used as possible. So if Play is just
caching the entire file in memory and writing it out in 1 movement,
I'd rather get access to the stream from the client and chunk out the
file contents manually.

I'm just not sure:
1. If Play can do what I want for me, if so, how?
2. If Play cannot do what I want and I need to get my hands dirty,
should I make the controller method no-arg so Play attempts no
bindings and just molest the request.body InputStream myself?

Riyad

unread,
Jan 26, 2011, 9:11:39 PM1/26/11
to play-framework
For anyone else interested in this topic, here are a few more things I
found:

** When I created a no-arg upload method (to avoid Play trying to
buffer and bind/instantiate objects for me) memory usage for
processing an HTTP POST of an 8 MB image went from spiking the Play
java process from 90mb to 520mb (my heap) to bouncing it from 90 to
about 92mb.

This was interesting to me because it suggested that in the previous
impl, Play seems to be trying to keep the form args all in memory as
it processes them where as with the no-arg method, they primarily stay
in the stream until processed (exactly what I want, even if it takes a
bit longer to buffer them out of the stream).

** (For folks messing with FileAPI/DataURI business) I found that
setting headers in my ajax call with the file name, size and type and
then using very simple String concat on the client side Javascript
allowed me to keep the actual *body* of my post dead-simple and only
contain the base64 content as opposed to a query string containing all
args and then a Data-URI formatted string ("data:image/
jpeg;base64,jSDkjasLksd...")

I liked this because it simplifies the Play controller code and
offloads the minor String operations onto the client instead of me
chopping things up on the server side wasting time.

Minor, but it also allowed me to do some quick sanity checking on the
client side before sending the data (like making sure it is a dataURI
format, it is long enough to be valid, etc.) so by the time things get
to the server, all I need is a null/empty check and I'm off to the
races as far as processing goes.


TIP: For those trying to do something similar, take a look at this
Base 64 encoding lib:
http://iharder.sourceforge.net/current/java/base64/

It's been around for a while and consistently worked on, I also
replaced commons-codec in a project recently (4sq checkin hack) that
hung consistently for about half my users when calling into the
Commons Codec Base64 class for whatever reason (I've also encountered
encoding/decoding bugs with the MigBase64 encoding lib)... anyone one
of the nicest features about the one linked above is it provides
convenience methods for decoding/encoding to and from Files directly
with a small interm buffer or even better yet (for my case) customized
Input/Output streams that can encode/decode directly.

That should allow me to take the request.body InputStream from Play
and run it right through a decoder straight out a file, keeping memory
usage low and then fire off a background AWS S3 upload.

It really feels like things are coming together and I just wanted to
share for anyone else that might run across this in the future and
feel overwhelmed.

Best,
Riyad
Reply all
Reply to author
Forward
0 new messages