Receive large files

s.mu...@gmail.com

unread,

Jan 11, 2021, 2:31:52 AM1/11/21

to RestExpress

Hello,

I've read multiple threads in this forum where users have pasted code to receive a file (link), and another where Todd says RestExpress really isn't designed to receive large files since it holds everything in memory before passing it to the controller (link).

If there is a requirement for a RestExpress server to be able to receive large files, what would be the best approach? The basic "get request body as stream, read it in a loop and keep writing to files" crashes with an OOM or just hangs infinitely and isn't working.

Should I use the ChunkedWriteHandler which is part of the pipeline, and make the client send in "chunked" mode? Should I (and can I) bypass RE and talk to Netty directly somehow, which claims it is optimised to be able to handle large files? The ChunkedWriteHandler is kinda in the middle of the pipeline so I'm not sure how to intercept the HTTP message there and write the chunks to a file.

Any pointers will help.

Thanks,

Murali

Todd Fredrich

unread,

Jan 13, 2021, 3:17:38 PM1/13/21

to RestExpress

Hi Murali,

This is a really good question. And it is somewhat difficult for me to answer... As you mentioned, RestExpress was built assuming small-ish JSON resources, therefore, it injects a lot into the Netty pipeline with those assumptions. And it also performs deserialization on incoming JSON payloads only after the entire dataset has been received.

Netty itself (the underlying I/O library utilized in RestExpress) certainly handles large files, processing each "chunk" of data received independently. If it were me, I wouldn't necessarily use RestExpress for that particular use case and would go "outside" its structures to create a completely new Netty pipeline and Netty handler to process the large files. Perhaps it could be stitched into the same pipeline that RestExpress creates, but I would only do that after I had a working stand-alone Netty process figured out.

If the low-level Netty handling is too fiddly for that particular process, you might try something like https://vertx.io/ that also uses Netty under the covers. Or even use a language created for building large file handling servers like https://golang.org/ or something.

Again, I'm sorry I can't give you an easy answer. If you do end up creating a Netty server that does it well and figure out how the RestExpress-built Netty pipeline could be optimized to handle that as well, I'm very open to collaboration or pull requests for that.

Thanks and good luck,

--Todd

s.mu...@gmail.com

unread,

Jan 14, 2021, 10:11:52 AM1/14/21

to RestExpress

Hi Todd,

Thanks for that response. As you might have guessed, I have a few follow up questions :D

There's a sample HTTP upload example (Java based) in the Netty source code, which I've tested and it seems to be working fine. I'm still trying to understand the different pieces of it, but I have a decent understanding of how it works.

The pipeline looks like this:

HttpRequestDecoder

HttpResponseEncoder

HttpContentCompressor

HttpUploadServerHandler - the class which handles the actual file receive

I modified the example to save onto disk locally, the file which is sent by the Client (Which is also provided as a Java program)

My questions are:

1. How would I go "outside" RestExpress for a specific route ("/upload" for example), but still keep RestExpress for all the other routes? I have the pipeline ready for upload from the sample program, and I am OK with the default RestExpress pipeline for all the other APIs. Is this possible?

2. Can RestExpress support listening to multiple ports on the same server instance? Like, file uploads on port XXXX and the rest of the APIs on port YYYY? This kinda ties to the previous question I suppose..

"And it also performs deserialization on incoming JSON payloads only after the entire dataset has been received"

3. When you said the above, did you mean the code in DefaultRequestHandler::processRequest in the RE code base? Could you elaborate a bit more?

Thinking on the same lines, if a client were to throw a "multipart/form-data" request to an RE server as of now, would I be able to configure a controller to read those parts individually without consuming too much memory?

4. In one of the Github issues (issue 22), you mention this:

Multi-part uploads work fine and the results "get" to the controller in the request. Other clients have performed multi-part parsing in the controller method itself and things seem to work fine. Perhaps we can get them to submit their controller logic back to the project...

Would you happen to have access to some of this code so that I can draw some hints from the same?

And what did you mean by the results "get" to the controller? You mean the "read" method in the controller? How would a POST request from the client, go to the "get" on the controller?