Suggested way to handle file uploads with Web API service

165 views
Skip to first unread message

Marcial Rion

unread,
Jun 6, 2019, 5:59:46 PM6/6/19
to vert.x

Hi



Anyone has some suggestions on how to properly handle file uploads (“multipart/form-data”) with Wep API service? The only way I found was using routerFactory.setExtraOperationContextPayloadMapper(). However, this forced me to use the blocking file read method, which I’d rather omit/replace with the async one (or did I get anything wrong here; still a bit of a noob when it comes to async programming)? Btw: Example project can be found here: https://github.com/mrion/apicontract, branch "web_api_service" or https://github.com/mrion/apicontract/tree/web_api_service respectively.



Another solution I could imagine is to pass on the uploaded filename and read the file on the “Web API service” side. However, this requires the HTTP server and “Web API event bus service” verticles to run on the same “system instance” (or at least share the respective FS).



An advanced version could also just pass on a reference, and then get the file using some kind of callback to yet another event bus service, which will then deliver the file (HTTP server and file serving event bus service verticles would have to be on the same system/share FS).



Or even better, if the “Web API event bus service” could figure out whether it runs on the same “system instance” as the HTTP server verticle, it could either access the file localy (direct FS read), or then get it remotely over event bus via an additional file serving event bus service verticle (which again would be tied to run on the same “system instance” as the HTTP server verticle).



Any advice, or recommendations?



Thx,


Marcial

Francesco Guardiani

unread,
Jun 10, 2019, 3:30:50 AM6/10/19
to vert.x
Hi,
The Event bus is not designed to handle large payloads like file uploads. there are a couple of other threads in this ml that explains this issue https://groups.google.com/forum/#!topic/vertx/42mJE3FqqUA https://groups.google.com/forum/#!msg/vertx/AnZ0ZiSgiqQ

And I think it's not wise too to manage the upload accessing the local fs, because as you sad you should force your services to run into the same instance of vert.x locally.

Why do you need to pass the file through the event bus? What if you just add an handler using addHandlerByOperationId and you directly manage the upload? What you need to do with those files? Load it and store somewhere?

Francesco

Marcial Rion

unread,
Jun 11, 2019, 4:43:58 PM6/11/19
to vert.x
Hi Francesco

Thanks for your response. I already came across those links you refer to. However, my files are rather small PDFs, almost all of them containing just one or two pages, with usually a size smaller than 2MB. Compared to the 100MB mentioned in one of the other threads, I considered these not to be "large" payloads (so, I guess, I was wrong, and one of my questions therefore would be: where is the size limit as far as the event bus is concerned / what is it considered a "large" payload?)... Obviously, my example project is also missing a size limitation check of the uploaded files ....

Basically, there is no need to pass the files through the event bus. In theory, I could as well process them once they are received by the Web server. However, for future scaling purposes, I decided to decompose different components of my application (not that I ever intend to really need that, but I always consider a proper design a plus ;-) ). One building block is the Web server, which handles the serving of the GUI components (currently planned to be Angular), as well as the API calls to be used by the GUI. However, I intended to offload the processing of the API calls to dedicated verticles using the Web API service "pattern", so in the future, they could even run on different computing nodes. One verticle implements user management tasks, the other one the document processing/management. The initial document processing might be time intense, so I did not want to do that on the Web server verticle. It will extract text from the PDF (directly or using OCR), and store it in a database. The file shall be stored on persistent storage. Furthermore, files can be renamed, searched for content, assigned tags, grouped in "dossiers", ... (kind of a simple app to archive/organize day to day paperwork; and yes, I know there are professional solutions for that, but most of them are to complex for my purposes. And hey, using this as a little project to get some hands on with Vert.x is fun :-) ). Now as the document "manipulation" (i. e. renaming, searching, tagging, etc.) is using Web API service to forward the calls to the Document processing verticle, for the sake of simplicity and consistency (and to have a "clean" architecture and proper segregation of concerns), I wanted to handle the file uploads the same way as the other calls to the document processing verticle (in the end, the upload is also hitting part of the "API"). -> See "component architecture" below... 

                                                            +----------+   user
                                                            | user     |   directory
                                               +--------->  | service  |  +--------->
 +----------------+                            |            | verticle |
 |  +----------+  |                            |            +----------+
 |  | API      |+----------------------------->|
 |  +----------+  |    dispatch API calls      |
 |                |    using Web API service   |            +----------+
 |    Web         |                            |            | doc      |
 |    server      |                            +--------->  | service  |  +--------->
 |    verticle    |                                         | verticle |   database/
 |                |                                         +----------+   file storage
 |  +----------+  |
 |  | "static" |  |
 |  | files    |  |
 |  +----------+  |
 +----------------+

However, it looks like I have to handle the uploads apart from the other API calls. What would you recommend? Another HTTP-based API between the Web Server and doc processing verticle? Otherwise, I could have the Web server store the file in an object store, hand over the reference through event bus to the doc verticle, where the file is retrieved again for processing (though this seems to be a little bit of an overkill)? Last but not least, I will nevertheless consider to point the upload directory of the Web server to some shared storage (GlusterFS, Ceph?). Then I could just hand over the filename through event bus, and access the file directly from the doc verticle...

Thanks again for your input...
Marcial



MUNGAI NJOROGE

unread,
Jun 11, 2019, 5:21:03 PM6/11/19
to vert.x
Hi, I have had the same issue and I solved this by streaming files to IPFS or Amazon Object Storage. Once I have the file reference to uploaded file, I save it to files database and pass the upload reference to the destination address.

Note that, this does not block eventbus and all uploads are handled by the first HTTP handler that receives the request.

Passing file object via eventbus sounds weired as it is a file-pointer.

John.

Julien Viet

unread,
Jun 13, 2019, 4:05:22 PM6/13/19
to ve...@googlegroups.com
Hi,

it looks like an interesting pattern, perhaps you could blog about it ?

Julien
> --
> You received this message because you are subscribed to the Google Groups "vert.x" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to vertx+un...@googlegroups.com.
> Visit this group at https://groups.google.com/group/vertx.
> To view this discussion on the web, visit https://groups.google.com/d/msgid/vertx/5e1341c6-95cb-4b25-a152-e97891af930f%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages