Missing request-post-data/raw (from web-server/http)

27 views
Skip to first unread message

Philip McGrath

unread,
Jun 29, 2017, 9:08:28 PM6/29/17
to Racket Users
I'm working on a Racket web application for which I need to proxy certain requests to a non-Racket service over HTTP. I've built a very basic proxy on top of http-sendrecv/url that works quite well for the most part. 

For POST requests, I pass the request-post-data/raw of the original request as the #:data argument of http-sendrecv/url.

However, I've discovered that certain POST requests (specifically involving file uploads) are not working as expected. On these requests, Chrome reports that it is performing a request with a header Content-Type:multipart/form-data; boundary=----WebKitFormBoundaryAJOgATwBujJhhtbY and a payload as follows:
------WebKitFormBoundaryAJOgATwBujJhhtbY
Content-Disposition: form-data; name="tool"
corpus.CorpusCreator
------WebKitFormBoundaryAJOgATwBujJhhtbY
Content-Disposition: form-data; name="palette"
default
------WebKitFormBoundaryAJOgATwBujJhhtbY
Content-Disposition: form-data; name="textarea-1014-inputEl"
Type in one or more URLs on separate lines or paste in a full text.
------WebKitFormBoundaryAJOgATwBujJhhtbY
Content-Disposition: form-data; name="upload"; filename="tmp-file.txt"
Content-Type: text/plain
------WebKitFormBoundaryAJOgATwBujJhhtbY--

However, at the Racket level, request-post-data/raw returns #f for these requests — but, adding to my confusion, the bindings still show up in request-bindings/raw.

Why doesn't this content show up in request-post-data/raw? Is there a way to access the raw, original data for these requests, or do I need to somehow reconstruct it from the bindings?

Thanks very much,
Philip

Neil Van Dyke

unread,
Jun 29, 2017, 10:44:38 PM6/29/17
to Philip McGrath, Racket Users
I don't know the answer to your particular questions with `web-server`
(I've made my own implementations of this in the past), and these
comments might not apply to your particular application, but I'll
mention here for whomever is interested...

It sounds like you're using this, which might preempt your question:

> post-data/raw : (or/c false/c bytes?)

Does your application permit a large file upload (an uploaded DVD-ROM
".iso" file, like for a Linux distro install disc 1, is typically a few
gigabytes, and video files can also get huge), and is your program
(including libraries it uses) going to try to allocate gigabytes at a
time just for one HTTP request?

If the `POST` data is potentially huge, you might want to think about
doing stream reading of it (i.e., not sucking it all into memory before
you do something with it), and sending blocks out your proxy
approximately as soon as they come in (without buffering too much).
That can make your program more robust, lower latency, and maybe even
improve overall speed.

Or, if you want to keep getting a convenient byte string out of the MIME
parser, and you plan to reject huge `POST` data before it
accidentally/intentionally DoS's your server, that will probably happen
either as the HTTP request is being read, or in the MIME multipart
parser (when the request is in MIME multipart, which `POST` isn't
always, and if the HTTP code hands off a pretty raw input port to
multipart parsing code, which it should). This is because you can't
assume that HTTP or part headers will tell you the content size before
you read the content -- sometimes you have to read to find the EOF or
the MIME boundary string kludge.

I think streaming algorithms are usually the way to go for potentially
huge data. (Well, until you then get into what I'll call "poetic
license" situations, in which you know how to do it in streaming, and
you know why you don't have to stream in this case.)

Philip McGrath

unread,
Jun 29, 2017, 11:07:22 PM6/29/17
to Neil Van Dyke, Racket Users
Thanks for your comments. 

The only legal files to upload in this case are plain text, so I'm not too worried about size. I'm relying on the web-server libraries to deal with any malicious attempts to send overwhelmingly large files (if that's a bad idea, I'd definitely appreciate hearing it!). Other parts of the application are implemented in #lang web-server, including some access control logic surrounding the requests that are proxied to the external service.

With other requests, the  post-data/raw field of the request struct has been #f only when the method field is #"GET": with POST requests, it has otherwise (and I thought it always would) contained the raw POST data e.g. #"corpus=austen&tool=corpus.CorpusMetadata". I thought the bindings from the bindings/raw-promise field were simply an abstraction over the post-data/raw (and/or query part of the uri field), which is why I'm confused that this POST request has bindings, but has #f for its post-data/raw.

-Philip



--
You received this message because you are subscribed to the Google Groups "Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jay McCarthy

unread,
Jun 30, 2017, 9:20:38 AM6/30/17
to Philip McGrath, Racket Users
Hi Philip,

I don't necessarily know the answer and it's possible that it is an
error. I'll explain what it is doing and maybe that will help us move
forward.

1) The request-bindings/raw is just an abstraction over
request-post-data/raw (and the URI)
2) The request-post-data/raw is always #f for GETs, are you sure they are POSTs?
3) POSTs with multipart form data are converted into a
request-bindings and the raw data is not made available, un-parsed.
4) If there's no Content-Length header, then even if there is data,
then it is not exposed.

I think that your problem may be (3). It sounds like you expect to see
a copy of the raw data of the request all the time even if it has been
parsed. (The logic of the current behavior is that at the
"application" level there is no POST data, but there is only form
data, but because of "transport" level constraints on the length of
URIs it had to be sent in the data part of the transport layer.)

Jay
> --
> You received this message because you are subscribed to the Google Groups
> "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to racket-users...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.



--
-=[ Jay McCarthy http://jeapostrophe.github.io ]=-
-=[ Associate Professor PLT @ CS @ UMass Lowell ]=-
-=[ Moses 1:33: And worlds without number have I created; ]=-

Philip McGrath

unread,
Jun 30, 2017, 10:02:50 AM6/30/17
to Jay McCarthy, Racket Users
Thanks, Jay. It is definitely POST, and there is a Content-Length header, so it seems like the problem is indeed #3. I was expecting the raw data to be there even if it had been parsed — I believe the POST data of #"corpus=austen&tool=corpus.CorpusMetadata" was also parsed into bindings (though not from multipart, obviously). 

So it sounds like what I'll need to do is detect when this situation is happening — I guess that would be when the method is POST, the request-post-data/raw is #f, and there are some bindings — and convert the bindings back into multipart form data to give to http-sendrecv/url.

-Philip


> For more options, visit https://groups.google.com/d/optout.



Reply all
Reply to author
Forward
0 new messages