>>>>> Ivan Shmakov <
iv...@siamics.net> writes:
> It took me about a day to write a crude but apparently (more or less)
> working HTTP to HTTPS proxy. (That I hope to beat into shape and
> release via news:alt.sources around next Wednesday or so. FTR, the
> code is currently under 600 LoC long, or 431 LoC excluding comments
> and empty lines.) Some design notes are below.
It took much longer (of course), and the code has by now expanded
about threefold. The HTTP/1 support is much improved, however;
for instance, request bodies and chunked coding should now be
fully supported. Moreover, the relevant code was split off into
a separate HTTP1::MessageStream push-mode parser module (or about
a third of the overall code currently), allowing it to be used
in other applications.
The no-https.perl code proper still needs some clean-up after
all the modifications it got.
The command-line interface is about as follows. (Not all the
options are as of yet thoroughly tested, though.)
Usage:
$ no-https
[-d|--[no-]debug] [--listen=BIND|-l BIND] [--mangle=MANGLE]
[--connect=COMMAND] [--ssl-connect=COMMAND]
$ no-https {-h|--help}
BIND is either [HOST:]PORT or, if includes a /, a file name for a
Unix socket to create and listen on. The default is 8080.
COMMAND will have %%, %h, %p replaced with a literal %, target host
and TCP port, respectively. Also, %s and %t are replaced respectively
with a space and a TAB.
MANGLE can be minimal, header, or a name of an App::NoHTTPS::Mangle::
package to require and use. If not specified, default is tried
first, falling back to (internally-implemented) header.
The --connect= and --ssl-connect= should make it possible to
utilize a parent proxy, including a SOCKS one, such as that
provided by Tor, like: --connect="socat STDIO
SOCKS4:localhost:%h:%p,socksport=9050". For --ssl-connect=,
a tsocks(1)-wrapped gnutls-cli(1) may be an option.
> Basics
> The basic algorithm is as follows:
> 1. receive a request header from the client; we only allow GET and
> HEAD requests for now, as we do not support request /bodies/ as of yet;
RFC 7230 section 3.3 actually provides simple criteria for
determining whether the request has a body:
The presence of a message body in a request is signaled by a
Content-Length or Transfer-Encoding header field. Request message
framing is independent of method semantics, even if the method does
not define any use for a message body.
As such, and given that message passing was "symmetrized," any
request method except CONNECT is now allowed by the code.
> 2. decide the server and connect there;
> 3. send the header to the server;
Preceded by the request line, obviously. (It was considered
a part of the header in the original version of the code.)
> 4. receive the response header;
(Same here, for the status line.)
We also pass any number of "100 Continue" messages here from
server to client before the "payload" response.
> 5. if that's an https: redirect:
> 5.1. connect over TLS, alter the request (Host:, "request target")
> accordingly, go to step 3;
A Host: header is prepended to the request header if the
original has none.
> 6. strip certain headers (such as Strict-Transport-Security: and
> Upgrade:, but also Set-Cookie:) off the response and send the result
> to the client;
Both the decision whether to "eat up" the redirect and how to
alter the header and body of the messages (requests and responses
alike) are left to the "mangler" object. The object is ought to
implement the following methods.
$ma->message_mangler (PARSER, URI)
Return a new mangler object for the given HTTP1::MessageStream
parser state (either request or response) and request URI.
Alternatively, return an URI of the resource to transparently
request instead of the given one.
Return undef if this mangler has nothing to do with the
given parser state and URI.
$ma->parser ([PARSER]), $ma->uri ([URI]),
$ma->start_line ([START-LINE]), $ma->header ([HEADER])
Get or set the HTTP1::MessageStream object, URI, HTTP/1
start line and HTTP/1 header, respectively, associated with
the particular request.
$ma->chunked_p ()
Return a true value if the body is ought to be transmitted
to the remote using chunked coding. (The associated header
is set up accordingly.)
$ma->get_mangled_body_part ()
Return the next part of the (possibly modified) HTTP/1
message body. This will typically involve a call to the
parser object to interpret the portion of the message
currently in its own buffer.
There're currently two such classes implemented: "minimal" and
"header," and I believe that the above interface can be used to
implement rather arbitrary HTTP message filters.
The "minimal" class removes Upgrade and Proxy-Connection headers
from the messages (requests and responses alike) and causes the
calling code to transparently replace all the https: redirects
with requested resources.
The "header" class also filters Strict-Transport-Security and
Set-Cookie off the responses. (Although the former should have
no effect anyway.)
There's a minor issue with the handling of https: redirects.
When
http://example.com/ redirects to
https://example.com/foo/bar,
for instance, the links in the latter document will become
relative to the former URI (unless the 'base' URI is explicitly
given in the document); thus <a href="baz" /> will point to
/baz -- instead of the intended /foo/baz. A likely solution
is to only eat up http:SAME to https:SAME redirects, rewriting
http:SOME to https:OTHER instead to point to http:OTHER (which
will then likely result in a redirect to https:OTHER, in turn
eaten up by the mangler.)
> 7. copy up to Content-Length: octets from the server to the client --
> or all the remaining data if no Content-Length: is given; (somewhat
> surprisingly, this seems to also work with the "chunked" coding not
> otherwise considered in the code);
Both the chunked coding and client-to-server body passing are
now ought to be supported (although POST requests remain untested.)
> 8. close the connection to the server and repeat from step 1 so long
> as the client connection remains active.
[...]
--
FSF associate member #7257
http://am-1.org/~ivan/