Questions about transparent proxy example

291 views
Skip to first unread message

Patrick Mézard

unread,
Jan 1, 2015, 10:39:43 AM1/1/15
to gopro...@googlegroups.com
Hello,

I am using goproxy as a transparent HTTP/HTTPS proxy for ads stripping using adblock rules. I have a working version hacked from the transparent proxy example but there are a couple of things unclear to me in:

https://github.com/elazarl/goproxy/blob/master/examples/transparent/transparent.go

My understanding is:
- I can register request/response handlers for regular HTTP transparent proxying
- HTTPS is handled by a dedicated server, wrapping incoming connections in CONNECT and redirecting them to goproxy. Requests made on these connection will be handled by the handlers defined for HTTP.

What I am missing is:
1- Why the need for NonproxyHandler? It seems to be used for request with relative URLs. Why have these to be handled separately?

2- What is the use for HijackConnect machinery defined in:

https://github.com/elazarl/goproxy/blob/master/examples/transparent/transparent.go#L47

It seems to directly forward CONNECT requests made on :80, bypassing the request/response handlers. Is that correct?

3- Is the ReqHostMatches() in:

https://github.com/elazarl/goproxy/blob/master/examples/transparent/transparent.go#L45

really necessary?

Wouldn't:

proxy.OnRequest().HandleConnect(goproxy.AlwaysMitm)

be good enough?

Thanks for any hints.
--
Patrick Mézard

Matthew Zimmerman

unread,
Jan 2, 2015, 10:57:26 AM1/2/15
to Patrick Mézard, gopro...@googlegroups.com
I wrote this and it was my first use of goproxy.  I heavily mirrored it off of the eavesdropper example and hacked through it till it worked for my purposes.  You're probably correct in all of your analysis, but I really haven't thought about it.  I find the goproxy API a little confusing, but I don't have any good suggestions on how to address it either.  I've been working on https://github.com/mzimmerman/whitelistproxy lately.  After I got the goproxy internals working, I haven't had to look at it as much recently.

I'd like to fix/clear up the logging though so that it's more clear in an after-the-fact log analysis what happened and when.

--
You received this message because you are subscribed to the Google Groups "goproxy-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to goproxy-dev+unsubscribe@googlegroups.com.
To post to this group, send email to gopro...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/goproxy-dev/54A56A3C.6020609%40mezard.eu.
For more options, visit https://groups.google.com/d/optout.

Elazar Leibovich

unread,
Feb 12, 2015, 4:42:20 AM2/12/15
to Patrick Mézard, gopro...@googlegroups.com
See comments inline
 
On Thu Jan 01 2015 at 5:39:43 PM Patrick Mézard <pat...@mezard.eu> wrote:
Hello,

What I am missing is:
1- Why the need for NonproxyHandler? It seems to be used for request with relative URLs. Why have these to be handled separately?

By default, a goproxy proxy behaves like a regular proxy. That is, it'll ignore non-proxy requests sent directly to it.

This is a good default, but in case you want to handle direct request otherwise - you have the NonproxyHandler.
 

2- What is the use for HijackConnect machinery defined in:

  https://github.com/elazarl/goproxy/blob/master/examples/transparent/transparent.go#L47

It seems to directly forward CONNECT requests made on :80, bypassing the request/response handlers. Is that correct?

IIRC this is for requests done with "curl -p -x proxy:8080", that is, regular HTTP requests, done by issuing a CONNECT to the proxy.
No it isn't. This is just a convenient function. In retrospect, I would've put all these convenient functions in a separate, less stable package.

Alas, I don't want to break backwards compatibility for this now.

I already thought to beautify the API with goproxy v2 (e.g., goproxy.New() instead of the NewHTTPProxy), but I'm not sure it's worth the trouble.

Thanks for using goproxy,

Sorry for the time it took me to response. If you have any other questions - do not hesitate to ask.

Patrick Mézard

unread,
Feb 14, 2015, 6:14:24 AM2/14/15
to Elazar Leibovich, gopro...@googlegroups.com
Hello,

Sorry for the late feedback, I should have answered to Matthew Zimmerman much earlier. In the meantime I published the proxy I was working on here if anyone is interested:

https://github.com/pmezard/adblock

the proxy bits being around here:

https://github.com/pmezard/adblock/blob/master/adstop/adstop.go#L186

Nothing really interesting, this is a partial copy-paste of the example discussed below. The HTTP part works very well, the HTTPS one works but has random failures and is fairly slow. I run it on an old low-powered ARM device, I do not know if it is the encryption itself which burns all the CPU or if the certificate generation may be a problem too. I still have to profile it.

On 12/02/15 10:42, Elazar Leibovich wrote:
> On Thu Jan 01 2015 at 5:39:43 PM Patrick Mézard <pat...@mezard.eu
> <mailto:pat...@mezard.eu>> wrote:
> 2- What is the use for HijackConnect machinery defined in:
>
> https://github.com/elazarl/__gop__roxy/blob/master/examples/__tran__sparent/transparent.go#L47
> <https://github.com/elazarl/goproxy/blob/master/examples/transparent/transparent.go#L47>
>
> It seems to directly forward CONNECT requests made on :80, bypassing
> the request/response handlers. Is that correct?
>
> IIRC this is for requests done with "curl -p -x proxy:8080", that is,
> regular HTTP requests, done by issuing a CONNECT to the proxy.

Ah proxy tunneling, I see.

> 3- Is the ReqHostMatches() in:
>
> https://github.com/elazarl/__gop__roxy/blob/master/examples/__tran__sparent/transparent.go#L45
> <https://github.com/elazarl/goproxy/blob/master/examples/transparent/transparent.go#L45>
>
> really necessary?
>
> No it isn't. This is just a convenient function. In retrospect, I
> would've put all these convenient functions in a separate, less
> stable package.
>
> Alas, I don't want to break backwards compatibility for this now.
>
> I already thought to beautify the API with goproxy v2 (e.g.,
> goproxy.New() instead of the NewHTTPProxy), but I'm not sure it's
> worth the trouble.

Everything is much clearer now. The transparent example should probably be split, it does too many things, some of them unrelated to the "transparent" part. The github code I linked above is really a transparent proxy, I did not need the regular proxying part and have no idea the hijacking part was about tunneling. It took me some time to figure what was useful or not. Plus the transparent bit was mixed with the TLS part, which I needed too. What do you think of splitting the example into:
- transparent proxy: regular proxy + NonproxyHandler
- HTTPS handling: regular proxy + the TLS unwrapping bits
- proxy tunneling: regular proxy + hijacker

I have not looked at the on-the-fly certificate generation part yet, but is there any caching? Do you feel it could be useful in my case? (and yes I should profile).

Thank you for goproxy.
--
Patrick Mézard

Elazar Leibovich

unread,
Feb 14, 2015, 11:26:51 AM2/14/15
to Patrick Mézard, gopro...@googlegroups.com

Thanks!
Feedback is much a appreciated.

Certificate generation is not cached, and might be the culprit.

You can override it though.

I'll be happy to see a pull request to clarify the example if you have time.

Problems in https are important. I'll be happy to here more about it as well.

Thanks!

Patrick Mézard

unread,
Feb 14, 2015, 12:54:55 PM2/14/15
to Elazar Leibovich, gopro...@googlegroups.com
On 14/02/15 17:26, Elazar Leibovich wrote:
> Thanks!
> Feedback is much a appreciated.
>
> Certificate generation is not cached, and might be the culprit.
>
> You can override it though.

OK, I will try this.

> I'll be happy to see a pull request to clarify the example if you have time.

I will try to find some time. Taking a quick look, here are the first things I am tempted to do:
- Add a section linking to the examples in the main README.md
- Unify examples naming:
* There is camel-case and pure lower-case and lower-case with dashes. I tend to prefer pure lower-case without separator for real utilities but dashes might be better for examples: noRedditAtWorktime => no-reddit-at-worktime
* Something I find really annoying with example binaries in go libraries is they have no "namespace" and tend to overwrite each other when you run "go install ./..." on a ton of libraries, sometimes causing the build to fail (just "find src -name example.go" ...). At work, I often remove the examples from the vendored libraries because of this. What do you think of prefixing all example with "goproxy-", like "goproxy-no-reddit-at-worktime".
- Add a README.md in every example repository and mention its existence in the related code file

Then we can talk about reorganizing things.

> Problems in https are important. I'll be happy to here more about it as well.

I will check my logs once I have tried to improve the certificate part. Taking a fresh look, some of the issues are related to certificate pinning by specific applications like iTunes, so it works like expected. I used to get random EOF or unexpected SNI errors though.
--
Patrick Mézard

Patrick Mézard

unread,
Apr 19, 2015, 1:51:01 PM4/19/15
to Elazar Leibovich, gopro...@googlegroups.com
On 14/02/15 17:26, Elazar Leibovich wrote:
> Thanks!
> Feedback is much a appreciated.
>
> Certificate generation is not cached, and might be the culprit.

That was it. The fix:
- Overrides the CONNECT action to cache generated tls.Config based on requested host
- Extends the previous fix to handle concurrent generations of the same host. If you visit https://example.com you easily end up loading several HTTPS resources from the same domain at the same time. A bit more logic is required to generate only one config.

It helps a lot, tls.Config generation was killing the box.

Code is here if someone is interested:

https://github.com/pmezard/adblock/blob/master/adstop/adstop.go#L197

Maybe I can improve that by generating only for domain/*.domain to avoid subdomains configurations. I don't know how hard it would be.

--
Patrick Mézard
Reply all
Reply to author
Forward
0 new messages