Web traffic statisitics gathering with goproxy

70 views
Skip to first unread message

Tong Sun

unread,
Nov 18, 2014, 9:24:20 AM11/18/14
to gopro...@googlegroups.com
Hi, 


A possible use case that suits goproxy but not Fiddler, is, gathering statisitics on page load times for a certain website over a week. With goproxy you could ask all your users to set their proxy to a dedicated machine running a goproxy server. 

I'd very much like to see an example of it. I gave a thought about the implementation, and thought that the best implementation is not to log to a DB each time a request is made. Instead, better write goproxy's own rotating log, and provide a tool to dump the log content. 

More design consideration on this. If go this way, please implement an API for goproxy command line to tell goproxy server to wrap up the existing log and start a new one. So when we are focused on something, we can start a new log when we start, then start a new log when we finish. So what's left in the middle is exactly what we want. Moreover, since different people might have different logging detail requirements (or for different cases), it is better to have a mechanism to customize how much detail to put into the log. 

This way, the httpdump (https://github.com/elazarl/goproxy/blob/master/examples/httpdump/httpdump.go) can be re-written to dump from the log instead. This give us a finer control of what to dump and when to dump. 

Does it make sense to you as well? 

Thanks

Tong

Elazar Leibovich

unread,
Nov 18, 2014, 9:57:32 AM11/18/14
to Tong Sun, gopro...@googlegroups.com
Makes sense. But don't you think it should be a separate project?

Are you integrated with implementing it?

BTW, what's wrong with writing to a database?

--
You received this message because you are subscribed to the Google Groups "goproxy-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to goproxy-dev...@googlegroups.com.
To post to this group, send email to gopro...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/goproxy-dev/d85603f5-f410-4b50-842d-d59a7d0a7e59%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Tong Sun

unread,
Nov 18, 2014, 10:09:43 AM11/18/14
to Elazar Leibovich, gopro...@googlegroups.com
On Tue, Nov 18, 2014 at 9:57 AM, Elazar Leibovich <ela...@gmail.com> wrote:
Makes sense. But don't you think it should be a separate project? 

No, it's like Apache writing its own access log. There'll be too much internal info that a separate project need and can't get, otherwise, the goproxy would be exposing way too much details to outside. 
 
 BTW, what's wrong with writing to a database?

Writing to its own log has the benefit of DB agnostic, otherwise, goproxy has to know/import every DB driver possible in go. Moreover, goproxy can flexibly change its internal log format, as long as the logdump can understand it. If external DB is involved, the rigid interface/legacy will make improving very hard. 
 

Are you integrated with implementing it?

Sorry, I'm still in learning by example phase. I can even barely understand what httpdump is doing. 

Please help. 

Thanks

Elazar Leibovich

unread,
Nov 18, 2014, 12:25:40 PM11/18/14
to Tong Sun, gopro...@googlegroups.com
Did you try setting proxy.Verbose = true?
It's not exactly an Apache access log, but it's pretty similar.

Tong Sun

unread,
Nov 18, 2014, 6:16:08 PM11/18/14
to Elazar Leibovich, gopro...@googlegroups.com

On Tue, Nov 18, 2014 at 12:25 PM, Elazar Leibovich <ela...@gmail.com> wrote:
Did you try setting proxy.Verbose = true?

Alright, I'll give it a try. 
Hmmm... sorry for a noob question, how can I tell if it is making any difference? 

Thanks

Elazar Leibovich

unread,
Nov 18, 2014, 10:07:53 PM11/18/14
to Tong Sun, gopro...@googlegroups.com
What do you mean "making difference"?
Makes difference in performance?

Tong Sun

unread,
Nov 19, 2014, 12:25:41 AM11/19/14
to Elazar Leibovich, gopro...@googlegroups.com
Oh, I meant the impact of setting proxy.Verbose = true. 

Having set it to true, how to gather statisitics on page load times? I haven't check the goproxy source, and there is not much document to tell how to get statisitics on page load times. Actually, even if I do look at goproxy source, I won't understand much. 

Sorry for being dense. 

Elazar Leibovich

unread,
Nov 19, 2014, 2:13:33 AM11/19/14
to Tong Sun, gopro...@googlegroups.com
You should probably look for benchmarking proxys.

Tong Sun

unread,
Nov 19, 2014, 9:05:42 AM11/19/14
to Elazar Leibovich, gopro...@googlegroups.com

On Wed, Nov 19, 2014 at 2:13 AM, Elazar Leibovich <ela...@gmail.com> wrote:
You should probably look for benchmarking proxys.


Ah, wonderful, Thanks!

Elazar Leibovich

unread,
Nov 19, 2014, 9:20:46 AM11/19/14
to Tong Sun, gopro...@googlegroups.com
Great!
If you benchmark goproxy, please share the results with us.

Tong Sun

unread,
Nov 21, 2014, 7:04:06 PM11/21/14
to Elazar Leibovich, gopro...@googlegroups.com

On Wed, Nov 19, 2014 at 2:13 AM, Elazar Leibovich <ela...@gmail.com> wrote:

You should probably look for benchmarking proxys.


Oooh, after having a closer look, I realized that it is not a benchmarking proxys, but just a random request sender. So unfortunately that's not what I want. 

I need a proxy that can gather statistics on page load times. The reason that I pick goproxy to do this is that I can use it to force me to learn go, and have a meaningful project to work on. Looks like it is not feasible in near future then. Anyway,

Thanks for all your helps

Elazar Leibovich

unread,
Nov 22, 2014, 10:29:58 AM11/22/14
to Tong Sun, gopro...@googlegroups.com
I sent you to this specific tool, since it supports defining proxy server for the benchmark.

Isn't that the case?

Tong Sun

unread,
Nov 22, 2014, 12:14:39 PM11/22/14
to Elazar Leibovich, gopro...@googlegroups.com
Unfortunately no, it is just a random request sender written in a few lines of C. It might support defining proxy to benchmark, when it benchmarks static-paged sites. But what I need is a proxy that can benchmark requests, not a request sender that benchmark proxy.

Elazar Leibovich

unread,
Nov 22, 2014, 12:17:20 PM11/22/14
to Tong Sun, gopro...@googlegroups.com
In that case I don't understand. Why do you need the proxy to measure the server time? Why can't you just do that with regular site benchmarking?

What are you trying to measure?

Tong Sun

unread,
Nov 24, 2014, 9:12:42 AM11/24/14
to Elazar Leibovich, gopro...@googlegroups.com
On Sat, Nov 22, 2014 at 12:17 PM, Elazar Leibovich <ela...@gmail.com> wrote:
In that case I don't understand. Why do you need the proxy to measure the server time? Why can't you just do that with regular site benchmarking?
What are you trying to measure?

The reason that regular site benchmarking won't cut:

  • password protected
  • rigid workflow. The steps have to be from A ->B ->C, etc
  • data variety. Even they walk the same workflow, their data volume will be different, because the respond vary from person to person
  • web service. Hundreds of different web service calls have the same entry point
  • customization. need to differentiate each web service calls according to none-standard propertied payloads
  • thus it is impossible to use any site benchmarking tools, 
  • but we do have a large volume of manual visit requests
  • that's why a proxy that can benchmark requests is the best fit

Elazar Leibovich

unread,
Nov 24, 2014, 11:58:08 AM11/24/14
to Tong Sun, gopro...@googlegroups.com
I still don't get it. How do you intend to solve those problems with a proxy?

Tong Sun

unread,
Nov 24, 2014, 12:03:47 PM11/24/14
to Elazar Leibovich, gopro...@googlegroups.com
As long as the proxy can gather statistics on page load times, then that's all that I need.

Matthew Zimmerman

unread,
Nov 24, 2014, 2:49:05 PM11/24/14
to Tong Sun, Elazar Leibovich, gopro...@googlegroups.com
When you say "statistics" you mean an individual http request and
subsequent response?

In my understanding of the command usage of "page load times", they're
only relevant from a browser perspective where the browser needs ~50
resources to put a "page" together let alone actually render the html
and process any javascript. When you say "page load time" that
typically means from a browser perspective which is something that
goproxy is not going to be able to give you.

If you instead want to be looking at response times, in verbose mode,
goproxy tells you when it makes the request and when it returns the
request (for http at least).
https://github.com/elazarl/goproxy/blob/master/proxy.go#L101
https://github.com/elazarl/goproxy/blob/master/proxy.go#L141

A log analysis on that will give you all the data that goproxy
realistically can.
> --
> You received this message because you are subscribed to the Google Groups
> "goproxy-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to goproxy-dev...@googlegroups.com.
> To post to this group, send email to gopro...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/goproxy-dev/CAMmz1OcFhjz5HcygKRRX45%3DmM8%3DFjsWfy7em77%2Bh7%2B9j6ds%3DGA%40mail.gmail.com.

Tong Sun

unread,
Nov 24, 2014, 3:10:35 PM11/24/14
to Matthew Zimmerman, Elazar Leibovich, gopro...@googlegroups.com
On Mon, Nov 24, 2014 at 2:49 PM, Matthew Zimmerman <mzimm...@gmail.com> wrote:
If you instead want to be looking at response times, in verbose mode,
goproxy tells you when it makes the request and when it returns the
request (for http at least).
https://github.com/elazarl/goproxy/blob/master/proxy.go#L101
https://github.com/elazarl/goproxy/blob/master/proxy.go#L141
 
Yes, response times will do. Thanks. 

A log analysis on that will give you all the data that goproxy
realistically can.

If I got ten different people accessing the same page almost at the same time, how can I pair their request & response logs up to tell which is which? 


Reply all
Reply to author
Forward
0 new messages