sFlow for HTTP

481 views
Skip to first unread message

Peter Phaal

unread,
Oct 6, 2010, 10:44:42 PM10/6/10
to sFlow
Here are some initial ideas for HTTP request and counter structures.
The http_request structure would be exported for sampled HTTP requests
along with an extended socket structure as described in:
http://www.sflow.org/sflow_host.txt

Using sFlow to monitor HTTP is a scalable way to monitor the
performance of large web server clusters or load balancers where
request rates are high and conventional logging solutions generate too
much data or impose excessive overhead. The combination of HTTP and
Memcached sFlow would provide good coverage of the key protocols in a
web 2.0 data center:
http://blog.sflow.com/2010/09/memcached.html

Comments?

----------

enum http_method {
OTHER = 0;
OPTIONS = 1;
GET = 2;
HEAD = 3;
POST = 4;
PUT = 5;
DELETE = 6;
TRACE = 7;
CONNECT = 8;
}

/* HTTP request */
/* opaque = flow_data; enterprise = 0; format = 2201 */
struct http_request {
http_method method; /* method */
string<255> uri; /* URI exactly as it came from the client
*/
string<32> host; /* Host value from request header */
string<255> referer; /* Referer value from request header */
string<64> useragent; /* User-Agent value from request header */
string<32> authuser; /* RFC 1413 identity of user*/
unsigned int bytes; /* content-length of document transferred
*/
unsigned int uS; /* duration of the operation
(microseconds) */
int status; /* HTTP status code */
}

/* HTTP counters */
/* opaque = counter_data; enterprise = 0; format = 2201 */
struct http_counters {
int method_option_count;
int method_get_count;
int method_head_count;
int method_post_count;
int method_put_count;
int method_delete_count;
int method_trace_count;
int methd_connect_count;
int method_other_count;
int status_1XX_count;
int status_2XX_count;
int status_3XX_count;
int status_4XX_count;
int status_5XX_count;
int status_other_count;
}

neilmckee

unread,
Oct 8, 2010, 6:20:05 PM10/8/10
to sFlow
I think the content length should be a 64-bit integer. It's not
uncommon to download more than 2^32 bytes with one HTTP GET.

Similarly, the 32-bit field for duration in microseconds will
overflow if a GET takes 72 minutes or more. Maybe that's not very
important for the kind of analysis that sFlow is targeted at, so
bumping it up to 64 bits seems unnecessary. Perhaps instead we could
just define 0xffffffff to mean "2^32-1 microseconds or more"?

Another useful field we could add here is the mime-type. A string<32>
should be enough to cover it.

Neil
Message has been deleted

neilmckee

unread,
Jul 15, 2011, 2:10:16 PM7/15/11
to sf...@googlegroups.com
Open-source implementations of this experimental HTTP structure are now available for:


based on feedback from these implementations, it looks like two more fields should be added to the HTTP-Request structure.  One to capture the request bytes (e.g. to know the size of a POST operation) and another to capture the "X-Forwarded-For" header field.  So now the proposed structure looked like this:

/* HTTP request */
/* opaque = flow_data; enterprise = 0; format = 2201 */
struct http_request {
  http_method method;        /* method */
  string<255> uri;           /* URI exactly as it came from the client */
  string<32> host;           /* Host value from request header */
  string<255> referer;       /* Referer value from request header */
  string<64> useragent;      /* User-Agent value from request header */
  string<64> xff;            /* X-Forwarded-For value from request header */
  string<32> authuser;       /* RFC 1413 identity of user*/
  string<32> mime-type;      /* Mime-Type */
  unsigned hyper req_bytes;  /* Content-Length of request */
  unsigned hyper resp_bytes; /* Content-Length of response */
  unsigned int uS;           /* duration of the operation (microseconds) */
  int status;                /* HTTP status code */
}


Please comment.

Neil

moseleymark

unread,
Aug 10, 2011, 12:51:59 PM8/10/11
to sFlow
Neil suggested I post to this thread. I've been playing with the nginx
sflow implementation and really like it. The only issue I've had so
far is that the UserAgent string is much too short at 64 bytes.
UserAgent strings are annoyingly (almost baroquely) long but customers
care about them when doing log analysis (we do shared web hosting).
One thing to consider is that a lot of browsers duplicate a bunch of
info at the beginning of the string and append the more unique stuff
later on.

I've increased my sflowtool output to do 255 bytes for useragent,
though I imagine that'll push some packets over the 1500 byte limit,
so probably 128 bytes is a better generic length.

Peter Phaal

unread,
Aug 11, 2011, 11:23:47 AM8/11/11
to sf...@googlegroups.com
Increasing the maximum useragent size to 128 shouldn't cause any
problems and sounds like a good idea.

It is possible for sFlow to transport records larger than the default
maximum datagram size of 1400 bytes (sFlow uses UDP as a transport, so
65,535 is the hard limit). However, keeping the record size well under
the MTU ensures that sFlow operates robustly even if there is packet
loss.

Peter Phaal

unread,
Oct 18, 2011, 3:50:17 PM10/18/11
to sf...@googlegroups.com
The structures have been updated based on experience with the Apache, NGINX, Tomcat and node.js implementations:


The following draft specification describes the new structures:


Changes include the addition of the X-Forwarded-For and request_bytes fields to the http_request structure and support for proxy/load balancer functionality.

Please comment on the draft so we can move to finalize the specification.

Peter
Reply all
Reply to author
Forward
0 new messages