Web Sockets in HAR file: a proposal to extend the format

1,065 views
Skip to first unread message

Ivan De Marino

unread,
Feb 29, 2012, 1:47:39 PM2/29/12
to http-archive-...@googlegroups.com
Hello all,

I'm very new to the details of HAR files, and I couldn't find anything about Web Sockets so far. So, here we go.

For work I recently had to implement the W3C Web Sockets specs.
Studying the actual protocol specs (http://tools.ietf.org/html/draft-ietf-hybi-thewebsocketprotocol-17#section-5) it's clear the Web Sockets set themselves too much a part respect to HTTP transactions to be "traced" as such.

So, I started taking a look at where in the specs format the communication of Web Sockets could fit.

First, the Handshake - well, the WS handshake it's a standard HTTP 101. It's almost always composed of a single HTTP req/res pair, where the Client asks for a WebSocket, the Server accepts (or refuses), and a set of "special" headers are used in the Handshake (most important: the "Update" one).

After that is done, there is no more HTTP going up and down: the socket on which it's based stays open and messages (chunked in frames) are sent up and down.

Second, the Messages - The official protocol defines the internal of the framing, but what we really care about is the Message. The message is a chunk of bytes or text that is sent or received, full duplex over the socket established during the Handshake.

Sending and Receiving are completely independent, asynchronous and decoupled. In other words, it's not like "http entries" in the HAR file: a message doesn't have a request AND a response.

What I'm thinking to do
- The Handshaking of Web Socket it will be described by an "entry" in the current HAR format. The fact that is the "handshaking of a WebSocket" will be captured by the "headers" so, it's a no-op (good!).

The issue is that the messages will probably need their own, new "property" in the JSON of the HAR. I was thinking something like:

        ...
        "pages" : [...],
        "messages": [
            {
                "comment": "",              //< optional
                "socketref": "0123456789",  //< ID of the web socket
                "direction": "out",         //< could be "in" or "out"
                "pageref": "page_0",        //< optional, if it was done in a web page
                "timings": {
                    "receive": -1,          //< -1 if direction is "out"
                    "send": 10,             //< -1 if the direction is "in"
                },
                "type": "text",             //< caould be "text", "binary" or "closing"
                "content": {
                    "size": 33,
                    "text": "\n",           //< optional
                    "comment": ""           //< optional
                }
            }
        ],
        "entries" : [...],
        ...

What's your opinion?
Would this be a good starting point?

I hope to gather some feedback from this group: as far as I understand, this is where the HAR format is discussed.
Right?

Ivan

Ian White

unread,
Mar 1, 2012, 12:31:59 PM3/1/12
to http-archive-...@googlegroups.com
Ivan,

The only thing I noticed after a quick look is that the messages might need to be inside of the "pages" list. Since the web socket connection is opened on a page and closed when you leave the page, it seems to fit better under pages. What do you think of making "messages" a sibling of "entries"?

Disclaimer: Ivan and I work together, and we're both dealing with web sockets in HAR right now. I want to keep the communication open, which is why I'm responding here instead of berating him over Skype.

Ivan De Marino

unread,
Mar 1, 2012, 12:45:05 PM3/1/12
to http-archive-...@googlegroups.com
Hi Ian,

nice to meet you (:P).

Well, I'm actually following the principle applied in the HAR format already: entities represent HTTP req/res, and they refer back to the Page to which they belong via a "pageref" parameter.

That, I suspect, is done because that is not always true: non-web-browser might be interested in producing HAR files of their HTTP transaction. Those will NOT have a webpage to refer to.

The same applies to the "messages" I'm proposing here (btw, the name "messages" is absolutely arbitrary and in no way recommended by me): yes, in general you do have WS inside webpages. But not always: take our VirtualUser Client, where the user can generate HTTP traffic, but NEVER open a webpage.

Ivan

PS Both me and Ian work for Neustar Webmetrics :)

Ian White

unread,
Mar 1, 2012, 1:04:51 PM3/1/12
to http-archive-...@googlegroups.com
Right. Carry on. (I jumped the gun, I now see the "pageref" key is in there)

For those following along, our "VirtualUser Client" is pretty much a souped up browsermob-proxy (http://opensource.webmetrics.com/browsermob-proxy/). A proxy that simulates network conditions, records the traffic that passes through, and can spit out har files. When used programmatically as an HTTP client, individual requests can be made -- in which case there's no actual browser navigation that would be considered a "page" in the har output.

Ivan De Marino

unread,
Mar 1, 2012, 1:14:16 PM3/1/12
to http-archive-...@googlegroups.com

On Thursday, March 1, 2012 6:04:51 PM UTC, Ian White wrote:
Right. Carry on. (I jumped the gun, I now see the "pageref" key is in there)

For those following along, our "VirtualUser Client" is pretty much a souped up browsermob-proxy (http://opensource.webmetrics.com/browsermob-proxy/). A proxy that simulates network conditions, records the traffic that passes through, and can spit out har files. When used programmatically as an HTTP client, individual requests can be made -- in which case there's no actual browser navigation that would be considered a "page" in the har output.
To be precise, we do add a "pageref" even in this case, but what it does is pointing to a "fake" page that represents the "TestStep".

It still gives some sort of "belonging" to those HTTP transaction, bucketing them into Steps.

But those are details of ours.
 
Reply all
Reply to author
Forward
0 new messages