Suggestions for HAR 1.2

69 views
Skip to first unread message

Jan Odvarko

unread,
May 3, 2010, 9:20:54 AM5/3/10
to HTTP Archive Specification
Since there is already a list of suggestions for additional fields
into HAR, I believe it's good time to start on HAR 1.2

Here is the list:

1) Comments
2) Binary response bodies
3) Geo Location
4) Connection speed
5) Socket numbers, proxy info, port/IP + IPV6
6) Render/JS timing events and CPU/Mem utilization
7) Page screenshots
8) Grouping by process
9) Intermediate HTTP responses 1xx
10) Anything else... ?

I am presenting all suggestions I have collected over the time, but it
doesn't mean all must go in HAR 1.2. I would personally pick only
those that are immediately useful for an existing tool so we have a
feedback about the real usage/scenario.

My personal vote goes for: #1, #2

---

#1) === Comments ===
Every field (e.g. log, request, response, etc.) should have a field
'comment' that can be used by various tools to append and/or read
provided comments.

- comment [string, optional] - A comment provided by the user.


#2) === Binary response bodies ===
As mentioned also in this thread:
http://groups.google.com/group/http-archive-specification/browse_thread/thread/5b9b2943e87d205d?hl=en

...there are cases where including also binary responses (e.g. images)
into the HAR file would be useful.

This change would be related to the existing <content> field. Current
definition:
"content": {
"size": 33,
"compression": 0,
"mimeType": "text/html; charset="utf-8",
"text": "<html><head></head><body/></html>\n"
}

Binary data should be encoded (e.g. using base64) where specific
encoding is also stored so the importer knows how to decode.

A new field:
- encoding [string, optional]: Encoding used for response text e.g
"base64".

Notes:
* Exporters should implement an option that allows to switch off
binary response export since the HAR file size can dramatically
increase.
* The new 'encoding' field could be avoided if base64 is mandatory.


#3) === Geo Location ===
Geographic location included in HAR. There should be only one
geographic location per page (?) so, this could be part of the
existing <page> element.

Current definition:
{
"startedDateTime": "2009-04-16T12:07:25.123+01:00",
"id": "page_0",
"title": "Test Page",
"pageTimings": {...}

}

New Field:
- geoLocation [string, optional] - Geographical location of the
client.

Note:
- 'geoLocation' could be rather part of the <creator> or <browser>
fields.


#4) === Connection speed ===
Connection speed at the time of measurement. I believe that this value
is related to a page load speed and, should be part of the page
element.

New Field
- bitrate [number, optional] - Connection speed (bit/s, bits per
second).


#5) === Socket Numbers ===
Should include following:
+ Port/IP + IPV6 (Source and destination)
+ Proxy info

I think this info should be part of the <entry> element.
Any specific suggestions for the structure?


#6) === Render/JS timing events and CPU/Mem utilization ===
Additional timing information. It could be part of the <page> element
that already contains e.g. <pageTimings> structure.

"pageTimings": [
{
"onContentLoad": 1720,
"onLoad": 2500
}
]

This structure could contain additional events and timing [ms]. Any
proposals for specific new fields?


#7) === Page screenshots ===
It should be possible to store also a list of page screenshots (taken
at various phases of page load).

New field: <pageScreenshots> in <page>.
(list of screenshots)

"pageScreenshot": [
{
"data": "",
"time": 2500
},
{
...
}
]

- data [string] - Encoded screenshot data (base64).
- time [string] - Number of milliseconds since the page load started
(page.startedDataTime)


#8) === Grouping by process ===
Additional grouping of requests. Currently requests are grouped by
parent page ID ('pageref' field in <entry>) + there is a list of pages
<pages>.

This would require new field in <log> called <processes> and new field
in <entry> called 'processref' (all optional).


#9) === Intermediate HTTP responses 1xx ===
Somebody proposed this one, but I can't remember detailed description.
Anyone?


Comments?

Honza

Bryan McQuade

unread,
May 3, 2010, 9:31:58 AM5/3/10
to HTTP Archive Specification
Thanks Jan. I agree that binary response bodies are immediately useful
and I'd like to see an implementation land soon. I can send you a
patch for netexport if you like.

I think the "encoding" field sounds fine. If omitted it implies that
there is no encoding. Otherwise it could be "base64" or possibly other
values.

RE: the comments:
* Exporters should implement an option that allows to switch off
binary response export since the HAR file size can dramatically
increase.
The HAR file size issue isn't specific to binary content. For instance
when I export CNN.com right now without any binary bodies, the HAR is
1.6Mb. nytimes.com is 1.7Mb. So perhaps having a general "don't
include any response bodies" is a better option. I think exporting all
bodies should be enabled by default since it's backward compatible and
users should have to be explicit about information that they don't
want. For instance if someone wants to keep an HTTP archive and they
forget to turn on response bodies they've lost some very important
data in their archives.

* The new 'encoding' field could be avoided if base64 is mandatory.
I don't think base64 should be mandatory. base64 increases the size
and makes text responses less readable. I like your suggestion of
having an 'encoding' field instead.
> As mentioned also in this thread:http://groups.google.com/group/http-archive-specification/browse_thre...

Jan Odvarko

unread,
May 3, 2010, 10:04:56 AM5/3/10
to HTTP Archive Specification
On May 3, 3:31 pm, Bryan McQuade <bmcqu...@google.com> wrote:
> Thanks Jan. I agree that binary response bodies are immediately useful
> and I'd like to see an implementation land soon. I can send you a
> patch for netexport if you like.
That would be great, thanks!

>
> I think the "encoding" field sounds fine. If omitted it implies that
> there is no encoding. Otherwise it could be "base64" or possibly other
> values.
Yes

> RE: the comments:
> * Exporters should implement an option that allows to switch off
> binary response export since the HAR file size can dramatically
> increase.
> The HAR file size issue isn't specific to binary content. For instance
> when I export CNN.com right now without any binary bodies, the HAR is
> 1.6Mb. nytimes.com is 1.7Mb. So perhaps having a general "don't
> include any response bodies" is a better option.
Good point, ok I'll create such option for NetExport.

> I think exporting all
> bodies should be enabled by default since it's backward compatible and
> users should have to be explicit about information that they don't
> want. For instance if someone wants to keep an HTTP archive and they
> forget to turn on response bodies they've lost some very important
> data in their archives.
>
> * The new 'encoding' field could be avoided if base64 is mandatory.
> I don't think base64 should be mandatory. base64 increases the size
> and makes text responses less readable. I like your suggestion of
> having an 'encoding' field instead.
Cool, I also prefer such flexibility.

Honza

Steve Souders

unread,
May 3, 2010, 1:09:40 PM5/3/10
to http-archive-...@googlegroups.com, Jan Odvarko
HAR has made great progress in becoming an industry standard. It would
be good to get more tools to adopt HAR. Let's generate a top ten list of
tools we wish would adopt HAR, and then figure out how to make that
happen. In some cases it's evangelism and support. In other cases it
might require code changes. And other cases the target might be an open
source project that we could contribute to directly.

This doesn't necessarily need to be on the 1.2 list, but I think wider
adoption is a higher priority now than additional features. (Unless
those features are necessary for wider adoption.)

-Steve

Patrick Meenan

unread,
May 3, 2010, 1:44:40 PM5/3/10
to http-archive-...@googlegroups.com, Jan Odvarko
I'd say that the binary responses are probably a key to wider adoption.
PageSpeed (and WebPagetest) would not be able to run any of the image
optimization checks without access to the actual image data.

I don't know that any of the others are as critical but I think that one is
pretty key.

-Pat

Patrick Meenan

unread,
May 3, 2010, 4:45:59 PM5/3/10
to http-archive-...@googlegroups.com
As I'm cleaning up the WebPagetest HAR export code I bumped into one more
thing that looks to be missing from the current spec. Is there a place in
the timings for an event object to include SSL negotiation time? I'm
currently adding it into the TCP connect time because that seemed like the
best fit but that's a pretty big missing one for me.

To be backward compatible we'd probably have to explicitly state that if it
is defined then the time is also included in the TCP connect time (unless it
belongs somewhere else).

Thanks,

-Pat

-----Original Message-----
From: http-archive-...@googlegroups.com

Sergey Chernyshev

unread,
May 3, 2010, 10:24:33 PM5/3/10
to http-archive-...@googlegroups.com
I agree with Bryan and Patrick that bodies (binary or otherwise) should be treated equally and included in reports when needed.

I believe, there should still be an option to save "time skeleton" only that contains enough data to draw waterfall diagrams, but otherwise skinny and optimized for saving data. This will allow tools to collect this data easily and store as many snapshots as possible without concentrating on the volume of data too much.

Thinking for a while about it, I believe it's better to have full version as default and this should be clear in documentation that this is highly encouraged because there should be one format and not many.

It's probably a good idea to add recommendation and best practice of storing HARs in compressed form (Show Slow already compresses HARs for storage).

        Sergey


--
Sergey Chernyshev
http://www.sergeychernyshev.com/

Sergey Chernyshev

unread,
May 3, 2010, 10:32:02 PM5/3/10
to http-archive-...@googlegroups.com
Thinking about use-case of har_to_pagespeed and similar tools to annotate a waterfall later displayed with HARViewer or alike, I believe it might be a good idea to introduce some form of standard for annotations (comments) to the elements.

I'm not 100% sure if that should be in the core of HAR spec or HAR spec should be providing an extension framework, but I can imagine that some items can be annotated with rankings or steps to improve them (either bodies or network properties like DNS time or cache properties).

        Sergey

Jan Odvarko

unread,
May 4, 2010, 3:31:09 AM5/4/10
to HTTP Archive Specification
On May 3, 10:45 pm, Patrick Meenan <patmee...@gmail.com> wrote:
> As I'm cleaning up the WebPagetest HAR export code I bumped into one more
> thing that looks to be missing from the current spec.  Is there a place in
> the timings for an event object to include SSL negotiation time?
No

> I'm currently adding it into the TCP connect time because that seemed like the
> best fit but that's a pretty big missing one for me.
>
> To be backward compatible we'd probably have to explicitly state that if it
> is defined then the time is also included in the TCP connect time.
Yes

---

The current definition of <timings> object looks like as follows:

"timings": {
"blocked": 0, // time spent in a browser queue
"dns": -1, // dns resolution time
"connect": 15, // time required to create TCP connection
"send": 20, // time required to send HTTP request
"wait": 38, // waiting for a response
"receive": 12 // reading entire response.
}

The new proposed field would be:
- ssl [number, optional] - Time required for SSL/TLS negotiation. If
this field is defined then the time is also included in the connect
field (to ensure backward compatibility with HAR 1.1). Use -1 if the
timing does not apply to the current request.

So, following is still true:
entry.time == entry.timings.blocked + entry.timings.dns +
entry.timings.connect + entry.timings.send + entry.timings.wait +
entry.timings.receive;
(where 'entry' is an object in 'log.entries')

Looks good to me, comments?

Honza

Jan Odvarko

unread,
May 4, 2010, 3:58:29 AM5/4/10
to HTTP Archive Specification
On May 4, 4:32 am, Sergey Chernyshev <sergey.chernys...@gmail.com>
wrote:
> Thinking about use-case of har_to_pagespeed and similar tools to annotate a
> waterfall later displayed with HARViewer or alike, I believe it might be a
> good idea to introduce some form of standard for annotations (comments) to
> the elements.
> I'm not 100% sure if that should be in the core of HAR spec or HAR spec
> should be providing an extension framework, but I can imagine that some
> items can be annotated with rankings or steps to improve them (either bodies
> or network properties like DNS time or cache properties).
The way how to extend HAR with custom fields is described in section
"Custom Fields" in the spec (http://groups.google.com/group/firebug-
working-group/web/http-tracing---export-format#Custom%20Fields)

I am personally voting for including a 'comment' field [string,
optional] (can be part of any object) into the core of HAR spec
(unless there is a reason why not to do it). Since HAR is targeted for
HTTP traffic analysis, I often tend to comment gathered results to
point out some specifics (e.g. why particular timing is what it is,
what the monitored page(s) is actually doing, what the entire HAR file
is supposed to describe, etc.) to explain the results to the end user
(for learning purposes or as results of performance analysis of
particular site).

HARViewer is currently supporting a custom comment fields as
"_comment".

Honza

>
>         Sergey
> >http://groups.google.com/group/http-archive-specification/browse_thre...

Jan Odvarko

unread,
May 4, 2010, 4:08:11 AM5/4/10
to HTTP Archive Specification
> It's probably a good idea to add recommendation and best practice of storing
> HARs in compressed form (Show Slow already compresses HARs for storage).
Agree, according to my experience, using (ZIP) compression on a HAR
file dramatically reduces it's size (when tested with no binary
responses included, zipped file was ~20-25% of the original).

I was thinking about a new extension "zhar" that would refer to zipped
HAR. Of course compression can be done automatically if a HAR file is
received from the server (using Content-Encoding: gzip) to be e.g.
displayed in the HARViewer.

Honza



Patrick Meenan

unread,
May 4, 2010, 8:03:31 AM5/4/10
to http-archive-...@googlegroups.com
Looks good to me as well.

-Pat

-----Original Message-----
From: http-archive-...@googlegroups.com
[mailto:http-archive-...@googlegroups.com] On Behalf Of Jan
Odvarko
Sent: Tuesday, May 04, 2010 3:31 AM
To: HTTP Archive Specification
Subject: Re: Suggestions for HAR 1.2

Sergey Chernyshev

unread,
May 4, 2010, 10:10:02 AM5/4/10
to http-archive-...@googlegroups.com
Jan,

A bit of a side-topic - do you envision any problems including bodies for HARViewer?
I mean, browser loading all this data, and maybe from multiple HARs - browers memory can potentially blow up.

        Sergey

Sergey Chernyshev

unread,
May 4, 2010, 10:13:09 AM5/4/10
to http-archive-...@googlegroups.com
I see - current extension mechanism is not portable - for internal consumption only, e.g. "Parsers MUST ignore all custom fields and elements if the file was not written by the same tool loading the file." which means that if there is a cross-tool usage, then fields must be included in the main spec.

What's the best practice here? for tool developer to use _something fields first and when usage is stable to propose removing the underscore and including in the next spec version?

             Sergey

Jan Odvarko

unread,
May 4, 2010, 10:20:42 AM5/4/10
to HTTP Archive Specification


On May 4, 4:10 pm, Sergey Chernyshev <sergey.chernys...@gmail.com>
wrote:
> A bit of a side-topic - do you envision any problems including bodies for
> HARViewer?
> I mean, browser loading all this data, and maybe from multiple HARs -
> browers memory can potentially blow up.
Yes. Since Firebug UI is mostly in HTML too, we have already
experienced the problem with large responses (in the Console and Net
panel). The solution we use is truncation of displayed responses to a
reasonable size.

There can be also a link displayed at the end of truncated response,
saying something like: "Open full response in a new tab". In case of
huge responses, this operation could still freeze/crash the browser,
but it's a way how to see more than within the HAR Viewer UI.

Also note that the HAR viewer displays a response only if it's
manually expanded by the user.

Honza

Jan Odvarko

unread,
May 4, 2010, 10:22:42 AM5/4/10
to HTTP Archive Specification
On May 4, 4:13 pm, Sergey Chernyshev <sergey.chernys...@gmail.com>
wrote:
> I see - current extension mechanism is not portable - for internal
> consumption only, e.g. "Parsers MUST ignore all custom fields and elements
> if the file was not written by the same tool loading the file." which means
> that if there is a cross-tool usage, then fields must be included in the
> main spec.
Precisely

> What's the best practice here? for tool developer to use _something fields
> first and when usage is stable to propose removing the underscore and
> including in the next spec version?
Yes

Honza

Sergey Chernyshev

unread,
May 4, 2010, 11:56:24 AM5/4/10
to http-archive-...@googlegroups.com
Sounds great - we'll see if this additional features are actually needed and used first.

            Sergey

Jan Odvarko

unread,
May 7, 2010, 11:24:55 AM5/7/10
to HTTP Archive Specification
Just to summarize proposed additions we want to have in HAR 1.2:


#1) === Comments ===
Every field (log, request, response, etc.) can have a 'comment' field
that is used by various tools/users to provide a description.

- comment [string, optional]: A description provided by the tool/user.


#2) === Binary response bodies ===
Modified object:

"content": {
"size": 33,
"compression": 0,
"mimeType": "text/html; charset="utf-8",
"text": "<html><head></head><body/></html>\n",
"encoding": "base64"
}

New field:
- encoding [string, optional]: Encoding used for response text e.g
"base64". If not specified, the text field is HTTP decoded
(decompressed & unchunked), than trans-coded from its original
character set into UTF-8*.

* Not sure yet with the UTF-8. There are some open questions about
escaping Unicode characters.


#10) === SSL/TLS Negotiation timings ===
Modified object:

"timings": {
"blocked": 0, // time spent in a browser queue
"dns": -1, // dns resolution time
"connect": 15, // time required to create TCP connection
"send": 20, // time required to send HTTP request
"wait": 38, // waiting for a response
"receive": 12, // reading entire response.
"ssl": 0 // ssl negotiation

}

New field:
- ssl [number, optional]: Time required for SSL/TLS negotiation. If
this field is defined then the time is also included in the connect
field (to ensure backward compatibility with HAR 1.1). Use -1 if the
timing does not apply to the current request.

Following must be true:
entry.time == entry.timings.blocked + entry.timings.dns +
entry.timings.connect + entry.timings.send + entry.timings.wait +
entry.timings.receive;
(where 'entry' is an object in 'log.entries')

---

Any thoughts about zipping the HAR file? Do we want to extend the core
HAR spec to support zipped har files? Also, in order to consume HAR
files online (within the browser) we can extend the core spec to
support JSONP. In such case the HAR log data would have to be enclosed
within a JS function like as follows:

function onInputData({
"log": { ... }
});

This doesn't have to be necessarily in the spec, but there could be at
least some guide lines how to properly handle both cases (ZIP, JSONP).


Honza

Bryan McQuade

unread,
May 7, 2010, 11:39:49 AM5/7/10
to http-archive-...@googlegroups.com
IMO compression should be optional (should not require that every HAR
file is stored compressed). Likewise HAR programs should not be
required to support compressed HAR files. If compression support is
mandatory you've required every program that supports HAR to link a
decompressor in, which might not be convenient/possible in some cases.
If a program does not support compressed HAR then it's the
responsibility of the user to decompress before passing the HAR file
into the program.

Given all this I think you could probably just leave compression out
of the spec or mention it briefly ("In order to store HAR files more
efficiently, it is recommended that you compress HAR files on disk.").

If you are going to make a specific type of compression part of the
spec I would prefer gzip instead of zip since gzip is more widely
supported by web tools today (since gzip is commonly used over HTTP,
whereas zip is not).

Steve Souders

unread,
May 7, 2010, 11:55:14 AM5/7/10
to http-archive-...@googlegroups.com
What about Mark Nottingham's suggestions about storing information at
the raw byte level?

http://www.mnot.net/blog/2010/05/05/har

-Steve
>> (decompressed& unchunked), than trans-coded from its original

Bryan McQuade

unread,
May 7, 2010, 12:17:20 PM5/7/10
to http-archive-...@googlegroups.com
Most of the hook points that HAR implementers have access to can't get
at the raw byte streams. For instance there isn't an API in Firefox to
get at the raw streams, that I'm aware of. It could be an optional
part of the spec but I don't think many tools would be able to provide
that kind of data with on-the-wire byte accuracy.

Sergey Chernyshev

unread,
May 9, 2010, 2:08:50 PM5/9/10
to http-archive-...@googlegroups.com
Re Compression:
I'm not 100% sure about the disk storage - I think it's up to a tool if it needs to compress the files when storing them permanently. ShowSlow, for instance, zips HARs when storing them in MySQL.

As for the transfer encoding, I think we can leave it to HTTP to provide - again, ShowSlow.com uses gzip for compression by default - not internally, but by configuring Apache to do so: http://code.google.com/p/showslow/source/browse/trunk/.htaccess

I don't think it should be part of the spec itself (with additional extension, mime type and so on), but should probably be described in best practices section instead, being quite traditional for HTTP-based communication.

Re: JSONP
It's a very good feature and should be encouraged. On the other hand though - I don't know if it should be part of the spec itself as it might bring too much confusion. I think the HAR itself should be just raw JSON, but recommendations for on-line serving might include "callback" URL parameter that should wrap HAR into a callback to a function. BTW, it shouldn't be a function declaration, but a call to it (as you probably meant to do):


onInputData({
        "log": { ... }
});

Again, I think it's a relatively traditional thing to do with JSON payloads so it probably needs to live in best practices.by 

BTW, I noticed that HAR spec itself lives withing Firebug's group (probably for historical reasons) and it confused me a lot when I originally tried to find all the information around the topic. It's probably a good idea to move it to this group's Pages to encourage independent usage ;)

Thank you,


        Sergey


--
Sergey Chernyshev
http://www.sergeychernyshev.com/


Jan Odvarko

unread,
May 10, 2010, 1:04:26 PM5/10/10
to HTTP Archive Specification
> BTW, I noticed that HAR spec itself lives withing Firebug's group
> (probably for historical reasons) and it confused me a lot when I originally
> tried to find all the information around the topic. It's probably a good
> idea to move it to this group's Pages to encourage independent usage ;)
Done, the spec is now here:
http://groups.google.com/group/http-archive-specification/web/har-1-1-spec

Honza

Jan Odvarko

unread,
May 10, 2010, 1:14:03 PM5/10/10
to HTTP Archive Specification
Following points (from this thread) express also my attitude.

== Compression ==
- Given all this I think you could probably just leave compression out
of the spec or mention it briefly ("In order to store HAR files more
efficiently, it is recommended that you compress HAR files on disk.").

- I don't think it should be part of the spec itself (with additional
extension, mime type and so on), but should probably be described in
best
practices section instead, being quite traditional for HTTP-based
communication.

== JSONP ===
- I don't think it should be part of the spec itself (with additional
extension, mime type and so on), but should probably be described in
best
practices section instead, being quite traditional for HTTP-based
communication.

-

All sound great to me. I would give it yet some time to make room for
additional thoughts/comments/proposals, but my feeling is that we are
close to HAR 1.2!

Honza


Jan Odvarko

unread,
May 10, 2010, 1:15:09 PM5/10/10
to HTTP Archive Specification
On May 9, 8:08 pm, Sergey Chernyshev <sergey.chernys...@gmail.com>
wrote:
> It's a very good feature and should be encouraged. On the other hand though
> - I don't know if it should be part of the spec itself as it might bring too
> much confusion. I think the HAR itself should be just raw JSON, but
> recommendations for on-line serving might include "callback" URL parameter
> that should wrap HAR into a callback to a function. BTW, it shouldn't be a
> function declaration, but a call to it (as you probably meant to do):
Yes, precisely.
Honza
Reply all
Reply to author
Forward
0 new messages