Since there is already a list of suggestions for additional fields
into HAR, I believe it's good time to start on HAR 1.2
Here is the list:
1) Comments
2) Binary response bodies
3) Geo Location
4) Connection speed
5) Socket numbers, proxy info, port/IP + IPV6
6) Render/JS timing events and CPU/Mem utilization
7) Page screenshots
8) Grouping by process
9) Intermediate HTTP responses 1xx
10) Anything else... ?
I am presenting all suggestions I have collected over the time, but it
doesn't mean all must go in HAR 1.2. I would personally pick only
those that are immediately useful for an existing tool so we have a
feedback about the real usage/scenario.
My personal vote goes for: #1, #2
---
#1) === Comments ===
Every field (e.g. log, request, response, etc.) should have a field
'comment' that can be used by various tools to append and/or read
provided comments.
- comment [string, optional] - A comment provided by the user.
#2) === Binary response bodies ===
As mentioned also in this thread:
http://groups.google.com/group/http-archive-specification/browse_thread/thread/5b9b2943e87d205d?hl=en
...there are cases where including also binary responses (e.g. images)
into the HAR file would be useful.
This change would be related to the existing <content> field. Current
definition:
"content": {
"size": 33,
"compression": 0,
"mimeType": "text/html; charset="utf-8",
"text": "<html><head></head><body/></html>\n"
}
Binary data should be encoded (e.g. using base64) where specific
encoding is also stored so the importer knows how to decode.
A new field:
- encoding [string, optional]: Encoding used for response text e.g
"base64".
Notes:
* Exporters should implement an option that allows to switch off
binary response export since the HAR file size can dramatically
increase.
* The new 'encoding' field could be avoided if base64 is mandatory.
#3) === Geo Location ===
Geographic location included in HAR. There should be only one
geographic location per page (?) so, this could be part of the
existing <page> element.
Current definition:
{
"startedDateTime": "2009-04-16T12:07:25.123+01:00",
"id": "page_0",
"title": "Test Page",
"pageTimings": {...}
}
New Field:
- geoLocation [string, optional] - Geographical location of the
client.
Note:
- 'geoLocation' could be rather part of the <creator> or <browser>
fields.
#4) === Connection speed ===
Connection speed at the time of measurement. I believe that this value
is related to a page load speed and, should be part of the page
element.
New Field
- bitrate [number, optional] - Connection speed (bit/s, bits per
second).
#5) === Socket Numbers ===
Should include following:
+ Port/IP + IPV6 (Source and destination)
+ Proxy info
I think this info should be part of the <entry> element.
Any specific suggestions for the structure?
#6) === Render/JS timing events and CPU/Mem utilization ===
Additional timing information. It could be part of the <page> element
that already contains e.g. <pageTimings> structure.
"pageTimings": [
{
"onContentLoad": 1720,
"onLoad": 2500
}
]
This structure could contain additional events and timing [ms]. Any
proposals for specific new fields?
#7) === Page screenshots ===
It should be possible to store also a list of page screenshots (taken
at various phases of page load).
New field: <pageScreenshots> in <page>.
(list of screenshots)
"pageScreenshot": [
{
"data": "",
"time": 2500
},
{
...
}
]
- data [string] - Encoded screenshot data (base64).
- time [string] - Number of milliseconds since the page load started
(page.startedDataTime)
#8) === Grouping by process ===
Additional grouping of requests. Currently requests are grouped by
parent page ID ('pageref' field in <entry>) + there is a list of pages
<pages>.
This would require new field in <log> called <processes> and new field
in <entry> called 'processref' (all optional).
#9) === Intermediate HTTP responses 1xx ===
Somebody proposed this one, but I can't remember detailed description.
Anyone?
Comments?
Honza