including image bodies in HAR files

75 views
Skip to first unread message

Bryan McQuade

unread,
Apr 28, 2010, 2:42:01 PM4/28/10
to HTTP Archive Specification, mdst...@google.com
Hi,

I see that for NetExport and I believe for HttpWatch as well, image
response bodies are not included in the HAR file.

We'd like to be able to include image responses in HAR files so we can
process them using the har_to_pagespeed command line tool and other
tools that use HAR with the Page Speed SDK (http://code.google.com/p/
page-speed/wiki/DownloadPageSpeed?tm=2).

I propose base64-encoding binary resource bodies. It would also be
important to annotate the body encoding in the HAR file so the
processor knows how to decode the body.

Would you be open to supporting base64 encodings for binary resource
bodies in HAR files?

Bryan McQuade

unread,
Apr 29, 2010, 12:38:30 PM4/29/10
to Sergey Chernyshev, http-archive-...@googlegroups.com, mdst...@google.com
I'm not actually aware of anything else missing from HAR files at the moment.

HAR maintainers, any thoughts on how we can add images and other
binary resources to HAR files?

On Wed, Apr 28, 2010 at 4:09 PM, Sergey Chernyshev
<sergey.c...@gmail.com> wrote:
> Which brings more generic question - are there any other parts of the
> sequence that don't get included into HAR?
>             Sergey

Jan Odvarko

unread,
Apr 29, 2010, 12:51:20 PM4/29/10
to http-archive-...@googlegroups.com
I believe that including image bodies (base64 encoded) is great
addition to HAR and I think we could include it into HAR 1.2

I am attending WWW2010 conference (Raleigh, NC) this week so, please
let me get back to this as soon as I am back (next week). There is
more we could yet include into 1.2 (e.g comments)

Honza

Sergey Chernyshev

unread,
Apr 28, 2010, 4:09:26 PM4/28/10
to http-archive-...@googlegroups.com, mdst...@google.com, Bryan McQuade
Which brings more generic question - are there any other parts of the sequence that don't get included into HAR?

            Sergey
On Wed, Apr 28, 2010 at 2:42 PM, Bryan McQuade <bmcq...@google.com> wrote:

Andy Sterland

unread,
Apr 29, 2010, 4:05:17 PM4/29/10
to http-archive-...@googlegroups.com
Could this be done for all content that isn't text? Or at least a clear definition of what makes a response an image.

It could be the content-type of image/* but many sites return images as text/html because they serve images from some dynamic uri like a PHP or ASP.net page. Then either the content has to be sniffed or some extra information provided. For a browser the fact that the uri was referenced in an img tag is a huge indicator, but not something all recorders can know.

Anyway it would just be great to have clarity on what would qualify as an image as not all exporters will have the same perspective.

Sergey Chernyshev

unread,
Apr 29, 2010, 5:34:39 PM4/29/10
to http-archive-...@googlegroups.com
Yeah, if all bodies can be in the HAR, that might be great. It should be optional though as space constraints might not allow people to store the whole thing (it can easily be a few megabytes a pop on quite regular sites).

I imagine tools like NetExport will have an option to export full bodies, but default should probably not be full (I imagine ShowSlow dieing pretty quickly if automated HAR beacon saves data periodically into it ;)).

        Sergey

Mark Nottingham

unread,
Apr 29, 2010, 8:30:42 PM4/29/10
to http-archive-...@googlegroups.com, mdst...@google.com, Bryan McQuade
Timely question.

I'm looking at integrating HAR (import and export, although not necessarily at once) into REDbot; <http://redbot.org/>.

The problem I have is that I don't have a lot of confidence that different HAR producers will create the format in exactly the same way. Because part of what RED does is HTTP conformance checking, it's important that it has access to the exact header values (not post-processed ones), the delimiters, spaces, and indeed the bytes on the wire to check for things like headers in the wrong character encoding.

RED could still work with HAR as-is, of course, but it wouldn't be able to run a fair number of tests.

Has anyone considered using a format that just annotates (probably by byte offset) the raw HTTP response with things like timing information, etc.?

Cheers,
--
Mark Nottingham http://www.mnot.net/

Jan Odvarko

unread,
May 3, 2010, 9:23:19 AM5/3/10
to HTTP Archive Specification
I have summarized all suggested additions to the HAR spec here:
http://groups.google.com/group/http-archive-specification/browse_thread/thread/58ad65da99f4e30e?hl=en

I have also included the request for binary responses (#2)

Honza

On Apr 28, 10:09 pm, Sergey Chernyshev <sergey.chernys...@gmail.com>
wrote:
> Which brings more generic question - are there any other parts of the
> sequence that don't get included into HAR?
>
>             Sergey
>
Reply all
Reply to author
Forward
0 new messages