FW: evangelizing pcap2har

19 views
Skip to first unread message

Jake Holland

unread,
Oct 26, 2010, 6:09:07 PM10/26/10
to pcap...@googlegroups.com
Attached file contains sample.pcap, sample_bad.har, and sample_good.har

-----Original Message-----
From: Jake Holland
Sent: Tuesday, October 26, 2010 3:06 PM
To: 'pcap...@googlegroups.com'
Subject: RE: evangelizing pcap2har

I spoke too soon when I said the viewer didn't complain. What it did was re-use my prior post, plus throw some junk on the end of what it was displaying. Apparently this viewer (this one is honza, right?) does not understand pageref, but it does understand page_ref:
http://www.softwareishard.com/har/viewer/

Here's a pcap and the associated har file in 2 versions: _bad uses "page_ref" and works on the honza viewer, and _good uses "pageref", as the spec says, and does not work. (Sorry I didn't just grab a new pcap and send it earlier, I was trying to get to lunch...)

I downloaded Fiddler, it appears to understand both equally. There's a red caution-looking symbol on each line, but I couldn't see how to get an actual error message, so perhaps there's something wrong with the "good" one also.

I have not tried HttpWatch, nor any of the other several readers listed here: http://groups.google.com/group/http-archive-specification/web/har-adopters

-----Original Message-----
From: pcap...@googlegroups.com [mailto:pcap...@googlegroups.com] On Behalf Of Libo Song
Sent: Tuesday, October 26, 2010 1:37 PM
To: pcap...@googlegroups.com
Cc: Ryan Witt
Subject: Re: evangelizing pcap2har

On Tue, Oct 26, 2010 at 4:15 PM, Jake Holland <jhol...@fastsoft.com> wrote:
> Hi Libo,
>
> Thanks for fixing the format problems, I hadn't noticed those, and the
> viewer I was using didn't complain.  I agree those changes for format
> are correct according to the har spec (c.f.
> http://groups.google.com/group/http-archive-specification/web/har-1-2-sp
> ec?hl=en).
>
> I also agree with the changes to include the connecting time in the
> first request, and that the logging changes are beneficial, so thanks
> for those as well.
>
> However, while reviewing your patch I had a few comments and questions
> about some changes I didn't fully agree with. Here are the questions
> (they refer to specific patches which I've included below all the
> questions):
>
>
> 1. I wasn't sure I understood patch0.  Can you send a pcap where this is
> meaningful?
>
> I think it's saying if there's a repeated header, then instead of making
> a list of values it concatenates the values--is that correct?
>
> If so, I'm not sure this represents correct behavior. I'm not surprised
> if it causes problems the original way, but I think I just read in the
> http spec that a comma should be inserted if you want to merge repeated
> http headers without changing the semantics:
> http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.2
> (The last paragraph in the section, specifically.)
> Based on this, I've made it "d[k] += ','+v" instead of "d[k] += v" and
> took out the commented part.
>

There are some repeated headers, especially the "Set-Cookie" header.
You fix is correct to insert "," between values, my fix was a quick hack
to make the harviewer happy.

>
> 2. In patch1, I think you use req and then afterwards check that it's
> usable.  If the req was bad, I think you get an exception in the first
> part.  Based on this reasoning, I've put the "if not req: continue"
> clause first.
>

Here I am trying to get the connection timing from the handshake packets.
Yes, it the req was bad, we should ignore it.

>
> 3. In patch2, I think the name change was not complete, so I changed "as
> e" to "as error" and the .append to use error instead of e.

Thanks for catching this.

>
>
> 4. In patch3, on "self.entries.sort(lambda x,y: x.ts_start -
> y.ts_start)" I got a "TypeError: comparison function must return int,
> not long", so I changed it to "int(x.ts_start - y.ts_start)".

I had seen this problem before, but I thought is was solved by convert
ts_connection to int value ts_start in millisecond. And I do not get that
error message any more.

>
>
> With the changes I've described, I've applied your patch (fixing
> conflicts where I had previously changed the logging that you removed)
> and tried it on the same pcap and viewer as before, and unless you have
> objections or can suggest a better integration path, I'll push them to
> my branch with attribution ("applied patches from Libo Song mostly
> fixing format errors") and request another pull from Andrew. Thanks.
>

Thanks.

>
> (patch fragments below)
>
> <patch0>
> diff --git a/dpkt_http_replacement.py b/dpkt_http_replacement.py
> index 4556979..bc0d69b 100644
> --- a/dpkt_http_replacement.py
> +++ b/dpkt_http_replacement.py
> @@ -24,9 +24,10 @@ def parse_headers(f):
>         k = l[0][:-1].lower()
>         v = len(l) != 1 and l[1] or ''
>         if k in d:
> -            if not type(d[k]) is list:
> -                d[k] = [d[k]]
> -            d[k].append(v)
> +#            if not type(d[k]) is list:
> +#                d[k] = [d[k]]
> +#            d[k].append(v)
> +            d[k] += v
>         else:
>             d[k] = v
>     return d
> </patch0>
>
>
> <patch1>
> diff --git a/http/flow.py b/http/flow.py
> index c7f0806..6e2e17d 100644
> --- a/http/flow.py
> +++ b/http/flow.py
> @@ -38,12 +39,21 @@ class Flow:
>                 pairable_responses.extend( [None for i in
> range(len(requests) - len(pairable_responses))] )
>             # if there are more responses, we would just ignore them
> anyway, which zip does for us
>             # create MessagePair's
> +            connected = False
>             for req, resp in zip(requests, responses):
> +                if not connected and tcpflow.handshake:
> +                    req.ts_connect = tcpflow.handshake[0].ts
> +                    connected = True
> +                else:
> +                    req.ts_connect = req.ts_start
> +                if not req:
> +                    logging.warning("Request is missing.")
> +                    continue
>                 self.pairs.append(MessagePair(req, resp))
>         except LookupError:
>             # there were no responses after the first request
>             # there's nothing we can do
> -            pass
> +            logging.warning("Request has no reponse.")
>
>  class MessagePair:
>     '''
> </patch1>
>
> <patch2>
> diff --git a/pcap.py b/pcap.py
> index fc2065c..d601318 100644
> --- a/pcap.py
> +++ b/pcap.py
> @@ -48,17 +48,14 @@ class TCPFlowAccumulator:
> <skipping>...</skipping>
>         except dpkt.dpkt.NeedData as e:
> +            log.warning(error)
>             log.warning('A packet in the pcap file was too short, '
>                         'debug_pkt_count=%d' % debug_pkt_count)
>             self.errors.append((None, e))
> </patch2>
>
> <patch3>
> diff --git a/httpsession.py b/httpsession.py
> index 9dd366b..3f109cb 100644
> --- a/httpsession.py
> +++ b/httpsession.py
> @@ -133,8 +134,8 @@ class HTTPSession(object):
>             # parse basic data in the pair, add it to the list
>             self.entries.append(Entry(msg.request, msg.response))
>
> -        # LSONG sort the entries on start
> -        self.entries.sort(lambda x,y: x.ts_start < y.ts_start)  # LSONG
> +        # Sort the entries on start
> +        self.entries.sort(lambda x,y: x.ts_start - y.ts_start)
>         self.user_agent = self.user_agents.dominant_user_agent()
>     def json_repr(self):
>         '''
> </patch3>
>
> -----Original Message-----
> From: pcap...@googlegroups.com [mailto:pcap...@googlegroups.com] On
> Behalf Of Libo Song
> Sent: Tuesday, October 26, 2010 11:22 AM
> To: pcap...@googlegroups.com
> Cc: Ryan Witt
> Subject: Re: evangelizing pcap2har
>
> I also generated a patch on the master branch, with some cleanup code.
> I have not tested on either Fiddler or HttpWatch. But I looked at the
> har
> file, and verified than "page_ref" is not "pageref", and the status code
> is
>  200 instead of "200".
>
> -libo
>
>
>

sample.tgz

Jake Holland

unread,
Oct 26, 2010, 8:33:45 PM10/26/10
to pcap...@googlegroups.com
I wrote an email to the address in their contacts on the hanzo viewer
and asked them to change to accept "pageref" as well as "page_ref".

I'm leaving that fix out for now in mine, because as far as I've seen
yet, more people accept the wrong version than accept the per-spec
version, particularly the highest hit when you search for "har viewer".
So I'll push that change when their implementation and the spec agree
with each other.

The other changes are pushed to my fork and another pull requested.
Thanks Libo.

Reply all
Reply to author
Forward
0 new messages