Possible additions to HttpArchive?

26 views
Skip to first unread message

Yusuke Tsutsumi

unread,
Oct 4, 2011, 1:44:07 PM10/4/11
to HTTP Archive
Hi Steve!

So I've been talking to my colleagues at Zillow, and there's some
functionality that they wanted implemented in HttpArchive. I know some
of these might not make sense in terms of what HttpArchive's goals
are, but I was hoping you could help me on figuring out a way to add
these on that makes sense.

1. Graphs for load time, document complete, and start render, fully
loaded

These are performance-related statistics that we were looking for, and
it looks like the method you use to generate these graphs is by using
the columns in the Pages table. As you're not looking for this data, I
believe that it probably wouldn't make sense to add new columns into
the table. What I was thinking about doing was adding another table
which just contains the timing values and a field for the page label,
so they can be linked, then keeping a separate .inc file that has the
methods with which to graph these. Extensible but a non-necessary part
of HttpArchive.

2. api for triggering a HttpArchive run on a specific URL.

We were imagining a method with which one could activate a run of a
specific site with an HTTP request, something along the lines of:

httparchive.org/run?url=ESCAPED_URL_HERE&key=AUTHENTICATION_KEY

The key would be some sort of hash, so not anyone could simply start a
run.

My thinking is this could be done along with the redesign of the
batch_process and batch_start methods, which you said needed to be
made more efficient.

I was also thinking about adding a page that would allow for simpler
configuration of HttpArchive, Possibly containing settings like which
graphs and trends to show, so people who want private instances of
HttpArchive can see information relevant to them.

Please let me know your thoughts on these. I know these aren't
necessarily in the scope of HttpArchive, but we feel like HttpArchive
can become a versatile performance testing tool as well as a great
look at the trends in internet technology.

Thanks! Sincerely,
-Yusuke

Steve Souders

unread,
Oct 4, 2011, 4:45:54 PM10/4/11
to HTTP Archive
Hi, Yusuke.

1. There are three fields in the pages table that capture what you
want: onLoad, onContentLoaded, and renderStart. If you look in the
mysqldumps you'll see that we actually do collect the data for onLoad
and renderStart. I believe there was an issue with onContentLoaded. We
could revisit that and see if it works - it's probably something in
the IE test agent or XML results. So you don't need another table
or .inc changes. Just fix the gathering part. If my understanding is
correct and you want the fix, please file a bug. You could start to
investigate the XML results and see if onContentLoaded is in there and
just not being extracted.

2. That sounds great. There's already the concept of a key -
wptky.inc.php. It's NOT committed to SVN. All it does is define
"$wptApiKey". We could ask Pat how the keys are generated, and extend
the code to handle multiple keys. Probably need to add a "username"
querystring param, too, which which to lookup the keys. This part so
far is easy. The hard part is how these results would fit into the
bigger system. HA is based on the concept of "runs" - for example, in
the select list in the Trends page. So the entire concept of trends,
stats, selecting runs, etc. has to be rethought. Or these results are
accessed via a completely different UI.

I hope to rewrite batch_* in the next few weeks. I'll start making
sure there's a bug for everything I'm working on so others can track
progress - about half the stuff I do isn't captured in bugs. But I
think the URL API is pretty unrelated to the batch_* rewrite.

Can you scope out the config page more? Maybe a bug or private email
to me.

Thanks.

-Steve

Yusuke Tsutsumi

unread,
Oct 5, 2011, 6:18:25 PM10/5/11
to HTTP Archive
Hi Steve,

1. Sorry my mistake, can't believe I missed that. I'll implement the
graphs and see if something is weird about the onContentLoaded trends.

2. I think possibly a different UI, or just have those individual
results not count on the trends or stats main pages. Is it possible to
create something that can differentiate between a Run of all sites and
a single run of a specific site? Possibly a boolean column in MySQL
that would be able to tell whether that entry in Pages was part of a
specific run, or an All-Sites Run.

I imagined a structure where one could see an individual run on
viewsites.php, and an individual run would contribute to the trends of
an individual web page, but not to the trends or stats main page.
Those would only show results that are part of an All-Sites run.

I will submit a bug outlining the config in greater detail, along with
another bug for the Run starting API.

Very excited about these! Thanks for working with me on getting these
into HttpArchive.

Sincerely,
-Yusuke
Reply all
Reply to author
Forward
0 new messages