Testing Service

Pat Meenan

unread,

Feb 22, 2011, 3:23:49 PM2/22/11

to web-testin...@googlegroups.com

Ok, I think we have quorum to at least get started. I was thinking that the best process would be something along the lines of:

1 - Define the kids of operations that will need to be performed (which will turn into the APIs)

2 - Agree on a standard API calling convention

3 - Work out the details on the individual API's.

If that sounds reasonable I'll throw out the first volley on what I think needs to be exposed and a little bit of rationale for each (once we get it refined a little better we can start documenting on the wiki):

Testing Service

Location/Configuration Enumeration
We need a way to query for the different test machine configurations and capabilities. I think this will probably need to be a filterable query for a flat list rather than a hierarchy because some models may make sense to say "give me all of the physical locations" and then list the configurations in each location but I can also see a consumption model where you would want the list of tester pools that have a particular browser or that are on a particular carrier.

If we go this route we'll probably need a collection of entry-points:
- Query for the list of parameters that can be used for filtering and their possible values
- Query for the actual tester pools (and identify fields of data you'd like back about each pool)

Test Submission
The ability to submit a test request to a given tester pool/configuration

Check Status/Retrieve Test Result
I'm thinking that the fetching of a result and checking of the status of a test request can be combined into a single interface but I'm not religious about it. If the test isn't complete yet then the status would be returned

Cancel/Delete Test Result
The testing service isn't expected to provide long-term storage of results. At a minimum you'll need to be able to cancel a pending test but it wouldn't hurt to allow for an explicit delete of ephemeral results as well.

Notification/Beaconing of Results
In addition to polling for status/result we should allow for the testing service to push results (or just notification of results being available) using a standard interface.

Thanks,

-Pat

Alois Reitbauer

unread,

Feb 22, 2011, 4:24:50 PM2/22/11

to web-testin...@googlegroups.com

Pat,

this sounds good. It is pretty much what I also need for my distributed ShowSlow testing service. I also think we need a synchronous and an asynchronous approach toward retrieving test results. Infrastructure-wise the synchronous approach will be easier to implement.

One thing we still might want to put in the list is accessing the actual test data results. The people building the data storage might not be the one finally using the data. Like in our case dynaTrace Ajax Edition queries an online service to get performance benchmark data. I would also like to add this functionality to the spec.

For protocols I would go for REST-based calls and JSON as data format. Both are easy to implement and can be used across programming languages - ok C++ might still be a bit hard here;-).

// Alois

Pat Meenan

unread,

Feb 22, 2011, 4:49:14 PM2/22/11

to web-testin...@googlegroups.com

Yeah, when we get to the test data archive service I was going to recommend it being flexible enough to accommodate public results (and enumeration interfaces for public and private) which should be a good fit for the benchmark data repository.

Sync vs. Async is an interesting point. I think we might be able to accommodate having the test request take a sync flag and return the results directly but it's not something that's going to be able to be implemented by all providers (some of my tests can take hours to complete if the queues get too backed up for example).

I like REST + JSON/JSONP as well but I'd like to see something where we have a well-defined structure that works for JSON/JSONP as well as XML (probably influenced by the fact that I'm a c++ programmer at heart and the JSON libraries all SUCK). I think we can come up with something that's not too onerous to implement and make everyone happy.

Christopher Joel

unread,

Feb 22, 2011, 5:15:35 PM2/22/11

to web-testin...@googlegroups.com, Pat Meenan

Pat,

Your proposed flow follows our internal testing flow almost identically. A few thoughts:

Test submission:

In our internal testing tools, we found it useful for the test submission service to respond with information about the tests position in the job queue. The user interface uses this data while attempting to recommend delivery mechanisms to the user; a short job queue means that the user could potentially wait for results to appear on the page, whereas a long queue may lead to an email response being the preferable delivery mechanism.

Asynchronous / synchronous model:

During our the development phase, we started with a synchronous model and moved to an asynchronous model after testing queues grew too long (which basically the same time we started to have queues).

Check status / data retrieval service:

Combining these services is a good idea and would result in more efficient / simplified service consumption.

Formats:

From the perspective of a web-based application consuming this data, JSON is definitely the preferred format, but flexibility is always a plus.

Result notification / beaconing:

If email is not the obvious candidate for this, are there any likely solutions on your mind currently? This is a really interesting piece of the flow to consider.

Chris

Sergey Chernyshev

unread,

Feb 22, 2011, 6:34:41 PM2/22/11

to web-testin...@googlegroups.com, Pat Meenan

I believe we need to make sure we cover the ability for all parties to implement API keys and test prioritization, if not in the protocols themselves, but just possible through url namespacing and stuff.

I'll probably add that we might need to dig a bit into the data formats for test results as well.

I think there are quite a few cases where we can standardize the data returned by testing tools - HAR (for capturing network activity) being one example, some ranking/metric data format is another one (which I'm particularly interested in), video capture might be another.

Sergey

Pat Meenan

unread,

Feb 22, 2011, 6:15:56 PM2/22/11

to Christopher Joel, web-testin...@googlegroups.com

I think we can combine the synchronous/async and immediate status by just having the test submit response return the same information as the status/retrieval. If the test could be completed synchronously then the data will be available directly in the result and if not you'll have the normal information that you'd get from a status check.

For the result notification/beaconing, I don't expect end-users to be consuming the service directly but for something to be built on top of it. E-Mail doesn't work well for service-service notification, I was thinking of a simple http post to a supplied URL which would allow for a push model instead of relying just on polling. You could build something that took the http post and turned it into a mail message but I don't think we want to use SMTP as a transport for the service itself. It should probably also be an optional optimization with polling always being supported. Sort of hidden in here is also allowing for things like Page Speed/YSlow beacons but I see that more as a test configuration option than an explicit way to get results.

Thanks,

-Pat

Patrick Lightbody

unread,

Feb 22, 2011, 6:52:13 PM2/22/11

to web-testin...@googlegroups.com

These seem fine to me - all pretty straight forward. The one I think is not so clear is: the test automation itself. Perhaps that is part of Test Submission, but we need to figure out _what_ is being submitted? A URL? A script? My two cents:

We need to support scripts, since there are so many areas of a site that are worth tracking performance that can't be identified with a simple URL. So the question is: what language should the scripts be in and what underlying APIs should we support. Personally, I think Selenium has done a fantastic job of working to become a standard. In fact, Opera has already signed up to write their own implementation of Selenium, and we (the Selenium devs, which I am part of) are working to lobby Mozilla and Google to do the same for their browsers.

That said, simply saying "we'll use Selenium" isn't enough, because the traditional Selenium languages are either too simple for complex use (Selenese, the HTML/table-style format) are too insecure for a hosted, shared service (Java/Ruby/C#/etc). What we've done at BrowserMob is adopt JavaScript (using Rhino), wrapped around Selenium. It lets our users write their scripts in a full-featured language that can easily be sandboxed. It's a bonus that JS is a platform neutral language, so whether you're a Ruby zealot or Python pro or Java guru, everyone can get behind JS :)

In short: my recommendation is that the scripts should be Selenium + JavaScript.

Also, related to this: does the site being tested have to be accessible to the network we're submitting to, or should we consider some sort of tunnel/proxy system to allow external services to test internal networks? (IMO this is probably out of scope for now)

--

We provide FREE website monitoring and load testing

http://browsermob.com

Patrick Lightbody

BrowserMob

(w) +1 (503) 828-9003 x 101

(m) +1 (415) 830-5488

Christopher Joel

unread,

Feb 22, 2011, 7:13:13 PM2/22/11

to web-testin...@googlegroups.com, Patrick Lightbody

I'm totally on the same page as you with regards to beaconing. I was personally hoping that email was not the preferred solution. I'm glad that's true! A simple POST hook sounds like a good answer to me.

We currently append additional metrics along with the HAR data that we collect, including video capture URLs, so I second Sergey's suggestion. It would be nice if we could standardize measurements for other aspects that impact performance.

Perhaps my perspective is unrealistic, but I feel like a standard set of APIs does not require tying implementors to a specific software stack in this case. I suppose the question it leaves It seems like it would be more flexible for everyone if we could define API calls and the responses to expect, and leave the details of actual data collection to the person implementing the service.

Selenium is great, but is it in scope to prescribe an explicit software stack that an implementor of this standard must use? I feel like there is more long term value if we lean more in the direction of standardizing request and response formats.

While I hesitate to agree that tying to a specific testing framework is the right direction, I think there is great merit to Patrick Lightbody's idea that a scripting language qualify as a value for the testing target. JavaScript's portability would also afford implementors freedom of choice when it comes to choosing software to run the test.

Chris

Patrick Lightbody

unread,

Feb 22, 2011, 7:22:40 PM2/22/11

to Christopher Joel, web-testin...@googlegroups.com

I hear you on not trying to be too specific to certain software stacks, but I think Selenium has enough going for it that it's something we should consider hanging our hat on. Obviously if we're too generic in this project very little will get done :)

The main reason I think Selenium's APIs (wrapped in a neutral language like JS) is a safe way to go is because:

1) It's open source, so therefore likely to be Less Evil

2) It's very popular - I'm happy to share stats on site traffic to the Selenium website, downloads, etc

3) It's working well with "competing" projects, such as Watir - future versions of Watir are wrapping around Selenium's core engine (WebDriver), resulting in Watir simply being a preferred Ruby-based API around the Selenium core

4) Now with Opera on board, there is evidence of at least one browser vendor support it directly. If we can get all the vendors on board, Selenium moves away from being an implementation in the stack and simple an API that the browser vendors implement against.

Patrick

--

We provide FREE website monitoring and load testing

http://browsermob.com

Patrick Lightbody

BrowserMob

(w) +1 (503) 828-9003 x 101

(m) +1 (415) 830-5488

Pat Meenan

unread,

Feb 22, 2011, 7:31:46 PM2/22/11

to web-testin...@googlegroups.com

FWIW, I think we're going to have to support multiple scripting languages for testing anything beyond just straight URLs. Most testing services have an existing proprietary language of some kind that is going to need to continue to be supported. The main thing we'd need to be able to standardize on is how to package up the script and deliver it (do they need to be self-contained and text-based or do we allow for referencing external files and packaging up a collection of files?).

It would be great if we had a reference implementation that could be shared across projects and that we'd eventually converge towards as an industry but I think requiring a specific implementation is going to be a little too restrictive for bringing everyone on board.

It looks like most of the discussions are getting into implementation specifics so I'll go ahead and start documenting the high-level functions and the implementation notes that have been going around and we can start fleshing out some more of the details.

Thanks,

-Pat

Patrick Lightbody

unread,

Feb 22, 2011, 7:38:33 PM2/22/11

to web-testin...@googlegroups.com

FWIW, I think we're going to have to support multiple scripting languages for testing anything beyond just straight URLs. Most testing services have an existing proprietary language of some kind that is going to need to continue to be supported. The main thing we'd need to be able to standardize on is how to package up the script and deliver it (do they need to be self-contained and text-based or do we allow for referencing external files and packaging up a collection of files?).

Probably supporting a package of files is important. For example, a script might need to upload a file or maybe we need to click on a Flash element that can't be easily expressed via text but can via an image.

So side-stepping the Selenium conversation and focussing more on API, I'd say if you want to support continuous monitoring I'd recommend that the spec not require that the script (or tarball of script data) be submitted directly every time. Rather, the script should be registered and then simple the ID of the script would be send to the servers (along with perhaps a last-modified time). This allows implementations to cache the script locally while also reducing overall bandwidth and stress on any internal queueing systems.

Hope that makes sense and doesn't make things too complex.

Patrick

Pat Meenan

unread,

Feb 22, 2011, 7:41:30 PM2/22/11

to web-testin...@googlegroups.com

Yep, that sounds like a great idea. There's going to be a fair amount of overhead on running scripts anyway and registering them through an additional interface keeps the actual test submission itself simpler.

Patrick Lightbody

unread,

Feb 22, 2011, 8:43:30 PM2/22/11

to web-testin...@googlegroups.com

I think that the initial beacon should sum up and standardize the beacons that ShowSlow is using plus what other provide today. Afaik ShowSlow at the moment consumes three different beacons - pagespeed, dynatrace and YSlow. I think there is space for standardization ;-). I think many parts of the beacon will be optional while others are required. If a tool cannot create a video - thats fine and the same is true for other metrics.

On the topic of beacons, last time this came on the list I was a proposing we push as much as possible to post-processing. That is: capture just the HAR and any other unique data and then do things like YSlow and PageSpeed scores against that captured data. Things like har_to_page_speed are making that more of a reality.

Patrick

Regarding Patricks proposal on Selenium. I think we need support for both Urls and Scripts. We can do this by providing a type flag in the request. My questions is what you mean by Selenium Scripts. Are we talking about the "old" API or the new WebDriver API. I also do not know whether webpagetest can consume this scripts then. For me they can be proprietary for now (having the type field) and are a focus of further standardization. If we make any changes here a lot of tools will have to be adapted.

// Alois

Christopher Joel

unread,

Feb 22, 2011, 9:20:12 PM2/22/11

to web-testin...@googlegroups.com

This has probably been mentioned somewhere already, but in response to this I'd like to mention that some metrics related to page performance, (an analysis of the post-script-execution DOM such as the one DOM Monster does, for instance) are difficult or impossible to derive from HAR data alone, so lets not forget them.

Chris

Patrick Lightbody

unread,

Feb 22, 2011, 9:24:39 PM2/22/11

to web-testin...@googlegroups.com

Agreed- that came up the last time this was brought up. But even still, it would be nice to capture just that data rather than capture a score that doesn't reflect raw data but instead reflects interpretation of that data. Given that YSlow and PageSpeed and Whatever scores are ultimately just a set of rules, it seems to me capturing that makes a lot less sense than capturing the root data that feeds in to the rules. Of course, I'm open to being convinced otherwise :)

Patrick

Alois Reitbauer

unread,

Feb 22, 2011, 9:41:37 PM2/22/11

to web-testin...@googlegroups.com

I think we should capture both data. The dynaTrace ShowSlow beacon for example sends both data. I also second the statement that some data must be precomputed. We for example capture a lot of data which we pre-process to get metrics out of it. The raw data is still available however sending a couple of MBs for every test case would be an overkill - I guess :-)

The truth will as always be somewhere in the middle ;-)

// Alois

Sergey Chernyshev

unread,

Feb 22, 2011, 9:53:22 PM2/22/11

to web-testin...@googlegroups.com

Regarding the scripting, I think we have to bite small enough to be able to chew it. Clearly there are many ways to skin scripting cat and in general gazillion ways to instruct a robot to do testing. I believe we need to go in direction that will not impose something that is optional and will provide means for extensibility.

I think we can define a subset of standard "testing instructions" some of which should include:

single URL to be tested
the simplest form which can be copied by user, triggered by bookmarklet and so on - minimal address piece, enough for large set of tools
Selenium script wrapped into something with some attachments if needed
I'll leave it to experts as I don't have enough experience here
custom code package, maybe with minimal meta-data like type
assuming that hub and tester know what the code is supposed to do and trust each other it can be any file that can be executed - exe, jar, shell script in any language and so on.

I think it makes it more or less universal, but if somebody has some other universal format to propose, we can probably add it to the list for reference implementation. In any case, I believe all of these instructions should be implemented in the way that makes them interchangeable. We'll just define these three as examples, but format should be easily extensible.

Now, regarding deriving stuff from HAR, obviously, it can be used widely considering that one of the hardest parts of testing is collecting actual network activity, but it is still far from covering everything in the world and can't be used for majority of cases. We should however make it a good part of the standard and provide enough examples for it.

If you think of example of the exception to the HAR rule, you can look at new piece of data ShowSlow supports as of this president's day - DOM Monster! metrics (yes, Alois, it's not 3 tools ;)). This bookmarklet is testing browser, but not network activity at all.

In some cases, everything that hub will store is just a reference to another server, take for example test link that ShowSlow stores for tests ran on WebPageTest - it's enough to reference the data and there is no need to duplicate the data.

My point is that there is no universal answer for beacons, we can just provide most common cases like:

HAR payload
a few specific high level statistics in aggregated form (possibly calculated from HAR) that became standard (e.g. TTFB, time to render, time to onLoad, total page size)
best-practice based ranking (like YSlow/Page Speed/dynaTrace)
plus some way to add any other

Maybe we should invite people who are doing wire-level listening and ask them if storing pcap files.

Just my 2cents.

I think we have to start relatively small, close to the hardware and then add higher level standards, otherwise we'll never finish it ;)

Pat, do you think we should use http://www.webperformancecentral.com/ for this initiative and start drafting the actors, sections of standards and so on?

Sergey

Message has been deleted

Alois Reitbauer

unread,

Feb 22, 2011, 8:19:11 PM2/22/11

to web-testin...@googlegroups.com

I think that the initial beacon should sum up and standardize the beacons that ShowSlow is using plus what other provide today. Afaik ShowSlow at the moment consumes three different beacons - pagespeed, dynatrace and YSlow. I think there is space for standardization ;-). I think many parts of the beacon will be optional while others are required. If a tool cannot create a video - thats fine and the same is true for other metrics.

Regarding Patricks proposal on Selenium. I think we need support for both Urls and Scripts. We can do this by providing a type flag in the request. My questions is what you mean by Selenium Scripts. Are we talking about the "old" API or the new WebDriver API. I also do not know whether webpagetest can consume this scripts then. For me they can be proprietary for now (having the type field) and are a focus of further standardization. If we make any changes here a lot of tools will have to be adapted.

// Alois

On Tue, Feb 22, 2011 at 4:13 PM, Christopher Joel <aar...@gmail.com> wrote:

Christopher Joel

unread,

Feb 22, 2011, 7:35:18 PM2/22/11

to web-testin...@googlegroups.com, Pat Meenan

Again, I think Selenium is great, and could potentially stand as the ideal reference implementation right now. I'm just not convinced that standardizing against any specific software stack is the right idea because, as Pat indicated, everyone is doing their own thing right now.

Chris

Eddie Jaoude

unread,

Feb 23, 2011, 2:07:28 AM2/23/11

to web-testin...@googlegroups.com, Christopher Joel

Great discussion all. Its really getting interesting now.

My 2 pence (sorry from London), regarding the result notification.
Pushing the notification would be better than continuous polling (using
something like xmpp), especially for large companies who use monitoring
screen that will have the result overview page open 247. However, could
this been a phase 2 enhancement?

Another clarification we will eventually need is what area of expertise
we all have and what parts we will each develop. Also, is it worth using
something like GitHub for the code development? (sorry if jumping ahead,
it just came to mind thats all).

Pat Meenan

unread,

Feb 23, 2011, 8:18:56 AM2/23/11

to web-testin...@googlegroups.com

On the development side, I think it's important to remember that we're
defining a spec, not building an application or service. My team will
commit to migrating WebPagetest to use (and expose) the standard
interfaces but I expect lots of other implementations to surface (both
commercial and open source). More importantly, I hope to see lots of
new applications built that can consume the testing service. If group
members want to self-form into project teams that build on top of the
specs I'd love to help in any way I can but that's not the primary
purpose of the group.

I think the NOC View/Monitoring screen details probably fall into the
application domain and there are lots of ways to crack that nut but I
don't think we want to push the implementation details into the spec.
XMPP is interesting in that it's easy to route through firewalls and
network zones but my gut is telling me that that may be going further
than we need to (at least for the initial design). You could always
build an XMPP relay on top of the http post notification as part of a
toolset or application.

Thanks,

-Pat

Pat Meenan

unread,

Feb 23, 2011, 8:29:41 AM2/23/11

to web-testin...@googlegroups.com

I set up a google code project that can host both the documentation as well as reference implementations, test suites, etc: http://code.google.com/p/web-testing-framework/. That way it's not tied to me and I feel a little better about the SLAs and backups. I have set up some of the more active members of the discussions as group owners and if anyone is interested in working on the docs or code directly just let us know and we can get you added as a committer. The wiki support isn't nearly as nice as a full sites page but I think it will work for the docs.

I'll get some initial pages started today and then we can take the discussion from there.

Thanks,

-Pat

Pat Meenan

unread,

Feb 23, 2011, 4:00:39 PM2/23/11

to web-testin...@googlegroups.com

Ok, I have modified my original suggestions based on the feedback and put it up on the wiki here: http://code.google.com/p/web-testing-framework/wiki/TestingServiceAPI. I think we have enough at a high level to start working on the actual details of each of the services. I'll start up a separate thread for the interface standards themselves and then we can start getting into the nitty-gritty.

Thanks,

-Pat

On 2/22/2011 9:53 PM, Sergey Chernyshev wrote:

Reply all

Reply to author

Forward