Cleaning up test results / managing result aggregation in "broad" categories

29 views
Skip to first unread message

Mihai Balan

unread,
Sep 14, 2012, 8:20:55 AM9/14/12
to browse...@googlegroups.com
Hi everybody,
I'm currently using Browserscope to track feature support for some new stuff we're adding in WebKit (and consequently in Chrome/Chromium). Up to this point, for each new feature we would add a new test in our suite and then run it again. As such, with the latest version of WebKit or Chrome Canary all tests appear as passing.

However, I'm considering moving to a model where tests for all features are written - initially some of them will fail but they will progressively pass as we implement the respective features. My concern here is that the failing tests will affect the aggregated results even after the feature was implemented. Using a very simple example, here's my understanding of how Browserscope works:
  • let's say we have 3 features, A, B and C.
  • Initially, none of them is implemented so in WebKit X.0 we'll have 3 0 scores
  • Then, in WebKit X.1, A is implemented. Now, running the test suite, I'll have one 100 score for A and two 0 scores for B and C
  • Then, in WebKit X.2, B is implemented. Now, running the test suite, I'll have two 100 scores for A and B, and one 0 score for C
  • Finally, in WebKit X.3, C is implemented too. Now, running the test suite, I'll have three 100 scores for A, B and C
  • At this point, my rows in Browserscope look like this:
    (A, B, C)
    0, 0, 0
    100, 0, 0
    100, 100, 0
    100, 100, 100

Here's my question now: If I use a "broad" category, such as "Top Browsers" or "Browser families", will C show as being unsupported? (Given it has three 0 results and only one 100 result)

If so, what's the best way to make sure my results, even when aggregated, show that C is supported (if you feel it sounds a little counter-intuitive, you're right :) )? Should I be running the tests each time more times? (I doubt it would actually scale). Is there a way to delete individual test results or even all the test results for a given User-agent?

Hope this makes some sense to you :)

Thanks in advance,
Mihai

--
Balan Mihail-Alexandru

Blog: http://blog2michou.blogspot.com/
Photoblog: http://mihaibalan.wordpress.com/

Lindsey Simon

unread,
Sep 17, 2012, 2:43:31 AM9/17/12
to browse...@googlegroups.com
Hello Mihai,

Yes - you've hit upon a situation where Browserscope is not very good. The aggregates will indeed be wrong until the median catches up.
There is not currently a way to delete test results.
The best we can do is try to keep Top Browsers more up to date / cutting edge - I think it's the main list people look at, and in fact I think no one really looks at the Families aggregates very seriously - they are just too broad to be useful now that browser versions release with such frequency.
Does this answer your question? 


--
You received this message because you are subscribed to the Google Groups "Browserscope" group.
To post to this group, send email to browse...@googlegroups.com.
To unsubscribe from this group, send email to browserscope...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/browserscope?hl=en.

miChou

unread,
Sep 18, 2012, 3:45:28 AM9/18/12
to browse...@googlegroups.com
Thanks for the answer. It does clear things up, although that also means there's no quick fix for our use-case :D

That being said, there's a couple more things to ask:
1. What is the actual function used when doing the aggregation? I suspect it's average, but it'd be useful to know for sure.
2. (and the most important one) To be totally honest, clearing up the results table would've been just a workaround. It my use-case, what I would really like to do as simple as possible, would be to show the test results in the latest version of the browsers. How hard would it be to add a new category (e.g. v=latest) to return the latest browsers in each "family"? I'm not asking you to code it - I would gladly do it if you're OK with the idea.

Thanks a lot in advance,
m.

Steve Souders

unread,
Sep 18, 2012, 2:32:34 PM9/18/12
to browse...@googlegroups.com, miChou
1. Browserscope shows the median result (not average).
2. I wonder if you might achieve your goal by using the &ua parameter to specify the "latest" version. For example, I recently ran a user test that found a bug where Chrome 23 did NOT clear localStorage. It got fixed in Chrome 23.0.1259.
    - All Versions (v=3) has 8 rows for Chrome 23 which might be more detail than I want to show.
    - The aggregate results for Major Versions (v=1) shows that Chrome 23 fails the test (when actually the latest version of Chrome 23 passes the test).
    - I can generate a URL that shows the results for the latest version of Chrome by manually (yes, manually) adding the desired UA names with the &ua parameter. This shows that "Chrome 23.0.1259" passes the localStorage test.

-Steve
To view this discussion on the web visit https://groups.google.com/d/msg/browserscope/-/gBoSUIRqaCYJ.

Lindsey Simon

unread,
Sep 21, 2012, 5:17:09 PM9/21/12
to browse...@googlegroups.com
On Tue, Sep 18, 2012 at 9:45 AM, miChou <mihai...@gmail.com> wrote:
Thanks for the answer. It does clear things up, although that also means there's no quick fix for our use-case :D

Does keeping the top list more up to date not solve your issue though? i.e. if the "family" results list is inaccurate it doesn't seem like it really matters.
 

That being said, there's a couple more things to ask:
1. What is the actual function used when doing the aggregation? I suspect it's average, but it'd be useful to know for sure.
2. (and the most important one) To be totally honest, clearing up the results table would've been just a workaround. It my use-case, what I would really like to do as simple as possible, would be to show the test results in the latest version of the browsers. How hard would it be to add a new category (e.g. v=latest) to return the latest browsers in each "family"? I'm not asking you to code it - I would gladly do it if you're OK with the idea.

These is technically what we ought to do for "top" - notice we have "Top Desktop Edge" and we could have a "Top Mobile Edge" category too. I'd love it if you can help us to keep these lists up to date and would gladly grant you commit and push access to do so. JD Dalton also helps with this.
It's slightly a pain in the butt because you'll need to both update a python file and a csv file which includes release dates (to help with graphing)
Are you game for that?!
 
To view this discussion on the web visit https://groups.google.com/d/msg/browserscope/-/gBoSUIRqaCYJ.
Reply all
Reply to author
Forward
0 new messages