Al,,
ActiveData[1] is warehouse of our test times, build times, and whatever
other properties or measures I can get my hands on. It has a primitive
query tool[2] that a human can use to send queries to the public service
[3]. The `unittest` table has results from all tests that emit
structured logs, and is the largest at 4 billion records.
Getting a feel for what is in ActiveData will take a little time. I
would first suggest a short list of some example documents. `{"from":
"unittest"}` will give you some example records. `{"from":"unittest",
"groupby":"build.platform"}` will give you the cardinality of the
platforms. There are also some starter docs on the query tool page [2].
Please forgive the low speed, it is scaled down for minimal cost.
Some other tables:
* {"from":"unittest.result.subtests"} a table of all nested unittest
documents: only has the failure messages, for now
* {"from":"jobs"} the buildbot job properties and timing (builds are not
properly categorized, he Treeherder code should be used for this),
* {"from":"jobs.action.timings"} each buildbot step, each mozharness
step, and their timing
* {"from":"orange_factor"} copy of the orangefactor data, plus `push_date`
* {"from":"perf"} all the Talos results, including replicates
I have only started to ingest the TaskCluster information.
[1]
https://wiki.mozilla.org/Auto-tools/Projects/ActiveData
[2]
http://activedata.allizom.org/tools/query.html
[3]
http://activedata.allizom.org/query
On 2015-11-05 09:18, L. David Baron wrote:
> On Wednesday 2015-11-04 12:46 -0500, William Lachance wrote:
>> On 2015-11-04 10:55 AM, William Lachance wrote:
>>> 1. Relatively deterministic.
>>> 2. Something people actually care about and are willing to act on, on a
>>> per-commit basis. If you're only going to look at it once a quarter or
>>> so, it doesn't need to be in Perfherder.
>>>
>>> Anyway, just thought I'd open the floor to brainstorming. I'd prefer to
>>> add stuff incrementally, to make sure Perfherder can handle the load,
>>> but I'd love to hear all your ideas.
>> Someone mentioned "test times" to me in private email.
> That was me. (I didn't feel like sending a late-at-night
> one-sentence email to the whole list, and figured there was a decent
> chance that somebody else would mention it as well.)
>
> I think they're worth tracking because we've had substantial
> performance regressions (I think including as bad as a doubling in
> times) that weren't caught quickly, and led to substantially worse
> load on our testing infrastructure.
>
>> I do think test times are worth tracking, but probably not in Perfherder:
>> test times might not be deterministic depending on where / how they're
>> running (which makes it difficult to automatically detect regressions and
>> sheriff them on a per commit basis) and regardless there's too much data to
>> really be manageable by Perfherder's intended interface even if that problem
>> were magically solved.
> It seems like if we're running the same tests on different sorts of
> machines, we could track different perf numbers for the test run on
> different machine classes.
>
> We'd also want to measure the test time and *not* the time spent
> downloading the build.
>
> And we'd probably want to measure the total time across chunks so
> that we don't count redistribution between chunks as a set of
> regressions and improvements.
>
> So that does make it a bit difficult, but it does seem doable.
>
>> As a possible alternative, I believe Kyle Lahnakoski's ActiveData project
>> (
https://wiki.mozilla.org/Auto-tools/Projects/ActiveData) already *does*
>> track this type of data but last I heard he was looking for more feedback on
>> how to alert/present it to the platform community. If you have any ideas on
>> this, please let him know (he's CC'ed). :)