Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Data platform & tools update, Q1 2017

25 views
Skip to first unread message

Georg Fritzsche

unread,
Apr 12, 2017, 2:22:09 PM4/12/17
to fx-team, Firefox Dev, dev-platform, FHR-dev
The data platform and tools teams are working on our core Telemetry system,
the data pipeline, providing core datasets and maintaining some central
data viewing tools.

To make new work more visible, we intend to provide quarterly updates.

What's new in the last few months?

On the data collection side, scalars
<https://gecko.readthedocs.io/en/latest/toolkit/components/telemetry/telemetry/collection/scalars.html>
are now supported through the pipeline, so new flag and count histograms
are now disallowed on Desktop in favour of boolean and uint scalars.

Event Telemetry
<https://gecko.readthedocs.io/en/latest/toolkit/components/telemetry/telemetry/collection/events.html>
is now ready for adoption. A general events table
<https://sql.telemetry.mozilla.org/queries/3415/source#table> is available,
a sync events table coming up and further uses are being looked at.

For documentation, we re-worked the guide for adding new Telemetry
<https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Adding_a_new_Telemetry_probe>
and extended the detailed data collection documentation
<https://gecko.readthedocs.io/en/latest/toolkit/components/telemetry/telemetry/collection/>
.

The prototype for making probe history
<https://georgf.github.io/fx-data-explorer/> more discoverable now has
daily updates and supports Nightly too.

For filing or finding bugs, there is now a new Data Platform and Tools
<https://bugzilla.mozilla.org/describecomponents.cgi?product=Data%20Platform%20and%20Tools>
product. Note that client-side bugs still go into the separate
Toolkit::Telemetry
component
<https://bugzilla.mozilla.org/enter_bug.cgi?product=Toolkit&component=Telemetry>
.

The data pipeline work powers results for re:dash
<https://sql.telemetry.mozilla.org/> and custom analysis
<https://analysis.telemetry.mozilla.org/> among other things.

Notable recent work here includes:

-

Providing efficient lookup of client histories using Hbase
<https://python-moztelemetry.readthedocs.io/en/stable/userguide.html#module-moztelemetry.hbase>
.
-

Experimental support for Zeppelin
<https://mail.mozilla.org/pipermail/fhr-dev/2017-March/001210.html>, a
new notebook type that improves Jupyter.
-

The Telemetry dashboard
<https://telemetry.mozilla.org/new-pipeline/dist.html> is now faster
through a dedicated read replica and client-side caching.
-

The Dataset API now has a select method
<http://python-moztelemetry.readthedocs.io/en/stable/userguide.html#moztelemetry.dataset.Dataset.select>
to return a subset of fields.
-

Providing a framework for testable Python ETL jobs
<https://github.com/mozilla/python_etl> generated from a template
<https://github.com/harterrt/cookiecutter-python-etl>.
-

Direct-to-parquet
<https://mozilla-services.github.io/lua_sandbox_extensions/parquet/sandboxes/heka/output/s3_parquet.html>
is in production, making easier to build datasets from incoming pings.


The data tools work powers tools that make data analysis more accessible
across Mozilla.

Updates here are:

-

For re:dash <https://sql.telemetry.mozilla.org/>, the UI improved to
make the dashboard list more accessible.
-

re:dash query issues were reduced by handling failing queries using
exponential back-off.
-

There is also a python re:dash client
<https://github.com/mozilla/redash_client> (h/t to emtwo), allowing
programmatic generation of queries and dashboards.
-

The distribution viewer <https://gauss.telemetry.mozilla.org/> is now
live, making distributions of a set of important Firefox metrics available.


-

The analysis service <http://analysis.telemetry.mozilla.org/> gained
features
<https://github.com/mozilla/telemetry-analysis-service/blob/master/WHATSNEW.md>
like persistent cluster storage and the ability to extend cluster lifetimes.


Coming soon

For the next few months, interesting projects in the pipeline include:

-

Work to decrease data latency, by sending the last ping of a Firefox
session immediately. We will also start sending timely pings for new users
and updates.
-

Rebooting documentation
<https://docs.google.com/presentation/d/1zWbzDCNkM5tzR9K6WgO4vR7fpiuJDP-JBNLrYDsbeUA/edit#slide=id.g1d58c03b5b_0_1>,
providing guidance as well as tying existing documentation together.
-

Start supporting new data collection from add-ons in Telemetry, starting
with events.


Contact us

Please reach out to us with any questions or concerns.

You can find us on IRC in #telemetry and #datapipeline.

The main mailing list for data topics is fhr-dev
<https://mail.mozilla.org/listinfo/fhr-dev>.

Bugs can be filed in one of these components
<https://wiki.mozilla.org/Telemetry#Filing_Bugs>.

You can also find us on Twitter as @MozTelemetry
<https://twitter.com/moztelemetry>.
0 new messages