Fwd: [Compat Data]

Jeremie Patonnier

unread,

Jan 20, 2014, 11:34:58 AM1/20/14

to dev...@lists.mozilla.org

(Big thanks to Luke who allows me to post on dev-mdn, sorry for those of
you who receive this e-mail several times :)

Hi!

This e-mail is a formal kick-off for the "Compatibility Data" project.

The general goal of this project is to improve the way we handle
compatibility data on MDN. Compatibility data (currently in the form of
compatibility tables dispatch accross MDN) allows users to know more about
the reliability of any web standard features and ease the use of web
technologies. It is a key feature for MDN as it is a direct expression of
the Mozilla mission. Currently, the data are gather and maintain "by hand".
Thanks to our awesome community we have some good data. However, this is
hardly sustainable as the number of technologies is growing as well as the
complexity of browsers' implementations. We start to face some difficulties
to stay up to date and to provide some improved content and extra features
around those data.

This entire project will be tracked on WikiMo:
https://wiki.mozilla.org/MDN/Development/CompatibilityTables

Everything regarding the project will be discuss publicly on the dev-mdn
mailing list (and a summary will be sent on a regular basis on the
mdn-drivers mailing list). I'll prefix my e-mail subject with the [Compat
Data] string. Feel free to do the same as it will help us all to filter
e-mails accordingly :) I'll be glad to answer any question or to help
anyone who wish to contribute to that project.

I suggest the following rough path to achieve that project (See the WikiMo
page for more details):

1. Retrieve the most important needs (requirement and use cases)
regarding the data
2. Handle a data storage
3. Build how to get data
4. Build how to expose data

As a first step, I wish to get your opinion, ideas, hope, wish, etc.
regarding this subject. We all have things to said about this topic,
please, feel free to share :)

In concrete, I'll gather needs and uses cases regarding compatibility data
in the next weeks. If needed I'll organise formal moments to gather inputs.

In the meantime, Ali, Luke, Maris and Holly, could you review the critical
path<https://wiki.mozilla.org/MDN/Development/CompatibilityTables#Critical_Path>I
made in order to make sure I haven't forget anythings critical
(especially in your area of expertise). If you are unfamiliar with such a
schema, here how it works:

- Each item on the schema represent a (macro) task that need to be
fulfill to complete the project.
- All tasks must be complete to consider the project complete.
- A given task can start only if one of its required parent tasks has
started.
- A given task can be marked as complete only if all its required parent
tasks are all marked as complete.

If you find something unclear or inaccurate, let me know in order to fix it.

As usual, I'll answer any question any of you could have :)

Best,
--
Jeremie
.............................
Web : http://jeremie.patonnier.net
Twitter : @JeremiePat <http://twitter.com/JeremiePat>

Luke Crouch

unread,

Jan 20, 2014, 3:02:32 PM1/20/14

to Jeremie Patonnier, dev...@lists.mozilla.org

Thanks for putting so much time and attention in this, Jeremie! I've
been reading about metrics & analytics a bunch lately, so this project
excites me.

The critical path seems very detailed and prescribed. Are you okay with
revising it all down to a "Minimum Viable Experiment" to answer just a
few critical questions:

* What is the primary problem and for whom?
* Are there alternative or complementary solutions already?
* How will we measure if we impact the problem or not?

-L

Jean-Yves Perrier

unread,

Jan 20, 2014, 10:28:09 PM1/20/14

to dev...@lists.mozilla.org

On 20/01/2014 20:02, Luke Crouch wrote:
> * What is the primary problem and for whom?

The primary problem for writers is that the same data is repeated
several times on the MDN:
1. On the main article page
2. On tutorial
3. On summary pages (list of properties, interfaces, methods, ...; for
Gecko or for all browsers)

And all this is multiplied by the amount of locales.

When something change in compatibility (browser XY now implements it),
an editor must edits dozen of pages to keep this coherent. This doesn't
happen and we have incoherence

> * Are there alternative or complementary solutions already?

We do it manually, but I refused to create some summary pages until know
because they wouldn't be unmaintainable.

> * How will we measure if we impact the problem or not?

This is obvious in this case: if we use it, it solves the problem; if it
doesn't solve, we just won't use it.

A second problem to solve, is to assess and improve the quality of the
data we have. Here the creation of summary pages with helps for
assessment and the comparison of our data with different sources will
allow us to discover errors.

--
Jean-Yves Perrier
Technical Writer / Mozilla Developer Network

Jean-Yves Perrier

unread,

Jan 20, 2014, 10:31:14 PM1/20/14

to dev...@lists.mozilla.org

Something that I don't see described is how we dealt with divergences in
data from different souces.

This is an important point.

Luke Crouch

unread,

Jan 20, 2014, 11:11:11 PM1/20/14

to Jean-Yves Perrier, dev...@lists.mozilla.org

What else can we measure?

e.g., with more and better browser compatibility data, do we expect to
see more traffic? More contributor feeding the tables?

-L

--
Q: Why is this email five sentences or less?
A: http://five.sentenc.es

Janet Swisher

unread,

Jan 21, 2014, 11:37:33 AM1/21/14

to dev...@lists.mozilla.org

On 1/20/14 8:11 PM, Luke Crouch wrote:
> What else can we measure?
>
> e.g., with more and better browser compatibility data, do we expect to
> see more traffic? More contributor feeding the tables?
>

I expect that if it is easier to contribute data to the tables, more
people will do it. It may have an effect on traffic, but I think it
would be indirect and hard to attribute to this.

--
Janet Swisher <mailto:jREMOVE...@mozilla.com>
Mozilla Developer Network <https://developer.mozilla.org>
Developer Engagement Community Organizer

Jeremie Patonnier

unread,

Jan 23, 2014, 9:09:51 AM1/23/14

to Luke Crouch, dev...@lists.mozilla.org

Hi luke!

2014/1/20 Luke Crouch <lcr...@mozilla.com>

> The critical path seems very detailed and prescribed.

Yes, it's a first draft an I tried to put as much as the things I have in
mind about it. However, this is currently open for discussion and need a
careful review before setting it in stone. That said, I always work
dynamically which mean that if at any time in the future something as an
impact on the critical path, the path will be change accordingly whatever
its current state. It will be done with the agreement of all the people
involved at that time, but I don't want a plan to be followed to the letter
if it's obvious that the plan is wrong. Also, Ali asked me to work on this
with a goals driven approach which means that if a goal becomes irrelevant,
it will be pushed out and if a new goal becomes essential it will be pushed
in.

> Are you okay with revising it all down to a "Minimum Viable Experiment" to
> answer just a few critical questions:
>

Absolutely, I will see if it's possible to narrow it down based on your
suggested points:

* What is the primary problem and for whom?

> * Are there alternative or complementary solutions already?

> * How will we measure if we impact the problem or not?
>

As Jean-Yves has made a first answer I'll make mine within his e-mail :)

Best

Jeremie Patonnier

unread,

Jan 23, 2014, 10:25:54 AM1/23/14

to Jean-Yves Perrier, dev...@lists.mozilla.org

Hi!

2014/1/21 Jean-Yves Perrier <jper...@mozilla.com>

> On 20/01/2014 20:02, Luke Crouch wrote:
>

>> * What is the primary problem and for whom?
>>

> The primary problem for writers is that the same data is repeated several
> times on the MDN:
> 1. On the main article page
> 2. On tutorial
> 3. On summary pages (list of properties, interfaces, methods, ...; for
> Gecko or for all browsers)
>
> And all this is multiplied by the amount of locales.

> When something change in compatibility (browser XY now implements it), an
> editor must edits dozen of pages to keep this coherent. This doesn't happen
> and we have incoherence

This has a huge impact on maintaining relevant data across all MDN. Because
our data are not centralized, it requires a manual edit on many pages just
to change a single information. This as two consequences: first, it take
many time and is a potential source of error (each time we edit the same
thing we can made a mistake); second, the localization will require to have
several different person edit different page which increase the risque to
introduce mistakes and to have out dated data across different page making
those information less accurate for the users.

So in short:

- For users: data can be incoherent (and therefor useless), so we need
to find a way to have our data always in sync across all MDN
- For contributors: Maintaining data is time consuming, so we need a way
to make data contribution easier and less error prone.

An a consequence if users are frustrated about useless data, it will impact
our ability to convince them that MDN is a thoughtful resource for them and
they will move away from MDN. On the other side, if contributing data is a
pain in the ass, less contributor will make the effort and the average
quality of our data will turn down which risk to create a vicious circle
with the previous problem : less accurate data = less users; less users =
less contributors; less contributors = less accurate data… and you closed
the loop.

Of course this is not the biggest problem we can find on MDN but it is
something we should fix before it is to late.

> * Are there alternative or complementary solutions already?
>>

> We do it manually, but I refused to create some summary pages until know
> because they wouldn't be unmaintainable.

Manual maintenance shows its limits and already slow us down on using those
data to improve our content.

TL;DR: regarding MDN needs, I don't know any examples of alternative
solution but some complementary ones.

>From my knowledge the only project I know doing that is the caniuse.com web
sites. The data are pretty good and their contributing workflow has many
aspect that are both good and bad. Basicaly, they export the content of
their data base as JSON file on Github and reimport them once changed.

I found two issues with caniuse.com:

- They do not provide a way to directly access the data remotely to use
them somewhere else (if we want to use their data, we have to import them
in our own data base… and it's definitively something we should consider);
- The scope of their data is quite low and we have a larger set of
technologies and browsers to cover on MDN, especially legacy stuff.

Browserscope.org is also one of the data source I'm aware off. This can
provide us with a huge amount of raw data, especially if we want to
automate grabbing information from tests someone could write. However, this
is very rough data it is not possible to use as is with two issues that
need to handle carefully:

- The data does not necessarily fit all our needs. At first glance, it
miss the ability to comment the data to give some context to them (we are
currently doing this a lot on MDN), for example, how to workaround a
missing feature on a given browser.
- The data can be tainted. Browserscope is a statistical tool that can
provide false positive or false negative depending on how the features were
tested (who wrote the tests, who maintained them, who pass the tests, in
which condition, etc.) therefor we cannot grab them blindly and we need
some review process to minimized any possible error, it's uncertain it
worth the effort (but it worth check it).

At last W3C official test suits is another possible data source. It cover a
very large ground and is actively maintain. However, the data are not easy
to get and they not necessarily cover a large set of browsers (and their
various version).

Those are all the three examples I have in mind, I would be pleased if
someone as other to share :)

* How will we measure if we impact the problem or not?
>>

> This is obvious in this case: if we use it, it solves the problem; if it
> doesn't solve, we just won't use it.
>

This is harder than it looks. The impact on user is very hard to measure
easily. A simple change in the number of visitors or in the number of
contribution would be not significant. The best way to measure the impact
on users would be to use a listening platform able to check what people
think about MDN content. But it's difficult and requires specific knowledge
and tools. My company edit such a tool (http://en.steerious.com/) but there
are others.

The impact on contributors is also very hard to measure. It would be
possible if we build some specific path to contribute to track them in
order to check the number of contribution, but as we haven't any current
data regarding such specific contribution it will be hard to check for any
progress.

As this project intent to improve some quality (better data, easier to
contribute) it's pretty hard to measure a direct effect of such an
improvement. The only thing we can measure for sure is the accomplishment
of what we want to do. For example it is possible to currently check if
data are out of synch (warning: it's not easy to do) and check after we
improved our system if the data are more coherent.

I'm welcoming any ideas on that matter.

> A second problem to solve, is to assess and improve the quality of the
> data we have. Here the creation of summary pages with helps for assessment
> and the comparison of our data with different sources will allow us to
> discover errors.

I agree, the idea is that in our current state, we are limited about what
we can do with our data. If we rationalize the way we handle the data we
will be able to create new ways to use them on MDN, or elsewhere (there is
already interest from people to reuse our data: webplatform.org want to do
it, caniuse.com already use MDN as a source of information for their own
data).

So rationalized our compatibility data open up many opportunities. If we
expose our data through an API, it would be easy to measure if people use
it.

This is just my own vision, if any one have ideas to share, please, do :)
Best,

Jeremie Patonnier

unread,

Jan 23, 2014, 10:30:44 AM1/23/14

to Jean-Yves Perrier, dev...@lists.mozilla.org

2014/1/21 Jean-Yves Perrier <jper...@mozilla.com>

> Something that I don't see described is how we dealt with divergences in
> data from different souces.
>

This is something that is not define yet, as we do not have start to define
all the constrain about the data we want to support. At first we need to
agree with the idea to import third party data or with the idea to allow
users to import massive amount of data. It will be done once the critical
path of the project would have been reviewed :)

> This is an important point.

I agree.

Luke Crouch

unread,

Jan 23, 2014, 5:00:32 PM1/23/14

to Jeremie Patonnier, Jean-Yves Perrier, dev...@lists.mozilla.org

I love the way this project is shaping up.

1. Could we maintain our own browserscope.org test suite [] and simply
use browserscope.org as the back-end data store? Then we could use their
API to bring in the compatibility data and add our own valuable content
like how to work around missing features.

2. Could we use w3c/web-platform-tests [] and do the same?

Either way, this project could achieve 10x our "increase contributors"
goal by adding a button to allow visitors to run the test suite and
submit their data back to us - which makes them a contributor. :)

Luke Crouch

unread,

Jan 23, 2014, 5:04:07 PM1/23/14

to Jeremie Patonnier, Jean-Yves Perrier, dev...@lists.mozilla.org

Oops, forgot to fill in my reference links:

browserscope.org test suites: http://www.browserscope.org/user/tests/howto

w3c/web-platform-tests: https://github.com/w3c/web-platform-tests

Jean-Yves Perrier

unread,

Jan 23, 2014, 6:09:09 PM1/23/14

to dev...@lists.mozilla.org

On 23/01/2014 22:00, Luke Crouch wrote:
> I love the way this project is shaping up.
>
> 1. Could we maintain our own browserscope.org test suite [] and simply
> use browserscope.org as the back-end data store? Then we could use
> their API to bring in the compatibility data and add our own valuable
> content like how to work around missing features.
>

I doubt it. We should hire people to write a test suite for the whole
Web. Also we need compatibility tables on Mozilla features too. So we
can't just use browserscope.org as the back-end store.

Plus we have no guarantee that it will not disappear at some point. It
is a risk.

> 2. Could we use w3c/web-platform-tests [] and do the same?
>

Again, we couldn't add Mozilla-specific things to these realms.

Luke Crouch

unread,

Jan 23, 2014, 6:18:59 PM1/23/14

to Jean-Yves Perrier, dev...@lists.mozilla.org

AIUI from browserscope docs, we *can* create our own custom test suites:

http://www.browserscope.org/user/tests/howto

In addition, browserscope is an open-source, django-based application:

https://code.google.com/p/browserscope/wiki/Contributing

So we could add it to, and host it from, MDN.

Jeremie Patonnier

unread,

Jan 24, 2014, 5:46:50 AM1/24/14

to Luke Crouch, Jean-Yves Perrier, dev...@lists.mozilla.org

Hi!

My opinion is that, right now, it's to soon to take any decision regarding
any technical point such as where and how to store data. As I state in the
critical path of the project the first thing to do is to gather our need
around those data. We express a few in this thread and I'm starting
gathering what we have. I'll come back very soon to discuss this further
with everybody (next week I think).

When our minds will be clear about what we need/want to do with those data,
it will be easier to discuss how to technically support those needs.
Currently, I just have feelings about Browserscope and I think it would be
better to have more inputs before turning any feelings into strong
opinions. There is pro and cons regarding using it and we need to be sure
it's the best solution before making any hard decision. For once, we are
not in a hurry so let's handle this properly :)

More to discuss once I'll make a first summary of the current discussion on
WikiMo ;)

Best,
Jérémie

2014/1/24 Luke Crouch <lcr...@mozilla.com>

> _______________________________________________
> dev-mdn mailing list
> dev...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-mdn