Hi!
2014/1/21 Jean-Yves Perrier <
jper...@mozilla.com>
> On 20/01/2014 20:02, Luke Crouch wrote:
>
>> * What is the primary problem and for whom?
>>
> The primary problem for writers is that the same data is repeated several
> times on the MDN:
> 1. On the main article page
> 2. On tutorial
> 3. On summary pages (list of properties, interfaces, methods, ...; for
> Gecko or for all browsers)
>
> And all this is multiplied by the amount of locales.
> When something change in compatibility (browser XY now implements it), an
> editor must edits dozen of pages to keep this coherent. This doesn't happen
> and we have incoherence
This has a huge impact on maintaining relevant data across all MDN. Because
our data are not centralized, it requires a manual edit on many pages just
to change a single information. This as two consequences: first, it take
many time and is a potential source of error (each time we edit the same
thing we can made a mistake); second, the localization will require to have
several different person edit different page which increase the risque to
introduce mistakes and to have out dated data across different page making
those information less accurate for the users.
So in short:
- For users: data can be incoherent (and therefor useless), so we need
to find a way to have our data always in sync across all MDN
- For contributors: Maintaining data is time consuming, so we need a way
to make data contribution easier and less error prone.
An a consequence if users are frustrated about useless data, it will impact
our ability to convince them that MDN is a thoughtful resource for them and
they will move away from MDN. On the other side, if contributing data is a
pain in the ass, less contributor will make the effort and the average
quality of our data will turn down which risk to create a vicious circle
with the previous problem : less accurate data = less users; less users =
less contributors; less contributors = less accurate data… and you closed
the loop.
Of course this is not the biggest problem we can find on MDN but it is
something we should fix before it is to late.
> * Are there alternative or complementary solutions already?
>>
> We do it manually, but I refused to create some summary pages until know
> because they wouldn't be unmaintainable.
Manual maintenance shows its limits and already slow us down on using those
data to improve our content.
TL;DR: regarding MDN needs, I don't know any examples of alternative
solution but some complementary ones.
>From my knowledge the only project I know doing that is the
caniuse.com web
sites. The data are pretty good and their contributing workflow has many
aspect that are both good and bad. Basicaly, they export the content of
their data base as JSON file on Github and reimport them once changed.
I found two issues with
caniuse.com:
- They do not provide a way to directly access the data remotely to use
them somewhere else (if we want to use their data, we have to import them
in our own data base… and it's definitively something we should consider);
- The scope of their data is quite low and we have a larger set of
technologies and browsers to cover on MDN, especially legacy stuff.
Browserscope.org is also one of the data source I'm aware off. This can
provide us with a huge amount of raw data, especially if we want to
automate grabbing information from tests someone could write. However, this
is very rough data it is not possible to use as is with two issues that
need to handle carefully:
- The data does not necessarily fit all our needs. At first glance, it
miss the ability to comment the data to give some context to them (we are
currently doing this a lot on MDN), for example, how to workaround a
missing feature on a given browser.
- The data can be tainted. Browserscope is a statistical tool that can
provide false positive or false negative depending on how the features were
tested (who wrote the tests, who maintained them, who pass the tests, in
which condition, etc.) therefor we cannot grab them blindly and we need
some review process to minimized any possible error, it's uncertain it
worth the effort (but it worth check it).
At last W3C official test suits is another possible data source. It cover a
very large ground and is actively maintain. However, the data are not easy
to get and they not necessarily cover a large set of browsers (and their
various version).
Those are all the three examples I have in mind, I would be pleased if
someone as other to share :)
* How will we measure if we impact the problem or not?
>>
> This is obvious in this case: if we use it, it solves the problem; if it
> doesn't solve, we just won't use it.
>
This is harder than it looks. The impact on user is very hard to measure
easily. A simple change in the number of visitors or in the number of
contribution would be not significant. The best way to measure the impact
on users would be to use a listening platform able to check what people
think about MDN content. But it's difficult and requires specific knowledge
and tools. My company edit such a tool (
http://en.steerious.com/) but there
are others.
The impact on contributors is also very hard to measure. It would be
possible if we build some specific path to contribute to track them in
order to check the number of contribution, but as we haven't any current
data regarding such specific contribution it will be hard to check for any
progress.
As this project intent to improve some quality (better data, easier to
contribute) it's pretty hard to measure a direct effect of such an
improvement. The only thing we can measure for sure is the accomplishment
of what we want to do. For example it is possible to currently check if
data are out of synch (warning: it's not easy to do) and check after we
improved our system if the data are more coherent.
I'm welcoming any ideas on that matter.
> A second problem to solve, is to assess and improve the quality of the
> data we have. Here the creation of summary pages with helps for assessment
> and the comparison of our data with different sources will allow us to
> discover errors.
I agree, the idea is that in our current state, we are limited about what
we can do with our data. If we rationalize the way we handle the data we
will be able to create new ways to use them on MDN, or elsewhere (there is
already interest from people to reuse our data:
webplatform.org want to do
it,
caniuse.com already use MDN as a source of information for their own
data).
So rationalized our compatibility data open up many opportunities. If we
expose our data through an API, it would be easy to measure if people use
it.
This is just my own vision, if any one have ideas to share, please, do :)
Best,