Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Let's talk slugs and how we use it in MDN/BC

42 views
Skip to first unread message

Renoir Boulanger

unread,
Nov 23, 2015, 3:26:28 PM11/23/15
to dev-mdn
Hi folks,

We're about to start migrating MDN pages to use BrowserCompat ("BC")
backed compatibility tables.

This process will have us, and our contributors to insert a macro with
a "slug" (e.g. `{{EmbedCompatTable("foo")}}`). That will display the
API-backed table to some users, and the traditional wiki-backed table
to the rest. Eventually, the wiki-backed table will be removed,
leaving just the slug connecting the MDN page to the BC API feature,
and moving data management to the API and contribution interface.

The problem about the slugs is that they look very similar to an MDN
URL and gives the impression a user could change parts of it to get
another table, but the promise breaks quickly.

A discussion arose about the format of the slug, if it should give a
textual representation (e.g. "web-css-border-radius"), if it should
simply be an opaque string (e.g. "000-000-000"), and other similar
questions.

We're still at the early stage of BC, but we’re about to make a big
leap by migrating MDN pages to use BC backed tables.

The reason for prioritizing; the JavaScript pages are a good candidate
for converting.

We thought of using JavaScript topic because; existing tables errors
has been fixed and can be easily imported, the translators for this
section do a good job at keeping their version up to date with the
English page, and the JavaScript topic is the one that has most
traffic on MDN.

On the flip side, the JavaScript topic pages are the ones with the
longest URLs and would hit this issue quickly.

All of this is to say that we could have a good amount of pages to
roll the BC backed tables and do some stress-testing.

Not a lot of pages has the macro —about 20; yet, maybe we should make
a change before it's too late.

This essay is about finding a future-friendly solution for identifying
what data we display on MDN and on BC API side, so we won't have
regrets later.

In the following sections, I'll cover the potential problems we might
face with the current implementation, make a requirement proposal and
give options with pros and cons so we can make an informed decision.



# The current state

The origin of data BC is, for the most part, from MDN, it's natural we
associate 1:1 a documentation page URL to a feature.

For example, we could be tempted to use "Web/CSS/border-radius" as
identifier directly from BC. But it's not a good idea for many
reasons, one of them being a limitation of which characters we can use
in a path (e.g. the bar part in "/api/foo/bar") or query (e.g. after
the "?") part of a URL. Due to that need to be usable in an URL, we
are limited to use ASCII Alphanumeric characters.

The initial BC identifier scheme, referred to as "slug", is roughly
lower casing and replacing parts from an MDN URL (e.g.
"Web/CSS/border-radius") to make it valid URL part (e.g.
"web-css-border-radius").

The requirement to fit within ASCII Alphanumeric and be path-like is
forcing us to try replacing slash ("/") with a dash, and other
manipulations in an attempt to ensure we have unique identifiers.

The fact we're close to MDN URL scheme, in which we don't mute some
words, quickly makes a long string and becomes hard to predict (e.g.
"web_javascript_reference_global_objects_numbea9a2b"). Which in the
end may confuse users and would lose its essence if the original page
change location.

This scheme seemed to make sense at first, but gets out of control
quickly and with the compound need to have not-too-long and unique
identifiers, we get forced truncate and add random characters; Which
breaks the "usable for humans." principle of slugs.

A solution could be we don't use slugs at all, and use database insert
ID, or other "auto increment" number. Unfortunately, due to the nature
of databases, we can't get for sure the same ID between full import
runs and would break the contract of identifying uniquely and
consistently.


## The limitations

As previously stated, one reason we can't use a path-like (i.e.
"Web/CSS/border-radius", converted to "web-css-border-radius) as a
resource identifier is that it becomes quickly out of control, and we
can't help users predict the syntax to use.

Here are a few MDN path and what would be the "slugs" to use with the
current implementation;

/Web/JavaScript/Reference/Global_Objects/RegExp, into
web-javascript-reference-global_objects-regexp

/Web/JavaScript/Reference/Global_Objects/NumberFormat, into
web-javascript-reference-global_objects-numberform

/Web/JavaScript/Reference/Global_Objects/NumberFormat/resolvedOptions, into
web_javascript_reference_global_objects_numbea9a2b

/Web/CSS/display, into
web-css-display

/Web/CSS/background-color, into
web-css-background-color


Notice the following:

* We can't know for certain when should we use dash ("-")
instead of underscore ("_")

* Some words are redundant making the slug
longer and could be omitted

* We can't know how to separate categorization
(e.g. Web, CSS) from predicate (e.g. border-radius)

* The text is directly bound to an MDN URL


Besides, we can't copy-paste strings with dashes easily. When we
double click, we don't get the full string selected but only the word
between the dashes.


With those examples in mind, we can observe the following limitations:


1. Users might want to use the same compatibility table on another MDN page.

Not only the page that particularly explain the feature.


2. A human might not be able to guess what to change in a too long or
cryptic string.

For example, the Number JavaScript API method
NumberFormat.resolvedOptions has
"web_javascript_reference_global_objects_numbea9a2b" as a slug.


3. On MDN, content moves around independently and so URLs can change.

At any time, it's most likely that somebody will rework
documentation about a feature and move content and URLs around.

We could try to find a way to harmonize how we refer and a way to
translate to some identifier the feature for BC internal purposes but
without adding complexity at the places users will interact with it.


4. BC isn't running as part of MDN runtime.

It can't know about things that got renamed. It is debatable,
whether or not it should track or act differently about it.

The exception is to help us migrate out of MDN data currently holds.


5. URL on MDN often also represent a locale.

A User-Agent feature will work the same anywhere, regardless of
which language the user understands.

Only difference BC will do display in English and allow us to tell
in which language we want the data. If the translation exists, it'll
be displayed.


6. In a foreseeable future, MDN won't be the only place to display
compatibility tables.

We may want the ability to merge datasets between BC sites but for
this, we would need a systematic way of creating an identifier.

An incremental, or one-time unique string such as an UUID wouldn't
work for this, unless other sites constantly imports from our database
dump or we keep a mapping somewhere.

A solution to this could be that we hash a human-friendly slug so we
meet both sides (slug for Kuma + opaque identifier for BC API). That
would be another discussion though.




# Requirements

For an ideal solution, we'd require.


## BC Identifiers

* ASCII Alphanumeric + "_", "-", and URL encoded
(ref: URL code points)
* No collisions
* (optional, if we want human-usable slugs)
Predictable scheme


## Slugs


Those would apply, unless we aren’t using slugs at all.

Remember: the idea of a slug is to be possible for a human to read and predict.

* One character to mark categorization+predicate
separation
* No collisions
* No ambiguity and not too long
* Makes it possible to visualize what it’s
about
* Max length of 100 characters
* No dash, as it’s hard to copy-paste
dashed strings

In this case, we're likely to need a way to convert into a valid BC
API identifier.





# Proposed solution


The question boils down to; Should we use slugs, or should we use an
opaque string (e.g. “000-000-000” a UUID, “ab1ab2ab3ab4ab5ab” a hash)?


Also, do we want to have the same string to represent a given feature
on both systems: In KumaScript macro, AND on BC API URLs?


If we decide, as a community, that BC API uses opaque identifiers
(e.g. "/api/v1/features/000-000-000") it doesn't mean we can’t set in
place tools for MDN contributors to know what to paste in the page.


We are in front of two possible solution paths:


Don't use a slug at all in Kuma; use the same identifier as BC API

Use an unlimited slug in Kuma; convert into BC API elsewhere


Depending on the choice, we’ll be able to tell what to prioritize
about it and see when we can make this change.

--
Renoir Boulanger
Mozilla Corporation

Justin Crawford

unread,
Nov 23, 2015, 5:51:37 PM11/23/15
to Renoir Boulanger, dev-mdc, dev-mdn
+dev-mdc

With this project we are attempting to create a canonical, addressible
database of web platform features. I think we will one day dream of new MDN
products and documentation that depend on the taxonomy we create here. A
strong and widely recognized platform feature taxonomy would be a
compelling way to organize and link all manner of content about web
development from any number of sources.

So I think the slug we choose should stand alone (i.e. should not be
dependent on MDN content organization), and should describe the feature.

Justin

On Mon, Nov 23, 2015 at 1:25 PM, Renoir Boulanger <ren...@mozilla.com>
wrote:
> _______________________________________________
> dev-mdn mailing list
> dev...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-mdn
>



--
Justin Crawford
hoos...@mozilla.com

Sebastian Zartner

unread,
Nov 24, 2015, 2:54:14 AM11/24/15
to Justin Crawford, dev-mdc, dev-mdn
When I created the cssinfo, csssyntax and svginfo templates, I thought that
page authors shouldn't be required to enter any referencing string. The
templates themselves should be able to get the right data.
So from a page author's point of view one should just be required to add
{{EmbedCompatTable}} as macro and the template itself should be smart
enough to get the correct data.

Having this in mind, a mapping needs to happen automatically. And this
mapping could be done via the MDN URL (which is currently already exposed
via the mdn_uri property in the BC API). Whether the BC API URLs contain a
human-readable slug or an opaque identifier would then be irrelevant in
that case, as referencing would happen via a MDN URL search.
Of course in case of pages being moved around, we'd need to have a way to
map the new URL to the one stored in the BC API. How that's done may be
discussed individually.

The question about whether the BC URL should be a human-readable slug or an
opaque string may be more important for when other sites or tools start
using the API.
And IMO a human-readable slug is easier to reference, though it wouldn't
hurt to offer both ways to access the data.

Regarding the automatic creation of the slugs, they should be shortened as
much as possible. Referring to your examples:

/Web/JavaScript/Reference/Global_Objects/RegExp could become
js_RegExp

/Web/JavaScript/Reference/Global_Objects/NumberFormat could become
js_NumberFormat

/Web/JavaScript/Reference/Global_Objects/NumberFormat/resolvedOptions could
become
js_NumberFormat_resolvedOptions

/Web/CSS/display could become
css_display

/Web/CSS/background-color could become
css_background-color

/Web/CSS/CSS_Flexible_Box_Layout/Using_CSS_flexible_boxes could become
css_Using_CSS_flexible_boxes

/Web/CSS/min-width could become
css_min-width

/Web/CSS/@viewport/min-width could become
css_viewport_min-width

I.e. only use the area and the page part for referencing and be smart about
adding the related object, @-rule, etc.
Conflicts may generally be resolved by adding the second last part of the
URL path and/or by numbering.
The different parts are always separated by underscores (could also be
dashes, but it must be unified throughout the API).

Sebastian
> _______________________________________________
> dev-mdc mailing list
> dev...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-mdc
> MDN contributor guide: http://bit.ly/ContributorGuide
> Doc project Trello board: https://trello.com/b/HAhl54zz/status

Renoir Boulanger

unread,
Nov 25, 2015, 10:05:11 AM11/25/15
to dev-mdn
Hi folks!
-

/Web/JavaScript/Reference/Global_Objects/RegExp, into
web-javascript-reference-global_objects-regexp
-

/Web/JavaScript/Reference/Global_Objects/NumberFormat, into
web-javascript-reference-global_objects-numberform
-

/Web/JavaScript/Reference/Global_Objects/NumberFormat/resolvedOptions,
into
web_javascript_reference_global_objects_numbea9a2b
-

/Web/CSS/display, into
web-css-display
-

/Web/CSS/background-color, into
web-css-background-color


Notice the following:


-

We can't know for certain when should we use dash ("-") instead of
underscore ("_")
-

Some words are redundant making the slug longer and could be omitted
-

We can't know how to separate categorization (e.g. Web, CSS) from
predicate (e.g. border-radius)
-
1.

ASCII Alphanumeric + "_", "-", and URL encoded (ref: URL code points
<https://specs.webplatform.org/url/webspecs/develop/#url-code-points>)
2.

No collisions
3.

(optional, if we want human-usable slugs) Predictable scheme


## Slugs

Those would apply, unless we aren’t using slugs at all.

Remember: the idea of a slug is to be possible for a human to read and
predict.


1.

One character to mark categorization+predicate separation
2.

No collisions
3.

No ambiguity and not too long
4.

Makes it possible to visualize what it’s about
5.

Max length of 100 characters
6.

No dash, as it’s hard to copy-paste dashed strings


In this case, we're likely to need a way to convert into a valid BC API
identifier.




# Proposed solution

The question boils down to; Should we use slugs, or should we use an opaque
string (e.g. “000-000-000” a UUID <https://www.uuidgenerator.net/>,
“ab1ab2ab3ab4ab5ab” a hash)?

Also, do we want to have the same string to represent a given feature on
both systems: In KumaScript macro, AND on BC API URLs?

If we decide, as a community, that BC API uses opaque identifiers (e.g. "
/api/v1/features/000-000-000") it doesn't mean we can’t set in place tools
for MDN contributors to know what to paste in the page.

We are in front of two possible solution paths:


1.

Don't use a slug at all in Kuma; use the same identifier as BC API
2.

Use an unlimited slug in Kuma; convert into BC API elsewhere


Depending on the choice, we’ll be able to tell what to prioritize about it
and see when we can make this change.

--
*Renoir Boulanger*
*Mozilla Corporation*

Check my availability: <https://freebusy.io/rboul...@mozilla.com>
Telegram: <https://telegram.me/renoirb>

Jeremie Patonnier

unread,
Nov 25, 2015, 10:58:21 AM11/25/15
to Sebastian Zartner, dev-mdc, dev-mdn
+1 with Sebastian,

We should create a "slug" free macro like {{EmbedCompatTable}} and hide the
underlaying mapping between MDN URL and BC identifier.

This is important because on both side things are not stable: MDN URL are
subject to change and BC id could change as long as the API is not set in
stone (and it shouldn't be as long as MDN is the only client for the API).

My 2ct,
Jeremie
> > > /Web/JavaScript/Reference/Global_Objects/RegExp, into
> > > web-javascript-reference-global_objects-regexp
> > >
> > > /Web/JavaScript/Reference/Global_Objects/NumberFormat, into
> > > web-javascript-reference-global_objects-numberform
> > >
> > > /Web/JavaScript/Reference/Global_Objects/NumberFormat/resolvedOptions,
> > into
> > > web_javascript_reference_global_objects_numbea9a2b
> > >
> > > /Web/CSS/display, into
> > > web-css-display
> > >
> > > /Web/CSS/background-color, into
> > > web-css-background-color
> > >
> > >
> > > Notice the following:
> > >
> > > * We can't know for certain when should we use dash ("-")
> > > instead of underscore ("_")
> > >
> > > * Some words are redundant making the slug
> > > longer and could be omitted
> > >
> > > * We can't know how to separate categorization
> > > (e.g. Web, CSS) from predicate (e.g. border-radius)
> > >
> > > * The text is directly bound to an MDN URL
> > > * ASCII Alphanumeric + "_", "-", and URL encoded
> > > (ref: URL code points)
> > > * No collisions
> > > * (optional, if we want human-usable slugs)
> > > Predictable scheme
> > >
> > >
> > > ## Slugs
> > >
> > >
> > > Those would apply, unless we aren’t using slugs at all.
> > >
> > > Remember: the idea of a slug is to be possible for a human to read and
> > > predict.
> > >
> > > * One character to mark categorization+predicate
> > > separation
> > > * No collisions
> > > * No ambiguity and not too long
> > > * Makes it possible to visualize what it’s
> > > about
> > > * Max length of 100 characters
> > > * No dash, as it’s hard to copy-paste
> > > dashed strings
> > >
> > > In this case, we're likely to need a way to convert into a valid BC
> > > API identifier.
> > >
> > >
> > >
> > >
> > >
> > > # Proposed solution
> > >
> > >
> > > The question boils down to; Should we use slugs, or should we use an
> > > opaque string (e.g. “000-000-000” a UUID, “ab1ab2ab3ab4ab5ab” a hash)?
> > >
> > >
> > > Also, do we want to have the same string to represent a given feature
> > > on both systems: In KumaScript macro, AND on BC API URLs?
> > >
> > >
> > > If we decide, as a community, that BC API uses opaque identifiers
> > > (e.g. "/api/v1/features/000-000-000") it doesn't mean we can’t set in
> > > place tools for MDN contributors to know what to paste in the page.
> > >
> > >
> > > We are in front of two possible solution paths:
> > >
> > >
> > > Don't use a slug at all in Kuma; use the same identifier as BC API
> > >
> > > Use an unlimited slug in Kuma; convert into BC API elsewhere
> > >
> > >
> > > Depending on the choice, we’ll be able to tell what to prioritize
> > > about it and see when we can make this change.
> > >
> > > --
> > > Renoir Boulanger
> > > Mozilla Corporation
> > > _______________________________________________
> > > dev-mdn mailing list
> > > dev...@lists.mozilla.org
> > > https://lists.mozilla.org/listinfo/dev-mdn
> > >
> >
> >
> >
> > --
> > Justin Crawford
> > hoos...@mozilla.com
> > _______________________________________________
> > dev-mdc mailing list
> > dev...@lists.mozilla.org
> > https://lists.mozilla.org/listinfo/dev-mdc
> > MDN contributor guide: http://bit.ly/ContributorGuide
> > Doc project Trello board: https://trello.com/b/HAhl54zz/status
> _______________________________________________
> dev-mdc mailing list
> dev...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-mdc
> MDN contributor guide: http://bit.ly/ContributorGuide
> Doc project Trello board: https://trello.com/b/HAhl54zz/status
>



--
Jeremie
.............................
Web : http://jeremie.patonnier.net
Twitter : @JeremiePat <http://twitter.com/JeremiePat>

Renoir Boulanger

unread,
Nov 25, 2015, 3:34:24 PM11/25/15
to Sebastian Zartner, dev-mdc, dev-mdn, Jeremie Patonnier
+1 to Sebastian.

And, thanks for the thoughtful response.

On Wed, Nov 25, 2015 at 10:57 AM, Jeremie Patonnier wrote:

> +1 with Sebastian,
>
> We should create a "slug" free macro like {{EmbedCompatTable}} and
>
hide the underlaying mapping between MDN URL and BC identifier.
>
> This is important because on both side things are not stable: MDN URL
>
are subject to change and BC id could change as long as the API is not
>
set in stone (and it shouldn't be as long as MDN is the only client for the
> API).
>
> My 2ct,
> Jeremie
>
> 2015-11-24 8:53 GMT+01:00 Sebastian Zartner <sebastia...@gmail.com>:
>
>> When I created the cssinfo, csssyntax and svginfo templates, I thought
>> that
>> page authors shouldn't be required to enter any referencing string. The
>> templates themselves should be able to get the right data.
>> So from a page author's point of view one should just be required to add
>> {{EmbedCompatTable}} as macro and the template itself should be smart
>> enough to get the correct data.
>>
>
That's a very nice to have feature. We aren't there yet though.

Having this in mind, a mapping needs to happen automatically.
>
>
We aren't there yet :(

BC has a lot of features, it can import a big portion of the compatibility
data, but that part of the system wasn't defined yet.

We **are** at defining "mapping needs to happen automatically".

And this mapping could be done via the MDN URL (which is currently
>
> already exposed via the mdn_uri property in the BC API).
>
>
I doubt that part was designed to remainon BC API once we're done migrating
data out of MDN.

But there's definitely a way we could create a non-MDN centric identifier
using MDN URLs.


> Whether the BC API URLs contain a human-readable slug or an opaque
>
> identifier would then be irrelevant in that case,
>
>
Think about the relationship between a GitHub pull request, and the git
commits attached to it.

A Kuma macro call could tell
{{EmbedCompatTable("something-human-friendly")}} would be the equivalent of
a pull request.

The features and child-features attached to an identifier relating to
"something-human-friendly" would be git commit hashes.

We don't have a similar pattern for that yet.


> as referencing would happen via a MDN URL search.
>
>
We can't consider that for the long time.

If we have an opaque "at insert time" identifier (e.g. number, or UUID),
there will be no way of guessing what's the identifier for a given feature.

We would have to store somewhere a relationship between the human-friendly
string and the opaque identifier.


> Of course in case of pages being moved around, we'd need to have a way to
>> map the new URL to the one stored in the BC API. How that's done may be
>> discussed individually.
>>
>
True.

Maybe a set of transformations to apply from an MDN URL, and create the
identifier at import time?


> The question about whether the BC URL should be a human-readable slug or an
>> opaque string may be more important for when other sites or tools start
>> using the API.
>>
>
BC doesn't share MDN’s runtime code, they're two separate web applications.

BC is designed to be decoupled and I doubt we want to revert that decision.


> And IMO a human-readable slug is easier to reference, though it wouldn't
>> hurt to offer both ways to access the data.
>>
>
(snip)

+1

I have a few ideas in mind on how to adapt MDN URLs into human-friendly
slugs.

I won't continue in that tangent because I'd like to clarify priorities
before discussing one avenue.

Next up.

Questions we may want to answer before going further down this discussion.

--
*Renoir Boulanger*
Web Browser Compatibility Data Lead
*Mozilla Corporation*

Renoir Boulanger

unread,
Nov 25, 2015, 3:37:44 PM11/25/15
to dev-mdn, dev-mdc
Let's continue this discussion about slugs and requirements.

Instead of me talking about how we could do it, something I’m keen to do,
let’s dive into what we may or may not want.

The technical solution will impose itself once we get an agreement.

My objective is to clarify untold or silent requirements so that we can
validate, or reject them, in the clear.

I carefully picked the use-cases below because I couldn't find a clear
decision made about them.

The four first are from the perspective of BC when used within Kuma (MDN);
the last one is from BC’s perspective and the contract to the web we could
make.



Please answer yes/no to the following;

A) I want to be able to tell manually which compatibility table to display
on a given page. Regardless of where it is in MDN.

B) I want to be able to read, and understand, the referencing string in
the Kuma macro (i.e. not an opaque string or number)

C) I want to be able to visualize the relationship between features (e.g.
{{EmbedCompatTable("/css/properties/background/origin")}} for
background-origin support feature for background CSS property)

D) I want Kuma to be able to "guess" what table to display based on the
MDN URL (i.e. in ".../Web/CSS/background", use {{EmbedCompatTable}}, will
show background CSS compatibility table)

E) As BC, the Web Application and API, I want to allow clients to be able
to recreate any identifiers systematically (i.e. like GitHub does with git
commit hashes, and how we can view the commit on GitHub)



Once we figured out those requirements, we'll be able to define a set of
rules and mechanism to enforce our choices.

Notice that the impact of disagreeing with "E", would require we make "at
insert time" ID. That ID could be a number or generated UUID, but in both
cases still be an identifier that would be unique to the deployment. With
this design decision, we wouldn't be able to sync data between a staging
instance hosting data we want to merge into production. Furthermore, what
if another entity maintains a CSS properties compatibility-data system
running BC, and **DON’T IMPORT** from Mozilla/MDN backup/database dumps.

This situation may not be fun to think of, but we could have in our design
a way to pull data between systems instead of creating a technical walled
garden.--

Sebastian Zartner

unread,
Nov 25, 2015, 4:17:10 PM11/25/15
to Renoir Boulanger, dev-mdc, dev-mdn
I'll answer those questions with yes and no. Though some of them probably
require further clarification, so let me know if I should explain them.

On 25 November 2015 at 21:36, Renoir Boulanger <ren...@mozilla.com> wrote:

> Please answer yes/no to the following;
>
> A) I want to be able to tell manually which compatibility table to display
> on a given page. Regardless of where it is in MDN.
>

Yes.


> B) I want to be able to read, and understand, the referencing string in
> the Kuma macro (i.e. not an opaque string or number)
>

Yes.

C) I want to be able to visualize the relationship between features (e.g.
> {{EmbedCompatTable("/css/properties/background/origin")}} for
> background-origin support feature for background CSS property)
>

No.


> D) I want Kuma to be able to "guess" what table to display based on the
> MDN URL (i.e. in ".../Web/CSS/background", use {{EmbedCompatTable}}, will
> show background CSS compatibility table)
>

Yes.


> E) As BC, the Web Application and API, I want to allow clients to be able
> to recreate any identifiers systematically (i.e. like GitHub does with git
> commit hashes, and how we can view the commit on GitHub)
>

Yes.

Sebastian

Renoir Boulanger

unread,
Nov 27, 2015, 6:58:19 PM11/27/15
to dev-mdn, dev-mdc, John Whitlock, Sebastian Zartner
Now that we’ve gone through requirements from an MDN contributor
perspective let’s go through what John and I discussed.

@John, please correct me if I'm wrong. I’m persisting notes here to
improve understanding of the system; I may be wrong on some aspects.

Hopefully, this will answer technical constraints, and answer
requirements and, most importantly, make us all on the same page.

The following is assuming that Sebastian’s answers are representative
of others stakeholders. I’m doing this because I can’t wait to get
more feedback and feels similar to what I’ve deduced from
conversations I had so far.


# Recap of the requirements shared earlier this week

I’ve taken the liberty to comment what may be the impacts on the
system if we follow this course.

I’ll use them as a support what John and I discussed earlier this wee.

<aside>Entries below are archived in [MDN BrowserCompat Backlog][1]
spreadsheet.</aside>


> A) I want to be able to tell manually which
> Compatibility table to display on a given
> page. Regardless of where it is in MDN.

YES — Now archived as Backlog item #58



> B) I want to be able to read, and understand, the
> referencing string in the Kuma macro (i.e. not
> an opaque string or number)

YES — Now archived as Backlog item #59



> C) I want to be able to visualize the relationship
> between features (e.g. {{EmbedCompatTable
> ("/css/properties/background/origin")}} for
> background-origin support feature for
> background CSS property)

NO — Now archived as Backlog item #57

I feel uneasy about not prioritizing #57. It means we’ll have to set
an aliasing system in place. Not a bad thing in itself.

The reason I’m uneasy is that if we were to make something that looks
like a URL or a directory tree, we should clean up the data and
fulfill the metaphor.

Let’s see later, when we have more feedback, about whether we should,
or not take #57 into account at all.



> D) I want Kuma to be able to "guess" what table to
> display based on the MDN URL (i.e. in
> ".../Web/CSS/background", use {{EmbedCompatTable}},
> will show background CSS compatibility table)

YES — Now archived as Backlog item 60

I haven’t had a chance to discuss with John about this aspect yet.



> E) As BC, the Web Application and API, I want
> to allow clients to be able to recreate any identifiers
> systematically (i.e. like GitHub does with git
> commit hashes, and how we can view the commit
> on GitHub)

YES — Now archived as Backlog item #61

Making a hard bind between a human-friendly slug system and an API
identifier can’t happen. If we want to allow to rewrite them as the
resulting hash will also change.

The way I saw this requirement was as part of #57.

The heart of the issue that started this thread is about the
limitation of what we can use in a Kuma macro (e.g.
{{EmbedCompatTable('foo')}}) **because** we were using it directly as
a BC API call (e.g. `?slug=foo`).

Having (#57 false, and #61 true) made me revise the position I had
earlier this week;

> (...) the impact of disagreeing with #61, would require we make
> "at insert time" ID. That ID could be a number or generated UUID,
> but in both cases still get an identifier that would be unique to each
> BC deployment. With this design decision, we wouldn't be able to
> sync data between a staging instance hosting data we want to merge
> into production. (...)

This would be the case where it’s maybe best we have unique
identifiers at insert time for BC API and a way to map slugs to the
opaque identifier. An aliasing system of sorts.

All of this to say that making an insert-time unique identifier, such
as an opaque string, a number, or a UUID on BC API side, isn’t a bad
idea after all.

As long as (#58, #59, #60) are fulfilled and make MDN Contributors
don’t have to deal with this complexity, or fill a page with cryptic
or ambiguous code.


[1]: <http://bit.ly/mdn-browsercompat-backlog>

--
Renoir Boulanger
Web Browser Compatibility Data Lead
Mozilla Corporation

Stephanie Hobson

unread,
Nov 30, 2015, 4:32:16 PM11/30/15
to Sebastian Zartner, dev-mdc, dev-mdn
Ali asked me to look at the email thread and weigh in and I have a new
proposal for how the importer could auto-generate the "slugs".[5]

At the BC meeting on Tuesday we agreed we need to pass the macro a
parameter:
- so it is robust to page moves
- so it is robust to translation (URLs change with translation)
- so we can embed the same table on multiple pages

I think we can continue with the original idea of using slugs. Renoir's
requirements questions and other discussion on this thread I'll add:
- this parameter cannot change (because we are relying on it)
- a human readable slug will reduce errors and increase efficiency
- the most important information is at the *end* of the MDN URL[1]
- we don't need to encode section information in the slug, it is available
through the API[2]

I propose we generate slugs by:
1) starting with the last two segments[3] of the MDN path
2) lowercase them
3) replace anything that isn't a letter or a number with a underscore
4) collapse multiple underscore down to one
5) strip underscores from beginning and end
6) look to see if there is a conflict with an existing slug
if there is (estimated 39 current cases):
7) add a underscore to the end of the slug
8) sanitize the first segment of the URL after "docs/Web" and add it to the
end too

Examples:

/Web/JavaScript/Reference/Global_Objects/RegExp ->
regexp_global_objects

/Web/JavaScript/Reference/Global_Objects/NumberFormat ->
numberformat_global_objects

/Web/JavaScript/Reference/Global_Objects/NumberFormat/resolvedOptions ->
resolvedoptions_numberformat

/Web/CSS/display ->
display_css

/Web/CSS/background-color ->
background_color_css

/Web/CSS/CSS_Flexible_Box_Layout/Using_CSS_flexible_boxes ->
using_css_flexible_boxes_css_flexible_box_layout

/Web/CSS/min-width ->
min_width_css

/Web/CSS/@viewport/min-width ->
min_width_viewport

Web/HTML/Element/a ->
a_element_html

Web/SVG/Element/a ->
a_element_svg

Based on an export of all MDN /web/ URLs this gives us fewer than:
- 6 remaining naming conflicts
- 64 slugs over 50 characters (I suspect most of these pages do not have
compat tables)
I suggest we solve these manually.

Here is a spreadsheet with examples[6] (it uses dashes instead of
underscores[4]):
https://docs.google.com/spreadsheets/d/183GTY9Iq6uG2uN3QTTSUvTvDpiIhj1nOuYJRH7GG0TA/edit?usp=sharing

This would give us a human readable, guessable from the URL, unchanging,
meaningful slug to use in the macros.

Thanks,
Stephanie.


[1] This is not my insight, but I can't find where it was said to provide
attribution.
[2] Sebastian confirmed this in response to Renoir's requirements.
[3] Using the first alone results in 2009 naming conflicts, this is easy to
resolve but means that there are 2009 cases where someone cannot guess the
URL correctly first try, compared to 39 if we go straight to two segments.
[4] Underscores are better because they are easier to copy and paste.
[5] https://xkcd.com/927/
[6] the algorithm I used to generate the spreadsheet actually applies steps
7 and 8 to both matches when there is a URL conflict but I think that is
unnecessarily complex, and not feasible going forward in time.


On Wed, Nov 25, 2015 at 1:16 PM, Sebastian Zartner <
sebastia...@gmail.com> wrote:

> I'll answer those questions with yes and no. Though some of them probably
> require further clarification, so let me know if I should explain them.
>
> On 25 November 2015 at 21:36, Renoir Boulanger <ren...@mozilla.com>
> wrote:
>
> > Please answer yes/no to the following;
> >
> > A) I want to be able to tell manually which compatibility table to
> display
> > on a given page. Regardless of where it is in MDN.
> >
>
> Yes.
>
>
> > B) I want to be able to read, and understand, the referencing string in
> > the Kuma macro (i.e. not an opaque string or number)
> >
>
> Yes.
>
> C) I want to be able to visualize the relationship between features (e.g.
> > {{EmbedCompatTable("/css/properties/background/origin")}} for
> > background-origin support feature for background CSS property)
> >
>
> No.
>
>
> > D) I want Kuma to be able to "guess" what table to display based on the
> > MDN URL (i.e. in ".../Web/CSS/background", use {{EmbedCompatTable}}, will
> > show background CSS compatibility table)
> >
>
> Yes.
>
>
> > E) As BC, the Web Application and API, I want to allow clients to be able
> > to recreate any identifiers systematically (i.e. like GitHub does with
> git
> > commit hashes, and how we can view the commit on GitHub)
> >
>
> Yes.
>
> Sebastian

Renoir Boulanger

unread,
Nov 30, 2015, 10:56:41 PM11/30/15
to Stephanie Hobson, dev-mdc, dev-mdn, Sebastian Zartner
On Mon, Nov 30, 2015 at 4:31 PM, Stephanie Hobson <sho...@mozilla.com> wrote:
> Ali asked me to look at the email thread and weigh in and I have a new
> proposal for how the importer could auto-generate the "slugs".[5]
>
> At the BC meeting on Tuesday we agreed we need to pass the macro a
> parameter:
> - so it is robust to page moves
> - so it is robust to translation (URLs change with translation)
> - so we can embed the same table on multiple pages
>
> I think we can continue with the original idea of using slugs. Renoir's
> requirements questions and the other discussion on this thread I'll add:
> - this parameter cannot change (because we are relying on it)


I have a proposal for that.

(More below)


> - a human readable slug will reduce errors and increase efficiency
> - the most important information is at the *end* of the MDN URL[1]
> - we don't need to encode section information in the slug, it is available
> through the API[2]


Unless we want to allow Non-ASCII Aalphanumeric characters in slugs
for KumaScript macros, but be able to have them as valid BC API URL.

(More below)


> I propose we generate slugs by:
> 1) starting with the last two segments[3] of the MDN path
> 2) lowercase them
> 3) replace anything that isn't a letter or a number with a underscore
> 4) collapse multiple underscore down to one
> 5) strip underscores from beginning and end
> 6) look to see if there is a conflict with an existing slug
> if there is (estimated 39 current cases):
> 7) add a underscore to the end of the slug
> 8) sanitize the first segment of the URL after "docs/Web" and add it to the
> end too


I like this slug proposal, except two details I outlined above;

- the use of underscore as categorization separator
- using the slug as a direct identifier, we may want to rename them

My opinion is that it would be simpler if we map a "to replace"
pattern to only one replacement.

Since the underscore is already used as a replacement for one or more
space, we can't use it.

For separation, how about we allow the slash back?

The original reason we said we shouldn't use slash was that it's
invalid ASCII Alphanumeric *because* we use the string we use in
KumaScript macro, and append directly to BC API.

If we allow the slash back, we'll need a way to convert the slug to
something ASCII Alphanumeric. We can use a hashing mechanism to solve
that problem.

In addition to Stephanie's —minus underscore as separator, here’s my proposal;

* Use John's idea of at "insert time" identifier with an UUID
(0000-0000-0000) that will be immutable.
* Use slug as alias within MDN for macros
* When we make a call to BC from Kuma, it'll make the macro argument
to its SHA1 representation

All of this could prepare ground for a desirable use-case:

AA
External BC relier,
IWT be able to
read a copy of the current data from the filesystem

In not so distant future, we'll be able to let future client systems
(e.g. Atom IDE), we could allow to download the data as a directory
with JSON files. By doing so, we can save our future-selves getting
hammered by HTTP requests.

Since we would already have slash as separator, we could support data
snapshot by dumping smaller JSON files in a matching directory
hierarchy.")

A call to BC API, when we know the item's ID (UUID, or anything
opaque) would look like:

https://browsercompat.org/api/v1/features/0000-0000-0000-0000

This identifier MUST *never* change.

As for the slugs, remember that it's not for human search, we should
see it as an aliasing system for KumaScript so we don't need to force
reliers to store a list of "this feature name" to "an opaque"
identifier. We'll have a BC API frontend and most likely a search
engine anway.

>From KumaScript, if a user uses a macro call like this

{{EmbedCompatTable("/css/border-radius")}}

When Kuma run update, it'll make a BC API call and replace the
argument by a hash —solving for us the ASCII Alphanumeric issue;

https://browsercompat.org/api/v1/features/?hash=29d0c2214b12ebb2acb15c4ea152a5ced7e9b4c9

If you want to see how I'm suggesting the slug patterns, take a look
at [1]. It's written in JavaScript and you can fork and make your own
replacement proposal.


[1]: <http://renoirb.com/prototypes/2015/browsercompat/slug-scheme.html>

--
Renoir Boulanger
Web Browser Compatibility Data Lead
Mozilla Corporation

Check my availability: <http://calendly.com/renoirb>
Appear.in: <https://appear.in/renoirb>
Telegram: <https://telegram.me/renoirb>

Sebastian Zartner

unread,
Dec 1, 2015, 2:25:49 AM12/1/15
to Stephanie Hobson, dev-mdc, dev-mdn
Well elaborated, Stephanie!

On 30 November 2015 at 22:31, Stephanie Hobson <sho...@mozilla.com> wrote:

> Ali asked me to look at the email thread and weigh in and I have a new
> proposal for how the importer could auto-generate the "slugs".[5]
>
> At the BC meeting on Tuesday we agreed we need to pass the macro a
> parameter:
> - so it is robust to page moves
> - so it is robust to translation (URLs change with translation)
> - so we can embed the same table on multiple pages
>

I'm still not very happy to *require* to pass the slug as a parameter. I'd
have kept it optional, because it makes things for authors easier on pages
where the slug normally doesn't change, i.e. all API pages. Though I can
live with that.


> I think we can continue with the original idea of using slugs. Renoir's
> requirements questions and other discussion on this thread I'll add:
> - this parameter cannot change (because we are relying on it)
> - a human readable slug will reduce errors and increase efficiency
> - the most important information is at the *end* of the MDN URL[1]
> - we don't need to encode section information in the slug, it is available
> through the API[2]
>

Just a note: While you mention this, your algorithm still includes the
section in many cases.


> I propose we generate slugs by:
> 1) starting with the last two segments[3] of the MDN path
>

Here the step is missing that switches the segments.

2) lowercase them
> 3) replace anything that isn't a letter or a number with a underscore
> 4) collapse multiple underscore down to one
> 5) strip underscores from beginning and end
> 6) look to see if there is a conflict with an existing slug
>

I assume we need to refer to the URL path here, not the slug. Otherwise one
of the resulting slugs would be smaller.


> if there is (estimated 39 current cases):
> 7) add a underscore to the end of the slug
> 8) sanitize the first segment of the URL after "docs/Web" and add it to
> the end too
>
> Examples:
>
> /Web/JavaScript/Reference/Global_Objects/RegExp ->
> regexp_global_objects
>
> /Web/JavaScript/Reference/Global_Objects/NumberFormat ->
> numberformat_global_objects
>
> /Web/JavaScript/Reference/Global_Objects/NumberFormat/resolvedOptions ->
> resolvedoptions_numberformat
>
> /Web/CSS/display ->
> display_css
>
> /Web/CSS/background-color ->
> background_color_css
>
> /Web/CSS/CSS_Flexible_Box_Layout/Using_CSS_flexible_boxes ->
> using_css_flexible_boxes_css_flexible_box_layout
>
> /Web/CSS/min-width ->
> min_width_css
>
> /Web/CSS/@viewport/min-width ->
> min_width_viewport
>
> Web/HTML/Element/a ->
> a_element_html
>
> Web/SVG/Element/a ->
> a_element_svg
>

It actually requires to get used to see the last two sections switched,
especially in the last two examples it's a bit strange to read.

Based on an export of all MDN /web/ URLs this gives us fewer than:
> - 6 remaining naming conflicts
> - 64 slugs over 50 characters (I suspect most of these pages do not have
> compat tables)
>

Are those listed in the spreadsheet somewhere? If not, could you please add
them?

Sebastian
0 new messages