Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

What if we designed the L20n syntax from scratch today?

99 views
Skip to first unread message

Staś Małolepszy

unread,
Jul 15, 2015, 1:34:39 PM7/15/15
to tools...@lists.mozilla.org
I made a thought exercise of imaging what it would be like to design the
L20n syntax from scratch, today.

I was guided by the following goals:

- Make the most commonly used syntax easy on the eyes and easy to
understand.
- Improve error recovery and make it easier to use HTML in translations.
- Make it easy to parse complex expressions. They are important for the
advanced features of L20n, but cost a lot in terms of parsing and resolving.
- Allow to use dashes in entity names.
- Make it possible to reference entities by dynamic names unknown at parse
time.
- Make it clearer that translation variants are different permutations of
the same translation data and as such, they are different than attributes
and are not part of the social contract.

I created a sample file with a new proposed syntax. Nothing is set in
stone yet and I'm looking forward to hearing your thoughts about the goals
and the proposal.

https://gist.github.com/stasm/c99010a8ab6d467562ba

Thanks,
-stas

Zibi Braniecki

unread,
Jul 15, 2015, 4:52:12 PM7/15/15
to mozilla-t...@lists.mozilla.org
Cool stuff Stas!

On Wednesday, July 15, 2015 at 10:34:39 AM UTC-7, Staś Małolepszy wrote:
> I made a thought exercise of imaging what it would be like to design the
> L20n syntax from scratch, today.
>
> I was guided by the following goals:
>
> - Make the most commonly used syntax easy on the eyes and easy to
> understand.

My only concern with regards to using '{', '}' for expanders is that it's much more common to use it in a string than '{{', '}}'. Which means that now you have to escape a normal character because of its special function in L20n.

Not sure how to evaluate how much of a problem it is. I hate escaping characters, but I believe that '{' character is indeed rarely used in regular text.

I searched through our translations and what comes to mind are:

- CSS in L10n will have to be unescaped
- JS Template strings will have to be unescaped because they use `Hello, ${user.name}` notation
- Some of our build tools use ${AB-CD}
- In one place Gaia Email app uses {name}
https://github.com/mozilla-b2g/gaia/blob/1ee07b1bf55894dc7bad8e4a2e31d77c59cf4106/apps/email/locales/email.en-US.properties#L664

Overall, it doesn't look bad. I'm a bit worried about overlapping syntax with ES6 Templates.

> - Improve error recovery and make it easier to use HTML in translations.

Here I feel like we're fixating on HTML. If L20n syntax is supposed to be language agnostic and useful for different environments, then fixating on making it convinient for HTML may be risky. How will it work for Rust? Python? Jinja? CSS?

I'm also of course curious how will Hashes look like.

> - Make it easy to parse complex expressions. They are important for the
> advanced features of L20n, but cost a lot in terms of parsing and resolving.

I don't like this change.

First, I believe that we should avoid naming pollution which I believe we should achieve by using namespaces.

@intl.*
@cldr.*
@gaia.*

On top of that, I'm don't think that lisp style expressions are helping here.
I've been using lisp style expressions in multiple environments and I always felt like they're alien compared to classic C syntax.

First, I believe it's harder to notice an error and harder to create a useful recovery message. Consider:

`callExpr param` - all possible combinations of mistakes - `callExpr,param`, 'callExpr.param`, etc.
`callExpr(param)` - we can easily report to the user sth like `callExpr[param)` and suggest a solution

Secondly, by far the language that localizers will know the most is JavaScript or some expression subset of CSS3/CSS4 (which follows JS syntax). Lowering entry barrier by presenting them familiar syntax to code the may know even if they only know a little is a huge benefit.

While it may be a matter of habit and preference, I believe that imposing something likely less familiar to localizers for disputable benefit is a wrong design decision.

As a mental experiment, design polish cldr.plural macro with your syntax and show it to a sample of localizers compared to current L20n expression syntax one.
What do you think they'll likely find more familiar? Where do you think they'll likely make more mistakes in?

> - Allow to use dashes in entity names.

I would not claim it to be a major win especially since it would require localizers to switch between JS syntax and lips syntax for things like `{- email-count emails-total}`

Also, teaching people that a variable may be `email` and a variable may be `count` and that we do have substration operator `-` but that `email-count` is not an expression but an entity ID is imho very confusing.

> - Make it possible to reference entities by dynamic names unknown at parse
> time.

I like this feature, but not sure if there's a real use case for that and if so, I'd expect it to be rare enough that we shouldn't design syntax around it imho. (in other words, {{ :[variable] }} would do)

> - Make it clearer that translation variants are different permutations of
> the same translation data and as such, they are different than attributes
> and are not part of the social contract.

I like it, although I'd like to combine it with globals, which I think are valuable to preserve.

I'm also not sure if ":" is better than "$", especially if we want to use ':' to denote value variants. Because in that narrative that makes entities variants of context?

My suggestion:

- $entity1:masculine
- $entity1.ariaLabel
- @cldr.plural
- user.name

All in all, I'm all in favor of revisiting the syntax and I think it's a good moment for that. I'd love us to get to Gaia devs and localizers and ask them for feedback. I'd love them to tinker with our current syntax and tell us where they have problems and come up with ideas. I think it's a perfect moment for last revision of the syntax and probably the last one before we have to make a call and stabilize it if we want to start using it in time for Firefox OS 2.5.

zb.

Staś Małolepszy

unread,
Jul 15, 2015, 8:49:12 PM7/15/15
to Zibi Braniecki, mozilla-t...@lists.mozilla.org
On Wed, Jul 15, 2015 at 10:52 PM, Zibi Braniecki <
zbigniew....@gmail.com> wrote:

> Cool stuff Stas!
>

Thanks for a super-quick reply! This is fantastic feedback.


> On Wednesday, July 15, 2015 at 10:34:39 AM UTC-7, Staś Małolepszy wrote:
> > I made a thought exercise of imaging what it would be like to design the
> > L20n syntax from scratch, today.
> >
> > I was guided by the following goals:
> >
> > - Make the most commonly used syntax easy on the eyes and easy to
> > understand.
>
> My only concern with regards to using '{', '}' for expanders is that it's
> much more common to use it in a string than '{{', '}}'. Which means that
> now you have to escape a normal character because of its special function
> in L20n.
>

That's the point, actually :) By forcing { and } to be always escaped we
can make sure that literal { } are either meaningful or are errors. We
don't have this luxury with <> now.


>
> I searched through our translations and what comes to mind are:
>
> - CSS in L10n will have to be unescaped
> - JS Template strings will have to be unescaped because they use `Hello,
> ${user.name}` notation
> - Some of our build tools use ${AB-CD}
> - In one place Gaia Email app uses {name}
>
> https://github.com/mozilla-b2g/gaia/blob/1ee07b1bf55894dc7bad8e4a2e31d77c59cf4106/apps/email/locales/email.en-US.properties#L664
>
> Overall, it doesn't look bad. I'm a bit worried about overlapping syntax
> with ES6 Templates.
>

Yeah, good point. I'm not too worried about Bash variables or that one
vvariable in Gaia's Email. I'm not sure about Jinja or es6 templates.
Would they be orthogonal to L20n or complementary? Does it even make sense
to have (escaped) es6 templates in translations?


>
> > - Improve error recovery and make it easier to use HTML in translations.
>
> Here I feel like we're fixating on HTML. If L20n syntax is supposed to be
> language agnostic and useful for different environments, then fixating on
> making it convinient for HTML may be risky. How will it work for Rust?
> Python? Jinja? CSS?
>

This is again a good point and I'm aware that there is world out there
beyond HTML :) That said, I think it's well aligned with Mozilla's mission
to push for HTML as the de facto standard of building localizable UI. And
while I'm not ruling out Rust or Python, I want to focus on making a
localization library for the Web in the first step. If we keep dreaming
about being the ultimate tool for all needs, we risk getting blocked and
not having a ready solution for our first target: the Web. HTML is the
stepping stone.


>
> I'm also of course curious how will Hashes look like.
>

It's in the gist:

; index is defined in []
; hashes are defined as series of hash pairs {}
; the default hash pair is marked with {* }
; hash keys are symbols or strings

{ notifications [(plural n)]
{* one "{ n } notification"}
{many "{ n } notifications"}
{"two words" "{ n } notifications"}}

I went for the hash-pair approach instead of key-value-map one to emphasize
the fact that variants are independent units of translation. They're
different facets of the entity and I wanted to make that clearer with a
syntax that's similar to the syntax of the whole entity.



>
> > - Make it easy to parse complex expressions. They are important for the
> > advanced features of L20n, but cost a lot in terms of parsing and
> resolving.


> I don't like this change.
>

Let's discuss! :)


>
> First, I believe that we should avoid naming pollution which I believe we
> should achieve by using namespaces.
>
> @intl.*
> @cldr.*
> @gaia.*
>

I like namespacing, but 1) I'm afraid of creating spurious hierarchies with
one elemnent and 2) it can be achieved by convention, e.g. intl-datetime.


>
> On top of that, I'm don't think that lisp style expressions are helping
> here.
> I've been using lisp style expressions in multiple environments and I
> always felt like they're alien compared to classic C syntax.
>

Consider this: you can probably write an entire S-expression parser in 100
lines of code. They're super-easy to implement too, which will make the
resolver simpler. We want expressions to support complex logic and in L20n
1.0 we paid for them while arguably they aren't used much. Only a handful
of localizer-engineers will write new expression code. So I think there's
a benefit of providing complex expressions without bundling a complex
implementation. I don't think C-like expressions give us that edge at all.


>
> First, I believe it's harder to notice an error and harder to create a
> useful recovery message. Consider:
>
> `callExpr param` - all possible combinations of mistakes -
> `callExpr,param`, 'callExpr.param`, etc.
> `callExpr(param)` - we can easily report to the user sth like
> `callExpr[param)` and suggest a solution
>

A form always looks the same (operand ...args). There are fewer kinds of
parsing errors. OTOH, we're able to catch more errors in the runtime: e.g.
(entity 1) should return "Invalid argument type passed to &entity".


>
> Secondly, by far the language that localizers will know the most is
> JavaScript or some expression subset of CSS3/CSS4 (which follows JS
> syntax). Lowering entry barrier by presenting them familiar syntax to code
> the may know even if they only know a little is a huge benefit.
>

> While it may be a matter of habit and preference, I believe that imposing
> something likely less familiar to localizers for disputable benefit is a
> wrong design decision.
>

Our C-style expressions are very limited: they're always a single
expression which makes the ternary if the only way to branch, and that's
tedious. I think S-expressions give use a bit more flexibility by allowing
different interpretations of the arguments passed into form, like in (cond
...) which takes an even number of args and pairs them together to create a
multi-branch if-else.

I also noticed that while S-expression can get out of hand in real-world
code, we only mostly need them for arithmetic operations (and possibly
simple string manipulation in the future). And once you get pass RPN,
they're even simpler than C-style syntax! See the next paragraph for code
samples.

>
> As a mental experiment, design polish cldr.plural macro with your syntax
> and show it to a sample of localizers compared to current L20n expression
> syntax one.
> What do you think they'll likely find more familiar? Where do you think
> they'll likely make more mistakes in?
>

I did that! I'm sure I'm not being objective here, but the new syntax
looks leaner to me.

https://gist.github.com/stasm/c99010a8ab6d467562ba#file-plural-new-clj
https://gist.github.com/stasm/c99010a8ab6d467562ba#file-plural-old-php
(also in tinker: http://goo.gl/X4pvwe)


>
> > - Allow to use dashes in entity names.
>
> I would not claim it to be a major win especially since it would require
> localizers to switch between JS syntax and lips syntax for things like `{-
> email-count emails-total}`
>

Why would they switch to JS?


>
> Also, teaching people that a variable may be `email` and a variable may be
> `count` and that we do have substration operator `-` but that `email-count`
> is not an expression but an entity ID is imho very confusing.
>

The subtraction operator can only be found as the callee in form, so it's
really more like an operand or simply a function call: (- a b).


>
> > - Make it possible to reference entities by dynamic names unknown at
> parse
> > time.
>
> I like this feature, but not sure if there's a real use case for that and
> if so, I'd expect it to be rare enough that we shouldn't design syntax
> around it imho. (in other words, {{ :[variable] }} would do)
>

I've noticed this could be helpful when you'd want to dynamically choose an
entity in some other translation: maybe byte units, maybe city names,
maybe something else. People seem to want a data-store-like structure that
they can retrieve things from. Without dynamic references to entities, they
resort to using hash value, which is bad because hashes are for variants of
the same thing, or to attributes, which also wrong because why would these
translations pretend to be metadata of some made-up entity? I think all of
those should be separate private entities and it should be easy to access
them without knowing the identifier a priori.


>
> > - Make it clearer that translation variants are different permutations
> of
> > the same translation data and as such, they are different than attributes
> > and are not part of the social contract.
>
> I like it, although I'd like to combine it with globals, which I think are
> valuable to preserve.
>

I'm keeping globals in form of library-provided symbols! &defmacro or &and
are what we call globals in 1.0 and 2.0. The only convenience is that you
don't have to type the & if you're calling them in an S-expr: (and &true
&true).

I'm also not sure if ":" is better than "$", especially if we want to use
> ':' to denote value variants. Because in that narrative that makes entities
> variants of context?
>

Great point, I had this same doubt myself. I went for : for entity
references to make them visually lightweight but distinct. For variant
access I considered / but I though we might want to use it when we
implement imports, like this:

(import "../other/common.l20n" "common")
{ hello "Hello, { :common/user }." }



> My suggestion:
>
> - $entity1:masculine
> - $entity1.ariaLabel
> - @cldr.plural
> - user.name


How about the... hash? :)

:entity#one#feminine
:entity.aria-label
&cldr-plural (when used not as callee)
(cldr-plural 3)
user.name

Also note that all of these shorthands will require a little bit of special
parsing because they're not S-expressions. We could work around this
problem (at least for now) by only allowing access via dedicated symbols:

(variant :entity "one" "feminine")
(attr :entity "aria-label")
(attr user "name")

Or by making them callable (it's a bit awkward, I'll admit, but simple!):

(#feminine (#one :entity))
(.aria-label :entity)
(.name user)

This is btw what Clojure does, too. It works because behind the scenes,
keywords and property accessors (.prop) implement the IFn protocol and they
can look themselves up in the argument if called as a function.

Last but not least, I'd like to keep @ out of the syntax and use it for
docstrings like @param.


> All in all, I'm all in favor of revisiting the syntax and I think it's a
> good moment for that. I'd love us to get to Gaia devs and localizers and
> ask them for feedback. I'd love them to tinker with our current syntax and
> tell us where they have problems and come up with ideas. I think it's a
> perfect moment for last revision of the syntax and probably the last one
> before we have to make a call and stabilize it if we want to start using it
> in time for Firefox OS 2.5.
>
>
Yay! Let's keep the ball rolling. I think we learned a lot from previous
attempts and from Gaia, but there's also the uncharted territory of the
untapped potential, so to speak.

Thanks again,
-stas

Zibi Braniecki

unread,
Jul 16, 2015, 2:31:34 AM7/16/15
to mozilla-t...@lists.mozilla.org
On Wednesday, July 15, 2015 at 5:49:12 PM UTC-7, Staś Małolepszy wrote:
> That's the point, actually :) By forcing { and } to be always escaped we
> can make sure that literal { } are either meaningful or are errors. We
> don't have this luxury with <> now.

Yeah, which in turn may lead to a confusing state between unclosed expander vs. unclosed entity. But I'm ok with experimenting with that!

> Yeah, good point. I'm not too worried about Bash variables or that one
> vvariable in Gaia's Email. I'm not sure about Jinja or es6 templates.
> Would they be orthogonal to L20n or complementary? Does it even make sense
> to have (escaped) es6 templates in translations?

I don't think it does. But the developer dealing with es6 templates right next to L20n overlays may be confused. I'm also wondering what will happen with angular/react templates in the context of l20n.

> This is again a good point and I'm aware that there is world out there
> beyond HTML :) That said, I think it's well aligned with Mozilla's mission
> to push for HTML as the de facto standard of building localizable UI.

Good point. I think that convinces me.

> It's in the gist:
>
> ; index is defined in []
> ; hashes are defined as series of hash pairs {}
> ; the default hash pair is marked with {* }
> ; hash keys are symbols or strings
>
> { notifications [(plural n)]
> {* one "{ n } notification"}
> {many "{ n } notifications"}
> {"two words" "{ n } notifications"}}

Ok, then we have the same open-close character for entities, hash values and expanders. That's worrying.

> I went for the hash-pair approach instead of key-value-map one to emphasize
> the fact that variants are independent units of translation. They're
> different facets of the entity and I wanted to make that clearer with a
> syntax that's similar to the syntax of the whole entity.

I don't think I like it. Once again, the most likely comparison people will have is JS Hash Values and I think it's a good one. I prefer

{notifications[plural(n)] {
*one: "{ n } notifications",
many: "{ n } notifications",
other: "{ n } notifications"
}}

> I like namespacing, but 1) I'm afraid of creating spurious hierarchies with
> one elemnent and 2) it can be achieved by convention, e.g. intl-datetime.

I believe that that's exactly the argument that PHP authors used and that led them to an API mess.

I'm much happier with one-element namespace, then flat list of namespace-like conventions with strpos, strslice, intl-numberformat and cldrPlural.

> Consider this: you can probably write an entire S-expression parser in 100
> lines of code. They're super-easy to implement too, which will make the
> resolver simpler.

I would prefer us not to design a challenging piece of L20n around how easy it is to implement, but how easy it is to learn and use.

> We want expressions to support complex logic and in L20n 1.0 we paid for them while arguably they aren't used much.

Which is ok. We now introduce them in 3.x slowly and see what we use. I believe that the original scope of the expression syntax was designed to express plural macros and I think it's a good limit.

> Only a handful of localizer-engineers will write new expression code. So I think there's a benefit of providing complex expressions without bundling a complex implementation.

I don't believe we know that. I don't think we got to the point where we know how commonly localizers will use expressions and my gut feeling is that they will use it more often then we assume.

And as I said before, I would prefer us not to focus on implementation cost when we're working on design.

> I don't think C-like expressions give us that edge at all.

And I don't think that that's the edge we should be focusing on.

> A form always looks the same (operand ...args). There are fewer kinds of
> parsing errors. OTOH, we're able to catch more errors in the runtime: e.g.
> (entity 1) should return "Invalid argument type passed to &entity".

how is it better than entity[1] or entity(1) ?

> Our C-style expressions are very limited: they're always a single
> expression which makes the ternary if the only way to branch, and that's
> tedious.

I don't understand your point here.

> I think S-expressions give use a bit more flexibility by allowing
> different interpretations of the arguments passed into form, like in (cond
> ...) which takes an even number of args and pairs them together to create a
> multi-branch if-else.

I don't think that we need more flexibility and definitely I don't think that making looser syntax that leave more to the interpretation is going to get us less errors.

> I did that! I'm sure I'm not being objective here, but the new syntax
> looks leaner to me.

I believe that you are not objective here. It's a matter of taste but you're introducing a lot of esoteric tokens like &true, mod, or, and (which look like variable), while at the same time trying to minimize the number of special characters.

I'm going to even skip the fact that the way we teach people math is by teaching them "a + b" not "+ a b", so the former is by design more recognizable. I believe that your approach is going to significantly increase the entry barrier and basically make the code harder to maintain when new people will try to work with pre-existing localizations.

I also feel like we're going to be less "webby" because all web technologies use C-like notation for arthmetic operations and JS-Object style notation for Hashes (see CSS, JS, JSON).

> > I would not claim it to be a major win especially since it would require
> > localizers to switch between JS syntax and lips syntax for things like `{-
> > email-count emails-total}`
> >
>
> Why would they switch to JS?

Sorry, I means developers here. I just don't think that lack of '-' in ID is a problem. CSS doesn't use them, JS doesn't use them. Seems like web technologies do just fine without it.

> The subtraction operator can only be found as the callee in form, so it's
> really more like an operand or simply a function call: (- a b).

Well substraction operator may also be found in the middle of the ID in your example.

> I've noticed this could be helpful when you'd want to dynamically choose an
> entity in some other translation: maybe byte units, maybe city names,
> maybe something else. People seem to want a data-store-like structure that
> they can retrieve things from. Without dynamic references to entities, they
> resort to using hash value, which is bad because hashes are for variants of
> the same thing, or to attributes, which also wrong because why would these
> translations pretend to be metadata of some made-up entity? I think all of
> those should be separate private entities and it should be easy to access
> them without knowing the identifier a priori.

I'm really worries about a scenario where people try to pass values from the user to l20n and resolve entity with that ID. I like the idea that the developer defines and calls ID names.

And, as I said, we can do this perfectly fine without lisp expressions.

> I'm keeping globals in form of library-provided symbols! &defmacro or &and
> are what we call globals in 1.0 and 2.0. The only convenience is that you
> don't have to type the & if you're calling them in an S-expr: (and &true
> &true).

That's even more complicated and confused I believe.

Also, I much prefer to have `and` as part of the expression syntax than a global call.

> How about the... hash? :)
>
> :entity#one#feminine

I think I prefer $entity:one:feminine over that.

I also prefer :entity:one:feminine. But I believe that the opening ':' is confusing.

> :entity.aria-label
> &cldr-plural (when used not as callee)
> (cldr-plural 3)

Definitely prefer @cldr.plural(3).

I can imagine going back to your idea of special-casing name `global` or `env`. and then env.cldr.plural instead of '@'.

> Also note that all of these shorthands will require a little bit of special
> parsing because they're not S-expressions. We could work around this
> problem (at least for now) by only allowing access via dedicated symbols:

I think that we should discuss pieces of your proposal separately instead of building one of top of another to avoid the illusion that it's "all-or-nothing" scenario. I like many of your changes, I don't think that your expression syntax proposal is a good change. I'd prefer to discuss those things without false implications.

> Last but not least, I'd like to keep @ out of the syntax and use it for
> docstrings like @param.

I'm not sure how special characters in a comment collide with expression syntax. In docstrings, the '@id' are only meaningful if they are at the beginning of the line preceeded by whitespaces. We won't have any case like that with expressions.

zb.

Staś Małolepszy

unread,
Jul 16, 2015, 7:28:40 AM7/16/15
to Zibi Braniecki, mozilla-t...@lists.mozilla.org
On Thu, Jul 16, 2015 at 8:31 AM, Zibi Braniecki <
zbigniew....@gmail.com> wrote:

> On Wednesday, July 15, 2015 at 5:49:12 PM UTC-7, Staś Małolepszy wrote:
> > That's the point, actually :) By forcing { and } to be always escaped we
> > can make sure that literal { } are either meaningful or are errors. We
> > don't have this luxury with <> now.
>
> Yeah, which in turn may lead to a confusing state between unclosed
> expander vs. unclosed entity. But I'm ok with experimenting with that!
>

I think expander vs. entity are fine, but you made me realize that there's
a problem with unclosed entities vs. opening of hash variants. We need a
better way for the parser to find the beginning of the next entity if the
previous one has not been closed. Back to the drawing board for hashes :)


>
> > It's in the gist:
> >
> > ; index is defined in []
> > ; hashes are defined as series of hash pairs {}
> > ; the default hash pair is marked with {* }
> > ; hash keys are symbols or strings
> >
> > { notifications [(plural n)]
> > {* one "{ n } notification"}
> > {many "{ n } notifications"}
> > {"two words" "{ n } notifications"}}
>
> Ok, then we have the same open-close character for entities, hash values
> and expanders. That's worrying.
>

See above. I need to change this part of the proposal. Thanks for
pointing it out.


>
> > I went for the hash-pair approach instead of key-value-map one to
> emphasize
> > the fact that variants are independent units of translation. They're
> > different facets of the entity and I wanted to make that clearer with a
> > syntax that's similar to the syntax of the whole entity.
>
> I don't think I like it. Once again, the most likely comparison people
> will have is JS Hash Values and I think it's a good one. I prefer
>
> {notifications[plural(n)] {
> *one: "{ n } notifications",
> many: "{ n } notifications",
> other: "{ n } notifications"
> }}
>

I don't think we should mimic JS code too much. There value in using a
different syntax; we have different semantics.


>
> > I like namespacing, but 1) I'm afraid of creating spurious hierarchies
> with
> > one elemnent and 2) it can be achieved by convention, e.g. intl-datetime.
>
> I believe that that's exactly the argument that PHP authors used and that
> led them to an API mess.
>

PHP is a full-on language which was used to build Facebook. Please tell me
you don't want to build the next Facebook in L20n ;)


> I'm much happier with one-element namespace, then flat list of
> namespace-like conventions with strpos, strslice, intl-numberformat and
> cldrPlural.
>

OK, how about the slash character which I used in my previous email when I
mentioned imports? Note this is just part of the name, not a new kind of
expression.

(cldr/plural n)
(intl/format-date datetime)


>
> > Consider this: you can probably write an entire S-expression parser in
> 100
> > lines of code. They're super-easy to implement too, which will make the
> > resolver simpler.
>
> I would prefer us not to design a challenging piece of L20n around how
> easy it is to implement, but how easy it is to learn and use.
>

I think that Lisp is easy to learn because it's so simple. So it's not
only on the merits of the simplicity of implementation that I suggest we
use it.

However, I think much of our discussion here boils down to who likes
S-expressions and who doesn't. We might want other people to weigh in. It
would also make sense (as you suggest later) to see what we could take from
this proposal that we both agree on and see what depends on the fact that I
used Lisp for expressions (like dashes in names). I tried to compile such a
list at the end of this email.


> > We want expressions to support complex logic and in L20n 1.0 we paid
> for them while arguably they aren't used much.
>
> Which is ok. We now introduce them in 3.x slowly and see what we use. I
> believe that the original scope of the expression syntax was designed to
> express plural macros and I think it's a good limit.
>

Why is it OK? L20n.js already is 70kb of JavaSciprt code, 30kb minified.
It's huge. If we want people to use it at all we should be careful about
adding more code to it.


>
> > Only a handful of localizer-engineers will write new expression code.
> So I think there's a benefit of providing complex expressions without
> bundling a complex implementation.
>
> I don't believe we know that. I don't think we got to the point where we
> know how commonly localizers will use expressions and my gut feeling is
> that they will use it more often then we assume.
>

I'd like to share that gut feeling but given how we haven't come up with UI
concepts for L20n in tools like Pootle tells me that it's not going to be
soon that we'll have all localizers writing expressions. I don't want to
optimize our syntax for a rare use-case of writing expressions. I think
the gist of my thinking is to base expressions on something which is proven
to work and easy to learn and implement. Remember v1's compiler and all
the different types of functions it created to be secure? I don't want to
repeat that.


> > A form always looks the same (operand ...args). There are fewer kinds of
> > parsing errors. OTOH, we're able to catch more errors in the runtime:
> e.g.
> > (entity 1) should return "Invalid argument type passed to &entity".
>
> how is it better than entity[1] or entity(1) ?
>

I think it's better suited for our minimal system because it's just one
kind of syntax that we need to support.

(operand ...args)

In the current http://l20n.github.io/spec/grammar.html you can make a
mistake on every level of the expression syntax. In conditional expr you
can forget the colon. In logic expr you can omit one operand. In call
expr you can mix parens. In computed property and attribute exprs you can
mix brackets. There's just so many permutations of errors and we need to
support all of them.


>
> > Our C-style expressions are very limited: they're always a single
> > expression which makes the ternary if the only way to branch, and that's
> > tedious.
>
> I don't understand your point here.
>

See https://gist.github.com/stasm/c99010a8ab6d467562ba#file-plural-old-php
for how the complex branching needs to be implemented in one single
expression using three ternary ifs. I think that's a limitation of the
current C-like syntax.


>
> > I think S-expressions give use a bit more flexibility by allowing
> > different interpretations of the arguments passed into form, like in
> (cond
> > ...) which takes an even number of args and pairs them together to
> create a
> > multi-branch if-else.
>
> I don't think that we need more flexibility and definitely I don't think
> that making looser syntax that leave more to the interpretation is going to
> get us less errors.
>

We can move part of the error reporting to runtime which is usually more
informative. We can also run static analysis on parse time (which is easy
with S-expressions because code is data) and possibly detect problems
before runtime.


>
> > I did that! I'm sure I'm not being objective here, but the new syntax
> > looks leaner to me.
>
> I believe that you are not objective here. It's a matter of taste but
> you're introducing a lot of esoteric tokens like &true, mod, or, and (which
> look like variable), while at the same time trying to minimize the number
> of special characters.
>

I think this comes down to teaching users that the opening paren is always
special and calls system functions (unless you call a macro with
(:my-plural ..)). Note that you can't call variables so it's not
ambiguous. If we had a symbol for calling other symbols, the following
would have the same meaning:

(cldr-plural n)
(call &cldr-plural n)


>
> I'm going to even skip the fact that the way we teach people math is by
> teaching them "a + b" not "+ a b", so the former is by design more
> recognizable. I believe that your approach is going to significantly
> increase the entry barrier and basically make the code harder to maintain
> when new people will try to work with pre-existing localizations.
>

Looks like you didn't skip it all ;) I don't think I want people to be
adding often in L20n. What's the use case? Even though I'd like to allow
full expression in placeables, I don't think we should tell localizers to
start doing arithmetic inside of translations. I expect expressions to be
mostly used in macros and I expect macros to be written by tech-savvy
localizers who can easily grasp RPN.



> I also feel like we're going to be less "webby" because all web
> technologies use C-like notation for arthmetic operations and JS-Object
> style notation for Hashes (see CSS, JS, JSON).
>

I think the definition of "webby" isn't very clear and may mean different
things to different people. I agree that RPN is non-standard on the client
side.


Sorry, I means developers here. I just don't think that lack of '-' in ID
> is a problem. CSS doesn't use them, JS doesn't use them. Seems like web
> technologies do just fine without it.
>

CSS very much uses dashes. A lot of dashes everywhere! In ids, class
names, property names, variables, you name it. Which in fact is my
inspiration! I want to be be able to do this:

<p id="hello-word" l10n-id="hello-world"></p>

and not this:

<p id="hello-word" l10n-id="helloWorld"></p>


> > The subtraction operator can only be found as the callee in form, so it's
> > really more like an operand or simply a function call: (- a b).
>
> Well substraction operator may also be found in the middle of the ID in
> your example.
>

In RPN it's not an operator if it's "in the middle". That's the benefit
which I want to reap: make syntax unambiguous.


> Also, I much prefer to have `and` as part of the expression syntax than a
> global call.
>

I know I'm repeating myself, but the whole point of RPN and S-expressions
is to not have operators. Everything is a function call.


> > How about the... hash? :)
> >
> > :entity#one#feminine
>
> I think I prefer $entity:one:feminine over that.
>

I could see this working out quite well, yes. Let me think about it.


> I can imagine going back to your idea of special-casing name `global` or
> `env`. and then env.cldr.plural instead of '@'.
>

Ugh. That's even more hierarchy... :(


> I think that we should discuss pieces of your proposal separately instead
> of building one of top of another to avoid the illusion that it's
> "all-or-nothing" scenario. I like many of your changes, I don't think that
> your expression syntax proposal is a good change. I'd prefer to discuss
> those things without false implications.
>

The consequences of using S-expressions for our use-case are as follows:

- we can use dashes in entity names, like CSS
this-is-a-valid-identifier

- we can use the single colon in references
:brandName

- we can use the question mark in entity and macro names
(:between? 2 1 3)

- every system function is called in the same manner, with (callee arg).
With C-like syntax, there's the confusion between referencing and calling:
@callee vs. @callee(arg).

- the only thing that we need to reserve is the & prefix which denotes
system functions. We can add anything there without the risk of clashing
with the existing syntax. C-like syntax prevents us from ever using &, |,
: etc. OTOH, we still need bool literals with the prefix like in
S-expressions: @true and @false.

- expressions use fewer special characters and are consistent. Compare the
following indexes (I'm already using parts of my proposed changes)
[@cldr/plural(n) user["gender"]]
[(cldr/plural n) (attr user "gender")]

- we're not limited to single-expression bodies. we can pass any number
of args to a function and define custom runtime behaviors. For instance we
could add a `condp` function (condition using a fixed predicate, similar to
a switch)
(condp &= username
"Mary" "she"
"John" "he"
"other")

Sure we could add a @condp global in the current syntax and call it with
any numbers of args, but why would @condp be a function and not @if?


Do we want to say 'no' to all of these advantages? We want to build a
small yet powerful domain language with expressions. Lisp syntax sounds
like a perfect fit to me, tbh.



> > Last but not least, I'd like to keep @ out of the syntax and use it for
> > docstrings like @param.
>
> I'm not sure how special characters in a comment collide with expression
> syntax. In docstrings, the '@id' are only meaningful if they are at the
> beginning of the line preceeded by whitespaces. We won't have any case like
> that with expressions.
>

Wouldn't the following be confusing? @param looks like a global.

/* @param $n Number of hellos */
<hello[@cldr.plural($n)] { ... }>


Thanks again, I'll revisit the design of hashes and will report back here.
-stas

Axel Hecht

unread,
Jul 16, 2015, 10:34:37 AM7/16/15
to mozilla-t...@lists.mozilla.org
I'm with gandalf that we should probably discuss expression changes
independently of entity markup changes.

I also think that I've got a patch to make us less brittle in the html
markup and error recovery case, I think we should roll that in and see
what it gives.

I also look back at my experience over using moz.l20n and/or tinker to
actually dive into the merits of the file format.

I think that we're in the scenario where ease of use is still dominated
by oversights and bugs in our toolchain. Which is OK, as we should make
sure the case where everything is fine works great.

If I had my way, we'd discuss changes to the file format based on
branched toolings, and then actually used stuff.

It's an interesting question on which level of tooling we want. Error
reporting is nice, and I think has to be there at the very least.

There's also a bunch of opportunities in terms of code assist, in
particular around indexes, expressions, plurals, etc. I wonder how many
of the pros and cons are related to this, and again, discussing them
without using them feels hard, at least to me.

Onto the actual proposals, I have a metric ton of stop-energy-NIH for
the lisp stuff. Like, I used emacs for ages, and never configured it to
my liking because of lisp. When it's Lisp, it's personal, y'know.

Structuring names, I'd side with gandalf on "keep it short, s...", the
namespace and name pairing looks strong enough. Unrelated on whether '-'
should be an ok part of an identifier, though.

On ':' in hash pairs or not, I don't have a strong preference, I guess.
I dropped the need for a : after the entity ID, and kept it for hashes,
just because I didn't bother the idea of changing hashes from JS.

Axel

Zibi Braniecki

unread,
Jul 16, 2015, 2:21:17 PM7/16/15
to mozilla-t...@lists.mozilla.org
On Thursday, July 16, 2015 at 4:28:40 AM UTC-7, Staś Małolepszy wrote:
> I don't think we should mimic JS code too much. There value in using a
> different syntax; we have different semantics.

We do, so it's very obvious to me that when we need to store things we can't in JS data model, we need to change. For example default hash value is an example where we extend. Thanks to that, everyone who ever saw JS, PHP, Python or literally any of the popular languges that do store hashes will recognize this:

{
x: "foo",
y: "foo2"
}

And on top of that we can store additional bit of information that we need. Minimal learning curve.

> PHP is a full-on language which was used to build Facebook. Please tell me
> you don't want to build the next Facebook in L20n ;)

No, and as you know, that was not my point. I'd appreciate if you didn't fall back to demagogy.

> OK, how about the slash character which I used in my previous email when I
> mentioned imports? Note this is just part of the name, not a new kind of
> expression.

> (cldr/plural n)
> (intl/format-date datetime)

Hahaha. I'm ok with slash but exactly as part of expression syntax ;)

I imagine that we could introduce imports this way. "cldr" is automatically imported.

> I think that Lisp is easy to learn because it's so simple. So it's not
> only on the merits of the simplicity of implementation that I suggest we
> use it.

I believe that that's your personal subjective experience not shared by majority of people working with IT.

> However, I think much of our discussion here boils down to who likes
> S-expressions and who doesn't. We might want other people to weigh in. It
> would also make sense (as you suggest later) to see what we could take from
> this proposal that we both agree on and see what depends on the fact that I
> used Lisp for expressions (like dashes in names). I tried to compile such a
> list at the end of this email.

Awesome. I agree. Let's separate pieces that all three of us agree on, and lets keep the ones where we disagree isolated.

> Why is it OK? L20n.js already is 70kb of JavaSciprt code, 30kb minified.
> It's huge. If we want people to use it at all we should be careful about
> adding more code to it.

I agree. And yet, I don't think that the solution is to impair users experience, learning curve, and make them learn new expression syntax in order to shave 10kb of that.

> I'd like to share that gut feeling but given how we haven't come up with UI
> concepts for L20n in tools like Pootle tells me that it's not going to be
> soon that we'll have all localizers writing expressions.

Agree.

> I don't want to optimize our syntax for a rare use-case of writing expressions. I think
> the gist of my thinking is to base expressions on something which is proven
> to work and easy to learn and implement. Remember v1's compiler and all
> the different types of functions it created to be secure? I don't want to
> repeat that.

And I pretty much believe that we should repeat much of that.

The thing with the syntax is that you can't optimize it for current properties scope plus one/two things and then figure out how it works 2 years from now when we'll add full expression syntax. We won't be able to change it. At all.
And I am actually quite confident that if we went with your expression syntax that it would be a major reason for people not to adopt L20n.

> I think it's better suited for our minimal system because it's just one
> kind of syntax that we need to support.

I don't think we should design L20n syntax around its minimal form, but its final form. That's why 1.x was so useful.

There was a project, not far ago, to design l10n library around minimal use and then add features as they became necessary.

I thought we had our lesson.

> In the current http://l20n.github.io/spec/grammar.html you can make a
> mistake on every level of the expression syntax. In conditional expr you
> can forget the colon. In logic expr you can omit one operand. In call
> expr you can mix parens. In computed property and attribute exprs you can
> mix brackets. There's just so many permutations of errors and we need to
> support all of them.

And we do, because they are parse errors. Which is awesome.

> See https://gist.github.com/stasm/c99010a8ab6d467562ba#file-plural-old-php
> for how the complex branching needs to be implemented in one single
> expression using three ternary ifs. I think that's a limitation of the
> current C-like syntax.

I don't think it's a problem.

> We can move part of the error reporting to runtime which is usually more
> informative.

Disagree. Parse errors are way safer and more useful than runtime ones. The earlier we catch a problem the better.

> We can also run static analysis on parse time (which is easy
> with S-expressions because code is data) and possibly detect problems
> before runtime.

Sorry, I don't even know if you're still serious. So, you're presenting static analysis on S-expressions for its ability to detect errors as an advantage over parser syntax errors...

> I think this comes down to teaching users that the opening paren is always
> special and calls system functions (unless you call a macro with
> (:my-plural ..)). Note that you can't call variables so it's not
> ambiguous. If we had a symbol for calling other symbols, the following
> would have the same meaning:
>
> (cldr-plural n)
> (call &cldr-plural n)

So, once again, you're basically saying that whatever other Web technology thay know is not useful and we'll teach them the new way. That's the learning curve I don't believe we should introduce.

> Looks like you didn't skip it all ;)

Oh I did. I can dig into that much more.

> I don't think I want people to be adding often in L20n.

They probably don't. But they may want to do execute logical operations or arthmetic operations. And, as I pointed out in the part you skipped, the way our civilisation teaches people about them (both in spoken/written languages and logic/math) is with "A x B x C", not "x A B C"

> I expect expressions to be
> mostly used in macros and I expect macros to be written by tech-savvy
> localizers who can easily grasp RPN.

And I believe that those same localizers would much prefer to work with JS-like, C-like syntax.

> I think the definition of "webby" isn't very clear and may mean different
> things to different people. I agree that RPN is non-standard on the client
> side.

I believe that when learning L20n, familiarity with Web technologies should lower the entry barrier and make it easier to read/modify/write.

Even languages we think of as "next" like L20n for Rust, Python, C++ will have the same characteristic. Unless we see lisp-like environment based L20n implementation in the future, I believe my point is valid.

> CSS very much uses dashes. A lot of dashes everywhere! In ids, class
> names, property names, variables, you name it. Which in fact is my
> inspiration!

Does it mean you also want to play funny games with our syntax later on like how CSS tries to deal with substraction operator in calc vs. ID's with dashes?

Sweet little decisions like "The + and - operators must always be surrounded by whitespace. The * and / operators do not require whitespace" ?

Do you want to play games with:

--foo-faa = 100px;
--foo-faa2 = 50px;

color: calc(var(--foo-faa) - var(--foo-faa2));

And repeat the introduced inconsistency and secondary nature of variables.

> I want to be be able to do this:
>
> <p id="hello-word" l10n-id="hello-world"></p>
>
> and not this:
>
> <p id="hello-word" l10n-id="helloWorld"></p>

And I'm ok with doing the latter to avoid toCamelCase/fromCamelCase stories between the language that doesn't have mathematical operators and one that does.

> In RPN it's not an operator if it's "in the middle". That's the benefit
> which I want to reap: make syntax unambiguous.

I mean substraction operator being "-" may be found in the middle of the ID as "my-name". It's not an operator anymore from the parser perspective, but it looks like one to a human eye.

> I know I'm repeating myself, but the whole point of RPN and S-expressions
> is to not have operators. Everything is a function call.

I feel like I'm telling you what is my perspective on what we should do with L20n syntax and you respond by saying whats the point of some other approach.

> Ugh. That's even more hierarchy... :(

True.

> The consequences of using S-expressions for our use-case are as follows:
>
> - we can use dashes in entity names, like CSS
> this-is-a-valid-identifier

Not a goal. Would be nice to have, but quite minor value imho. Languages that can substract should not have "-" in IDs.

> - we can use the single colon in references
> :brandName

I don't believe it's specific to S-expressions.

> - we can use the question mark in entity and macro names
> (:between? 2 1 3)

Not a goal.

> - every system function is called in the same manner, with (callee arg).
> With C-like syntax, there's the confusion between referencing and calling:
> @callee vs. @callee(arg).

Antigoal. I much prefer the syntax to easily catch when the user did something else than he wanted. That's why we separated entity ID references from variables. To catch early that the user wanted to reference an entity that doesn't exist rather than saying "well, maybe it's a variable?".

Same with other pieces. I prefer to say "well, you wanted to call and it's not callable" than "oh, you want something X with arg Y and we'll try some things and maybe one of them will work".

> - the only thing that we need to reserve is the & prefix which denotes
> system functions. We can add anything there without the risk of clashing
> with the existing syntax. C-like syntax prevents us from ever using &, |,
> : etc.

And I don't see this as a value. "hello-$w|rl:d" IDs are not a huge value.

btw. I really believe that you're rationalizing your preference by trying to present the ability to push arbitrary characters into entity ID's as a huge win.

> OTOH, we still need bool literals with the prefix like in
> S-expressions: @true and @false.

I don't think we decided on boolean literals. Last time we discussed, we dropped them.

> - expressions use fewer special characters and are consistent. Compare the
> following indexes (I'm already using parts of my proposed changes)
> [@cldr/plural(n) user["gender"]]
> [(cldr/plural n) (attr user "gender")]

If you want to not distinguish globals separately (as you do in your sexpr example) then don't put '@' in front of the global.

Except of that, yeah. The former looks much better to me.

> - we're not limited to single-expression bodies. we can pass any number
> of args to a function and define custom runtime behaviors. For instance we
> could add a `condp` function (condition using a fixed predicate, similar to
> a switch)
> (condp &= username
> "Mary" "she"
> "John" "he"
> "other")

So, we could add switch if we needed it. I don't see a huge value in it.

> Sure we could add a @condp global in the current syntax and call it with
> any numbers of args, but why would @condp be a function and not @if?

Because there's no major use case and we're dealing with basics for now.

>
> Do we want to say 'no' to all of these advantages?

Absolutely. I find the cost of introducing exotic syntax, increasing entry barrier, diverging from expression language of other web technologies (CSS, JS, DOM, JSON etc.) and moving lots of syntax errors to runtime. Absolutely.

> We want to build a small yet powerful domain language with expressions. Lisp syntax sounds like a perfect fit to me, tbh.

I don't agree with you.

> Wouldn't the following be confusing? @param looks like a global.
>
> /* @param $n Number of hellos */
> <hello[@cldr.plural($n)] { ... }>

Hmm, idk. I don't see it as very confusing.

zb.

Richard Olsson

unread,
Jul 18, 2015, 8:33:52 AM7/18/15
to mozilla-t...@lists.mozilla.org
Wow, I'm loving the discussion. I made a lukewarm attempt at joining the work on L20n maybe six months or a year back, but never quite got into it because the project didn't feel as alive back then. I'm very happy to see such fundamental topics as the syntax being discussed, and am inspired to give it another try. So please bear with me since I'm still pretty new to the project.

Personally what attracted me to L20n was the syntax, and not the integration with the web front-end stack. I think the original syntax works great, and although there are some minor tweaks in Stas's proposal that I wouldn't mind, I think Zibi has the more valid points on most issues begin discussed in this thread.

My main concern when I came across L20n was l20n.js, and how the L20n language to me felt very tightly coupled to javascript and HTML. It seems to me as if many of the concerns being voiced in this thread are related to that same fact, and I don't see why it need to be that way.

If the desired syntax is too (technically) complex, making the parser heavier than is considered ideal for a browser app, why not pre-compile to a format that can be more easily parsed instead of changing the source syntax? The mo/po approach is actually one of the few things that I don't personally hate about gettext. :)

This might seem a little bit off-topic, but it's part of a general way of thinking of L20n and the JS API as separate, which could actually affect some of the syntax questions. One example is the global @screen context, relevant in this discussion because of the @ prefix.

I'm gonna go out on a limb here and say that I don't think globals should be in the language at all. While responsive localization is a great thing, the screen size should be a context provided by the browser JS API, and hence just be a normal $var. Most web localizers wouldn't know the difference, but it keeps the language cleaner and avoids taking on responsibilities which can prove to be very difficult, like how to deal with timezones for @hour (locale != timezone), or @screen when localizing on the back-end.

I have lots more to say about the API and separating L20n from the SDKs, but nothing that is strictly relevant to the syntax of L20n, so I'll leave it for now. :)

Cheers
/R

Staś Małolepszy

unread,
Jul 20, 2015, 10:13:35 AM7/20/15
to Zibi Braniecki, mozilla-t...@lists.mozilla.org
On Thu, Jul 16, 2015 at 8:21 PM, Zibi Braniecki <
zbigniew....@gmail.com> wrote:

>
>
> {
> x: "foo",
> y: "foo2"
> }
>
> And on top of that we can store additional bit of information that we
> need. Minimal learning curve.
>

I see how this might be minimal learning curve for someone who understands
a bit of programming. For those who haven't had such experience at all,
this is black magic. It's familiar to us and it's easy to confuse this
with thinking that this is *the* minimal learning curve. In fact, I could
argue that S-expressions are the minimal learning curve too!


>
> > PHP is a full-on language which was used to build Facebook. Please tell
> me
> > you don't want to build the next Facebook in L20n ;)
>
> No, and as you know, that was not my point. I'd appreciate if you didn't
> fall back to demagogy.
>

It's not. It's a crucial point. PHP was badly designed without
namespaces. It's bad design because PHP is a general programming
language. It currently has over 5,000 built-in functions. That's
terrible. L20n is a domain-specific language and I hope we'll mange to
keep the number of built-ins below 100 and more like 10, really.


>
> > OK, how about the slash character which I used in my previous email when
> I
> > mentioned imports? Note this is just part of the name, not a new kind of
> > expression.
>
> > (cldr/plural n)
> > (intl/format-date datetime)
>
> Hahaha. I'm ok with slash but exactly as part of expression syntax ;)
>
> I imagine that we could introduce imports this way. "cldr" is
> automatically imported.
>

OK, that's an interesting approach which I like. This is also related to
the sentiment Richard has about globals vs. developer $vars. Maybe we
could get rid of global completely with such imports?


>
> > I think that Lisp is easy to learn because it's so simple. So it's not
> > only on the merits of the simplicity of implementation that I suggest we
> > use it.
>
> I believe that that's your personal subjective experience not shared by
> majority of people working with IT.
>

And what's that belief based on?


> The thing with the syntax is that you can't optimize it for current
> properties scope plus one/two things and then figure out how it works 2
> years from now when we'll add full expression syntax. We won't be able to
> change it. At all.
>

S-expressions give us that flexibility by moving syntax constructs like +
operator to runtime, where + is a function call. The syntax stays the same.


> And I am actually quite confident that if we went with your expression
> syntax that it would be a major reason for people not to adopt L20n.
>

Why are you confident again, here?


> I don't think I want people to be adding often in L20n.
>
> They probably don't. But they may want to do execute logical operations or
> arthmetic operations. And, as I pointed out in the part you skipped, the
> way our civilisation teaches people about them (both in spoken/written
> languages and logic/math) is with "A x B x C", not "x A B C"
>

You say "they probably don't" and then you say "may want to execute
arithmetic operations". What kind of operations are those? Can you think
of specific examples? Realistically, do we need more than some kind of
plural support which can be extended to take into account metadata bools
like "is animate" or "is a person" and some kind of string lookup function
(startsWith, endsWith)?

I think we're overengineering L20n big time. It shows in the discussion
about the dash in identifiers. Is L20n supposed to be a programming
language? Or is it supposed to be a data store format with some expression
syntax? I firmly believe it's the latter. We'd be using YAML if it wasn't
for multiline strings on our side and significant whitespace on YAML's
side. I want the expression syntax to be well thought-out and baked into
the language, but perhaps we're focusing too much on it and in fact we're
ending up designing the whole syntax for an edge-case!

> CSS very much uses dashes. A lot of dashes everywhere! In ids, class
> > names, property names, variables, you name it. Which in fact is my
> > inspiration!
>
> Does it mean you also want to play funny games with our syntax later on
> like how CSS tries to deal with substraction operator in calc vs. ID's with
> dashes?
>

> Sweet little decisions like "The + and - operators must always be
> surrounded by whitespace. The * and / operators do not require whitespace" ?
>

No, and that is why S-expressions are part of my proposal. I fully believe
that if our syntax is not based on S-expressions we should not allow dashes
in names. My point was about being webby. CSS ids feel webby to me and we
don't have them right now.


> And I'm ok with doing the latter to avoid toCamelCase/fromCamelCase
> stories between the language that doesn't have mathematical operators and
> one that does.
>

If we allow dashes in names there won't be need for toCamelCase. We can
fix what DOM got wrong :)


> > The consequences of using S-expressions for our use-case are as follows:
> >
> > - we can use dashes in entity names, like CSS
> > this-is-a-valid-identifier
>
> Not a goal. Would be nice to have, but quite minor value imho. Languages
> that can substract should not have "-" in IDs.
>

That's not true. All Lisps can substract :) Arguably, the dash is more
webby: CSS and HTML both support it.


> > - we can use the single colon in references
> > :brandName
>
> I don't believe it's specific to S-expressions.
>

It depends on the character that we use for entity references, but you're
right, it's not specific.


>
> > - every system function is called in the same manner, with (callee arg).
> > With C-like syntax, there's the confusion between referencing and
> calling:
> > @callee vs. @callee(arg).
>
> Antigoal. I much prefer the syntax to easily catch when the user did
> something else than he wanted. That's why we separated entity ID references
> from variables. To catch early that the user wanted to reference an entity
> that doesn't exist rather than saying "well, maybe it's a variable?".
>
> Same with other pieces. I prefer to say "well, you wanted to call and it's
> not callable" than "oh, you want something X with arg Y and we'll try some
> things and maybe one of them will work".
>

Is @os a property or a method? Is @hour a property or a method?
@deviceType? I think we're creating confusion by allowing globals to be
both callable and non-callable. I don't think we'll want to pass things
around be reference, so maybe they should always be callable?


> And I don't see this as a value. "hello-$w|rl:d" IDs are not a huge
> value.
>

Of course not, but using & or | in the future as a special name might be
helpful. C-like syntax forbids a lot of characters, really. Also, naming
bool-returning functions as `person?` instead of `isPerson` is a nice
humanizing aspect that I really like in Lisp and Ruby.

-stas

Staś Małolepszy

unread,
Jul 20, 2015, 11:11:15 AM7/20/15
to Richard Olsson, mozilla-t...@lists.mozilla.org
Hey Richard, thanks for taking part in the discussion!

On Sat, Jul 18, 2015 at 2:33 PM, Richard Olsson <r...@richardolsson.se> wrote:

>
> Personally what attracted me to L20n was the syntax, and not the
> integration with the web front-end stack. I think the original syntax works
> great, and although there are some minor tweaks in Stas's proposal that I
> wouldn't mind, I think Zibi has the more valid points on most issues begin
> discussed in this thread.
>

The thing that bothers me in the current syntax is how cryptic it is. Take
a look at this relatively simple example:

<unreadEmails[@cldr.plural($emailCount)] {
one: "You have one unread email",
other: "You have {{ $emailCount }} unread emails"
}>

There's all kinds of brackets in there, and three special characters: ., @
and $, and it's clear what they mean and how to even google them. Part of
this might actually be on purpose: having the dollar sigil in the
placeable might increase the chance of the localizer not translating the
variable name itself (I've seen this happen more than I wish I had :). But
that's just a hypothesis which we haven't verified.

Also, we keep talking about how the localizers will want to reference other
entities in translations, but do we have any other example of this other
than the {{ brandName }}?

I don't want localizers to do the following (which is redundant,
complicated and suffers from capitalization problems):

<_tab {
nominative: "tab",
genitive: "tab's"
}>

<closeTab "Close {{ _tab.nominative }}?">
<eraseTab "Erase all of this {{ _tab.genitive }} history?">

I want them to do the following:

<closeTab "Close tab?">
<eraseTab "Erase all of this tab's history?">

I've written about this before:

http://informationisart.com/20/
http://informationisart.com/19/

And I'm not sure if one of my examples (reproduced below) is what I want
the localizers to do anymore:

<_uniteDeMesure {
B: "o",
KB: "Ko",
MB: "Mo",
GB: "Go",
TB: "To"}><availableSize "Il reste {{ $size }} {{ _uniteDeMesure[$unit] }}">

We shouldn't be used hash value like that. Hash values should be a
collection of variants of the same entity depending on some external
variable, not data stores. I'm now thinking that we should offer a syntax
to dynamically reference entities by computed name and encourage localizers
to add many small private entities to their translation files:

<_unitB "o">
<_unitKB "Ko">
<_unitMB "Mo">
<_unitGB "Go">
<_unitTB "To">
<availableSize "Il reste {{ $size }} {{ @get('_unit' + $unit) }}">

The same i true for expressions and macros. What are the actual
use-cases? We might want to focus on advanced plurals support (including
animacy and personality) and some sort of string manipulation / lookup.
Also an easy way to select a specific branch of code form things like
@screen or @hour (hence my suggestion to add cond and condp, or a switch
statement).


If the desired syntax is too (technically) complex, making the parser
> heavier than is considered ideal for a browser app, why not pre-compile to
> a format that can be more easily parsed instead of changing the source
> syntax? The mo/po approach is actually one of the few things that I don't
> personally hate about gettext. :)
>

I think this is a valid point and we've been in fact doing this for a while
in Firefox OS already: all resource files are parsed and concatenated on
buildtime into JSONs.

One of the criteria for "webbiness" is the ability to do a F5-style
development, and any compile step prevents that. My hope is that we can
have parser that's fast enough to not be forced to use precompilation.
Maybe this is where my desire to have a lean syntax comes from :).


> This might seem a little bit off-topic, but it's part of a general way of
> thinking of L20n and the JS API as separate, which could actually affect
> some of the syntax questions. One example is the global @screen context,
> relevant in this discussion because of the @ prefix.
>
> I'm gonna go out on a limb here and say that I don't think globals should
> be in the language at all. While responsive localization is a great thing,
> the screen size should be a context provided by the browser JS API, and
> hence just be a normal $var. Most web localizers wouldn't know the
> difference, but it keeps the language cleaner and avoids taking on
> responsibilities which can prove to be very difficult, like how to deal
> with timezones for @hour (locale != timezone), or @screen when localizing
> on the back-end.
>

I like the idea of moving some (or all) globals into the $var space.
However, to address your point about some globals not being available in
certain environments, I'd like to point out that globals have always been
intended as platform- and environment-specific.

-stas

Zibi Braniecki

unread,
Jul 20, 2015, 11:59:33 AM7/20/15
to mozilla-t...@lists.mozilla.org
On Monday, July 20, 2015 at 7:13:35 AM UTC-7, Staś Małolepszy wrote:
> I see how this might be minimal learning curve for someone who understands
> a bit of programming. For those who haven't had such experience at all,
> this is black magic. It's familiar to us and it's easy to confuse this
> with thinking that this is *the* minimal learning curve. In fact, I could
> argue that S-expressions are the minimal learning curve too!

I disagree, I also don't think that it's the right topic (as in, I'm not sure if the answer to this question should be a major component of the decision on what syntax we use).


> > I believe that that's your personal subjective experience not shared by
> > majority of people working with IT.
> >
>
> And what's that belief based on?

Gut feelings, personal experience with software, nothing sharp enough to make finite statements.

> S-expressions give us that flexibility by moving syntax constructs like +
> operator to runtime, where + is a function call. The syntax stays the same.

I think you misunderstood what I wrote above.

>
> > And I am actually quite confident that if we went with your expression
> > syntax that it would be a major reason for people not to adopt L20n.
> >
>
> Why are you confident again, here?

Once again. Gut feeling, personal belief, experience with software. Wherver I write about my personal take which is a matter of preference, I denote that. It's the same strength of argument as vast majority of the whole thread from your side - personal preference.

> You say "they probably don't" and then you say "may want to execute
> arithmetic operations". What kind of operations are those? Can you think
> of specific examples? Realistically, do we need more than some kind of
> plural support which can be extended to take into account metadata bools
> like "is animate" or "is a person" and some kind of string lookup function
> (startsWith, endsWith)?

I don't know :) I don't know which direction L20n will take as we'll add more features. Will we end up with some sort of responsive localization? With dynamic string length adaptation? With string lookups?

I believe that JS-style expression is a safe bet for what we may want to do in the future.

> I think we're overengineering L20n big time. It shows in the discussion
> about the dash in identifiers. Is L20n supposed to be a programming
> language? Or is it supposed to be a data store format with some expression
> syntax? I firmly believe it's the latter. We'd be using YAML if it wasn't
> for multiline strings on our side and significant whitespace on YAML's
> side. I want the expression syntax to be well thought-out and baked into
> the language, but perhaps we're focusing too much on it and in fact we're
> ending up designing the whole syntax for an edge-case!

I agree that expressions are minor piece of l20n format. I don't know if it will be in the future. I believe that JS-style expression syntax allows us to provide the minimal now, and easily extend-while-staying-familiar later.

We'd be using YAML, awesome. And we'd be storing JS-style expression in that YAML.

I just don't agree with you, I guesss, with your belief that JS-expression syntax is overcomplicated, overengineered and overall bad for our goals. I believe it's perfect for our goals :)

> No, and that is why S-expressions are part of my proposal. I fully believe
> that if our syntax is not based on S-expressions we should not allow dashes
> in names. My point was about being webby. CSS ids feel webby to me and we
> don't have them right now.

I don't believe that dash-based IDs are at the heart of what we should try to improve in our syntax.

At the same time, I'm ok improving that. We can introduce similar semantics that JS have to solve it.

But I don't believe that we should transition away from JS-style to lisp-style to achieve them.

> If we allow dashes in names there won't be need for toCamelCase. We can
> fix what DOM got wrong :)

See one above.

> That's not true. All Lisps can substract :) Arguably, the dash is more
> webby: CSS and HTML both support it.

And JS supports it too.

> Is @os a property or a method? Is @hour a property or a method?
> @deviceType? I think we're creating confusion by allowing globals to be
> both callable and non-callable. I don't think we'll want to pass things
> around be reference, so maybe they should always be callable?

Yeah, maybe they should. I would be ok with "@hour" being shortcut for "@hour()", but maybe we want to stay explicit.

> Of course not, but using & or | in the future as a special name might be
> helpful. C-like syntax forbids a lot of characters, really. Also, naming
> bool-returning functions as `person?` instead of `isPerson` is a nice
> humanizing aspect that I really like in Lisp and Ruby.

I see it as another risk of unfamiliar syntax that increases the chance that a localizer will try to translate it. Humanizing expression syntax in localization language may be an anti-pattern for me.

Which is maybe also part (just part) of why I'm not that worried about cryptic syntax. I expect localizers to copy&paste a lot as they learn, and the less "en-US" like the expression looks like the less risk that they'll try to translate it. "isPerson($name)" sound way less risky than "person? name".


zb.

Axel Hecht

unread,
Jul 20, 2015, 5:18:33 PM7/20/15
to mozilla-t...@lists.mozilla.org
Gonna reduce this to a response without trying to do context:


S-Expressions:

There's two things about S-Expressions:

* Operators require whitespace separation
This is basically that (- foo bar) and (-foo bar) are two different things.

I think that's a fine conversation to have, and I see how enabling
operator chars inside IDs can be a pro.

* Operator-first notation
This one, sorry, is just awkward. No kid learns math this way. 1 + 3 is
elementary school, and I might go into details of how I suffer from
people not getting this right, but seriously, this is something we
should build on, and not break.

Conditionals:

I think there's a valid point whether the ternary is the best way to
express conditionals.

foo ? one : two
vs
if(condition, if_true), if(condition, if_true, else_value)


Semantical structures:

This is one of the basic design principles I had way back when.

There's no semantics to hashes.

Semantics depend on the language and the grammar of that language. They
also to some extent depend on the use case in the app, like, refactoring
strings into app and toolkit.

The basic gist of hashes is that, in a software context, the number of
possibilities is finite, and thus you can always create a hash mapping
to get things right.

There's an explicit denial of semantics in that approach.


eval(), in a sense:

{{ @get('_unit' + $unit) }}

vs

{{ _uniteDeMesure[$unit] }}

From my Math point-of-view, there are two cases:

Linear functions, which are implemented through matrix multiplications
(internal entity with an index), and
non-linear functions, which are macros.

There's a tad of anti-Perl here, in that, if I can rule out an
alternative way of implementing stuff, I'd rule it out.

Note, from a POV of "let's have a string per thing", requesting the same
entity for B, kB, mB etc, is already a premature optimization. But it's
also OK in practice, so I don't think we should spend to much effort to
create a white theory for it.

That's what I remember to put into this so far.

Axel

Richard Olsson

unread,
Jul 21, 2015, 5:32:02 AM7/21/15
to mozilla-t...@lists.mozilla.org
On Monday, 20 July 2015 17:11:15 UTC+2, Staś Małolepszy wrote:
> Hey Richard, thanks for taking part in the discussion!

Thank you! I feel very welcome. :)

> The thing that bothers me in the current syntax is how cryptic it is. Take
> a look at this relatively simple example:
>
> <unreadEmails[@cldr.plural($emailCount)] {
> one: "You have one unread email",
> other: "You have {{ $emailCount }} unread emails"
> }>

I really don't think it's that bad. Maybe the @cldr part is hard to understand, and the {} brackets might be redundant (since the entity name can't contain whitespace and the index always ends with ], so it could still be parsed without the {}.

And I definitely like the <> entity enclosure over more {} brackets.

> I've written about this before:
>
> http://informationisart.com/20/
> http://informationisart.com/19/

I think you're right about this, but I'm unsure how this would affect syntax. I mean, you still need variable expansion/placeables and there are some uses cases for entity references, so it would still need to be in the language, don't you think?

> <availableSize "Il reste {{ $size }} {{ @get('_unit' + $unit) }}">

I'm not a big fan of this to be honest. I prefer dereferencing a hash/object over this eval-type construct.

> The same i true for expressions and macros. What are the actual
> use-cases?

I think there are many use cases, with plurals being the most prominent one. I think we should leave it up to the localization engineer to decide whether they want to use macros/expression. I can imagine different time formats and things like that as well (where you want it to say "just now", "an hour ago", "several hours ago", "yesterday", "May 11th" et c depending on time passed).

We might want to focus on advanced plurals support (including
> animacy and personality) and some sort of string manipulation / lookup.
> Also an easy way to select a specific branch of code form things like
> @screen or @hour (hence my suggestion to add cond and condp, or a switch
> statement).


> I think this is a valid point and we've been in fact doing this for a while
> in Firefox OS already: all resource files are parsed and concatenated on
> buildtime into JSONs.

I'm not a big fan of the (somewhat verbose) JSON AST syntax. I like compact file formats and have designed a few in my days. So if we wanted something more compact, I could take a look.

> One of the criteria for "webbiness" is the ability to do a F5-style
> development, and any compile step prevents that.

I agree, and what a lot of libraries do nowadays is provide a runtime parser for development purposes, and then a compiler for deployment/production. One example among many being React.js.

Personally I rarely use the runtime parsers because I set up a build environment with fast, automatic build-on-save, but it still makes sense and could be one option. The runtime parsers are often viable options for production as well, although not recommended.

> I like the idea of moving some (or all) globals into the $var space.
> However, to address your point about some globals not being available in
> certain environments, I'd like to point out that globals have always been
> intended as platform- and environment-specific.

I understand that was always the intention, and that's what made me wonder whether they need to be their own thing at all (since they're not really global in any other sense than the $vars are), instead of using the already existing context variable space.

Staś Małolepszy

unread,
Jul 21, 2015, 9:55:16 AM7/21/15
to Axel Hecht, mozilla-t...@lists.mozilla.org
Thanks for sharing your thoughts, Axel!

On Mon, Jul 20, 2015 at 11:19 PM, Axel Hecht <l1...@mozilla.com> wrote:

>
> * Operators require whitespace separation
> This is basically that (- foo bar) and (-foo bar) are two different things.
>
> I think that's a fine conversation to have, and I see how enabling
> operator chars inside IDs can be a pro.
>

The point about required whitespace is a good one. It's worth noting that
our current syntax also requires whitespace in certain places, e.g. before
an attribute's id.


> * Operator-first notation
> This one, sorry, is just awkward. No kid learns math this way. 1 + 3 is
> elementary school, and I might go into details of how I suffer from people
> not getting this right, but seriously, this is something we should build
> on, and not break.
>

I'm not going to argue with that :) Would it change anything if in my
examples I called the adding function `add` and not `+`? Compare: (add 1
3) vs. (+ 1 3). That's arguably even more humanized than 1 + 3.

S-expressions aside, there's something else that I'd like to question: the
usefulness of "1 + 3" in localization. Languages are really hard to get
right if you apply maths and logic to them. Something that I really loved
when I first saw it is Rust's pattern matching with guards.
https://doc.rust-lang.org/book/patterns.html It looks like something that
would be very useful for localization, perhaps more so than adding two
numbers.

While I'd like to allow the following to be possible (using the old syntax
and skipping plurals):

<lastVisit "Your last visit was {{ @Math.roundUp($days / 7) }} weeks
ago.">

... I don't want to encourage localizers to write such code. First of all,
if the developer passes $days, you should probably use days. OTOH, if some
language uses weeks more often than days, then I think I'd like the
localization community to create a common set of macros for this language
so that the localizers can do the following instead:

import('common');
<lastVisit "Your last visit was {{ _toWeeks($days) }} weeks ago.">

The implementation can be thus hidden from the localizer while still
keeping them in control of what's happening. The more I think about it,
the more I'm convinced that possibly the only expression type that I'd like
to be in a wide-spread use is the call expression: toWeeks(),
prettyDate(), plural() etc. Which also works really well in S-expression,
btw.

Perhaps the gist of my thinking is this: we've designed the expression
syntax such that it's too powerful for non-programmer-localizers (whom we
want to encourage to only use the call expression) and not convenient
enough for the programmer-localizers: the limitation of a single
expression per body is troubling and our ternary-if syntax is cumbersome.
I posit that we have two target audiences with different needs. If the
expression syntax is going to be used by the few programmer-localizers, I
question the decision to implement a complex syntax which is difficult to
parse and has complex precedence rules. My proposal to use S-expressions
was an attempt to work around this: make the expression syntax very
general and lean, so that it's still possible to encode complex logic, but
at the same time don't invest time and resources into maintaining it.



> Semantical structures:
>
> This is one of the basic design principles I had way back when.
>
> There's no semantics to hashes.
>

I like the simplicity of this approach :) I think the need for semantics
arose when we started thinking about tools like compare-locales and the
social contract between the developer and the localizer. We wanted to
understand if hash members were or were not part of the social contract.
Do you have more thoughts on this?

Related to this is my blog post on the asymmetry of L20n:
http://informationisart.com/21/



> There's a tad of anti-Perl here, in that, if I can rule out an alternative
> way of implementing stuff, I'd rule it out.
>

That's a really good rule to keep in mind in general, I like it.


> Note, from a POV of "let's have a string per thing", requesting the same
> entity for B, kB, mB etc, is already a premature optimization. But it's
> also OK in practice, so I don't think we should spend to much effort to
> create a white theory for it.
>

Hmm, I'm not sure I agree here. You're right that it's a bit an
optimization and that the developer could also choose the unit on their
side and then use one of the predefined translations corresponding to that
unit (did I understand you right?). But OTOH, I feel like it would be
better if the developer passed a raw value into the translation which the
localizer can take and transform at will. For instance, suppose there's a
language which increases the unit denomination on a off-by-one order of
magnitude cadence, so: 3B, 30B, 300B, 3000B, 30KB, 300KB, 3000KB, 30MB,
300MB, 3000MB, 30GB etc. This is possible if we leave the choice of the
unit to the localizer. The same applies to dates.


-stas

Staś Małolepszy

unread,
Jul 21, 2015, 10:17:50 AM7/21/15
to Richard Olsson, mozilla-t...@lists.mozilla.org
On Tue, Jul 21, 2015 at 11:31 AM, Richard Olsson <r...@richardolsson.se> wrote:

>
>
> I think you're right about this, but I'm unsure how this would affect
> syntax. I mean, you still need variable expansion/placeables and there are
> some uses cases for entity references, so it would still need to be in the
> language, don't you think?
>

Yes, naturally. But perhaps if we agree that entity references should not
be common, we could make their syntax "harder" ($entity) and make variable
syntax "easier" (var).


> I'm not a big fan of the (somewhat verbose) JSON AST syntax. I like
> compact file formats and have designed a few in my days. So if we wanted
> something more compact, I could take a look.
>

That would be great!

Also, have you seen the recent change in the JSON syntax? We simplified it
a bit in https://github.com/l20n/l20n.js/pull/49.

$ echo "<hello \"Hello, world\">" | ./tools/parse.js -o entries
hello: Hello, world



> > One of the criteria for "webbiness" is the ability to do a F5-style
> > development, and any compile step prevents that.
>
> I agree, and what a lot of libraries do nowadays is provide a runtime
> parser for development purposes, and then a compiler for
> deployment/production. One example among many being React.js.
>
> Personally I rarely use the runtime parsers because I set up a build
> environment with fast, automatic build-on-save, but it still makes sense
> and could be one option. The runtime parsers are often viable options for
> production as well, although not recommended.
>

That's an interesting approach. Zibi, Axel, what do you guys think?


>
>
> I understand that was always the intention, and that's what made me wonder
> whether they need to be their own thing at all (since they're not really
> global in any other sense than the $vars are), instead of using the already
> existing context variable space.
>

Yup, I think we're on the same page here.

Thanks!
-stas
0 new messages