formatters on sections

Andy Chu

unread,

Jun 19, 2009, 3:45:06 AM6/19/09

to JSON Template

I was talking with a co-worker today at lunch, and found this nice use
case. This is extremely annoying for our tech-writers:

The problem is escaping <>& in code, especially when having to go back
and edit it. It's very error prone. A very nice use of formatters
is:

{.section snippet|html}
if (n < 3 && m > 3) {
var s = {s|js-string}; # I can also format a javascript string
here, and THEN escape it as HTML too!
}
{.end}

Here is another very nice use from my documentation:

{.section template|json-template-code} # syntax highlight as JSON Template
{.end}

{.section data-dictionary|json} # syntax highlight JSON dict
{.end}

{.section result|html} # syntax highlight the expanded result as HTML
{.end}

I use this all in the same doc. If you look at the scheme I use to
generate my docs docs, it's actually a bit awkward now:

http://code.google.com/p/json-template/source/browse/#svn/trunk/doc

Generate a graph image inside an HTML doc:

[.section graph|dot-format]
digraph {
...
[variables] [here] [too] # expansion occurs before dot-format
}
[.end]

The formatter could render the .dot file as an image graph, and then
return an <img src="rendered.png">.

I have seen people do this type of thing with Javascript (including
google-code-prettify, which is used on code.google.com). However,
it's a lot faster to render if it's done on the server side (and
cacheable, etc.).

Currently, you could do this with a static JSON file, but this is annoying:

{"data-dictionary": "{\"name\": \"value\"}"
"template": "..."
"result": "<a href=\"...\"",
}

So I think this feature is quite simple to implement, general, and
consistent with the rest of JSON Template. Also, I don't think other
template engines solve this problem. This would be a really nice
feature which would make JSON Template stand out in the (large) crowd.
People generally evaluate technologies by features, and unfortunately
minimalism is not considered a feature. Especially if there was an
add-on library of all these formatters, I could see it becoming quite
popular.

Andy

Andy Chu

unread,

Jun 19, 2009, 11:39:52 PM6/19/09

to JSON Template

Here's a single model, where both formatters and predicates work on
sections and substitutions. This model is nice because it can do many
things:

1) Prove that this is a specification error.
http://code.google.com/p/json-template/issues/detail?id=21 . Right
now substituting null's raises an UndefinedVariable, which isn't
right.

2) Explains why sections can have formatters (they can't now, only
substitutions can)

3) Explain why predicates can be applied to *both* sections and
substitions, and why formatters can't be overloaded as predicates.

4) Explain why formatters can be chained but predicates can't, e.g.
{foo|formatter1|formatter2}

Definitions:

A JSON value is as the standard says: null, true, false, number,
string, list, object.

A string is a special case of a JSON value.

A substitution identifies a single name that may or may not be
expanded into the output.

A directive is a literal, substitution, or a section.

A section identifies a sequence of directives that may or may not be
expanded into the output.

The *value* of a substitution is the value from the data dictionary
with the corresponding name. Any dictionary node is a JSON value,
hence the substitution's value is a JSON value.

The *value* of a section is the value of the sequence of directives it
contains, concatenated. This definition is recursive since sections
can contain sections. The value of each directive must be a string,
since only strings can be concatenated.

Formatters are functions from an *arbitrary* JSON value -> JSON value
(null is included in both domain and range!). This implies they can
be chained. And it also implies that they may modify any other value.
Any output can be replaced by applying the formatter to the output.
(notice I make NO specific reference to either sections or
substitutions, it's consistent!)

(The default formatter is "str", which is the identity function on strings)

Predicates are functions from arbitary JSON values to -> booleans
(which are also a special case of JSON values). Predicates *decide*
whether a value should be shown. (I also don't need to make any
reference to sections or substitutions here).

(The default predicate is like bool() in Python, which maps null,
false, "", [], {} --> false, and everything else to true. This is how
sections and repeated sections currently decide to expand the {.or}
section)

So then we have 2x2=4 combinations:

{name|html}

{.section program|html}
if (n < 3 && m > 4) ...
{.end}

{num|plural?} -- shown iff Plural(num) is true

{.section group|plural?}
There are {num} people in group {name}
{.or}
There is one person in group {name}
{.end}

So they are orthogonal, and this implementation will be very clean,
without special cases. Since sections can appear inside sections,
conceptually there will be a stack of formatters. But all the
implementations are recursive interpreters, so this stack will can
just be the host language stack.

I thought of this this morning, so there could be holes, but it looks
very sound to me. It cleans up some specification errors, and there's
an algebra of values and formatters. And it solves practical problems
too!

Andy

Russ Cox

unread,

Jun 20, 2009, 2:02:49 AM6/20/09

to json-t...@googlegroups.com

Here's a single model, where both formatters and predicates are the
same operation: filtering. This model is nice because it has only
one definition of expression and assigns only one meaning to |.

Definitions:

A JSON value is as the standard says: null, true, false, number,
string, list, object.

A string is a special case of a JSON value.

A substitution identifies a single value that may or may not be
expanded into the output.

A directive is a literal, substitution, or a section.

A section identifies a sequence of directives that may or may not be
expanded into the output.

Filters are functions from an *arbitrary* JSON value -> JSON value

(null is included in both domain and range!). This implies they can
be chained. And it also implies that they may modify any other value.

An expression has the form name ("|" name)*.
To evaluate the expression, the leading name is looked up
in the value stack (same as current JSON template) to produce
an initial value. Then each .name is applied, replacing the value
with the result of looking that name up in value, which must
be a JSON map. Then each |name is applied, replacing
the value with the result if applying the named filter to the value.
The expression a|b|c|d would be written d(c(b(a))) in many
programming languages. The special leading name "@" means
the top element on the value stack.

A substitution has the form "{" expr "}". In the output, it is
replaced by the result of evaluating expr.

A section has the form
"{.section" expr "}" true-body ("{.or}" false-body)? "{.end}"
In the output, it is replaced by the result of the following procedure.
First, expr is evaluated to produce a value v.
If v is not false or empty, v is pushed on the value stack, true-body
is executed, and v is popped off. Otherwise, false-body is executed
if present.

A repeated section has the form
"{.repeated section" expr "}" body ("{.or}" false-body)? "{.end}"
[Yes there's also an alternates-with but it doesn't matter here.]
In the output, it is replaced by the result of the following procedure.
First, expr is evaluated to produce a value v, which must be an array.
For each element of v, that element is pushed on the stack, body is
executed, and the element is popped off. If v is empty, false-body
is executed if present.

As a result of this definition, one style of filters is as "predicates"
which return either their input or false. Using them can enable or
disable sections appropriately:

{.section num|>1}
There are {num|english} people in group {name}
{.or}
{.section num|==1}

There is one person in group {name}

{.or}
There are no people on group {name}. How sad.
{.end}
{.end}

It is also possible to use filters to transform the data before iterating
over it, for example to select only the public fields from a data structure
definition:

{.repeated section fields|public?}
Field {name} has type {type}.
{.end}

Or to order an array in a certain way:

{.repeated section people|sort-by-phone-number}
{name} {phone-number}
{.end}

Or perhaps to create a potentially large data structure on the fly:

{.section x|primes-up-to-10000}
{@} is prime.
{.end}
Again, the primes are: {x|primes-up-to-10000}.

In this implementation, evaluation of expressions has just
one behavior: a|b|c|d means take a, pipe it through b, c, and d,
and use the result as the value of the expression, no matter
what the context.

Neither sections nor substitutions have predicates or formatters.
Data has filters. Instead of 4 cases, there is 1.
Predicates are an idiom, not a built-in concept.

Russ

Andy Chu

unread,

Jun 20, 2009, 3:48:11 PM6/20/09

to json-t...@googlegroups.com

OK interesting. I see what you mean more clearly with the single
definition of expressions. Either a simple name or a
name|formatter|... chain is an expression, and you just keep the
existing rule which determines whether to show a section.

But there's a problem as you've defined it: sections can't contain
.name lookups now. Even if they could, the lookup is "direct", not
the type that walks the stack. The fundamental problem is that for a
section, you're looking for context to test for truth and push on the
stack, while in a substitution, you're looking for a value to put in
the output. These just can't be made the same.

> An expression has the form name ("|" name)*.
> To evaluate the expression, the leading name is looked up
> in the value stack (same as current JSON template) to produce
> an initial value. Then each .name is applied, replacing the value
> with the result of looking that name up in value, which must
> be a JSON map. Then each |name is applied, replacing
> the value with the result if applying the named filter to the value.
> The expression a|b|c|d would be written d(c(b(a))) in many
> programming languages. The special leading name "@" means
> the top element on the value stack.

As mentioned, we would need 2 types of expressions here. In my
formulation, there are separate but similar syntaxes for substitutions
and sections, and then the formatters apply to *values*. You haven't
defined the value of a section here -- only the value of an
expression, which may appear inside a section directive.

So it looks like you're punting on the formatters on (expanded value
of) sections. But I think this is just too useful to pass up.

> A section has the form
> "{.section" expr "}" true-body ("{.or}" false-body)? "{.end}"
> In the output, it is replaced by the result of the following procedure.
> First, expr is evaluated to produce a value v.
> If v is not false or empty, v is pushed on the value stack, true-body
> is executed, and v is popped off. Otherwise, false-body is executed
> if present.

The asymmetry here will lead to awkwardness in practice. Say you have
{"num": 3}, then the most compact thing to do is:

{.section num|plural?)
There are {@} people here.
{.or}
There is {num} person here.
{.end}

Of course you can use {num} in the first clause because of the stack
walk, but this subtlety is confusing.

In my formulation, we retain the original rule that the "num" context
is pushed on the stack if it's non-empty. The predicate determines
which one to execute. These decisions would be orthogonal -- in yours
they're tied together because you discard the original value in favor
of the filtered one.

Also, the definition for plural? here is:

def Plural(num):
if num > 1:
return num
else:
return None

I'd rather write:

def Plural(num):
return num > 1

> A repeated section has the form
> "{.repeated section" expr "}" body ("{.or}" false-body)? "{.end}"
> [Yes there's also an alternates-with but it doesn't matter here.]
> In the output, it is replaced by the result of the following procedure.
> First, expr is evaluated to produce a value v, which must be an array.
> For each element of v, that element is pushed on the stack, body is
> executed, and the element is popped off. If v is empty, false-body
> is executed if present.
>
> As a result of this definition, one style of filters is as "predicates"
> which return either their input or false. Using them can enable or
> disable sections appropriately:
>
> {.section num|>1}
> There are {num|english} people in group {name}
> {.or}
> {.section num|==1}
> There is one person in group {name}
> {.or}
> There are no people on group {name}. How sad.
> {.end}
> {.end}

This brings up something I've been thinking about -- chaining of
conditions (e.g. elif, elsif). I would rather have a flat structure
than a nested one.

> It is also possible to use filters to transform the data before iterating
> over it, for example to select only the public fields from a data structure
> definition:
>
> {.repeated section fields|public?}
> Field {name} has type {type}.
> {.end}

Right. In either formulation, I would write that like this, to avoid
using a loop in the formatter:

{.repeated section fields}
{section @|public?} {# Repeated sections move the cursor each time}

Field {name} has type {type}.
{.end}

{.end}

public? operates on an element rather than a list of elements.

> Or to order an array in a certain way:
>
> {.repeated section people|sort-by-phone-number}
> {name} {phone-number}
> {.end}

Unrelated to this discussion -- in either formulation, you can have

{.repeated section people|sort phone-number}
{name} {phone-number}
{.end}

{.repeated section people|sort name}
{name} {phone-number}
{.end}

Where sort is a single formatter that takes a string argument. This
works like the template-file ("include" feature) if you haven't seen
it.

----

About overloading | operator -- note that I tried to use ? as an
operator instead of |. However, this just looks ugly:

{.section ? plural}
{.end}

{var?plural}

The ? reads better after the predicate name, and it also would look
too much like a ternary operator, but the position of the predicate
swapped.

I'm open to suggestions on syntax that distinguishes predicates from
formatters. Formatters can be chained, so | is appropriate, but
predicates so far can't, although it's conceivable to have multiple
predicates.

> Or perhaps to create a potentially large data structure on the fly:
>
> {.section x|primes-up-to-10000}
> {@} is prime.
> {.end}
> Again, the primes are: {x|primes-up-to-10000}.
>
> In this implementation, evaluation of expressions has just
> one behavior: a|b|c|d means take a, pipe it through b, c, and d,
> and use the result as the value of the expression, no matter
> what the context.
>
> Neither sections nor substitutions have predicates or formatters.
> Data has filters. Instead of 4 cases, there is 1.
> Predicates are an idiom, not a built-in concept.

So, as mentioned, you do have 2 cases for expressions. And my
formulation has 2 as well -- this is because I've introduced the
concept of "values" of sections and formatters, and defined
predicates/formatters on values, not a 2x2 matrix.

In the end I think that the practical matter of formatting sections is
vital. This was asked for by users, and it will fix nastiness I have
in my doc/makedocs.py stuff. There will be a bunch of silly Python
code eliminated in favor of simple, declarative templates.

It's very interesting how there can be two consistent but different
interpretations of the same language. I'm glad that coherent models
can be made.

thanks,
Andy

Andy Chu

unread,

Jun 20, 2009, 3:56:41 PM6/20/09

to json-t...@googlegroups.com

To be very succinct about it:

- You're trying to make sections and substitutions the same by
defining common expressions, which include formatters. But sections
and substitutions are not the same.

- However, the *values* of both sections and substitutions *are* the
same in my model (strings), and thus formatters and predicates can
apply to both.

thanks,
Andy

Andy Chu

unread,

Jun 20, 2009, 4:20:49 PM6/20/09

to json-t...@googlegroups.com

Another possibility is to define ? as a unary suffix operator, which
means "use my argument to determine whether to show this value". I'll
have to think about this a bit.

Andy

Russ Cox

unread,

Jun 22, 2009, 2:20:48 PM6/22/09

to json-t...@googlegroups.com

> To be very succinct about it:
>
> - You're trying to make sections and substitutions the same by
> defining common expressions, which include formatters. But sections
> and substitutions are not the same.

I am trying to make the *argument* to sections and substitutions the same.
That is different from making the concepts the same.

> - However, the *values* of both sections and substitutions *are* the
> same in my model (strings), and thus formatters and predicates can
> apply to both.

I understand this. The problem is that this interpretation
makes the evaluation of the expression context-sensitive,
which is almost never done and almost always a mistake.

Russ

Russ Cox

unread,

Jun 22, 2009, 4:14:08 PM6/22/09

to json-t...@googlegroups.com

> But there's a problem as you've defined it: sections can't contain
> .name lookups now. Even if they could, the lookup is "direct", not
> the type that walks the stack. The fundamental problem is that for a
> section, you're looking for context to test for truth and push on the
> stack, while in a substitution, you're looking for a value to put in
> the output. These just can't be made the same.

Sure they could. It's a bit surprising
that they're different now anyway.
In both cases you are looking for a value.

The fundamental problem I have with predicates
as a special case is the reuse of the | syntax.
If predicates were written {.section num?plural}
then it would be clear they were a different case.

> In the end I think that the practical matter of formatting sections is
> vital. This was asked for by users, and it will fix nastiness I have
> in my doc/makedocs.py stuff. There will be a bunch of silly Python
> code eliminated in favor of simple, declarative templates.

I think this would be great, but I wonder
if there is better syntax. On the page,
{.section a|b} looks like a section with
argument a|b, but in fact it is {(.section a)|b}.
I think it would make more sense to add a
separate "pipe this block through a filter"
operator than overload .section. .section is
about testing conditions and walking into structs.

Your example:

{.section program|html}
if (n < 3 && m > 4) ...
{.end}

does not have any substitutions, and that
sounds like a common case. So why
should the reformatting of a block require
pushing a new variable on the evaluation stack?

Russ

Andy Chu

unread,

Jun 23, 2009, 2:00:52 AM6/23/09

to json-t...@googlegroups.com

On Mon, Jun 22, 2009 at 1:14 PM, Russ Cox<r...@swtch.com> wrote:
>
>> But there's a problem as you've defined it: sections can't contain
>> .name lookups now. Even if they could, the lookup is "direct", not
>> the type that walks the stack. The fundamental problem is that for a
>> section, you're looking for context to test for truth and push on the
>> stack, while in a substitution, you're looking for a value to put in
>> the output. These just can't be made the same.
>
> Sure they could. It's a bit surprising
> that they're different now anyway.
> In both cases you are looking for a value.

In the .section case, you don't want to look *up* the stack for
something to push onto it. That's just confusing and not necessary.
It's conceivable that sections could have dotted lookup, so the
expression syntax would be the same, but the way you evaluate the
expressions will still be different.

> The fundamental problem I have with predicates
> as a special case is the reuse of the | syntax.
> If predicates were written {.section num?plural}
> then it would be clear they were a different case.

Yes, and since I want to reserve room for filtering sections, we can't
use | unless the ? in plural? becomes part of the language.

I tried the {.section num?plural} syntax -- it still just feels a
little wrong. It would be good to imagine what more than 2 cases
would look like:

{.section num .plural?}

There are {@} people here.

{.==1?}
There is only one person here.
{.or}
There is nobody here.
{.end}

Not saying that's the best syntax, but if we need multiple cases,
there should be room for expansion.

> I think this would be great, but I wonder
> if there is better syntax. On the page,
> {.section a|b} looks like a section with
> argument a|b, but in fact it is {(.section a)|b}.
> I think it would make more sense to add a
> separate "pipe this block through a filter"
> operator than overload .section. .section is
> about testing conditions and walking into structs.
>
> Your example:
>
> {.section program|html}
> if (n < 3 && m > 4) ...
> {.end}
>
> does not have any substitutions, and that
> sounds like a common case. So why
> should the reformatting of a block require
> pushing a new variable on the evaluation stack?

You can avoid this by having the default argument be @:

{.section|html}
Just literal strings
{.end}

Django has something called "blocktrans" which is just plain ugly:

http://docs.djangoproject.com/en/dev/topics/i18n/

But I think you're right about the common case not involving
substitutions. And in that case the translation can be done outside
of JSON Template -- as a filter on the JSON data dictionary. The
"JSON Config" language I mentioned to you will support this -- it's an
expression language that evaluates to JSON.

There are a quite a few cases where you want to do multiple template
expansions, so the configuration language will "coordinate" those.
That's nicer than having the template language itself specify multiple
stages of filtering.

Another complication is that some filters can be done piece-by-piece,
like HTML escaping, while others can't, like wikification. I want to
preserve the option of compiling to JavaScript or C without too much
fuss.

So I think I'll put the section filtering off at least until JSON
Config is done, and I can handle the common case there. It would be
extremely convenient to do things inline in the template, but it will
complicate the implementation, and I'm concerned about the JavaScript
code size. I want to hold off for the larger picture of composing
these little languages.

For now we'll have to come up with some syntax for predicates on
sections, since those problems can't be solved outside the language.
I'm open to more suggestions on syntax.

I like the idea of trying to make it an idiom, but the asymmetry
between the evalutation context of the section clauses looks like a
dealbreaker. The subtlety is that the section argument specifies an
evaluation context for ALL clauses -- and the predicate decides
*which* clause to show. The {.or} clause should be evaluated in the
named context if it exists, regardless of the result of the predicate.

Andy

Reply all

Reply to author

Forward