autoescaping for template

213 views
Skip to first unread message

Mike Samuel

unread,
Jun 14, 2011, 12:18:32 AM6/14/11
to golang-nuts
Go nuts,

I would like to provide an optional contextual autoescaping mechanism
for the template package ( http://golang.org/pkg/template/ ).

I was thinking that I would provide functions similar to
template.MustParse and friends that change {field} directives to
{field|formatterAppropriateToContext} before delegating to the other
parsing functions.

If you want detail, http://js-quasis-libraries-and-repl.googlecode.com/svn/trunk/safetemplate.html
explains why I think autoescaping is important and how I think this
can be implemented without incurring additional per-template
invocation overhead.

Does this sound like something go system developers might find useful?

If so, would anyone have pointers on where support code should live,
or can anyone help review code?

cheers,
mike

Rob 'Commander' Pike

unread,
Jun 14, 2011, 12:29:22 AM6/14/11
to Mike Samuel, golang-nuts

I'm in the middle of writing a new template package in part to make this sort of thing easy. Once that's done and soaked, the existing package will be deprecated.

In the meantime, though, it shouldn't be too hard to do if you're willing to provide your own formatter map.

-rob

Mike Samuel

unread,
Jun 14, 2011, 12:45:19 AM6/14/11
to Rob 'Commander' Pike, golang-nuts
2011/6/13 Rob 'Commander' Pike <r...@google.com>:

In case you're not too far along with the template language for
feature requests, autoescaping would be very easy to implement if I
can get a handle to the parse trees for a bundle of templates since
the easiest way to implement this is as a parse tree bundle to parse
tree bundle transformation. Having a whole compilation unit worth of
parse trees helps since I'm doing static analysis and being able to
look into callees where possible helps.

This is my first dive into Go, so I could I start by trying to get
some of the least possibly controversial bits written and conforming
with http://golang.org/doc/contribute.html and then send out a code
review request. Those early bits won't depend on the specifics of the
template language -- they depend on details of the HTML, JS, and CSS
grammars that make them attackable.

Or would it be better to first float a design document on this list
for criticism?

Rob 'Commander' Pike

unread,
Jun 14, 2011, 1:05:03 AM6/14/11
to mikes...@gmail.com, golang-nuts

Sounds like a plausible plan, but let's discuss it first. Always good to do that. That'll help me in my universe too.

Not sure why a bundle is better than a tree. Or are you thinking about nested templates?

-rob

Mike Samuel

unread,
Jun 14, 2011, 1:56:21 AM6/14/11
to Rob 'Commander' Pike, golang-nuts

Perhaps "bundle" is the wrong word.

Many template languages have a way to include the output from one
template in another. If your new scheme does, then it would help to
be able to rewrite bother callers and callees at the same time to
handle cases like the below (apologies for the ad-hoc syntax). If
there is no call mechanism or one tree contains multiple template
definitions, then one tree works.

{template main}
<a {call foo} onclick="var {call foo}; alert(href)">
{/template}

{template bar}
href="{field1}"
{/template}

> -rob

Rob 'Commander' Pike

unread,
Jun 14, 2011, 2:40:24 AM6/14/11
to mikes...@gmail.com, golang-nuts
roger. that's what i called nested templates.

-rob

Mike Samuel

unread,
Jun 14, 2011, 5:47:16 PM6/14/11
to golang-nuts
How about the following order of work and very high level design?



Task 1 - define context ( http://js-quasis-libraries-and-repl.googlecode.com/svn/trunk/safetemplate.html#contexts
)

A context corresponds to a parser state in the HTML/CSS/JS/URI grammar
represented as a tuple of enum values.

A context can pack into less than 20 bits, but they're not used during
template expansion so I don't see any point to make the code more
obscure by doing bunch of bit twiddling.

Even unpacked, a context is smaller than 64b, so code using them will
pass them by value: func (ctx Context) ... instead of func (ctx
*Context) ... below.

In Go, I will define enums thus:

type State uint8
const (
StateHtmlPcdata State = iota,
// other state definitions
)

and similarly for the other enum types.

Then I will define a Context as a struct of enums.
type Context struct {
state State
...
}

The zero enum value for each enum type is the value used in the
default template start context, so Context{} is the default template
start context.

A context should be immutable so the members are private, not
embedded, and there are getters per field.
func (ctx Context) State() state

Context supports a few operators:
ctx.Equals(Context) -- fieldwise equality

Union(ctx1, ctx2) -- the lower bound of the two see docs above

ctx.BeforeDynamicValue() -- context after a forced epsilon
transition.

ctx.State() etc. -- getter

ctx.WithState(State) etc. -- derives one context from another

Testing will focus on the union and epsilon transition operators.



Task 2 - Context propagation across chunks of HTML

Need to be able to compute a context after a chunk of HTML given the
chunk and the context before character 0 of the chunk.

The chunk of HTML is a string of raw text from the template between
two directives. E.g. in "<a href='/foo/{bar}.html'>", there are two
chunks of HTML "<a href='/foo/" and ".html'>".

Computing the context requires parsing tokens using a combined HTML/JS/
CSS/URI grammar. There is no common lexical grammar for these 4
languages which makes goyacc of limited utility. And chunks can start/
end inside what are typically consider tokens in those languages
lexical grammars.

I will define a
func ProcessRawText(
contextBefore Context, rawText string) (
contextAfter Context)
that uses a PEG style parser based upon a table mapping States to
arrays of possible transitions.

Each transition is tuple of
token *regexp.Regexp
lookahead *regexp.Regexp
func (token string, contextBefore Context) Context

Go's regexp package does not support lookahead assertions hence the
extra lookahead pattern.

There is already a substantial test-suite written in another language
that I can translate.
Since Go's regexp package does not support case-insensitivity, I will
try to improve test coverage of mixed-case and upper-case element and
attribute names.



Task 3 - Define Sanitizers
http://js-quasis-libraries-and-repl.googlecode.com/svn/trunk/safetemplate.html#sanitization_functions

This requires defining a number of functions akin to
template.HtmlFormatter.
E.g. they satisfy the signature
func (w io.Writer, format string, value ...interface{})

These are the most performance sensitive part of autoescaping since
the cost is incurred per-template invocation.

Testing will check for correctness by porting existing sanitizer test
suites, and on benchmarking.



Task 4 - Define sanitized content types.

http://js-quasis-libraries-and-repl.googlecode.com/svn/trunk/safetemplate.html#sanitized_content_types

Define an interface that sanitizer functions can recognize (via
TypeSwitchStmt?) to avoid over-escaping.

type SanitizedContent interface {
// Coerces to a string of the associated content kind.
fmt.Stringer
// One of the ContentKind consts.
ContentKind() ContentKind
}

Define a concrete type that implements SanitizedContent.

Testing will focus on expanding the task 3 test suite to cover
sanitized types.



Task 5 - Implement parse tree transformation to propagate context and

This requires implementing the autosanitize algorithm described at
http://js-quasis-libraries-and-repl.googlecode.com/svn/trunk/safetemplate.html#context_propagation

The nature of the parse tree is TBD.

Testing will focus on porting existing test suites.



Task 6 - Provide APIs

Update template APIs to make it easy to opt-into auto-escaping.

This work is only useful to template clients who wish to produce HTML,
XML, CSS, JS, or URIs.

Perhaps the API should be exposed via the template package or via a
package whose name reflects the web soup assumptions.

Implement an end-to-end test suite that uses the template APIs to
generate HTML, and then parses the output to ensure it is well-formed
HTML without unsanitized payloads.

Mike Samuel

unread,
Jun 21, 2011, 1:09:22 AM6/21/11
to golang-nuts
I put together a patch for Task 1 at http://codereview.appspot.com/4625052

Would anyone like to take a look? If so, to what handle I should hg
mail it?


On Jun 14, 2:47 pm, Mike Samuel <mikesam...@gmail.com> wrote:
> How about the following order of work and very high level design?
>
> Task 1 - define context (http://js-quasis-libraries-and-repl.googlecode.com/svn/trunk/safetemp...

Mike Samuel

unread,
Aug 3, 2011, 8:55:42 PM8/3/11
to golang-nuts
I've been looking at exp/template to see where contextual autoescaping
might fit in.

It's only relevant to templates used to produce strings of HTML, CSS,
or JavaScript, so it should probably not go in exp/template.

Is there a preferred place it should go?

I'd like to implement autoescaping as a transformation on parse trees
such as the Set returned by Parse.

How should template extensions work to rewrite template nodes? The
nodeType enum, listNode struct, and other node structs defined in exp/
template/parse.go all seem to be private to that module.

Rob 'Commander' Pike

unread,
Aug 3, 2011, 9:01:37 PM8/3/11
to Mike Samuel, golang-nuts

They're private until you figure out what needs to be public. The plan was to keep things private and away from dependent fingers until we know what needs to be exported.

The idea is to write new packages, say template/html etc., that import the existing template and do something to the parse tree and perhaps execution engine, although I hope it's just the parse tree, which gets statically verified and slightly rewritten, perhaps by appending processors to the execution pipelines.

So I think we're on the same page.

-rob

Mike Samuel

unread,
Aug 3, 2011, 9:18:32 PM8/3/11
to Rob 'Commander' Pike, golang-nuts
2011/8/3 Rob 'Commander' Pike <r...@google.com>:

Great. Then I'll work on exp/template/html and I'll send out a
monolithic change once I have it working that will include both
exp/template/html and changes to exp/template to make certain bits
public.

During the course of review, some of those public bits might be moved
back to private, and if/when it looks like exp/template/html is close
to ready I can split the monolithic change up as appropriate.

I may muck with the js and html predefined global functions used in pipelines.

Rob 'Commander' Pike

unread,
Aug 3, 2011, 9:28:43 PM8/3/11
to mikes...@gmail.com, golang-nuts

Let's discuss this a little beforehand once you have some details worked out. (The word "monolithic" scares me.) I prefer to chat about the design before the code is written rather than through the codereview process.

-rob

Mike Samuel

unread,
Aug 3, 2011, 10:05:25 PM8/3/11
to Rob 'Commander' Pike, golang-nuts
2011/8/3 Rob 'Commander' Pike <r...@google.com>:
>
> On Aug 4, 2011, at 11:18 AM, Mike Samuel wrote:

> Let's discuss this a little beforehand once you have some details worked out. (The word "monolithic" scares me.) I prefer to chat about the design before the code is written rather than through the codereview process.

Fair enough. I sketched out a design earlier in this thread at
http://groups.google.com/group/golang-nuts/browse_thread/thread/e8bc7c771aae3f20/abdf127060ab7bf6?lnk=gst&q=autoescaping+order+of+work#abdf127060ab7bf6

I think I have a better handle on task 5 which relates to the parse
tree transformation.

I define a cascading inferences struct which contains
* a mapping from template name to template root
* a mapping from template name to the computed start context and end context
* a list of cloned templates
* a set of pipeline functions that need to be added to {{pipeline}}s
* possibly a list of warnings that might be displayed if this inferences turns
out to be part of the consistent view of the template set

Then I implement
func propagateContext(templates *Set, out *inferences)
which performs http://js-quasis-libraries-and-repl.googlecode.com/svn/trunk/safetemplate.html#context_propagation
and a function to apply the consistent *inferences to the *Set

The nodes all seem to have line numbers which is great, but I'm not
quite clear on how to get errors/warning messages back to the user or
whether panic is the most appropriate thing to do for errors and log
for warnings.

Nigel Tao

unread,
Aug 4, 2011, 5:56:22 AM8/4/11
to mikes...@gmail.com, Rob 'Commander' Pike, golang-nuts
On 4 August 2011 12:05, Mike Samuel <mikes...@gmail.com> wrote:
> 2011/8/3 Rob 'Commander' Pike <r...@google.com>:
>>
>> On Aug 4, 2011, at 11:18 AM, Mike Samuel wrote:
>
>> Let's discuss this a little beforehand once you have some details worked out. (The word "monolithic" scares me.) I prefer to chat about the design before the code is written rather than through the codereview process.
>
> Fair enough.  I sketched out a design earlier in this thread at
> http://groups.google.com/group/golang-nuts/browse_thread/thread/e8bc7c771aae3f20/abdf127060ab7bf6?lnk=gst&q=autoescaping+order+of+work#abdf127060ab7bf6

On that thread, you've broken up your intent into a number of
different tasks, but I think you're slicing the overall body of work
on the wrong axis.

For example, the first patch you proposed is at
http://codereview.appspot.com/4625052

This defines a zillion constants and states, and it's hard for me as a
reviewer to see the forest for the trees (even after I've read the
paper at http://js-quasis-libraries-and-repl.googlecode.com/svn/trunk/safetemplate.html).
For example, I'm not sure if "type State uint8" is the best
representation, but I can't tell without seeing how it'll be used. In
Go, we often represent a state machine state's by its transition
function, such as exp/template/lex.go's "type stateFn func(*lexer)
stateFn".

Instead, for a first patch, I would like to see something that's
end-to-end but with a simplified functionality. I'd pick just two
types of escaping (URL and HTML) and demonstrate that if my template
was:
`<form action="/foo/{{.Name}}/bar">Hello {{.Name}} etcetera`
then I could recognize the first action as a URL and the second as HTML PCDATA.

The URL-ness recognition doesn't have to be born perfect. Once we
agree on an overall design, we can check it in and iterate towards
covering all the CSS Double-Quoted URI corner cases.

If I have some time over the weekend, I might try this myself.

Rob 'Commander' Pike

unread,
Aug 4, 2011, 7:19:33 AM8/4/11
to mikes...@gmail.com, golang-nuts, Nigel Tao
i agree with nigel but think (based on an idea from russ) it should start with an even simpler first step that just examines a parse tree and prints out useful information about the state of the parse tree at the action points.

to facilitate that, it probably makes sense for me to do the work of breaking out the lexer and parser into subpackages of template that yours and other analyzers can use directly. how does that sound?

-rob

Mike Samuel

unread,
Aug 4, 2011, 11:52:33 AM8/4/11
to Rob 'Commander' Pike, golang-nuts, Nigel Tao
2011/8/4 Rob 'Commander' Pike <r...@google.com>:

> i agree with nigel but think (based on an idea from russ) it should start with an even simpler first step that just examines a parse tree and prints out useful information about the state of the parse tree at the action points.
>
> to facilitate that, it probably makes sense for me to do the work of breaking out the lexer and parser into subpackages of template that yours and other analyzers can use directly. how does that sound?
>
> -rob

I can put together an end-to-end solution for a simplified HTML
grammar that ignores HTML comments, RCDATA and CDATA tags, that
naively assumes all templates start in a PCDATA context, and only
admits double quoted attributes and only tries to specially handle URI
attributes.

I don't think having the template lexer available would help.

Russ Cox

unread,
Aug 4, 2011, 12:13:31 PM8/4/11
to mikes...@gmail.com, Rob 'Commander' Pike, golang-nuts, Nigel Tao
> I can put together an end-to-end solution for a simplified HTML
> grammar that ignores HTML comments, RCDATA and CDATA tags, that
> naively assumes all templates start in a PCDATA context, and only
> admits double quoted attributes and only tries to specially handle URI
> attributes.
>
> I don't think having the template lexer available would help.

To be more concrete, the proposal is that Rob will
take on the work to make the syntax tree available,
probably as types defined in a separate package,
the same way that exp/regexp has exp/regexp/syntax.
The idea is that the syntax tree is a decent way
to let higher-level packages analyze the templates.

Having done that, I think there's actually little at
the beginning that has to be designed. I'd be
happy with

package html // exp/template/html
func Analyze(*Set)

where Analyze just prints things it learns.
Getting to a point where that actually works
should be a whole bunch of small-step CLs during
which we can help you (Mike) pick up effective Go
idioms and you can help us understand the problem
domain better. By the time Analyze is printing the
right information everyone will be up to speed on
the two halves and we can figure out the right way
to use the analysis. It might be that we want to
give the new package the same API as template,
or it might be that we want to make it something you
ask for while using the real template.

The ideal progression would be a sequence of small
(~500 line, definitely not more than 1000 line) CLs,
so that we can do fine adjustments on the overall
trajectory in flight.

Russ

Mike Samuel

unread,
Aug 4, 2011, 1:08:46 PM8/4/11
to r...@golang.org, Rob 'Commander' Pike, golang-nuts, Nigel Tao
2011/8/4 Russ Cox <r...@golang.org>:

Understood. I will watch exp/template for any syntax checkins before
doing any further work.

Reply all
Reply to author
Forward
0 new messages