Re: "html/dom" alternative to html/template for true separation of concerns?

301 views
Skip to first unread message
Message has been deleted

Egon

unread,
Sep 13, 2017, 3:58:47 PM9/13/17
to golang-nuts
If you want to manipulate HTML files then there is https://godoc.org/golang.org/x/net/html,
but it comes with all the dangers of potential injection attacks and so on... which "html/template" avoids.
Writing something that injects into the specific nodes and afterwards encodes shouldn't be a big problem.

If you want to write HTML directly from code then writing a simple html encoder with the necessary models


Anyways it's unclear what you are proposing or needing: in general standard libraries shouldn't do everything
and probably this, whatever it is, should belong to a 3-rd party package.

+ Egon

On Wednesday, 13 September 2017 22:02:02 UTC+3, Karv Prime wrote:
Hello,

I only recently found my way to go. I'm a (former?) fullstack web-dev and as I ran into a PHP related problem (DOMDocument not working with HTML5 tags, I'd choose another solution stack if the language wouldn't be a fixed point in history) I was looking if Go already has a good way to manipulate HTML files. The templating is fine, but in my humble opinion there's a problem...

Problem: IMHO templating in the current form is flawed. To insert placeholders (i.E. "{{.nav}}") probably isn't an optimal solution as it just tells the code "hey, act upon me". It seems to be a shallow solution to prevent code-mixins, but fails to really separate the concerns.

Solution: If there would be a Go package to directly manipulate the DOM it would be very helpful to separate Markup and Code. The code would act onto the markup file (*.html) to create the page/site/module/... (whatever is needed).

Pros:
- Frontend devs could create their own pages, modules, etc. without thinking about any special tags they'd need.
-> '<main></main>' instead of '<main>{{.content}}</main>'
-> '<meta name="description" content="">' instead of '<meta name="description" content="{{.description}}">'
- Error/Exception if some tag/id/class/... has not been found instead of admins possibly not knowing about it.
-> You can act upon it and tell the users "Oops, something went wrong, we're looking into it." so they know that the current state of the site isn't what they should see.
-> Better an empty element (and the admin knows about it) instead of users seeing placeholders.
- It's easier to avoid any problems with funny users trying to trick the system.
- In theory faster than templating solutions (untested claim, so there's a big questionmark)?
- It prefers modular frontends (main site, nav, main content, reusable modules (i.E. for items on a sales platform)) instead of a single file with placeholders
- It prefers cleaner code and true SoC instead of the ofttimes preferred workflow "just a little HTML in the code to create each item faster" or vice versa.
- ...

Cons:
- If there are elements unknown to the backend-devs, they will probably stay empty
-> Possible solution could be some kind of taint-checking for empty elements after page creation
- "Duplicate" code if there's frontend-scripting that is changing parameters accordingly to AJAX results, but that's almost unavoidable.
- Probably more communication needed between backend- and frontend-devs
-> Possible solution, the aforementioned taint-checking, to see these problems in testing, if they should arise
- ...

Feel free to contribute your thoughts, other pros/cons, etc. :)

Kind regards
Karv

Andy Balholm

unread,
Sep 13, 2017, 4:42:20 PM9/13/17
to karv....@gmail.com, golang-nuts
It sounds like what you’re wanting to do is basically what is called Template Animation at http://www.workingsoftware.com.au/page/Your_templating_engine_sucks_and_everything_you_have_ever_written_is_spaghetti_code_yes_you

You might be interested in goquery (github.com/PuerkitoBio/goquery), which provides jQuery-like syntactic sugar over x/net/html.

Andy
Message has been deleted
Message has been deleted

Andy Balholm

unread,
Sep 13, 2017, 6:43:10 PM9/13/17
to golang-nuts
You may not be aware that the html/template package does automatic escaping. So if a template has <div id=not-so-secure-blogpost>{{.Blogpost}}</div> and Blogpost contains <script>alert(“Pwned”)</script>, the result will be something like <div id=not-so-secure-blogpost>&lt;script&gt;alert(&quot;Pwned&quot;)&lt;/script&gt;</div>

Assigning to the div’s innerHTML would be bad in this case, but appending a text node to the div would be safe.

Andy

On Sep 13, 2017, at 2:10 PM, karv....@gmail.com wrote:

I don't know why it's unclear, what I'm proposing, but I'll try a 2nd time:

Something similar to: http://php.net/manual/en/book.dom.php

Or, even simpler:
- Find Tags, IDs, Classes, etc. in an HTML document.
- Something similar to Element.innerHTML to put content into these tags (https://developer.mozilla.org/en-US/docs/Web/API/Element/innerHTML)
- Something similar to Element.setAttribute to change attributes of DOM elements (https://developer.mozilla.org/en-US/docs/Web/API/Element/setAttribute)
- Maybe some validation if the HTML DOM is correct
- Possibly some sanitation to check if there are any empty tags or empty tag attributes (i.E. empty content on some meta tag)

In short: Load some HTML code, and manipulate the HTML Document Object Model instead of being dependent on placeholders.

Yes, a standard library shouldn't do everything. But same goes with templating, so that isn't really an argument against implementing it into the codebase if one of the main foci of Golang is the Web.

I wasn't ignoring the Security Model. If someone uses Golang to create a comment section in the web, the same could happen with Templates, if the developer isn't aware of possible security issues. There is no difference if some unchecked user content is injected into <div id="not-so-secure-blogpost>{{blogpost}}</div> or <div id="not-so-secure-blogpost></div>. So I really don't see where "html/template" avoids this issue if some coder doesn't watch out how user content is handled. Escaping the user content (or other security features) can be implemented too, yes - but that should be some other package imho.

Kind regards
Karv

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Message has been deleted

Andy Balholm

unread,
Sep 13, 2017, 9:58:02 PM9/13/17
to Karv Prime, golang-nuts
Why does automatic escaping make html/template completely impractical? (Or did I guess the antecedent of “it” incorrectly?)

Andy

On Sep 13, 2017, at 4:30 PM, Karv Prime <karv....@gmail.com> wrote:

Thank you for the heads up. So it is completely impractical for the needed purpose.

In that case it would be truly bad. That's why user input should always be checked. Such a blogpost shouldn't even come that far. ^^ Either it's escaped before it gets to the database (not truly necessary due to prepared statements etc., but depends on the use case scenario), but at least it should be escaped before it hits the visual representation.

Let's stay with the blogpost example to give some further insight and assume the following [folder]/file structure:
[site]
- site.html (the full site, but without nav, and main, as well as data that depends on which page is shown, language, etc. (html lang, title, keywords, etc.)
- nav.html (only the navigation, which isn't depending on anything, but exists as its own module)
- main.html (main content - in our case the blog - that has different blogposts)
[modules]
- blogpost.html (a singular blogpost and how it should look like)

So the application should at first stick together site, nav, and main. If that happens at runtime or if it creates a template beforehand is a matter of optimization, but doesn't really matter in our example. As the user requested the page in Latin, lang, title, keywords, etc. are filled in accordingly. Up to that point any code injection could be possible but then there are other security concerns as until then no user data has been used. We have 5 blogposts as our blog came to live a day ago and up to now only spambots were here. But user entries are user entries, so let's parse them. Take the blogpost.html file and fill <div class="username"></div> as well as <div class="userpost"></div> with their content: Escape the content, then fill it in the same way as innerHTML from JS works. Put these 5 blogposts into main and send it to the user.
Another user clicks "blog" in the navigation but has JS activated - so it only loads the main content. Again the 5 blogposts, but not the full site.
Some other user is active on the blog, but gets updates every 10 minutes or due to server side events - as the previous user complained about the botposts he now only gets a representation of blogpost.html sent with the content to be prepended before the other posts.

Yes, one could realize that solely with templates. But everytime just a little thing has to be changed (i.E. another navigation link added) someone has to touch the whole site.html file (GIT be praised, but nonetheless it's not that good for really big sites, so a separation is at least sometimes practical). The downside is that every HTML guy needs to learn the "how to templating in language X", be it Golang, Twig, Smarty, ... instead of just creating plain simple HTML which can be manipulated by the code via the HTML DOM. And if there's something missing it creates a warning which is practical too (as, if the full site without the dynamic stuff gets stitched together beforehand from some kind of easy maintainable [meta] page, it could stay with the previous version until the oversight is solved, or whatever one wants to do with that information). And the problem "some coders could actually forget to check user input" can be solved with taint checking (if the content comes from a "secure" source (i.E. a .html file) there is no need for a warning, but if it's from a database all hell should break loose) - but as files could under certain circumstances also be user-created (i.E. some esoteric database where every blog entry is a file) there's a problem here. One can't prevent coders from making mistakes. PHP tried, it failed. ^^ Java has no taint checking if user data is injected into a SQL query, Perl and Ruby have it. Maybe the solution would be to allow a coder to choose between an unescaped innerHTML and an escaped one.
Message has been deleted

Egon

unread,
Sep 14, 2017, 1:42:56 AM9/14/17
to golang-nuts
On Thursday, 14 September 2017 00:11:11 UTC+3, Karv Prime wrote:
I don't know why it's unclear, what I'm proposing, but I'll try a 2nd time:

The devil is in the details :), but this makes it clearer.

I just had few different ideas floating around in my head that could fit the first description. (e.g. is it similar to shadow-dom, direct dom manipulation, dom manipulation through some abstraction, separate layers etc.)
 

Something similar to: http://php.net/manual/en/book.dom.php

Or, even simpler:
- Find Tags, IDs, Classes, etc. in an HTML document.
- Something similar to Element.innerHTML to put content into these tags (https://developer.mozilla.org/en-US/docs/Web/API/Element/innerHTML)
- Something similar to Element.setAttribute to change attributes of DOM elements (https://developer.mozilla.org/en-US/docs/Web/API/Element/setAttribute)
- Maybe some validation if the HTML DOM is correct
- Possibly some sanitation to check if there are any empty tags or empty tag attributes (i.E. empty content on some meta tag)

In short: Load some HTML code, and manipulate the HTML Document Object Model instead of being dependent on placeholders.

Yes, a standard library shouldn't do everything. But same goes with templating, so that isn't really an argument against implementing it into the codebase if one of the main foci of Golang is the Web.

I wasn't ignoring the Security Model. If someone uses Golang to create a comment section in the web, the same could happen with Templates, if the developer isn't aware of possible security issues. There is no difference if some unchecked user content is injected into <div id="not-so-secure-blogpost>{{blogpost}}</div> or <div id="not-so-secure-blogpost></div>. So I really don't see where "html/template" avoids this issue if some coder doesn't watch out how user content is handled. Escaping the user content (or other security features) can be implemented too, yes - but that should be some other package imho.

Kind regards
Karv

Am Mittwoch, 13. September 2017 21:58:47 UTC+2 schrieb Egon:

Egon

unread,
Sep 14, 2017, 1:48:46 AM9/14/17
to golang-nuts
On Thursday, 14 September 2017 02:40:41 UTC+3, Karv Prime wrote:
Thank you for the heads up. So it is completely impractical for the needed purpose.

In that case it would be truly bad. That's why user input should always be checked. Such a blogpost shouldn't even come that far. ^^ Either it's escaped before it gets to the database (not truly necessary due to prepared statements etc., but depends on the use case scenario), but at least it should be escaped before it hits the visual representation.

See https://rawgit.com/mikesamuel/sanitized-jquery-templates/trunk/safetemplate.html#problem_definition it goes into depth on the implications of a particular design.

tl;dr; practice has shown that people forget proper sanitization. Automatic sanitization with type-inference is easy to use and gets things right unless you start messing with the template.HTML directly.


Let's stay with the blogpost example to give some further insight and assume the following [folder]/file structure:
[site]
- site.html (the full site, but without nav, and main, as well as data that depends on which page is shown, language, etc. (html lang, title, keywords, etc.)
- nav.html (only the navigation, which isn't depending on anything, but exists as its own module)
- main.html (main content - in our case the blog - that has different blogposts)
[modules]
- blogpost.html (a singular blogpost and how it should look like)

So the application should at first stick together site, nav, and main. If that happens at runtime or if it creates a template beforehand is a matter of optimization, but doesn't really matter in our example. As the user requested the page in Latin, lang, title, keywords, etc. are filled in accordingly. Up to that point any code injection could be possible but then there are other security concerns as until then no user data has been used. We have 5 blogposts as our blog came to live a day ago and up to now only spambots were here. But user entries are user entries, so let's parse them. Take the blogpost.html file and fill <div class="username"></div> as well as <div class="userpost"></div> with their content: Escape the content, then fill it in the same way as innerHTML from JS works. Put these 5 blogposts into main and send it to the user.
Another user clicks "blog" in the navigation but has JS activated - so it only loads the main content. Again the 5 blogposts, but not the full site.
Some other user is active on the blog, but gets updates every 10 minutes or due to server side events - as the previous user complained about the botposts he now only gets a representation of blogpost.html sent with the content to be prepended before the other posts.

Yes, one could realize that solely with templates. But everytime just a little thing has to be changed (i.E. another navigation link added) someone has to touch the whole site.html file (GIT be praised, but nonetheless it's not that good for really big sites, so a separation is at least sometimes practical). The downside is that every HTML guy needs to learn the "how to templating in language X", be it Golang, Twig, Smarty, ... instead of just creating plain simple HTML which can be manipulated by the code via the HTML DOM. And if there's something missing it creates a warning which is practical too (as, if the full site without the dynamic stuff gets stitched together beforehand from some kind of easy maintainable [meta] page, it could stay with the previous version until the oversight is solved, or whatever one wants to do with that information). And the problem "some coders could actually forget to check user input" can be solved with taint checking (if the content comes from a "secure" source (i.E. a .html file) there is no need for a warning, but if it's from a database all hell should break loose) - but as files could under certain circumstances also be user-created (i.E. some esoteric database where every blog entry is a file) there's a problem here. One can't prevent coders from making mistakes. PHP tried, it failed. ^^ Java has no taint checking if user data is injected into a SQL query, Perl and Ruby have it. Maybe the solution would be to allow a coder to choose between an unescaped innerHTML and an escaped one.

Am Donnerstag, 14. September 2017 00:43:10 UTC+2 schrieb Andy Balholm:

Andy Balholm

unread,
Sep 14, 2017, 10:03:51 AM9/14/17
to Karv Prime, golang-nuts
I still don’t understand why automatic escaping makes html/template impractical for the purpose you were describing. Is it because the blog post would be HTML rather than plain text? In that case, you would need to convert it to the template.HTML type before passing it to the template, and it would be rendered without escaping. (Then sanitizing the HTML to prevent XSS would be up to you, of course.) The html/template escapes content by default, but there are ways to get around it if you tell it you know what you are doing.

Andy

On Sep 13, 2017, at 7:01 PM, Karv Prime <karv....@gmail.com> wrote:

It = html/template
"The purpose" = the one I thought I could use it for and described above.

Marvin Renich

unread,
Sep 14, 2017, 10:34:42 AM9/14/17
to golang-nuts
* Karv Prime <karv....@gmail.com> [170913 22:01]:
> It = html/template
> "The purpose" = the one I thought I could use it for and described above.

I'm still not sure you understand the capabilities of html/template.
This playground snippet might help you:

https://play.golang.org/p/_1KSiZbwh-

package main

import (
"fmt"
"html/template"
"os"
)

var Trusted = `<div style="font-weight:bold">This Is Bold</div>`
var Untrusted = `<div style="display:none">pwnd</div>`

var MyHtml = `<body>
<p>{{.systemContent}}</p>
<p>{{.userContent}}</p>
</body>`

func main() {
var tmpl, err = template.New("").Parse(MyHtml)
if err != nil {
fmt.Fprintf(os.Stderr, "error parsing template: %s\n", err)
os.Exit(1)
}
var data = make(map[string]interface{})
data["systemContent"] = template.HTML(Trusted)
data["userContent"] = Untrusted
err = tmpl.Execute(os.Stdout, data)
if err != nil {
fmt.Fprintf(os.Stderr, "error executing template: %s\n", err)
os.Exit(1)
}
}

You as the programmer get to decide which sources are trusted and which
are not.

If you are happy with goquery, as someone else suggested, that's fine.
But the template package may be simpler to work with and do what you
want. Note that with template, you can put some, or even most, of your
logic in the template, or you can go the other way around and have only
simple {{.foo}} tags in the template that get replaced with large chunks
of HTML that you generate in your program (which seems to be very close
to what the Template Animation link encourages), or somewhere in
between.

...Marvin

Message has been deleted

Andy Balholm

unread,
Sep 14, 2017, 11:44:42 AM9/14/17
to Karv Prime, golang-nuts
The placeholders never show up in template output. If the data is missing, the placeholders normally just disappear; in some cases there might be an error, depending on exactly what type of “missing.”

Andy

On Sep 14, 2017, at 8:14 AM, Karv Prime <karv....@gmail.com> wrote:

As it would get a little bit confusing if I'd reply to everyone with a single post, I'll answer in a single post. I hope you don't mind. At least now it's past 16:00 and not past 04:00 and I have a clearer mind. ^^

@Egon: I've read the whole article - yes, many coders sadly do forget about proper sanitization of user-input. As I'm pretty focused on security, I know about the implications of many design-approaches. Easy-to-use approaches are neat and in that certain case super useful - but sadly not for my use-case. ^^

@Andy Balholm: No, the "blog posts" are not HTML. Again: There is a reusable HTML snippet. That snippet can be filled with user content - which truly needs to be sanitized due to security concerns. If the snippet gets sent to the user via asynchronous request there's nothing more to do as JS takes the part with putting it into its place. But if the whole page has to be rendered, that snippet needs to be put into the page, before the whole page gets sent to the user. The other way would be to leave the complete rendering to the user browser which comes with its very own disadvantages (i.E. no scripting available, etc.).
I thought that the whole package auto-sanitizes the content as you've stated before. Now, okay, it's usable for that use case. It's not perfect with all the artifacts one needs to put into the HTML code, but if necessary I can work with that. ^^

@Marvin Renich: Thank you for this information. I'm new to Golang and I probably misunderstood one comment here for "the (whole) template package does automatic escaping), so I didn't look further - my mistake. So it would be possible to implement everything via the template package - yet there's the disadvantage of the need to put artifacts into the markup which then get replaced by the wanted content (I have to look into it further - if there's an error if there is no data for some template code it's perfectly fine... otherwise it will look like some websites where the artifacts are visible to the user if they didn't get replaced).

Marvin Renich

unread,
Sep 14, 2017, 12:29:03 PM9/14/17
to golang-nuts
* Karv Prime <karv....@gmail.com> [170914 11:14]:
> ... - yet
> there's the disadvantage of the need to put artifacts into the markup which
> then get replaced by the wanted content

You have to do that anyway, you just use different artifacts. Each
location where a substitution will occur must be uniquely identified,
whether it is <div class="summaryData"></div> or
<div>{{.summaryData}}</div>.

If summaryData has any HTML at all, then the programmer and the designer
must coordinate on styles (at least style names) anyway (the Template
Animation post glosses over this point), so there is not even a need for
the <div> wrapper; just put {{.summaryData}} and let the program supply
any necessary style or class attributes. The resulting HTML will have
one less unnecessary wrapper element.

I'm not saying the <div> wrapper can't be in the template if it is
useful for other purposes, but that it is not needed for the template
substitution, whereas it is needed as a placeholder when doing DOM
manipulation.

...Marvin

Message has been deleted
Message has been deleted

Marvin Renich

unread,
Sep 14, 2017, 3:45:43 PM9/14/17
to golang-nuts
* Karv Prime <karv....@gmail.com> [170914 13:16]:
> I wouldn't agree on "there is not even a need for the <div> wrapper" part.
> If the HTML tags are produced entirely by code, it comes with its own
> issues - suddenly there is a thing that wasn't in the markup - it would
> probably reduce maintainability. If there's already a file with <div
> class="summaryData">[...]</div> it can be reused as the designer sees fit.
> Let's, for example, assume 2 cases.

As I said in my previous post, if the <div> element is useful for the
designers, e.g. to position or align the whole replacement summaryData
(which I am assuming is significantly more complicated than some
unformatted text), then by all means put it in.

The point about not needing the <div> tags was really more directed at
use cases where you are inserting some simple replacement in the middle
of the content of a larger element, e.g.

<p>Hello, {{.Name}}, you last logged in on {{.PrevLogin}}.</p>

Doing this with DOM manipulation requires extra <span> elements around
the items to be replaced. The snippet above is much easier to read and
maintain than the corresponding HTML with <span> elements, and produces
smaller, more readable HTML to send over the wire.

The <div> tag also _may_ not be needed in cases where the replacement
content is significantly larger, contains multiple elements, and a
container is not needed to separate the replacement from elements before
and after it.

> Case 1: Main file has '[...]<main></main>[...]', the module is the
> aforementioned div with summary data. The coder has to know where to put
> it. Does it belong directly in main? Is it inside another element? The
> backend dev doesn't know without input from the frontend devs - so backend
> devs are involved deeper into the frontend design as they should be.

No, the main file has

<main><some-elements />{{.summaryData}}<more-elements /></main>

The program is told where to put it by the designer.

My point was that the designer must use something to identify the
location for the replacement, and there is little difference to the
designer whether the file has

<div id="summaryData"></div>
or
<div>{{.summaryData}}</div>

I posit that the second is much easier to recognize as a placeholder for
data that will be supplied later; easier for the designers, easier for
the programmers, easier for outside consultants unfamiliar with the
project, easier for new hires, and easier for management.

...Marvin

jens.s...@gmail.com

unread,
Oct 19, 2017, 10:36:27 AM10/19/17
to golang-nuts
@Karv Prime
I was looking for a library for this but i have not been able to find any yet. Did you have any luck? I have worked with http://www.thymeleaf.org/ for Java extensively and I really enjoyed it. Its a lot easier to work with the front-end people with legal html. Dom manipulation gives a lot more separation than is possible with simple template/injection in my opinion. Maybe we should start on making a golang lib for this... :-)

@Marvin Renich
Not to start a war on which approach is better but when using the dom manipulation.


<p>Hello, {{.Name}}, you last logged in on {{.PrevLogin}}.</p>

would be:

<p th:text="#{hello}">All this text will be replaced by the hello var. This way the front end developer can test with long/short texts (user names) and see how they look</p>

Also if doing localization you would not have static text like that in your html template.

Also:
<div th:replace="header"></div>  - is much better IMO. The front-end guy can put in whatever html he needs to test the div content and it will all get replaced when the dom is parsed.



Reply all
Reply to author
Forward
0 new messages