How to use HTML escapes in Field titles

82 views
Skip to first unread message

Joe Barnhart

unread,
Jul 24, 2017, 9:00:09 PM7/24/17
to web2py-users
I have field titles which need to indicate "less than or equal" or "greater than" symbols.  HTML provides such escapes with the character sequences ≤ and > respectively.  But when I use these characters in the "title" of a Field, the displayed SQLFORM changes these back into "≤" for example instead of displaying the "less than or equal" symbol.  If I use the Field titles directly in a table on my own, they work as expected.

Is there a reason why the "title" of a Field is afforded this extra protection from me, the hapless programmer?

Here's a table definition, cuz Anthony always asks for one.  (And he's right to.)

db.define_table("event",
    Field("id_meet", "reference meet", label=T("Meet")),
    Field("id_session", "reference session", label=T("Session")),
    Field("id_event", "reference event_index", label=T("Event")),
    #Field("day", "integer", notnull=True, label=T("Day")),
    Field("ord", "integer", notnull=True, label=T("Order")),
    Field("swim_num", "string", notnull=True, label=T("Number")),
    elapsed_time_field("lcm_gt",label=T("LCM >")),
    elapsed_time_field("lcm_le",label=T("LCM ≤")),
    elapsed_time_field("scm_gt",label=T("SCM >")),
    elapsed_time_field("scm_le",label=T("SCM ≤")),
    elapsed_time_field("scy_gt",label=T("SCY >")),
    elapsed_time_field("scy_le",label=T("SCY ≤")),
    Field("nt_ok", "boolean", label=T("Allow NT")),
    format=event_list,
    migrate=current.settings.migrate)


You will notice I'm using a Field factory "elapsed_time_field" to create the Field objects.  It just defines each as an integer field and presets the validator and formatter.  Before you ask.... Yes, I tried it with and without the T() operator.  There was no change on the SQLFORM and the table still works as expected.

-- Joe

Joe Barnhart

unread,
Jul 24, 2017, 9:45:10 PM7/24/17
to web2py-users
Ugh.  Replace the word "title" with "label" everywhere.  Global search and replace.

Joe Barnhart

unread,
Jul 26, 2017, 7:43:14 PM7/26/17
to web2py-users
It's even worse than I imagined.

Leaving off the T() operation, I find that my field labeled "LCM >" is actually sanitized at some point into:

<label class="control-label col-sm-5" for="event_join_lcm_gt" id="event_join_lcm_gt__label">LCM &gt;: </label>

Yes, something in the process has recognized the character ">" and changed it to "&gt;"  But the field "LCM &le;" was sanitized into:

<label class="control-label col-sm-5" for="event_join_lcm_le" id="event_join_lcm_le__label">LCM &amp;le;: </label>

In this case, not only was the &le; not recognized by the sanitizer, it actually DE-SANITZED it by removing the ampersand and sanitizing it separately.

This is the problem with automatic "stuff" -- such as hidden magic sanitization.  When it goes wrong, it goes very wrong.  And you need to spend hours with the source trying to figure out where it went wrong and if there's an easy fix without modifying the distro.

-- Joe

Anthony

unread,
Jul 26, 2017, 8:16:55 PM7/26/17
to web2py-users
On Wednesday, July 26, 2017 at 7:43:14 PM UTC-4, Joe Barnhart wrote:
It's even worse than I imagined.

Leaving off the T() operation, I find that my field labeled "LCM >" is actually sanitized at some point into:

<label class="control-label col-sm-5" for="event_join_lcm_gt" id="event_join_lcm_gt__label">LCM &gt;: </label>

Yes, something in the process has recognized the character ">" and changed it to "&gt;"  But the field "LCM &le;" was sanitized into:

<label class="control-label col-sm-5" for="event_join_lcm_le" id="event_join_lcm_le__label">LCM &amp;le;: </label>

In this case, not only was the &le; not recognized by the sanitizer, it actually DE-SANITZED it by removing the ampersand and sanitizing it separately.

Both of the above are encoded as expected -- the ">" character is converted to "&gt;", and the "&" character in "&lt;" is converted to "&amp;". This is consistent and expected behavior. If you want to end up with "&lt;", then why not just start with "<"?

What is your ultimate goal? Do you not want the final HTML to include the "&gt;" and "&lt;" HTML entities so they display as ">" and "<" on the page?

Also, what do you mean by, "If I use the Field titles directly in a table on my own, they work as expected?" In each case, what do you want to display on the page, and what are you expecting in the raw HTML?

Anthony

Joe Barnhart

unread,
Jul 26, 2017, 8:28:01 PM7/26/17
to web2py-users
Hi Anthony --

The problem is that I don't want "LT" but rather "LE", i.e. "less than or equal to."

As far as my table comment, I meant that when I used the SQL table and its Fields to create an SQLTABLE, the labels "just worked" and produced a column header with the desired symbol instead of printing "&le;" in the column heading.  So SQLTABLE behavior differed from SQLFORM in this manner.

I have found a workaround, finally, which lets me have symbols in both forms and tables:

elapsed_time_field("lcm_gt",label=CAT(T("LCM"),XML(" &gt;"))),
elapsed_time_field
("lcm_le",label=CAT(T("LCM"),XML(" &le;"))),
elapsed_time_field
("scm_gt",label=CAT(T("SCM"),XML(" &gt;"))),
elapsed_time_field
("scm_le",label=CAT(T("SCM"),XML(" &le;"))),
elapsed_time_field
("scy_gt",label=CAT(T("SCY"),XML(" &gt;"))),
elapsed_time_field
("scy_le",label=CAT(T("SCY"),XML(" &le;"))),

I first tried the obvious, letting T() handle the substitution but again, that doesn't work.  The &xx; character escapes get printed literally in the form label instead of creating the symbol I intended.  So I got around the problem with CAT().  Still, an awful amount of work and hassle to reverse-engineer and make a work around for something that just should have worked.

I'm not sure why &xx; is "sanitized" to begin with.  It seems like an extreme form of sanitizing, to eliminate any and all special characters from form labels.

-- Joe

Joe Barnhart

unread,
Jul 26, 2017, 9:46:47 PM7/26/17
to web2py-users
Actually I've thought a little more about it and I think this construction is better.

elapsed_time_field("lcm_gt",label=XML(T("LCM %s",("&gt;",)))),
elapsed_time_field
("lcm_le",label=XML(T("LCM %s",("&le;",)))),
elapsed_time_field
("scm_gt",label=XML(T("SCM %s",("&gt;",)))),
elapsed_time_field
("scm_le",label=XML(T("SCM %s",("&le;",)))),
elapsed_time_field
("scy_gt",label=XML(T("SCY %s",("&gt;",)))),
elapsed_time_field
("scy_le",label=XML(T("SCY %s",("&le;",)))),

-- Joe

Anthony

unread,
Jul 26, 2017, 9:50:36 PM7/26/17
to web2py-users
As far as my table comment, I meant that when I used the SQL table and its Fields to create an SQLTABLE, the labels "just worked" and produced a column header with the desired symbol instead of printing "&le;" in the column heading.  So SQLTABLE behavior differed from SQLFORM in this manner.

Can you show your SQLTABLE code? When I do SQLTABLE(..., headers='labels'), which uses the field labels to generate the headers, I get the exact same output as in a SQLFORM (i.e., the values are escaped).
 
I have found a workaround, finally, which lets me have symbols in both forms and tables:

elapsed_time_field("lcm_gt",label=CAT(T("LCM"),XML(" &gt;"))),
elapsed_time_field
("lcm_le",label=CAT(T("LCM"),XML(" &le;"))),
elapsed_time_field
("scm_gt",label=CAT(T("SCM"),XML(" &gt;"))),
elapsed_time_field
("scm_le",label=CAT(T("SCM"),XML(" &le;"))),
elapsed_time_field
("scy_gt",label=CAT(T("SCY"),XML(" &gt;"))),
elapsed_time_field
("scy_le",label=CAT(T("SCY"),XML(" &le;"))),

Why bother with the CAT()? Just do:

label=XML(T("LCM &le;"))

which is your original label simply wrapped in XML().
 
I first tried the obvious, letting T() handle the substitution but again, that doesn't work.  The &xx; character escapes get printed literally in the form label instead of creating the symbol I intended.  So I got around the problem with CAT().  Still, an awful amount of work and hassle to reverse-engineer and make a work around for something that just should have worked.

First, I'm not sure why the complaint. The book clearly states that all text injected in views is escaped, and this is to protect against XSS vulnerabilities (as well as for proper HTML parsing and displaying of special characters intended as page content). The book also clearly states how you can override the default escaping -- by wrapping the content in XML(), which works exactly as expected in this case. No need for any reverse-engineering or workarounds -- just use the documented functionality.

Second, before you vent, keep in mind that the framework, documentation, and support are all being provided by folks who are volunteering their time and energy for free.
 
I'm not sure why &xx; is "sanitized" to begin with.  It seems like an extreme form of sanitizing, to eliminate any and all special characters from form labels.

I think that's fairly standard -- see https://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_Cheat_Sheet#Output_Encoding_Rules_Summary. Aside from security issues, the escaping also serves to ensure that various special characters are displayed properly (e.g., converting "&" to "&amp;" results in the "&" being displayed on the page, which is typically what we want). Of course there are cases where we don't need/want the escaping, but the framework cannot know for sure where those cases are, so we err on the side of better security and leave it up to the developer to explicitly mark text that should not be escaped.

Anthony

Joe Barnhart

unread,
Jul 27, 2017, 12:38:00 AM7/27/17
to web2py-users
Thank you for your help, Anthony.

-- Joe

Reply all
Reply to author
Forward
0 new messages