XML unescaping

413 views
Skip to first unread message

Jeremy Mawson

unread,
Mar 16, 2009, 8:13:13 AM3/16/09
to lif...@googlegroups.com
Hi,

I've been mucking around with lift and having a great time. I have cooked up a page that retrieves XML from a datasource and renders it. However some of the text elements I extract are HTML encoded. When rendered in the browser it looks like HTML code, rather than rendered HTML.

I looked for a scala utility to unescape this, found scala.xml.Utility.unescape, but could not get it working.

Here's how I've tried to use it.

      val title = Utility.unescape(result \ "title" text, new StringBuilder)

Unfortunately this always gives me a value of null, even though result.\("title").text is something like - "Tsvangirai's wife killed in<b>car</b>crash - ABC News"

What could I be doing wrong? I realise this is probably a plain old scala question, but I hope someone here can help me anyway.

Thanks
Jeremy

Derek Chen-Becker

unread,
Mar 16, 2009, 9:51:56 AM3/16/09
to lif...@googlegroups.com
Well, it may be that the XML output portion of Scala is escaping your ampersands a second time. For instance, check out this session in the interpreter:

scala> val title = "Catsby & Twisp"
title: java.lang.String = Catsby & Twisp

scala> val escaped = <span>{title}</span>
escaped: scala.xml.Elem = <span>Catsby &amp;amp; Twisp</span>

scala> val unescaped = <span>{ scala.xml.Unparsed(title) }</span>
unescaped: scala.xml.Elem = <span>Catsby &amp; Twisp</span>


Note that if you embed a String within XML elements, Scala will automatically escape any ampersands unless you wrap the String in a scala.xml.Unparsed instance. The second test there will render in the browser like "Catsby &amp; Twisp", because the ampersand was escaped. I know you're asking about the Utility object, but I think that would be fixing the symptom rather than the cause.

Derek

Jeremy Mawson

unread,
Mar 16, 2009, 9:36:42 PM3/16/09
to lif...@googlegroups.com
Thanks Derek. Familiarity with the APIs is one of the tricks when moving to a new language I guess.

This worked for me, but I have a follow-on issue.

Just as a background I am rendering search results which are provided as XML.  Here's my binding code:
      result => bind("entry", chooseTemplate("listings", "listing", xhtml),
        "title" -> <a href={result.url}>{Unparsed(result.title)}</a>,
        "description" -> Unparsed(result.description),
        "link" -> Text(result.url))
    })

This fails to compile as the Unparsed in the description line is not a valid parameter for the bind function. (I'm not sure why, it's just a fancy Node like any other right?) The exact error is:

overloaded method value bind with alternatives (String,net.liftweb.util.Box[(scala.xml.NodeSeq) => scala.xml.NodeSeq],net.liftweb.util.Box[(scala.xml.PrefixedAttribute) => scala.xml.MetaData],scala.xml.NodeSeq,net.liftweb.util.Helpers.BindParam*)scala.xml.NodeSeq <and> (String,scala.xml.NodeSeq,net.liftweb.util.Helpers.BindParam*)scala.xml.NodeSeq cannot be applied to (java.lang.String,scala.xml.NodeSeq,net.liftweb.util.Helpers.TheBindParam,(String, scala.xml.Unparsed),net.liftweb.util.Helpers.TheBindParam)

If I change that line to "description" -> Text(Unparsed(result.description)) it compiles, but the Text constructor will re-escape so I'm back to square one.

If I change the line to "description" ->  <span>{Unparsed(result.description)}</span>, it compiles but I have an unwanted span tag and worse ... if result.description is not well formed XML my page will fail to render! Firefox complains of an XML Parsing Error. The description field has an unmatched <br> tag (literally &lt;br&gt;) in the middle of it to force it onto two lines.

So my first question is, how can I avoid the extra <span> tag?
Secondly, can I render (!X)HTML via Lift?

Thanks
Jeremy


2009/3/17 Derek Chen-Becker <dchen...@gmail.com>



--
Jeremy Mawson
Senior Developer | Online Directories

Sensis Pty Ltd
222 Lonsdale St
Melbourne 3000
E: jeremy...@sensis.com.au

Marc Boschma

unread,
Mar 16, 2009, 11:41:31 PM3/16/09
to lif...@googlegroups.com

On 17/03/2009, at 12:36 PM, Jeremy Mawson wrote:

If I change the line to "description" ->  <span>{Unparsed(result.description)}</span>, it compiles but I have an unwanted span tag and worse ... if result.description is not well formed XML my page will fail to render! Firefox complains of an XML Parsing Error. The description field has an unmatched <br> tag (literally &lt;br&gt;) in the middle of it to force it onto two lines.


Try "description" -> <xml:group>{Unparsed(result.description)}</xml:group>

That wraps the string in a scala XML group node...

With respect to the <br> tag, it should be <br/> or <br></br> to be well formed. If you want to support non-well formed XML fro the database wouldn't you need to parse it and convert it to well formed first or upon retrieval ?

Regards,

Marc

Jeremy Mawson

unread,
Mar 16, 2009, 11:51:49 PM3/16/09
to lif...@googlegroups.com
Thanks Marc.  <xml:group> works nicely.

For this exercise this is hypothetical, but it matches very closely a project I have enabled in the past using struts and JIBX... 

Say the data was sourced from an external party's service and there was a contractual agreement to not alter the data in any way? I.E. I'm stuck with the poorly formed HTML. Probably one could agree with the partner that the transformation to valid XHTML is appropriate, but I'll let the question stand anyway.

Is poorly formed (but otherwise supported-by-browsers) HTML renderable via Lift at all?

Cheers
Jeremy



2009/3/17 Marc Boschma <marc+l...@boschma.cx>

David Pollak

unread,
Mar 16, 2009, 11:58:09 PM3/16/09
to lif...@googlegroups.com
On Mon, Mar 16, 2009 at 8:51 PM, Jeremy Mawson <jeremy.ma...@gmail.com> wrote:
Thanks Marc.  <xml:group> works nicely.

For this exercise this is hypothetical, but it matches very closely a project I have enabled in the past using struts and JIBX... 

Say the data was sourced from an external party's service and there was a contractual agreement to not alter the data in any way? I.E. I'm stuck with the poorly formed HTML. Probably one could agree with the partner that the transformation to valid XHTML is appropriate, but I'll let the question stand anyway.

Is poorly formed (but otherwise supported-by-browsers) HTML renderable via Lift at all?

If it's supported by the browser, it will be rendered, but Firefox and Chrome will both complain about malformed XHTML.

You could run the String through an HTML parser (there are a few floating around for Java that will parse poorly formed HTML) and then walk the nodes and build XML.  I would argue that this would satisfy any contractual requirements, although I no longer practice law, so I can't argue it on your behalf. :-)
 


Cheers
Jeremy



2009/3/17 Marc Boschma <marc+l...@boschma.cx>


On 17/03/2009, at 12:36 PM, Jeremy Mawson wrote:

If I change the line to "description" ->  <span>{Unparsed(result.description)}</span>, it compiles but I have an unwanted span tag and worse ... if result.description is not well formed XML my page will fail to render! Firefox complains of an XML Parsing Error. The description field has an unmatched <br> tag (literally &lt;br&gt;) in the middle of it to force it onto two lines.


Try "description" -> <xml:group>{Unparsed(result.description)}</xml:group>

That wraps the string in a scala XML group node...

With respect to the <br> tag, it should be <br/> or <br></br> to be well formed. If you want to support non-well formed XML fro the database wouldn't you need to parse it and convert it to well formed first or upon retrieval ?

Regards,

Marc






--
Jeremy Mawson
Senior Developer | Online Directories

Sensis Pty Ltd
222 Lonsdale St
Melbourne 3000
E: jeremy...@sensis.com.au






--
Lift, the simply functional web framework http://liftweb.net
Beginning Scala http://www.apress.com/book/view/1430219890
Follow me: http://twitter.com/dpp
Git some: http://github.com/dpp

Marc Boschma

unread,
Mar 17, 2009, 12:44:41 AM3/17/09
to lif...@googlegroups.com
To quote David from a previous thread on the mailing list:

I've enhanced LiftRules as follows:

 /**
 * A partial function that determines content type based on an incoming
 * RequestState and Accept header
 */
 var determineContentType:
 PartialFunction[(Can[RequestState], Can[String]), String] = {
   case (_, Full(accept)) if 
accept.toLowerCase.contains("application/xhtml+xml") =>
     "application/xhtml+xml"

   case _ => "text/html"
 }

You can change the determineContentType Partial Function in Boot.scala 
to accomplish your goals.

So maybe you could add in Boot.scala

determineContentType = {
case (Full(req), _) if req.path match {
case "text" :: "only" :: _ => true
case _ => false} => "text/html"
} orElse determineContentType

which would set the return type of any page under and including "/text/only" to "text/html" and if not under that would chain to the standard lift content type determine partial function...

Obviously you could define your own function to check the path rather than in-line it...

David: Would req.param("x") be the equivalent to S.param("x") ??

Regards,

Marc

Jeremy Mawson

unread,
Mar 17, 2009, 12:06:03 AM3/17/09
to lif...@googlegroups.com
Thanks David.


2009/3/17 David Pollak <feeder.of...@gmail.com>

TylerWeir

unread,
Mar 17, 2009, 4:32:07 AM3/17/09
to Lift
> scala> val title = "Catsby &amp; Twisp"

+1 for the Penny Arcade reference.

On Mar 16, 8:51 am, Derek Chen-Becker <dchenbec...@gmail.com> wrote:
> Well, it may be that the XML output portion of Scala is escaping your
> ampersands a second time. For instance, check out this session in the
> interpreter:
>
> scala> val title = "Catsby &amp; Twisp"
> title: java.lang.String = Catsby &amp; Twisp
>
> scala> val escaped = <span>{title}</span>
> escaped: scala.xml.Elem = <span>Catsby &amp;amp; Twisp</span>
>
> scala> val unescaped = <span>{ scala.xml.Unparsed(title) }</span>
> unescaped: scala.xml.Elem = <span>Catsby &amp; Twisp</span>
>
> Note that if you embed a String within XML elements, Scala will
> automatically escape any ampersands unless you wrap the String in a
> scala.xml.Unparsed instance. The second test there will render in the
> browser like "Catsby &amp; Twisp", because the ampersand was escaped. I know
> you're asking about the Utility object, but I think that would be fixing the
> symptom rather than the cause.
>
> Derek
>
> On Mon, Mar 16, 2009 at 6:13 AM, Jeremy Mawson <jeremy.mawson.w...@gmail.com

Derek Chen-Becker

unread,
Mar 17, 2009, 9:59:03 AM3/17/09
to lif...@googlegroups.com
Glad someone caught it :)
Reply all
Reply to author
Forward
0 new messages