toString not generating valid XML

Marilou Landes

unread,

Aug 14, 2015, 7:10:34 PM8/14/15

to Lucee, mla...@illinois.edu

There is a difference between the ACF10 toString( value ) output and Lucee toString( value ) output when the value is XML. I've attached a simple HTML document that is read using xmlParse( ). I've also attached 2 additional files, that reprersent the output from ACF10 and Lucee (FINAL 4.5.2.007).

Lucee output seems to be missing the first line (<?xml ...) and has not properly closed the META, LINK, BR and IMG tags.

Suggestions for resolving this?

Before.html

AfterACF10.html

AfterLucee.html

Francesco Pepe

unread,

Aug 14, 2015, 8:16:32 PM8/14/15

to Lucee, mla...@illinois.edu

Images are missing...

Adam Cameron

unread,

Aug 15, 2015, 10:36:16 AM8/15/15

to Lucee, mla...@illinois.edu

On Saturday, 15 August 2015 00:10:34 UTC+1, Marilou Landes wrote:

There is a difference between the ACF10 toString( value ) output and Lucee toString( value ) output when the value is XML. I've attached a simple HTML document that is read using xmlParse( ). I've also attached 2 additional files, that reprersent the output from ACF10 and Lucee (FINAL 4.5.2.007).

You didn't include code to demonstrate how you're processing the HTML file, which is pretty key to this.

However this will do it:

dir = expandPath("./");

platform = structKeyExists(server, "lucee") ? "lucee" : "coldFusion";

xml = xmlParse("#dir#before.html");

fileWrite("#dir##platform#.xml", toString(xml));

Running that on CF11 and Lucee 4.5, I am seeing the same thing you are.

Suggestions for resolving this?

File a bug.
Await someone to fix it, or
Fix it yourself.

Lucee's XML parsing just doesn't seem to work in this situation.

--

Adam

Adam Cameron

unread,

Aug 15, 2015, 10:41:41 AM8/15/15

to Lucee, mla...@illinois.edu

On Saturday, 15 August 2015 01:16:32 UTC+1, Francesco Pepe wrote:

Images are missing...

The images aren't really relevant. It's the contents of the files the OP is meaning for us to look at.

Adam Cameron

unread,

Aug 15, 2015, 10:42:25 AM8/15/15

to Lucee, mla...@illinois.edu

On Saturday, 15 August 2015 15:36:16 UTC+1, Adam Cameron wrote:

Lucee's XML parsing just doesn't seem to work in this situation.

There is also a second bug with this: https://luceeserver.atlassian.net/browse/LDEV-492

"Two for the price of one" day today, apparently.

Mark Drew

unread,

Aug 15, 2015, 12:15:06 PM8/15/15

to lu...@googlegroups.com

Instead of doing xmlParse lucee should have an htmlParse function (can't check right now as on phone) try that.

Mark Drew

- Sent by typing with my thumbs.

--
See Lucee at CFCamp Oct 22 & 23 2015 @ Munich Airport, Germany - Get your ticket NOW - http://www.cfcamp.org/
---
You received this message because you are subscribed to the Google Groups "Lucee" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lucee+un...@googlegroups.com.
To post to this group, send email to lu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lucee/3d62876e-b162-4108-b339-09d1644fe528%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

<Before.html>

<AfterACF10.html>

<AfterLucee.html>

Mark Drew

unread,

Aug 15, 2015, 12:17:37 PM8/15/15

to lu...@googlegroups.com

http://docs.lucee.org/reference/functions/htmlparse.html

This does exactly what you intend.

Mark Drew

- Sent by typing with my thumbs.

On 15 Aug 2015, at 00:10, Marilou Landes <marilou...@gmail.com> wrote:

--

See Lucee at CFCamp Oct 22 & 23 2015 @ Munich Airport, Germany - Get your ticket NOW - http://www.cfcamp.org/
---
You received this message because you are subscribed to the Google Groups "Lucee" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lucee+un...@googlegroups.com.
To post to this group, send email to lu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lucee/3d62876e-b162-4108-b339-09d1644fe528%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

<Before.html>

<AfterACF10.html>

<AfterLucee.html>

Marilou Landes

unread,

Aug 15, 2015, 10:48:47 PM8/15/15

to Lucee, mla...@illinois.edu

The code you set up is what I'm doing. Thank you for confirming the behavior. I'll report the bug and patiently wait for someone to fix it ... and find a work-around while I'm waiting.

Marilou Landes

unread,

Aug 15, 2015, 10:55:06 PM8/15/15

to Lucee

I will look at the htmlParse function, but the issue is really with the toString(), whereby the xml document object is rewritten as a string. Null elements, like <br> need to be closed properly, as in <br />.

Mark Drew

unread,

Aug 16, 2015, 4:56:40 AM8/16/15

to lu...@googlegroups.com

so you are using XMLParse to turn a non-XML document into an XML document? (the to String is just serializing it)

Marilou Landes wrote:

I will look at the htmlParse function, but the issue is really with
the toString(), whereby the xml document object is rewritten as a
string. Null elements, like <br> need to be closed properly, as in
<br />.

On Saturday, August 15, 2015 at 11:15:06 AM UTC-5, Mark Drew wrote:

    Instead of doing xmlParse lucee should have an htmlParse function
    (can't check right now as on phone) try that.

    Mark Drew
    - Sent by typing with my thumbs.

    On 15 Aug 2015, at 00:10, Marilou Landes <marilou...@gmail.com

<javascript:>> wrote:

    There is a difference between the ACF10 toString( value ) output
    and Lucee toString( value ) output when the value is XML. I've
    attached a simple HTML document that is read using xmlParse( ).
    I've also attached 2 additional files, that reprersent the output
    from ACF10 and Lucee (FINAL 4.5.2.007).

    Lucee output seems to be missing the first line (<?xml ...) and
    has not properly closed the META, LINK, BR and IMG tags.

    Suggestions for resolving this?

    --
    See Lucee at CFCamp Oct 22 & 23 2015 @ Munich Airport, Germany -
    Get your ticket NOW - http://www.cfcamp.org/
    ---
    You received this message because you are subscribed to the
    Google Groups "Lucee" group.
    To unsubscribe from this group and stop receiving emails from it,

send an email to lucee+un...@googlegroups.com <javascript:>.

To post to this group, send email to lu...@googlegroups.com

<javascript:>.

To view this discussion on the web visit
https://groups.google.com/d/msgid/lucee/3d62876e-b162-4108-b339-09d1644fe528%40googlegroups.com

<https://groups.google.com/d/msgid/lucee/3d62876e-b162-4108-b339-09d1644fe528%40googlegroups.com?utm_medium=email&utm_source=footer>.

For more options, visit https://groups.google.com/d/optout

    <https://groups.google.com/d/optout>.
    <Before.html>
    <AfterACF10.html>
    <AfterLucee.html>

--
See Lucee at CFCamp Oct 22 & 23 2015 @ Munich Airport, Germany - Get
your ticket NOW - http://www.cfcamp.org/
---
You received this message because you are subscribed to the Google
Groups "Lucee" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to lucee+un...@googlegroups.com

<mailto:lucee+un...@googlegroups.com>.

To post to this group, send email to lu...@googlegroups.com

<mailto:lu...@googlegroups.com>.

To view this discussion on the web visit

https://groups.google.com/d/msgid/lucee/f649920d-7f70-4ee4-9ff5-a5fd115f83f5%40googlegroups.com
<https://groups.google.com/d/msgid/lucee/f649920d-7f70-4ee4-9ff5-a5fd115f83f5%40googlegroups.com?utm_medium=email&utm_source=footer>.

Mark Drew

unread,

Aug 16, 2015, 5:04:14 AM8/16/15

to lu...@googlegroups.com

I did the following example, using your test files and I got the output that you might expect
<cfscript>
    xmlHTML = xmlParse(FileRead("Before.html"));
    htmlHtml = htmlParse(FileRead("Before.html"));

    dump(toString(xmlHTML));
    dump(toString(htmlHtml));
</cfscript>

The output of xmlParse is:

<html lang="en"> <head><META http-equiv="Content-Type" content="text/html; charset=UTF-8"><META content="text/html; charset=UTF-8" http-equiv="Content-Type"> <title>XMLParse and toString Test</title> <link href="style.css" rel="stylesheet" type="text/css"> </head> <body> <h1 data-eid="1" data-pid="56" tabindex="0">XMLParse and toString Test</h1> <img alt="someFile" src="someFile.jpg"> <br> </body> </html>

Maintaining the actual format of HTML even though you used the xmlParse, in an (odd?) way I can see why it would do this.

The output of the htmlParsing gives you a more XML like output:

<?xml version="1.0" encoding="UTF-8"?><html xmlns="http://www.w3.org/1999/xhtml" lang="en"><head><meta content="text/html; charset=UTF-8" http-equiv="Content-Type"/><title>XMLParse and toString Test</title><link href="style.css" rel="stylesheet" type="text/css"/></head><body> <h1 data-eid="1" data-pid="56" tabindex="0">XMLParse and toString Test</h1> <img alt="someFile" src="someFile.jpg"/> <br clear="none"/> </body></html>

Which I think is more akin to what you were expecting (closing the br and img tags)

Hope that helps

Regards

Mark Drew

Marilou Landes

16 August 2015 03:55

I will look at the htmlParse function, but the issue is really with the toString(), whereby the xml document object is rewritten as a string. Null elements, like <br> need to be closed properly, as in <br />.

On Saturday, August 15, 2015 at 11:15:06 AM UTC-5, Mark Drew wrote:
--
See Lucee at CFCamp Oct 22 & 23 2015 @ Munich Airport, Germany - Get your ticket NOW - http://www.cfcamp.org/
---
You received this message because you are subscribed to the Google Groups "Lucee" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lucee+un...@googlegroups.com.
To post to this group, send email to lu...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/lucee/f649920d-7f70-4ee4-9ff5-a5fd115f83f5%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Mark Drew

15 August 2015 17:14

Instead of doing xmlParse lucee should have an htmlParse function (can't check right now as on phone) try that.

Mark Drew
- Sent by typing with my thumbs.

On 15 Aug 2015, at 00:10, Marilou Landes <marilou...@gmail.com> wrote:

There is a difference between the ACF10 toString( value ) output and Lucee toString( value ) output when the value is XML. I've attached a simple HTML document that is read using xmlParse( ). I've also attached 2 additional files, that reprersent the output from ACF10 and Lucee (FINAL 4.5.2.007).

Lucee output seems to be missing the first line (<?xml ...) and has not properly closed the META, LINK, BR and IMG tags.

Suggestions for resolving this?
--
See Lucee at CFCamp Oct 22 & 23 2015 @ Munich Airport, Germany - Get your ticket NOW - http://www.cfcamp.org/
---
You received this message because you are subscribed to the Google Groups "Lucee" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lucee+un...@googlegroups.com.
To post to this group, send email to lu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lucee/3d62876e-b162-4108-b339-09d1644fe528%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

<Before.html>
<AfterACF10.html>
<AfterLucee.html>

Marilou Landes

15 August 2015 00:10

Adam Cameron

unread,

Aug 16, 2015, 5:25:25 AM8/16/15

to Lucee

You are somewhat missing the point, Mark. The source document is XML. The dialect of mark-up the file contains might be XHTML, but it's well-formed and it is valid XML. Look at it.

So all the operation is doing is taking some XML, converting it to a Lucee XML object, then turning it back into XML again.

And Lucee is ballsing up the last step. That is the issue here: the toString() function is not emitting valid XML. The problem does not lie with the initial parsing.

Mark Drew

unread,

Aug 16, 2015, 5:51:07 AM8/16/15

to lu...@googlegroups.com

Aha
I got you,

But they did ask for a work around no?

Anyway, I didn't miss the overall point. there seems to be a bug you are right, but I gave an option on how to get round it.

The source is HTML in Before.html might have closing tags but it is NOT marked as proper XHTML (it's marked as HTML5) so it gets processed by (I am guessing here TagSoup? ) as a non XML variant right?

<!DOCTYPE html>
<html lang="en">

Not marked as

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">

And if you DO mark the BEFORE.html as proper XHTML it then XMLparses correctly as:

<?xml version="1.0" encoding="UTF-8"?><html xmlns="http://www.w3.org/1999/xhtml"> <head><META content="text/html; charset=UTF-8" http-equiv="Content-Type"/> <title>XMLParse and toString Test</title> <link href="style.css" rel="stylesheet" type="text/css"/> </head> <body> <h1 data-eid="1" data-pid="56" tabindex="0">XMLParse and toString Test</h1> <img alt="someFile" src="someFile.jpg"/> <br/> </body> </html>

So in the first case you are saying "hey lucee, xmlparse this HTML5 doc, which doesnt need stuff to be closed" it then parses it following the rules of HTML5

If you properly mark it up as XHTML then it DOES parse it with those rules. Basically lucee is doing what you asked it to.

Regards

Mark Drew

Adam Cameron

16 August 2015 10:25

You are somewhat missing the point, Mark. The source document is XML. The dialect of mark-up the file contains might be XHTML, but it's well-formed and it is valid XML. Look at it.

So all the operation is doing is taking some XML, converting it to a Lucee XML object, then turning it back into XML again.

And Lucee is ballsing up the last step. That is the issue here: the toString() function is not emitting valid XML. The problem does not lie with the initial parsing.

On Sunday, 16 August 2015 10:04:14 UTC+1, Mark Drew wrote:

--
See Lucee at CFCamp Oct 22 & 23 2015 @ Munich Airport, Germany - Get your ticket NOW - http://www.cfcamp.org/
---
You received this message because you are subscribed to the Google Groups "Lucee" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lucee+un...@googlegroups.com.
To post to this group, send email to lu...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/lucee/29911346-c952-44f2-b41b-63eaadef30c7%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Mark Drew

16 August 2015 10:04

I did the following example, using your test files and I got the output that you might expect
<cfscript>
    xmlHTML = xmlParse(FileRead("Before.html"));
    htmlHtml = htmlParse(FileRead("Before.html"));

    dump(toString(xmlHTML));
    dump(toString(htmlHtml));
</cfscript>

The output of xmlParse is:

<html lang="en"> <head><META http-equiv="Content-Type" content="text/html; charset=UTF-8"><META content="text/html; charset=UTF-8" http-equiv="Content-Type"> <title>XMLParse and toString Test</title> <link href="style.css" rel="stylesheet" type="text/css"> </head> <body> <h1 data-eid="1" data-pid="56" tabindex="0">XMLParse and toString Test</h1> <img alt="someFile" src="someFile.jpg"> <br> </body> </html>

Maintaining the actual format of HTML even though you used the xmlParse, in an (odd?) way I can see why it would do this.

The output of the htmlParsing gives you a more XML like output:

<?xml version="1.0" encoding="UTF-8"?><html xmlns="http://www.w3.org/1999/xhtml" lang="en"><head><meta content="text/html; charset=UTF-8" http-equiv="Content-Type"/><title>XMLParse and toString Test</title><link href="style.css" rel="stylesheet" type="text/css"/></head><body> <h1 data-eid="1" data-pid="56" tabindex="0">XMLParse and toString Test</h1> <img alt="someFile" src="someFile.jpg"/> <br clear="none"/> </body></html>

Which I think is more akin to what you were expecting (closing the br and img tags)

Hope that helps

Regards

Mark Drew

James Holmes

unread,

Aug 16, 2015, 5:51:31 AM8/16/15

to lu...@googlegroups.com

I looked at it. I see this:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"><html><head><META http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body> <div> <h1>XMLParse and toString Test</h1> <img src="http://someFile.jpg" alt="someFile"> <br> </div> </body></html>

That is not valid XML.

--

Shu Ha Ri: Agile/Lean Product Development blog - http://www.bifrost.com.au/

Agile in 140 characters or less - https://twitter.com/James_R_Holmes

Whatever LinkedIn is for - http://www.linkedin.com/in/jrholmes

--

See Lucee at CFCamp Oct 22 & 23 2015 @ Munich Airport, Germany - Get your ticket NOW - http://www.cfcamp.org/
---
You received this message because you are subscribed to the Google Groups "Lucee" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lucee+un...@googlegroups.com.
To post to this group, send email to lu...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/lucee/29911346-c952-44f2-b41b-63eaadef30c7%40googlegroups.com.

Mark Drew

unread,

Aug 16, 2015, 6:03:03 AM8/16/15

to lu...@googlegroups.com

It's not XML

validity error: Validation failed: no DTD found !

James Holmes

16 August 2015 10:51

I looked at it. I see this:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"><html><head><META http-equiv="Content-Type" content="text/html; <div> <h1>XMLParse and toString Test</h1> <img src="http://someFile.jpg" alt="someFile"> <br> </div> </body></html>

That is not valid XML.

--
Shu Ha Ri: Agile/Lean Product Development blog - http://www.bifrost.com.au/
Agile in 140 characters or less - https://twitter.com/James_R_Holmes
Whatever LinkedIn is for - http://www.linkedin.com/in/jrholmes

--
See Lucee at CFCamp Oct 22 & 23 2015 @ Munich Airport, Germany - Get your ticket NOW - http://www.cfcamp.org/
---
You received this message because you are subscribed to the Google Groups "Lucee" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lucee+un...@googlegroups.com.
To post to this group, send email to lu...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/lucee/CAEhA4MrPGgETe0GhvuDuCZejse6T9J8bkN%3DQktw1R_0uDTp42w%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

Adam Cameron

16 August 2015 10:25

You are somewhat missing the point, Mark. The source document is XML. The dialect of mark-up the file contains might be XHTML, but it's well-formed and it is valid XML. Look at it.

So all the operation is doing is taking some XML, converting it to a Lucee XML object, then turning it back into XML again.

And Lucee is ballsing up the last step. That is the issue here: the toString() function is not emitting valid XML. The problem does not lie with the initial parsing.

On Sunday, 16 August 2015 10:04:14 UTC+1, Mark Drew wrote:
--
See Lucee at CFCamp Oct 22 & 23 2015 @ Munich Airport, Germany - Get your ticket NOW - http://www.cfcamp.org/
---
You received this message because you are subscribed to the Google Groups "Lucee" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lucee+un...@googlegroups.com.
To post to this group, send email to lu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lucee/29911346-c952-44f2-b41b-63eaadef30c7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Mark Drew

16 August 2015 10:04

I did the following example, using your test files and I got the output that you might expect
<cfscript>
    xmlHTML = xmlParse(FileRead("Before.html"));
    htmlHtml = htmlParse(FileRead("Before.html"));

    dump(toString(xmlHTML));
    dump(toString(htmlHtml));
</cfscript>

The output of xmlParse is:

<html lang="en"> <head><META http-equiv="Content-Type" content="text/html; charset=UTF-8"><META content="text/html; charset=UTF-8" http-equiv="Content-Type"> <title>XMLParse and toString Test</title> <link href="style.css" rel="stylesheet" type="text/css"> </head> <body> <h1 data-eid="1" data-pid="56" tabindex="0">XMLParse and toString Test</h1> <img alt="someFile" src="someFile.jpg"> <br> </body> </html>

Maintaining the actual format of HTML even though you used the xmlParse, in an (odd?) way I can see why it would do this.

The output of the htmlParsing gives you a more XML like output:

<?xml version="1.0" encoding="UTF-8"?><html xmlns="http://www.w3.org/1999/xhtml" lang="en"><head><meta content="text/html; charset=UTF-8" http-equiv="Content-Type"/><title>XMLParse and toString Test</title><link href="style.css" rel="stylesheet" type="text/css"/></head><body> <h1 data-eid="1" data-pid="56" tabindex="0">XMLParse and toString Test</h1> <img alt="someFile" src="someFile.jpg"/> <br clear="none"/> </body></html>

Which I think is more akin to what you were expecting (closing the br and img tags)

Hope that helps

Regards

Mark Drew

Marilou Landes

16 August 2015 03:55

I will look at the htmlParse function, but the issue is really with the toString(), whereby the xml document object is rewritten as a string. Null elements, like <br> need to be closed properly, as in <br />.

On Saturday, August 15, 2015 at 11:15:06 AM UTC-5, Mark Drew wrote:

--
See Lucee at CFCamp Oct 22 & 23 2015 @ Munich Airport, Germany - Get your ticket NOW - http://www.cfcamp.org/
---
You received this message because you are subscribed to the Google Groups "Lucee" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lucee+un...@googlegroups.com.
To post to this group, send email to lu...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/lucee/f649920d-7f70-4ee4-9ff5-a5fd115f83f5%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Mark Drew

15 August 2015 17:14

Instead of doing xmlParse lucee should have an htmlParse function (can't check right now as on phone) try that.

Mark Drew
- Sent by typing with my thumbs.

On 15 Aug 2015, at 00:10, Marilou Landes <marilou...@gmail.com> wrote:

There is a difference between the ACF10 toString( value ) output and Lucee toString( value ) output when the value is XML. I've attached a simple HTML document that is read using xmlParse( ). I've also attached 2 additional files, that reprersent the output from ACF10 and Lucee (FINAL 4.5.2.007).

Lucee output seems to be missing the first line (<?xml ...) and has not properly closed the META, LINK, BR and IMG tags.

Suggestions for resolving this?

--
See Lucee at CFCamp Oct 22 & 23 2015 @ Munich Airport, Germany - Get your ticket NOW - http://www.cfcamp.org/
---
You received this message because you are subscribed to the Google Groups "Lucee" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lucee+un...@googlegroups.com.
To post to this group, send email to lu...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/lucee/3d62876e-b162-4108-b339-09d1644fe528%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

<Before.html>
<AfterACF10.html>
<AfterLucee.html>

Adam Cameron

unread,

Aug 16, 2015, 6:34:36 AM8/16/15

to Lucee

On Sunday, 16 August 2015 10:51:07 UTC+1, Mark Drew wrote:

The source is HTML in Before.html might have closing tags but it is NOT marked as proper XHTML (it's marked as HTML5) so it gets processed by (I am guessing here TagSoup? ) as a non XML variant right?

Sorry, you're fixating on a tangential explantatory comment I made, which overall is not (well ought not) be relevant. I was using the erm "XHTML" merely as an explanation that HTML can indeed also be XML. XHTML has more rules beyond just that, as you point out.

xmlParse() should not be guessing at dialects, it should be doing what it's told: here's some XML... parse it.

I slightly misspoke before though. I should have restricted my description of the doc's appropriateness to the notion of "well-formed", not "valid". On reading the RFC, these are two different things (I did not realise there was this distinction, nor that those terms are meaningful in the context of XML). XML is "well-formed" if its tags balance out etc. But to be "valid" it needs to have a doctype and a DTD (although I suspect these days a schema would also be OK in lieu of at DTD... I'm only reading this thing superficially).

Now... xmlParse() could have been implemented to reject any non-VALID XML, however I think we can all agree that's unnecessarily restrictive. Who here has really spent much time ensuring their XML has a DTD, and the XML follows it? I actually have in the past, but not for a very long time as it all seems like a lot of work for little real-world gain. Similarly with schemas. Anyhow, Adobe took the pragmatic route and decided "well-formed" was the requirement for an XML string to be parseable. This is the cue Railo and then Lucee should (and as far as I can tell: did) take.

If an XML string has DTD information, then the XML must need to comply with the DTD. Same with a schema.

However the XML parser should not make unsolicited guesses as to what the dialect of the XML is. If no such information is provided, then no such information should be inferred. And if Lucee is doing that (as you speculate? it might be), it is wrong to do so.

Adam Cameron

unread,

Aug 16, 2015, 6:39:48 AM8/16/15

to Lucee

On Sunday, 16 August 2015 10:51:31 UTC+1, James Holmes wrote:

I looked at it. I see this:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"><html><head><META http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body> <div> <h1>XMLParse and toString Test</h1> <img src="http://someFile.jpg" alt="someFile"> <br> </div> </body></html>

That is not valid XML.

No, that isn't. But... where did you get that from? It's not the doc under discussion here.

Adam Cameron

unread,

Aug 16, 2015, 6:47:27 AM8/16/15

to Lucee

On Sunday, 16 August 2015 10:51:07 UTC+1, Mark Drew wrote:

The source is HTML in Before.html might have closing tags but it is NOT marked as proper XHTML (it's marked as HTML5) so it gets processed by (I am guessing here TagSoup? ) as a non XML variant right?

Whatever it's doing, it's predicated on the outer tags being <html>. If I just change those to be <foo> - even if I leave the <!DOCTYPE html> - then it behaves as one would expect it to.

So it does seem like it's performing some unsolicited guess work here.

Mark Drew

unread,

Aug 16, 2015, 7:52:12 AM8/16/15

to lu...@googlegroups.com

Could be. I think it might be the sax parser but that is as far as I got looking into the XMLUtil and parseXml funcs.

Mark Drew

- Sent by typing with my thumbs.

--
See Lucee at CFCamp Oct 22 & 23 2015 @ Munich Airport, Germany - Get your ticket NOW - http://www.cfcamp.org/
---
You received this message because you are subscribed to the Google Groups "Lucee" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lucee+un...@googlegroups.com.
To post to this group, send email to lu...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/lucee/442fbc9e-cf2b-4f3f-b997-c77395318ec3%40googlegroups.com.

James Holmes

unread,

Aug 16, 2015, 8:01:30 AM8/16/15

to lu...@googlegroups.com

That's the source of before.html according to my machine. Are you seeing something different?

--
See Lucee at CFCamp Oct 22 & 23 2015 @ Munich Airport, Germany - Get your ticket NOW - http://www.cfcamp.org/
---
You received this message because you are subscribed to the Google Groups "Lucee" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lucee+un...@googlegroups.com.
To post to this group, send email to lu...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/lucee/d5e97f1d-ce69-4b02-b4ad-00fbe314b402%40googlegroups.com.

Adam Cameron

unread,

Aug 16, 2015, 8:31:18 AM8/16/15

to Lucee

On Sunday, 16 August 2015 13:01:30 UTC+1, James Holmes wrote:

That's the source of before.html according to my machine. Are you seeing something different?

Yup. The file contains this:

<!DOCTYPE html>
<html lang="en">

<head><META http-equiv="Content-Type" content="text/html; charset=UTF-8" />


<title>XMLParse and toString Test</title>
<link href="style.css" rel="stylesheet" type="text/css" />
</head>
<body>

<h1 data-eid="1" data-pid="56" tabindex="0">XMLParse and toString Test</h1>
<img src="someFile.jpg" alt="someFile" />
<br />


</body>
</html>

That's from the actual file, not from browsing to it and looking at the source. I can only presume your browser is showing you a back-working of its parsed DOM document, not the original file contents.

James Holmes

unread,

Aug 16, 2015, 8:38:21 AM8/16/15

to lu...@googlegroups.com

Yep. Interestingly, the same behaviour Lucee is exhibiting.

Reply all

Reply to author

Forward