Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

tdom and CDATA

108 views
Skip to first unread message

yaha...@gmail.com

unread,
Nov 2, 2017, 9:14:43 AM11/2/17
to
I have issue with tdom and CDATA and I wonder if there is a bug:


% package require tdom
0.8.3
dom parse {<a><b>1</b><![CDATA[test of & <bad> format]]><c>12</c></a>}
domDoc0x9baf9a0
% domDoc0x9baf9a0 asXML
<a>
<b>1</b>
test of &amp; &lt;bad&gt; format<c>12</c>
</a>



What happened to my nice CDATA section? isn't the all idea of CDATA sections is that they stay as they are?

Ashok

unread,
Nov 2, 2017, 10:24:28 AM11/2/17
to
Not an XML expert, but looks fine to me. When read, the CDATA is stored
as the text

"test of <bad> format"

When output as XML the special characters are replaced with their entity
encoding. The output is equivalent to the CDATA version.

Any application parsing the two forms (CDATA and the asXML output)
should see the same text.

/Ashok

Andreas Leitgeb

unread,
Nov 2, 2017, 12:25:55 PM11/2/17
to
Ashok <pal...@yahoo.com> wrote:
> Not an XML expert, but looks fine to me. When read, the CDATA is stored
> as the text

Not an XML expert, either.
Something *else* still looks strange:

set orig {<a><b>1</b><![CDATA[test of & <bad> format]]><c>12</c></a>}

set d1 [dom parse $orig]
set d1X [$d1 asXML]

set d2 [dom parse $d1X]
set d2X [$d2 asXML]

string equal $d1X $d2X ;# --> 0

So, they're apparently not "equivalent". (the difference is,
that another newline gets added before "test of ...")

> Any application parsing the two forms (CDATA and the asXML output)
> should see the same text.

Well, tdom itself may see something different between its input and its
asXML output. The parse&asXML transormation obviously isn't "idempotent"

PS:
>> % package require tdom
>> 0.8.3
ditto here.

yaha...@gmail.com

unread,
Nov 2, 2017, 1:19:46 PM11/2/17
to
I know that for the application it does not matter. My issue is more of esthetics.
The asXML result is kept in the log and the CDATA form is more readable than the parsed version. At least I would like to have:
% domDoc0x9baf9a0 asXML -withCDATA

Rolf Ade

unread,
Nov 2, 2017, 6:37:55 PM11/2/17
to

yaha...@gmail.com writes:
> I know that for the application it does not matter. My issue is more of esthetics.
> The asXML result is kept in the log and the CDATA form is more
> readable than the parsed version.

I'm sorry that tDOM doesn't please you in this regard as much as you
wish.

As others told you and you know, the promise is to keep the "infoset".
And that you can get.

> At least I would like to have:
> % domDoc0x9baf9a0 asXML -withCDATA

I would like to please you but I'm afraid things are not so simple as
they may look at first.

Well, I was not involved in writing all that standards, I only try to
implement them in a sane way and brigde as much as possible between
them.

One key feature of tDOM is its XPath implementation. XPath has a
slightly other data model than DOM. XSLT is close coupled with XPath.
They all expect CDATA sections as being text and "as much character data
as possible is grouped into each text node: a text node never has an
immediately following or preceding sibling that is a text node".

This is what you get with [dom parse ...] - an XPath/XSLT "ready" DOM
tree.

This does not mean that tDOMs DOM implementation isn't able to handle
CDATA sections. You may create, insert, query them with DOM methods and
you will get them serialized as CDATA sections with asXML.

So, asXML doesn't need a -withCDATA option, it writes them right now
without. It's [dom parse ...] that needs something like a -strict
option, meaning that <b> foo <![CDATA[test of & <bad> format]]> bar</b>
end up in the element node "b" with three child nodes: the text node
"foo ", the CDATA section node "test of & <bad> format" and the text
node " bar".

You can't reliable do XPath on such a tree (or at least not naively) but
you can create such trees with script code and you even can convert such
a tree right now with [$doc normalize -forXPath] which groups all
successive text and CDATA section nodes into one.

It's all there and sane and understandable but just not that -strict
knob for your use case. I'm sorry.




Rolf Ade

unread,
Nov 2, 2017, 6:40:35 PM11/2/17
to

Andreas Leitgeb <a...@logic.at> writes:
> Not an XML expert, either.
> Something *else* still looks strange:
>
> set orig {<a><b>1</b><![CDATA[test of & <bad> format]]><c>12</c></a>}
>
> set d1 [dom parse $orig]
> set d1X [$d1 asXML]
>
> set d2 [dom parse $d1X]
> set d2X [$d2 asXML]
>
> string equal $d1X $d2X ;# --> 0
>
> So, they're apparently not "equivalent". [...]
>
> [...] The parse&asXML transormation obviously isn't "idempotent"

Please try:

set orig {<a><b>1</b><![CDATA[test of & <bad> format]]><c>12</c></a>}

set d1 [dom parse -keepEmpties $orig]
set d1X [$d1 asXML -indent none]

set d2 [dom parse -keepEmpties $d1X]
set d2X [$d2 asXML -indent none]

string equal $d1X $d2X ;# --> 1

So, if you want "idempotency", just ask for. (The -keepEmpties flag
isn't needed in this example but it is in general for this.)

The default for [dom parse ...] is to throw away any white space only
text node. With -keepEmpties this white space is kept as TEXT_NODE.

The default for [$doc asXML] is to generate some "pretty printed"
serialization. With -indent none (or -indent no, same result) no white
space whatsoever will be added in between the string representations of
the nodes of the DOM tree.

Both defaults are of course not the right thing if you look at them from
an XML zealot viewpoint. If you need full strictness in this detail
you have the options.

(Well, if you do xslt transformations it is recommended to parse both
source and stylesheet with -keepEmpties if you don't know for sure that
you don't need.)

This defaults show bias for "XML as a data format" (versus "XML as a document
format") and this was it where tDOM came from.

> PS:
>>> % package require tdom
>>> 0.8.3
> ditto here.

This doesn't matter. It's this way since more than 15 years.

The zealots have an argument to claim that defaults are the wrong way
around. (What this argument is? "The principle of least surprise". If
you know your recommendations then you expect to get even the white
space in your XML data as text nodes. And if you do XSLT in any case.)

To turn around this defaults isn't an option (would introduce data
driven bugs in a lot of code). I could add a global flag to switch the
defaults for the one that prefer that. But isn't that mostly
bikeshedding?


yaha...@gmail.com

unread,
Nov 3, 2017, 12:33:49 AM11/3/17
to
Thanks for the detailed response.
tdom does please me for many years so I cannot complain. It is small issue that I can live with.
Maybe version 0.9 will have the -strict flag so I will wait :-)

Harald Oehlmann

unread,
Nov 3, 2017, 5:10:58 AM11/3/17
to
Could someone "save" the explanation by Rolf in a tutorial place ?
- fossil wiki of tdom
- wiki.tcl.tk ?

It just sounds like a great bunch of information.

Thanks,
Harald

Rich

unread,
Nov 3, 2017, 12:59:35 PM11/3/17
to
wiki.tcl.tk is open for editing by everyone. You can add the
information you want to add yourself, without asking anyone.

Rolf Ade

unread,
Nov 3, 2017, 10:06:46 PM11/3/17
to
It's to late for 0.9. That was released 2017-08-24. I've started to
implement such a "keep CDATA sections" switch. Expect it to be in the
next release. Much earlier on trunk.

yaha...@gmail.com

unread,
Nov 9, 2017, 4:51:24 AM11/9/17
to

> It's to late for 0.9. That was released 2017-08-24. I've started to
> implement such a "keep CDATA sections" switch. Expect it to be in the
> next release. Much earlier on trunk.

This is great, thanks so much.
For now I had to make an ugly bypass so everybody would be happy. Will keep my eyes open on the tdom trunk.

Rolf Ade

unread,
Nov 9, 2017, 4:19:21 PM11/9/17
to
It's already on trunk. Lemme know about any problems you may have with
it.

yaha...@gmail.com

unread,
Nov 9, 2017, 11:17:53 PM11/9/17
to
Apologies for my ignorance but I am not clear how to download the trunk from http://core.tcl.tk/tdom/dir?ci=tip

The github https://github.com/tDOM/tdom is not updated so obviously it is not the latest trunk.

Gerald Lester

unread,
Nov 9, 2017, 11:25:16 PM11/9/17
to
On 11/09/2017 10:17 PM, yaha...@gmail.com wrote:
> Apologies for my ignorance but I am not clear how to download the trunk from http://core.tcl.tk/tdom/dir?ci=tip

Use fossil, as in:

mkdir tdom
fossil clone http://core.tcl.tk/tdom tdom.fossil
cd tdom
fossil open ../tdom.fossil

>
> The github https://github.com/tDOM/tdom is not updated so obviously it is not the latest trunk.
>


--
+----------------------------------------------------------------------+
| Gerald W. Lester, President, KNG Consulting LLC |
| Email: Gerald...@kng-consulting.net |
+----------------------------------------------------------------------+

Christian Gollwitzer

unread,
Nov 10, 2017, 2:14:26 AM11/10/17
to
Am 10.11.17 um 05:25 schrieb Gerald Lester:
> On 11/09/2017 10:17 PM, yaha...@gmail.com wrote:
>> Apologies for my ignorance but I am not clear how to download the
>> trunk from http://core.tcl.tk/tdom/dir?ci=tip
>
> Use fossil, as in:
>
> mkdir tdom
> fossil clone http://core.tcl.tk/tdom tdom.fossil
> cd tdom
> fossil open ../tdom.fossil

Alternatively, if you just want the trunk sources as an archive: On the
site http://core.tcl.tk/tdom/dir?ci=tip click onto the checkin number
8cd193981c. Then you'll see Downloads ZIP / Tarball.

Christian

Harald Oehlmann

unread,
Nov 10, 2017, 3:29:58 AM11/10/17
to
For this feature, probably first log in (anonymous is ok, see the login
page)
0 new messages