Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

tDOM: <ELEM/> vs. <ELEM></ELEM>

8 views
Skip to first unread message

Mikhail Teterin

unread,
May 8, 2008, 2:22:15 PM5/8/08
to
Hello!

When an element has no children, tDOM's string representation of it (as
obtained via the asXML-method) will produce

<ELEM/>

I agree, that this is more concise and should be the default. Unfortunately,
I'm facing a home-grown upstream XML-parser, which breaks on the above
"short-cut"... The XML, that my program generates, must be parsable by that
piece...

Can the asXML-method, please, grow an option (i.e., -fullempties), which
would cause it to produce:

<ELEM></ELEM>

instead?

Thanks!

-mi

Darren New

unread,
May 8, 2008, 2:33:26 PM5/8/08
to
Mikhail Teterin wrote:
> Can the asXML-method, please, grow an option (i.e., -fullempties), which
> would cause it to produce:
>
> <ELEM></ELEM>
>
> instead?

That looks like something you could, after-the-fact, fix with a regexp,
especially if you don't have any attributes on the tag. Just as a
suggestion?

--
Darren New / San Diego, CA, USA (PST)
"That's pretty. Where's that?"
"It's the Age of Channelwood."
"We should go there on vacation some time."

Gerald W. Lester

unread,
May 8, 2008, 2:54:53 PM5/8/08
to

The two are not the same. <ELEM></ELEM> has an empty text node as its child
whereas </ELEM> has no children period.


--
+--------------------------------+---------------------------------------+
| Gerald W. Lester |
|"The man who fights for his ideals is the man who is alive." - Cervantes|
+------------------------------------------------------------------------+

Christian Nassau

unread,
May 8, 2008, 3:06:44 PM5/8/08
to
Mikhail Teterin wrote:
> Can the asXML-method, please, grow an option (i.e., -fullempties), which
> would cause it to produce:
>
> <ELEM></ELEM>
>
> instead?

You could try "asHTML" instead of "asXML": for example the script

> package require tdom
> dom parse {
> <a>
> <b>one<c>two</c></b>
> <x/>
> </a>
> } doc
> puts [$doc asHTML]

gives

> <a>
> <b>one<c>two</c>
> </b><x></x>
> </a>

HTH,

--
=> Christian Nassau, http://www.nullhomotopie.de


Mikhail Teterin

unread,
May 8, 2008, 3:41:40 PM5/8/08
to
Darren New wrote:
> That looks like something you could, after-the-fact, fix with a regexp,
> especially if you don't have any attributes on the tag. Just as a
> suggestion?

I do, actually, have attributes, but that's not a show-stopper... I cooked
up the patch for tDOM already and sent it to Rolf...

With any luck the next version of tDOM will have that as an option.

-mi

Mikhail Teterin

unread,
May 8, 2008, 3:46:42 PM5/8/08
to
Gerald W. Lester wrote:
> The two are not the same.  <ELEM></ELEM> has an empty text node as its
> child whereas </ELEM> has no children period.

What's next? Are you going to insist, using a 0 instead of NaN is a bad
idea? ;-)

Most (all?) XML-tools will happily "translate" <ELEM></ELEM> into <ELEM/>
and back. tDOM included...

I don't have enough /theoretical/ knowledge of XML to argue, but
enough /practical/ experience to want it as an easy option...

Yours,

-mi

Gerald W. Lester

unread,
May 8, 2008, 4:39:39 PM5/8/08
to

I think you misunderstood what I was attempting to say.

I was attempting to point out that the "home-grown upstream XML-parser"
might be actually *looking* for the missing child node (like it is required
by the DTD) and is complaining because a required node is not there.

Also, as you point out the tdom parse (particularly if the -keepEmpties is
not supplied, which is the normal mode) creates the same tree.

Mikhail Teterin

unread,
May 8, 2008, 7:32:04 PM5/8/08
to
Gerald W. Lester wrote:
> I was attempting to point out that the "home-grown upstream XML-parser"
> might be actually looking for the missing child node (like it is required

> by the DTD) and is complaining because a required node is not there.

No, it most certainly does not. And there are no DTDs :0 It just claims, the
file is "not valid XML".

-mi

Rolf Ade

unread,
May 9, 2008, 9:16:45 AM5/9/08
to
Mikhail Teterin wrote:
>When an element has no children, tDOM's string representation of it (as
>obtained via the asXML-method) will produce
>
> <ELEM/>
>
>I agree, that this is more concise and should be the default. Unfortunately,
>I'm facing a home-grown upstream XML-parser, which breaks on the above
>"short-cut"... The XML, that my program generates, must be parsable by that
>piece...

(Mikhail has also mailed in private to me (which is OK), pointing to
this clt posting. But in general it is better to post such things (at
least also) on the tdom mailing list. I don't follow clt close
enought that you could be sure, I noticed an< tdom related posting
here.)

That XML piece above is absolutely well-formed; any compliant parser
must unterstand that. You are right; what your tool writes is OK.

So they have a crappy XML parser. They are not able or willing to fix
their XML parser, but say to you: Deliver us some special XML
serialisation. We don't want this other perfect legal XML
serialisation.

That happens, in fact.

To make that really clear: The others are to blame. They don't support
XML (although I'm pretty sure, they claim that in their marketing
papers), they may be support a subset of the XML recommendation and
leaving out essential parts ('must' parts, not something optional).

(That's probably the moment to admit, that the somewhat strange
-escapeAllQuot asXML option has its reason in exactly such a story.)

>Can the asXML-method, please, grow an option (i.e., -fullempties), which
>would cause it to produce:
>
> <ELEM></ELEM>
>
>instead?

In general, that's unnecessary feature cruft. (Or in the light of that
written above: In theory, it's unnecessary feature cruft.) And it's a
step more down the road to a kitchen sink asXML method. The 'right
thing' to do would be, that they fix their borken parser.

On the other hand ... To insist on the 'right thing' may be a noble
habit, but often has ramifications. And since I've done more or less
the same sin against the pure doctrine with -escapeAllQuot I've some
sympathy. But a more and more growing parameter list to
tcldom_treeAsXML() doesn't looks right (although I've done the same
with -escapeAllQuot, confessed, because it's the simplest thing.)

I'll do something.

rolf

Rolf Ade

unread,
May 9, 2008, 9:39:35 AM5/9/08
to
Gerald W. Lester wrote:

>Mikhail Teterin wrote:
>> When an element has no children, tDOM's string representation of it (as
>> obtained via the asXML-method) will produce
>>
>> <ELEM/>
>>
>> I agree, that this is more concise and should be the default. Unfortunately,
>> I'm facing a home-grown upstream XML-parser, which breaks on the above
>> "short-cut"... The XML, that my program generates, must be parsable by that
>> piece...
>>
>> Can the asXML-method, please, grow an option (i.e., -fullempties), which
>> would cause it to produce:
>>
>> <ELEM></ELEM>
>>
>> instead?
>
>The two are not the same. <ELEM></ELEM> has an empty text node as its child
>whereas </ELEM> has no children period.

The both pieces of serialized XML "<ELEM/>" and "<ELEM></ELEM>" are
exactly the same. In every serialized XML you could replace the one
way of writing with the other way, without any change of information
of that XML. If you parse that back into a DOM tree, then even
<ELEM></ELEM> wont create an empty text node as child of ELEM. See
this example.

package require tdom

dom parse {<ELEM></ELEM>} doc
$doc documentElement root
puts [llength [$root childNodes]]

==> 0

On the other side, there is such a thing as empty text nodes. If you
come from a DOM tree, then that's the way to force the output, Mikhail
wants (and that may be, what Gerald tried to say):

package require tdom

set doc [dom createDocument ELEM]
puts [$doc asXML]
# Add a text node with empty text, and serialize again
$doc documentElement root
$root appendChild [$doc createTextNode ""]
puts [$doc asXML]

That returns:

<ELEM/>

<ELEM></ELEM>

With this, one is able to force the requested way of serializion, with
some hackery.

rolf

Gerald W. Lester

unread,
May 9, 2008, 10:43:07 AM5/9/08
to
Rolf Ade wrote:
> ...

> On the other side, there is such a thing as empty text nodes. If you
> come from a DOM tree, then that's the way to force the output, Mikhail
> wants (and that may be, what Gerald tried to say):
>
> package require tdom
>
> set doc [dom createDocument ELEM]
> puts [$doc asXML]
> # Add a text node with empty text, and serialize again
> $doc documentElement root
> $root appendChild [$doc createTextNode ""]
> puts [$doc asXML]
>
> That returns:
>
> <ELEM/>
>
> <ELEM></ELEM>
>
> With this, one is able to force the requested way of serializion, with
> some hackery.

Rolf, that was indeed what I was attempting to say along with the guess
(proved incorrect) that their may have been a DTD requiring a (possiblely)
empty text node there.

From later post, it appears to be just a broken XML parser.

Mikhail Teterin

unread,
May 9, 2008, 10:47:37 AM5/9/08
to
Rolf Ade wrote:
> And it's a step more down the road to a kitchen sink asXML method.

Well, what's wrong with a "kitchen sink" asXML method, really? You know a
lot more about XML than most people using your package, naturally. It would
thus be better, if you implemented the options for various "flavors" (i.e.
work-arounds for buggy parsers) of output, instead of forcing users to walk
the XML-trees themselves, when they need output to be in a certain shape...

I perfectly understand the reluctance to even lift a finger because of
somebody else's incompetence, but hope, you'll overcome it -- for the sake
of those, who are, like myself, in between...

Thanks. Yours,

-mi

Gerald W. Lester

unread,
May 9, 2008, 10:48:24 AM5/9/08
to
Rolf Ade wrote:
>...

>
> On the other hand ... To insist on the 'right thing' may be a noble
> habit, but often has ramifications. And since I've done more or less
> the same sin against the pure doctrine with -escapeAllQuot I've some
> sympathy. But a more and more growing parameter list to
> tcldom_treeAsXML() doesn't looks right (although I've done the same
> with -escapeAllQuot, confessed, because it's the simplest thing.)
>
> I'll do something.

A stray idea, how about a callback option that is handed a tag, value,
closetag and returns the same. Then in cases like this the closetag and
value would be the empty string (i.e. the input would be the list <elem/> {}
{}) and the call back would return a list of:
<elem> {} </elem>.

It would be expensive when it has to be used, but very powerful.

Again, just a stray idea.

Rolf Ade

unread,
May 11, 2008, 7:13:32 AM5/11/08
to
Mikhail Teterin wrote:
>Rolf Ade wrote:
>> And it's a step more down the road to a kitchen sink asXML method.
>
>Well, what's wrong with a "kitchen sink" asXML method, really?

There are not so much overwhelming reasons, to choose XML as
dataformat. Belong the few is, that you don't have much discussion at
the syntax level. Your XML is either well-formed, or it is
not. Your XML is well-formed. It's their bug.

Add in, that XML has enough momentum, that various proofed, compilant
parsers are readily available, with bindings for all kind of
programming languages.

The only sense of the proposed output option is, to support a certain
broken parser (without, for sure, any notable 'market share'). To try
to support any possible broken parser is a road into kitchen sink.

The only thing which hinders me at a clear no is my own 'fall of man'
with -escapeAllQuot. But that case was different in important
details. Especially there was no simple workaround, but in your case,
there is.

Something like the following will do it (you may add bells and
whistles as needed); there's no need to patch the tdom sources:

proc serializeForThisDamnBrokenParser {doc} {
foreach node [$doc selectNodes {//*[not(count(node()))]}] {
$node appendChild [$doc createTextNode ""]
}
return [$doc asXML]
}

rolf

0 new messages