Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

lxml empty versus self closed tag

314 views
Skip to first unread message

Robin Becker

unread,
Mar 2, 2022, 10:33:43 AM3/2/22
to
I'm using lxml.etree.XMLParser and would like to distinguish

<tag></tag>

from

<tag/>

I seem to have e.getchildren()==[] and e.text==None for both cases. Is there a way to get the first to have e.text==''
--
Robin Becker

Dieter Maurer

unread,
Mar 2, 2022, 2:20:54 PM3/2/22
to
I do not think so (at least not without a DTD):
`<tag/>' is just a shorthand notation for '<tag></tag>' and
the difference has no influence on the DOM.

Note that `lxml` is just a Python binding for `libxml2`.
All the parsing is done by this library.

Robin Becker

unread,
Mar 3, 2022, 4:22:47 AM3/3/22
to
On 02/03/2022 18:39, Dieter Maurer wrote:
> Robin Becker wrote at 2022-3-2 15:32 +0000:
>> I'm using lxml.etree.XMLParser and would like to distinguish
>>
>> <tag></tag>
>>
>> from
>>
>> <tag/>
>>
>> I seem to have e.getchildren()==[] and e.text==None for both cases. Is there a way to get the first to have e.text==''
>
> I do not think so (at least not without a DTD):

I have a DTD which has

<!ELEMENT tag (content)*>

so I guess the empty case is allowed as well as the self closed.

I am converting from an older parser which has text=='' for <tag></tag> and text==None for the self closed version. I
don't think I really need to make the distinction. However, I wonder how lxml can present an empty string content
deliberately or if that always has to be a semantic decision.

> `<t

ag/>' is just a shorthand notation for '<tag></tag>' and
> the difference has no influence on the DOM.
>
> Note that `lxml` is just a Python binding for `libxml2`.
> All the parsing is done by this library.
yes I think I knew that

Dieter Maurer

unread,
Mar 3, 2022, 4:56:16 AM3/3/22
to
Robin Becker wrote at 2022-3-3 09:21 +0000:
>On 02/03/2022 18:39, Dieter Maurer wrote:
>> Robin Becker wrote at 2022-3-2 15:32 +0000:
>>> I'm using lxml.etree.XMLParser and would like to distinguish
>>>
>>> <tag></tag>
>>>
>>> from
>>>
>>> <tag/>
>>>
>>> I seem to have e.getchildren()==[] and e.text==None for both cases. Is there a way to get the first to have e.text==''
>>
>> I do not think so (at least not without a DTD):
>
>I have a DTD which has
>
><!ELEMENT tag (content)*>
>
>so I guess the empty case is allowed as well as the self closed.

Potentially, something changes when `content` contains `PCDATA` (as
one possibility) (but I doubt it).
0 new messages