How to insert EntityReferences into attributes?

28 views
Skip to first unread message

Phil Spiby

unread,
Feb 21, 2018, 4:40:38 PM2/21/18
to nokogiri-talk
I am currently using Nokogiri 1.6.2.1 and finding it impossible to insert entity references into attributes in XML mode.

To provide an example of the problem I have attached the following script:

require 'nokogiri'
xml = Nokogiri::XML::Document.new
elem = Nokogiri::XML::Node.new('elem', xml)
er = Nokogiri::XML::EntityReference.new( xml, 'reg' )
xml.root = elem
xml.root << er
elem['attr'] = er
puts xml

I was expecting:
<?xml version="1.0"?>
<elem attr="&reg;">&reg;</elem>

but get:
<?xml version="1.0"?>
<elem attr="&amp;reg;">&reg;</elem>

Any suggestions?

Mike Dalessio

unread,
Feb 25, 2018, 4:35:18 PM2/25/18
to nokogiri-talk
Hi Phil,

Thanks for asking this question. It definitely pokes at the edges of libxml2's behavior (though worth noting that xerces (the jruby implementation) behaves identically, so this behavior derives from similar interpretations of the XML 1.0 spec).

The key bit, I think, is that "&reg;" is an unknown entity to libxml2's XML parser (though it's known to the HTML parser). The list of known entities in XML is defined here:


but the TL;DR is it's quot, amp, apos, lt, and gt.

It's useful to note that the structure of the attribute is what we'd expect:

```
xml = Nokogiri::XML::Document.new
elem = Nokogiri::XML::Node.new('elem', xml)
xml.root = elem
elem.set_attribute("attr1", Nokogiri::XML::EntityReference.new( xml, 'reg' ))
elem.set_attribute("attr2", Nokogiri::XML::EntityReference.new( xml, 'amp' ))
puts xml.inspect
# => #<Nokogiri::XML::Document:0xf278f8 name="document" children=[#<Nokogiri::XML::Element:0xf277cc name="elem" attributes=[#<Nokogiri::XML::Attr:0xf26df4 name="attr1" value="&reg;">, #<Nokogiri::XML::Attr:0xf26de0 name="attr2" value="&amp;">]>]>
```

(the key bit there being that both the known and unknown entity references are stored similarly, as `value="&reg;"` and `value="&amp;"`). This tells us that the issue is at serialization time -- one is escaped, as it's an unknown entity reference, and the other is left alone.

What I'd like to recommend is that you declare a DTD that will declare this entity. That said, I tried creating an internal subset (DTD) for the doc, and declaring an entity, but that didn't affect the output:

```
xml = Nokogiri::XML::Document.new
xml.create_internal_subset("author", nil, "author.dtd")
xml.create_entity("reg", Nokogiri::XML::EntityDecl::INTERNAL_GENERAL, nil, nil, "Ⓡ")
elem = Nokogiri::XML::Node.new('elem', xml)
er = Nokogiri::XML::EntityReference.new( xml, 'reg' )
xml.root = elem
elem.set_attribute("attr", er)
puts xml
# => <?xml version="1.0"?>
#    <!DOCTYPE author SYSTEM "author.dtd" [
#    <!ENTITY reg "Ⓡ">
#    ]>
#    <elem attr="&amp;reg;"/>
puts xml.inspect
# => #<Nokogiri::XML::Document:0x12f7488 name="document" children=[#<Nokogiri::XML::DTD:0x12f7384 name="author" children=[#<Nokogiri::XML::EntityDecl:0x12f7348 "<!ENTITY reg \"Ⓡ\">\n">]>, #<Nokogiri::XML::Element:0x12f72f8 name="elem" attributes=[#<Nokogiri::XML::Attr:0x12e9b58 name="attr" value="&reg;">]>]>
puts xml.errors.inspect
# => []
```

I'm a bit lost. Anybody on the list know what's going on?


--
You received this message because you are subscribed to the Google Groups "nokogiri-talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nokogiri-talk+unsubscribe@googlegroups.com.
To post to this group, send email to nokogi...@googlegroups.com.
Visit this group at https://groups.google.com/group/nokogiri-talk.
For more options, visit https://groups.google.com/d/optout.

Phil Spiby

unread,
Feb 26, 2018, 4:51:29 AM2/26/18
to nokogiri-talk
Hi Mike,

Thanks for looking into this for me.
I had also looked at declaring internal_subset and entity just in case it was a problem of an unrecognised entity declaration.

If anyone else can identify the source of the problem.
Is this a candidate for a bug report?
if so where and how?
Reply all
Reply to author
Forward
0 new messages