Help Adding Element Declarations to a Internal Subset

53 views
Skip to first unread message

Justin Angel

unread,
Mar 21, 2017, 2:54:12 PM3/21/17
to nokogiri-talk
Hello all,

I'm trying, and failing, to add an element definition to the internal subset of a DTD. Entities are added just fine, I just can't seem to add element definitions. I've tried using Nokogiri::XML::ElementDecl, but that particular class doesn't appear to function the same as EntityDecl which adds the entity definition to the internal subset. Any chance someone would be kind enough to help out a fellow rubyist?

For instance:

require 'nokogiri'

# Create the document
doc = Nokogiri::XML::Document.new()

# SYSTEM External ID
doc.create_internal_subset("arbitrary", nil, "https://127.0.0.1/some.dtd")

# Add an entity to an internal subset
Nokogiri::XML::EntityDecl::new('test', doc, nil, nil, nil, "Dereferenced!")

# Create the root element
doc.root = Nokogiri::XML::Element.new("arbitrary", doc)

# Create a child element
child = Nokogiri::XML::Element.new("child", doc)

# Add the entity reference to the child
child.content = Nokogiri::XML::EntityReference.new(doc, "test")

# Add the child to the root element
doc.root.add_child(child)

puts doc

returns:

<?xml version="1.0"?>
<!DOCTYPE arbitrary SYSTEM "https://127.0.0.1/some.dtd" [
<!ENTITY test "Dereferenced!">
]>
<arbitrary>
  <child>&amp;test;</child>
</arbitrary>

Thanks in advance!

Mike Dalessio

unread,
Mar 21, 2017, 4:55:12 PM3/21/17
to nokogiri-talk
Heya,

Thanks for asking this question! There are two things going on here. First is the fact that `#content=` will escape whatever you give it. Try changing


child.content = Nokogiri::XML::EntityReference.new(doc, "test")

to

child.add_child Nokogiri::XML::EntityReference.new(doc, "test")

and you'll see that the output

<child>&amp;test;</child>

changes to 

<child>&test;</child>


OK, halfway there. Now the next problem is that entity references are only dereferenced by libxml2 only at parse time. Since you added an EntityReference directly to the DOM, an EntityReference it shall sadly remain.

You could try to work around this by re-parsing the document:

doc2 = Nokogiri::XML(doc.to_xml)
puts doc2.to_xml

but you'll see that the reference is still only a reference:

<child>&test;</child>

And so the last secret is revealed: make sure to tell Nokogiri at parse time that you don't want entities in your document:

doc2 = Nokogiri::XML(doc.to_xml) {|config| config.noent}
puts doc2.to_xml

outputs

<?xml version="1.0"?>
<!DOCTYPE arbitrary SYSTEM "https://127.0.0.1/some.dtd" [
<!ENTITY test "Dereferenced!">
]>
<arbitrary>
  <child>Dereferenced!</child>
</arbitrary>


Hope this has been enlightening!

-m




--
You received this message because you are subscribed to the Google Groups "nokogiri-talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nokogiri-talk+unsubscribe@googlegroups.com.
To post to this group, send email to nokogi...@googlegroups.com.
Visit this group at https://groups.google.com/group/nokogiri-talk.
For more options, visit https://groups.google.com/d/optout.

Wayne Brissette

unread,
Mar 21, 2017, 4:58:26 PM3/21/17
to nokogiri-talk

On Mar 21, 2017, at 3:54 PM, Mike Dalessio <mike.d...@gmail.com> wrote:

Hope this has been enlightening!


It has been for me… I’m always discovering things on this list! Thanks Mike. 

-Wayne

Justin Angel

unread,
Mar 22, 2017, 7:09:00 AM3/22/17
to nokogiri-talk
Hi again Mike,

Thanks for always helping out! However, I think my original question was poorly phrased. I still struggle with the XML terminology so my thoughts aren't always very clear. I'll let XML do the talking for me :)

Here is the current XML output (also, thanks for catching that error with the reference being escaped...that probably saved me a few hours):

<?xml version="1.0"?>
<!DOCTYPE arbitrary SYSTEM "https://127.0.0.1/some.dtd" [

    <!ENTITY test "Dereferenced!">

]>
<arbitrary>
  <child>&test;</child>
</arbitrary>

Here is the desired XML output (note the additional element definition within the DTD):

<?xml version="1.0"?>
<!DOCTYPE arbitrary SYSTEM "https://127.0.0.1/some.dtd" [

    <!ENTITY test "Dereferenced!">
    <!ELEMENT new_element ANY>

]>
<arbitrary>
  <child>&test;</child>
</arbitrary>
To unsubscribe from this group and stop receiving emails from it, send an email to nokogiri-tal...@googlegroups.com.

Mike Dalessio

unread,
Mar 24, 2017, 8:46:08 AM3/24/17
to nokogiri-talk
Ah, I see what you're asking. Sorry for the tangent about entities.

Right, so, as it turns out you're probably the first person in the history of Ruby to ever want to do this! Nokogiri doesn't have any functionality to create or manipulate element declarations. The ElementDecl class is pretty much read-only, meaning that if your XML has one, you can parse and inspect it.

I tried and failed to find a workaround for you to do this with the current version of Nokogiri.

Let me noodle on it.


To unsubscribe from this group and stop receiving emails from it, send an email to nokogiri-talk+unsubscribe@googlegroups.com.

Justin Angel

unread,
Mar 24, 2017, 9:23:12 AM3/24/17
to nokogiri-talk
Hi Mike,

As strange as it sounds, this excites me! If required, is there any way that I can assist with the development process? I have a fairly strong understanding of Ruby and beginner level understanding of C (geared toward vulnerability research)....just let me know!

Mike Dalessio

unread,
May 11, 2017, 5:43:35 PM5/11/17
to nokogiri-talk
Justin,

I've created this github issue to track this feature request. Let's continue the conversation there!


-m


To unsubscribe from this group and stop receiving emails from it, send an email to nokogiri-talk+unsubscribe@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages