How do I 'lift and shift' whole sections?

4 views
Skip to first unread message

Wayne Brissette

unread,
Jun 23, 2020, 11:17:13 AM6/23/20
to nokogiri-talk
I have the following structure in some XML documents:

<chapter><title>Chapter 1</title>
  <sect1><title>1.1</title></sect1>
  <sect1><title>1.2</title>
    <sect2><title>1.2.1</title></sect2>
    <sect2><title>1.2.2</title><body>body1.2.2</body>
      <sect3><title>1.2.2.1</title></sect3>
      <sect3><title>1.2.2.2</title>
        <sect4><title>1.2.2.2.1</title></sect4>
      </sect3>
      <sect3><title>1.2.3</title>
        <sect4><title>1.2.3.1</title></sect4>
        <sect4><title>1.2.3.2</title></sect4>
      </sect3>
     </sect2>
</sect1>
  <sect1>1.2.4</sect1>
</chapter>

The problem with these is I'm transforming these into another XML flavor that doesn't allow for <section> elements to have other <section> elements in them. 

Chapters also need to be converted to a section element. So because of this, I'm parsing the document as a fragment. 

In the end, the above snippet should look like this: 

<section><title>Chapter 1</title></section>
<section><title>1.1</title></section>
<section><title>1.2</title></section>
<section><title>1.2.1</title></section>
<section><title>1.2.2</title><body>body1.2.2</body></section>
<section><title>1.2.2.1</title></section>
<section><title>1.2.2.2</title></section>
<section><title>1.2.2.2.1</title></section>
<section><title>1.2.3</title></section>
<section><title>1.2.3.1</title></section>
<section><title>1.2.3.2</title></section>
<section>1.2.4</section>

I thought I had the right answer by first changing the sect1 - sect4 element names to section, then going through a loop like this: 

content.css('section > section > section').each do |node|
  children = node.element_children
  if !children.nil?
    node.add_next_sibling(children)
    end
end

That worked fine until I realized that it didn't capture all my other children elements. I tried going through, and storing each section and then trying to write them out as siblings to their parent, but one of two things happened, either I got them out of order (backwards) or just out of order in general because of the processing loop. 


I'm really stuck on this one. knowing there must be an easy way to do this that I'm just not thinking of. 

 -Wayne

wbrisett

unread,
Jun 23, 2020, 12:07:57 PM6/23/20
to nokogiri-talk
Got it! why is it that whenever I post things after struggling for hours I finally 'get it'? 

Just for historic purposes or if anybody else ever needs to do this, here is how I got it to work.


require 'nokogiri'

xml = <<-EOXML
<chapter><title>Chapter 1</title>
  <sect1><title>1.1</title></sect1>
  <sect1><title>1.2</title>
    <sect2><title>1.2.1</title></sect2>
    <sect2><title>1.2.2</title><body>body1.2.2</body>
      <sect3><title>1.2.2.1</title></sect3>
      <sect3><title>1.2.2.2</title>
        <sect4><title>1.2.2.2.1</title></sect4>
      </sect3>
      <sect3><title>1.2.2.3</title>
        <sect4><title>1.2.2.3.1</title></sect4>
        <sect4><title>1.2.2.3.2</title></sect4>
      </sect3>
     </sect2>
</sect1>
  <sect1><title>1.3</title></sect1>
</chapter>
EOXML

content = Nokogiri::XML.fragment(xml)


# need to lift and shift first

content.css('sect1, sect2, sect3, sect4').each do |node|
  node.name = "section"
end

content.css('section > section > section').each do |node|
  children = node.css('section')
  node.add_next_sibling(children)
end

content.css('section > section').each do |node|
  children = node.css('section')
  node.add_next_sibling(children)
end

content.css('section').each do |node|
  children = node.css('section')
    node.add_next_sibling(children)
  end

content.css('chapter').each do |node|
  children = node.css('section')
  node.add_next_sibling(children)
  node.name = "section"
end

p content.to_xml
Reply all
Reply to author
Forward
0 new messages