Nokogiri sort recursive sub levels and international characters

17 views
Skip to first unread message

Noyan Aydin

unread,
Sep 24, 2017, 12:17:52 PM9/24/17
to nokogiri-talk
Hi,

I'm new to Ruby, fascinated by Ruby and especially Nokogiri. I work for a translation company. The documents that I work with are multilanguage XMLs mostly. 

I need to sort a number of UTF-8 encoded XML index files (from which HTML files are produced externally). Since the files are translated, first level is to be recreated. That is, the entries starts with "a" might need to be moved to letter "b". Also, the sort key needs to be sorted internationally. 

This is a sample. Bold nodes should be re-created. Red entries should move to other parent nodes. The depth of nodes are indeterminated, I found upto 6 levels, some less.  

<?xml version = "1.0" encoding="UTF-8"?>
<ix xmlns="urn:Index-Schema">
  <g t="A" o="002">
    <i t="adım">
      <j l="222"/>
    </i>
    <i t="Base">
      <j l="105"/>
    </i>
    <i t="adil">
      <j l="105"/>
    </i>
  </g>
  <g t="B" o="002">
    <i t="Alarms">
      <i t="stop alarms">
        <i t="alarm list">
          <j l="79"/>
        </i>
      </i>
      <i t="add alarms">
        <j l="82"/>
      </i>
      <i t="silent alarms">
        <j l="80"/>
      </i>
    </i>
  </g>
</ix>

The above XML should be like that: 

<?xml version = "1.0" encoding="UTF-8"?>
<ix xmlns="urn:Index-Schema">
  <g t="A" o="002">
    <i t="adım">
      <j l="222"/>
    </i>
    <i t="adil">
      <j l="105"/>
    </i>
    <i t="Alarms">
      <i t="add alarms">
        <j l="82"/>
      </i>
      <i t="silent alarms">
        <j l="80"/>
      </i>
      <i t="stop alarms">
        <i t="alarm list">
          <j l="79"/>
        </i>
      </i>
    </i>
  </g>
  <g t="B" o="002">
    <i t="Base">
      <j l="105"/>
    </i>
  </g>
</ix>

I need a recursive function, but since first level of nodes g are not to preserved, gets messy. 

What I thought is that: I tried to get root, clear all children of root. I saved to create a new index after sort. I saved a sample of letter node "g" as template. I extracted all sub "i" nodes with xpath. I added to a temp node and sort that temporary node's children. Then after each first letter change of sorted list, I added a node "g" to the root, and the sorted sub nodes to that new letter. 

Even though, I told the story as I succeded, all got messy. I could not sort the temporary node's children internationally. I could not add the list as I described. Any help would be greatly appriciated!
Reply all
Reply to author
Forward
0 new messages