Re: [nokogiri-talk] begginer question - remove node based on a condition

283 views
Skip to first unread message

Mike Dalessio

unread,
Jun 2, 2013, 1:19:31 PM6/2/13
to nokogiri-talk
Hi,

Thanks for asking this question. The best way to solve this problem is by using XPath queries to look up specific nodes:

```
doc = Nokogiri::XML xml

doc.xpath("//ProjectListItem").each do |item|  # for each node named ProjectListItem
  info = item.xpath("./ProjectInfo").first # find the ProjectInfo node under it
  if info.attribute("Status").value == "Completed" # check the value of the attribue
    item.unlink # and conditionally remove it
  end
end
```

Make sense?


On Sat, Jun 1, 2013 at 4:23 AM, Sebastjan Hribar <sebastja...@gmail.com> wrote:
Hi,

I'm a begginer at nokogiri and I'm stuck with what I imagine should not be difficult.

I need to modifiy an XML file. I've familiarized myself with the XML structure and went through the nokogiri tutorials but I'm having trouble with referencing certain elements in the structure.

I have an XML file which contains a list of projects and I need to completely remove all projects which have the status "Completed":

----------------------------------------------------------------------------------------------------------------------------
    </ProjectListItem>
    <ProjectListItem Guid="d339bbf3-b7e3-4158-9db6-f617456bbb22" ProjectFilePath="177_6_10082012\177_6_10082012.proj">
      <ProjectInfo CompletedAt="2012-08-10T11:00:37.2956776Z" StartedAt="2012-08-01T07:26:44.9643707Z" IsInPlace="false" DueDate="2012-08-10T16:00:00Z" IsImported="false" Name="177_6_10082012" Status="Completed" CreatedBy="user" CreatedAt="2012-08-01T07:24:53.7892586Z" />
      <SettingsBundle Guid="00000000-0000-0000-0000-000000000000">
        <SettingsBundle>
          <SettingsGroup Id="PublishProjectOperationSettings">
            <Setting Id="OrganizationPath">/somepath</Setting>
            <Setting Id="OrganizationIds">someID</Setting>
            <Setting Id="ServerUri">ps.http://server/</Setting>
            <Setting Id="PublicationStatus">Published</Setting>
            <Setting Id="ServerUserName">user</Setting>
            <Setting Id="ServerUserType">WindowsUser</Setting>
            <Setting Id="LastSyncedAt">05/30/2013 19:52:48</Setting>
            <Setting Id="PermissionsDenied">False</Setting>
          </SettingsGroup>
        </SettingsBundle>
      </SettingsBundle>
    </ProjectListItem>
----------------------------------------------------------------------------------------------------------------------------

My poor attempt is this:

require 'nokogiri'

    f = File.open('projects.xml')
    xdoc = Nokogiri::XML(f)
    xdoc.each do |node|
      if node.attribute("Status") == "Completed"
        node.parent.remove
      end
    end
    f.save
    f.close

I need help learning how to reference the elements properly so that I may delete the entire project node. If someone could point me in the right direction I'd very much appreciated.

Kind regards,

seba

--
You received this message because you are subscribed to the Google Groups "nokogiri-talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nokogiri-tal...@googlegroups.com.
To post to this group, send email to nokogi...@googlegroups.com.
Visit this group at http://groups.google.com/group/nokogiri-talk?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Sebastjan Hribar

unread,
Jun 2, 2013, 3:41:00 PM6/2/13
to nokogi...@googlegroups.com
Hi,

thank you for your help. This really helps me understand it better. However, the completed projects don't get removed. I tried opening the file with different parameters but nothing changed.
To test it I've called print on item and all items were printed to the console, so everything until the unlink seems to be ok.

What am I doing wrong?

regards,
seba

Dne nedelja, 02. junij 2013 19:19:31 UTC+2 je oseba Mike Dalessio napisala:

Sebastjan Hribar

unread,
Jun 3, 2013, 2:13:59 PM6/3/13
to nokogi...@googlegroups.com
hi,

I've just tested this code:

require 'nokogiri'

f = File.open('3.xml')
doc = Nokogiri::XML f
doc.xpath("//book").each do |item|       # for each node named ProjectListItem
    item.remove                                     # and conditionally remove it
  end


against this file:

http://msdn.microsoft.com/en-us/library/windows/desktop/ms762271%28v=vs.85%29.aspx


again, nothing happens, but if I call "print item" all book items get printed to the terminal. How do I remove them from the file? Same for projects above.

regards,
seba

Dne sobota, 01. junij 2013 10:23:34 UTC+2 je oseba Sebastjan Hribar napisala:

Hassan Schroeder

unread,
Jun 3, 2013, 2:58:19 PM6/3/13
to nokogi...@googlegroups.com
On Mon, Jun 3, 2013 at 11:13 AM, Sebastjan Hribar
<sebastja...@gmail.com> wrote:

> f = File.open('3.xml')
> doc = Nokogiri::XML f
> doc.xpath("//book").each do |item| # for each node named ProjectListItem
> item.remove # and conditionally remove it
> end

> again, nothing happens, but if I call "print item" all book items get
> printed to the terminal. How do I remove them from the file? Same for
> projects above.

First, you need to add your conditional in there, but I believe you're
primarily confused about IO.

The 'doc' you are manipulating is an in-memory representation that
has no intrinsic relationship with the file from which it was read.

Once you've done whatever to the 'doc', you need to explicitly write
it out to a file (overwriting the old data if desired) to save it.

HTH,
--
Hassan Schroeder ------------------------ hassan.s...@gmail.com
http://about.me/hassanschroeder
twitter: @hassan

Sebastjan Hribar

unread,
Jun 4, 2013, 5:46:50 AM6/4/13
to nokogi...@googlegroups.com

First, you need to add your conditional in there, but I believe you're
primarily confused about IO.

The 'doc' you are manipulating is an in-memory representation that
has no intrinsic relationship with the file from which it was read.

Once you've done whatever to the 'doc', you need to explicitly write
it out to a file (overwriting the old data if desired) to save it.



Thank you for your help. Unfortunatelly I have an additional problem. The code below produces an empty xml file with only xml declaration at the top. (<?xml version="1.0"?>) 

-----------------------------------------------------------------------------------------------------
require 'nokogiri'

f = File.open('projects.xml', "w")

doc = Nokogiri::XML f
doc.xpath("//ProjectListItem").each do |item|   
  info = item.xpath("./ProjectInfo").first  
  if info.attribute("Status").value == "Completed"
    item.unlink                                    
  end
end

f.write(doc)
f.close
-----------------------------------------------------------------------------------------------------


If I do it the way below, the file is overwritten and the completed projects are in fact removed, but the empty line remain.
I need the completed projects removed, but there should ne no emtpy lines.
-----------------------------------------------------------------------------------------------------
require 'nokogiri'

f = File.open('projects.xml', "r")

doc = Nokogiri::XML f
doc.xpath("//ProjectListItem").each do |item|   
  info = item.xpath("./ProjectInfo").first  
  if info.attribute("Status").value == "Completed"
    item.unlink                                    
  end
end

f.close
fin = File.open("projects.xml", "w")
fin.write(doc)
fin.close
-----------------------------------------------------------------------------------------------------

regards,
seba

Hassan Schroeder

unread,
Jun 4, 2013, 12:51:40 PM6/4/13
to nokogi...@googlegroups.com
On Tue, Jun 4, 2013 at 2:46 AM, Sebastjan Hribar
<sebastja...@gmail.com> wrote:

> If I do it the way below, the file is overwritten and the completed projects
> are in fact removed, but the empty line remain.
> I need the completed projects removed, but there should ne no emtpy lines.

Blank lines in the file doesn't really matter when it's XML, but if you
really want to eliminate them:

Either loop through the doc and find empty text nodes and remove
them, or (me being lazy) reparse the modified document using the
NOBLANKS ParseOption, write it out and done :-)

Sebastjan Hribar

unread,
Jun 4, 2013, 1:41:08 PM6/4/13
to nokogi...@googlegroups.com
Dne 04. 06. 2013 18:51, pi�e Hassan Schroeder:
> On Tue, Jun 4, 2013 at 2:46 AM, Sebastjan Hribar
> <sebastja...@gmail.com> wrote:
>
>> If I do it the way below, the file is overwritten and the completed projects
>> are in fact removed, but the empty line remain.
>> I need the completed projects removed, but there should ne no emtpy lines.
> Blank lines in the file doesn't really matter when it's XML, but if you
> really want to eliminate them:
>
> Either loop through the doc and find empty text nodes and remove
> them, or (me being lazy) reparse the modified document using the
> NOBLANKS ParseOption, write it out and done :-)
>
> HTH,
took the lazy path:) thx a lot, works perfectly!

Just a few side notes:
- thank you all for your patience
- I went through the IO again so I hopefully won't ask stupid questions
again in the future
- you're right, I used the XML with blanks and the software in question
actually "repaired" (removed the blanks itself) when it loaded up the xml

regards,
seba

Sebastjan Hribar

unread,
Jun 5, 2013, 4:07:54 AM6/5/13
to nokogi...@googlegroups.com
This is my final version of the method to remove completed projects:

-----------------------------------------------------------------------------
  def remove_completed(f)
    file = File.open(f, "r")
    doc = Nokogiri::XML(file)
    #completed_projects = []####################################

    doc.xpath("//ProjectListItem").each do |item|
      info = item.xpath("./ProjectInfo").first
      if info.attribute("Status").value == "Completed"
        #completed_projects << info.attribute("Name").value###############################
        item.unlink
      end
    end
    #return completed_projects###########################
    file.close

    file = File.open(f, "w")
    file.write(doc)
    file.close

    file = File.open(f, "r")
    doc2 = Nokogiri::XML(file) {|x| x.noblanks}
    file.close

    file = File.open(f, "w")
    file.write(doc2)
    file.close
  end
-----------------------------------------------------------------------------
I have a problem when I use the commented lines as well to populate the array. The array completed_projects gets populated, but items do not get unlinked. If I only unlink the items without generating and populating the array as seen above, everything works fine.
I need this array to use it for moving the respective folders to an archive.

Is this related to nokogiri or have I made a pure ruby mistake?

regards,
seba

Hassan Schroeder

unread,
Jun 5, 2013, 5:40:33 AM6/5/13
to nokogi...@googlegroups.com
On Wed, Jun 5, 2013 at 1:07 AM, Sebastjan Hribar
<sebastja...@gmail.com> wrote:

> return completed_projects ###########################
> file.close

> I have a problem when I use the commented lines as well to populate the
> array. The array completed_projects gets populated, but items do not get
> unlinked. If I only unlink the items without generating and populating the
> array as seen above, everything works fine.
> I need this array to use it for moving the respective folders to an archive.
>
> Is this related to nokogiri or have I made a pure ruby mistake?

As soon as you call `return` your method ends; nothing past that point
is going to be executed. (Try putting some print statements in to see.)

All you need to do is put 'completed_projects' as the last statement
of the method.

BTW, there's no reason to write the file twice; you can reparse the
'doc' in memory before writing it out to the file.

Sebastjan Hribar

unread,
Jun 5, 2013, 3:13:49 PM6/5/13
to nokogi...@googlegroups.com
Dne 05. 06. 2013 11:40, pi�e Hassan Schroeder:
> On Wed, Jun 5, 2013 at 1:07 AM, Sebastjan Hribar
> <sebastja...@gmail.com> wrote:
>
>> return completed_projects ###########################
>> file.close
>> I have a problem when I use the commented lines as well to populate the
>> array. The array completed_projects gets populated, but items do not get
>> unlinked. If I only unlink the items without generating and populating the
>> array as seen above, everything works fine.
>> I need this array to use it for moving the respective folders to an archive.
>>
>> Is this related to nokogiri or have I made a pure ruby mistake?
> As soon as you call `return` your method ends; nothing past that point
> is going to be executed. (Try putting some print statements in to see.)
>
> All you need to do is put 'completed_projects' as the last statement
> of the method.
You're right of course. I found out what the problem was. This is my
first project that I'm trying to do on BDD principles and I'm using
minitest. The problem was in my poorly written test.

> BTW, there's no reason to write the file twice; you can reparse the
> 'doc' in memory before writing it out to the file.
>
How can I reparse an in memory document? Do I even need "doc2" variable
then? Because

doc2 = Nokogiri::XML(doc) {|x| x.noblanks}

where I try to parse the doc which is the in memory document, doesn't
work. I get this:

/home/sebah/.rvm/gems/ruby-1.9.3-p392/gems/nokogiri-1.5.9/lib/nokogiri/xml/document.rb:53:in
`parse': undefined method `empty?' for
#<Nokogiri::XML::Document:0x000000008f5160> (NoMethodError)
from
/home/sebah/.rvm/gems/ruby-1.9.3-p392/gems/nokogiri-1.5.9/lib/nokogiri/xml.rb:33:in
`XML'
from xml1.rb:24:in `remove_completed'
from xml1.rb:33:in `<main>'

Apart from that, my archiver is complete. The final step is the gui
(green_shoes).

regards,
seba

Hassan Schroeder

unread,
Jun 5, 2013, 6:49:19 PM6/5/13
to nokogi...@googlegroups.com
On Wed, Jun 5, 2013 at 12:13 PM, Sebastjan Hribar
<sebastja...@gmail.com> wrote:

> How can I reparse an in memory document? Do I even need "doc2" variable
> then? Because
>
> doc2 = Nokogiri::XML(doc) {|x| x.noblanks}
>
> where I try to parse the doc which is the in memory document, doesn't work.

> /home/sebah/.rvm/gems/ruby-1.9.3-p392/gems/nokogiri-1.5.9/lib/nokogiri/xml/document.rb:53:in
> `parse': undefined method `empty?' for
> #<Nokogiri::XML::Document:0x000000008f5160> (NoMethodError)
> from
> /home/sebah/.rvm/gems/ruby-1.9.3-p392/gems/nokogiri-1.5.9/lib/nokogiri/xml.rb:33:in
> `XML'
> from xml1.rb:24:in `remove_completed'
> from xml1.rb:33:in `<main>'

Can you post (or better, gist) your version that shows that error?

Sebastjan Hribar

unread,
Jun 6, 2013, 6:12:35 AM6/6/13
to nokogi...@googlegroups.com
Dne 06. 06. 2013 00:49, pi�e Hassan Schroeder:
> On Wed, Jun 5, 2013 at 12:13 PM, Sebastjan Hribar
> <sebastja...@gmail.com> wrote:
>
>> How can I reparse an in memory document? Do I even need "doc2" variable
>> then? Because
>>
>> doc2 = Nokogiri::XML(doc) {|x| x.noblanks}
>>
>> where I try to parse the doc which is the in memory document, doesn't work.
>> /home/sebah/.rvm/gems/ruby-1.9.3-p392/gems/nokogiri-1.5.9/lib/nokogiri/xml/document.rb:53:in
>> `parse': undefined method `empty?' for
>> #<Nokogiri::XML::Document:0x000000008f5160> (NoMethodError)
>> from
>> /home/sebah/.rvm/gems/ruby-1.9.3-p392/gems/nokogiri-1.5.9/lib/nokogiri/xml.rb:33:in
>> `XML'
>> from xml1.rb:24:in `remove_completed'
>> from xml1.rb:33:in `<main>'
> Can you post (or better, gist) your version that shows that error?
>
here it is:
https://gist.github.com/sebastjan-hribar/5720554

regards
seba

Hassan Schroeder

unread,
Jun 6, 2013, 9:07:42 AM6/6/13
to nokogi...@googlegroups.com
On Thu, Jun 6, 2013 at 3:12 AM, Sebastjan Hribar
<sebastja...@gmail.com> wrote:

>>> How can I reparse an in memory document?

> https://gist.github.com/sebastjan-hribar/5720554

To eliminate the 'doc2' variable, replace line 28 with

file.write( Nokogiri::XML( doc.serialize ) { |x| x.noblanks } )

HTH,

Sebastjan Hribar

unread,
Jun 6, 2013, 2:33:43 PM6/6/13
to nokogi...@googlegroups.com
Dne 06. 06. 2013 15:07, pi�e Hassan Schroeder:
> On Thu, Jun 6, 2013 at 3:12 AM, Sebastjan Hribar
> <sebastja...@gmail.com> wrote:
>
>>>> How can I reparse an in memory document?
>> https://gist.github.com/sebastjan-hribar/5720554
> To eliminate the 'doc2' variable, replace line 28 with
>
> file.write( Nokogiri::XML( doc.serialize ) { |x| x.noblanks } )
>
> HTH,
last piece of the puzzle:)
thank you very much!
regards,
seba
Reply all
Reply to author
Forward
0 new messages