Help with parsing iTunes XML Document

354 views
Skip to first unread message

omarvelous

unread,
Jan 18, 2010, 11:38:15 PM1/18/10
to nokogiri-talk
Hey, I'm a newb trying to figure out this Nokogiri stuff out. So far
is looks very promising for what I'm trying to achieve. I'm looking to
parse a snippet of my iTunes Library XML document, available at the
following link: http://pastie.org/784283.

As you see this isn't the typical (at least from the various parsing
examples I've found on the web, noko or not) XML format...

Ultimately what I would like to do is capture the Key and the Values,
on each line contained within the "/plist/dict/dict/dict" xpath
corresponding to a "Track" dict element to create a Hash object. Any
Guidance on the best way to go about this? I'm assuming I might have
to parse and create a converted XML document? I figure if I can do
that much, then that actually wouldn't be necessary.

I've tried a few different things, one major problem is that the file
by default has Tabs and New Lines, therefore outputting empty spaces
in my testing... Anyway to ignore those as well??

Any guidance in greatly appreciated!

Thanks in Advance!

Aaron Patterson

unread,
Jan 19, 2010, 2:22:56 AM1/19/10
to nokogi...@googlegroups.com
Hello!

On Mon, Jan 18, 2010 at 8:38 PM, omarvelous <om...@omarvelous.com> wrote:
> Hey, I'm a newb trying to figure out this Nokogiri stuff out. So far
> is looks very promising for what I'm trying to achieve. I'm looking to
> parse a snippet of my iTunes Library XML document, available at the
> following link: http://pastie.org/784283.

Awesome. We can help with this.

> As you see this isn't the typical (at least from the various parsing
> examples I've found on the web, noko or not) XML format...
>
> Ultimately what I would like to do is capture the Key and the Values,
> on each line contained within the "/plist/dict/dict/dict" xpath
> corresponding to a "Track" dict element to create a Hash object. Any
> Guidance on the best way to go about this? I'm assuming I might have
> to parse and create a converted XML document? I figure if I can do
> that much, then that actually wouldn't be necessary.

Yes, I've performed a task similar to this before.

> I've tried a few different things, one major problem is that the file
> by default has Tabs and New Lines, therefore outputting empty spaces
> in my testing... Anyway to ignore those as well??

We can indeed. There is a blank? method that will help. But my
examples below will be more clear.

> Any guidance in greatly appreciated!

There are two ways to go about this. We'll cover the nokogiri version
first (since this is the nokogiri list), followed by an alternate
version. Below is an example that parses your XML file. I've tried
to comment it so it will be self explanatory:

require 'rubygems'
require 'nokogiri'

list = []
doc = Nokogiri::XML(File.open(ARGV[0], 'r'))

# Find each dictionary item and loop through it
doc.xpath('/plist/dict/dict/dict').each do |node|

hash = {}
last_key = nil

# Stuff the key value pairs in to hash. We know a key is followed by
# a value, so we'll just skip blank nodes, save the key, then when we
# find the value, add it to the hash
node.children.each do |child|

next if child.blank? # Don't care about blank nodes

if child.name == 'key'
# Save off the key
last_key = child.text
else
# Use the key we saved
hash[last_key] = child.text
end
end

list << hash # push on to our list
end

# Do something interesting with the list
p list

If you want to be OS X specific, there is actually an easier way. The
iTunes XML file is a PList file (a special format that most mac apps
use). RubyCocoa ships with the OS X, and actually has a built in
function for converting PList XML files in to hashes and lists.

The following example takes advantage of RubyCocoa to do the parsing
for us, then we just deal with normal Ruby hashes and lists:

require 'osx/cocoa'

OSX.load_plist(File.read(ARGV[0]))['Tracks'].each do |id, info|
p info['Artist'] => info['Name']
end

Hope that helps!

--
Aaron Patterson
http://tenderlovemaking.com/

omarvelous

unread,
Jan 20, 2010, 1:13:27 AM1/20/10
to nokogiri-talk
Works like a charm!!! Man your great! I will try to digest this
more... Just read it over... Got it! I completely under stand it
now... So simple...

Thanks a lot Aaron!

On Jan 19, 2:22 am, Aaron Patterson <aaron.patter...@gmail.com> wrote:
> Hello!
>

Reply all
Reply to author
Forward
0 new messages