Parsing an XML feed using ElementTree

500 views
Skip to first unread message

Sithembewena Lloyd Dube

unread,
May 24, 2011, 6:13:31 AM5/24/11
to django...@googlegroups.com
Hi Everyone,

I am trying to parse an XML feed and display the text of each child node without any success. My code in the python shell is as follows:

>>>import urllib
>>>from xml.etree import ElementTree as ET

>>>content = urllib.urlopen('http://xml.matchbook.com/xmlfeed/feed?sport-id=&vendor=TEST&sport-name=&short-name=Po')
>>>xml_content = ET.parse(content)

I then check the xml_content object as follows:

>>>xml_content
<xml.etree.ElementTree.ElementTree instance at 0x01DC14B8>

And now, to iterate through its child nodes and print out the text of each node:

>>>for node in xml_content.getiterator('contest'):
...        name = node.attrib.get('text')
...        print name
...
>>>

Nothing is printed, even though the document does have 'contest' tags with text in them. If I try to count the contest tags and increment an integer (to see that the document is traversed) I get the same result - the int remains at 0.

>>> i = 0
>>> for node in xml_content.getiterator('contest'):
...     i += 1
...
>>> i
0

What am I getting wrong? Any hints would be appreciated.

--
Regards,
Sithembewena Lloyd Dube

Daniel Roseman

unread,
May 24, 2011, 6:42:34 AM5/24/11
to django...@googlegroups.com
This isn't really a Django question...

Nevertheless, the issue is probably in the line "name = node.attrib.get('text')". What this does is get the attribute of the current node that has the name 'text' - ie if your XML was like this:

    <contest text="foo"/>

However, what you probably have is this:

    <contest>foo</contest>

in which case you just want to access the `text` property directly:

    name = node.text

--
DR.

Тимур Зарипов

unread,
May 24, 2011, 11:10:22 AM5/24/11
to django...@googlegroups.com
I'd really reallly suggest you to use lxml library for xml parsing -- it has xpath in it.

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To post to this group, send email to django...@googlegroups.com.
To unsubscribe from this group, send email to django-users...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/django-users?hl=en.

Brian Bouterse

unread,
May 24, 2011, 3:57:36 PM5/24/11
to django...@googlegroups.com
+1 for xpath

I also like using xml.dom.minidom since it is so simple and straightforward.

If you XML is poorly formed go with beautiful soup.

Brian


2011/5/24 Тимур Зарипов <q210...@gmail.com>



--
Brian Bouterse
ITng Services

Masklinn

unread,
May 24, 2011, 4:07:49 PM5/24/11
to django...@googlegroups.com
On 2011-05-24, at 21:57 , Brian Bouterse wrote:
> +1 for xpath
>
> I also like using
> xml.dom.minidom<http://docs.python.org/library/xml.dom.minidom.html>since

> it is so simple and straightforward.
>
I'm sorry, but I whole-heartedly disagree with this. ElementTree is orders of magnitude simpler and more straightforward than the unending pain of working with the DOM interface.

Brian Bouterse

unread,
May 24, 2011, 4:26:46 PM5/24/11
to django...@googlegroups.com
We all have our opinions.  Either way this conversation is OT from Django.

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To post to this group, send email to django...@googlegroups.com.
To unsubscribe from this group, send email to django-users...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/django-users?hl=en.

Sithembewena Lloyd Dube

unread,
May 25, 2011, 8:35:32 AM5/25/11
to django...@googlegroups.com
Hi Everyone,

Thanks for all your suggestions. I read up on gzip and urllib and also learned in the process that I could use urllib2 as its the latest form of that library.

Herewith my solution: I don't know how elegant it is, but it works just fine.

def get_contests():
     url = 'http://xml.matchbook.com/xmlfeed/feed?sport-id=&vendor=TEST&sport-name=&short-name=Po'
     req = urllib2.Request(url)
     req.add_header('accept-encoding','gzip/deflate')
     opener = urllib2.build_opener()
     response = opener.open(req)
     compressed_data = response.read()
     compressed_stream = StringIO.StringIO(compressed_data)
     gzipper = gzip.GzipFile(fileobj=compressed_stream)
     data = gzipper.read()
     current_path = os.path.realpath(MEDIA_ROOT + '/xml-files/d.xml')
     data_file = open(current_path, 'w')
     data_file.write(data)
     data_file.close()
     xml_data = ET.parse(open(current_path, 'r'))
     contest_list = []
     for contest_parent_node in xml_data.getiterator('contest'):
          contest = Contest()
          for contest_child_node in contest_parent_node:
               if (contest_child_node.tag == "name" and contest_child_node.text is not None and contest_child_node.text != ""):
                    contest.name = contest_child_node.text
               if (contest_child_node.tag == "league" and contest_child_node.text is not None and contest_child_node.text != ""):
                   contest.league = contest_child_node.text
               if (contest_child_node.tag == "acro" and contest_child_node.text is not None and contest_child_node.text != ""):
                   contest.acro = contest_child_node.text
               if (contest_child_node.tag == "time" and contest_child_node.text is not None and contest_child_node.text != ""):
                   contest.time = contest_child_node.text
               if (contest_child_node.tag == "home" and contest_child_node.text is not None and contest_child_node.text != ""):
                   contest.home = contest_child_node.text
               if (contest_child_node.tag == "away" and contest_child_node.text is not None and contest_child_node.text != ""):
                   contest.away = contest_child_node.text
          contest_list.append(contest)
     try:
          os.remove(current_path)
     except:
          pass
     return contest_list

Many thanks!
Regards,
Sithembewena Lloyd Dube

Sithembewena Lloyd Dube

unread,
May 25, 2011, 8:37:47 AM5/25/11
to django...@googlegroups.com
P.S: I was aware that I posted a non-django question: I just took the chance that someone here may have needed to do the same.

Thanks!
Reply all
Reply to author
Forward
0 new messages