Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
xml data or other?
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  10 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Artie Ziff  
View profile  
 More options Nov 9 2012, 7:54 am
Newsgroups: comp.lang.python
From: Artie Ziff <artie.z...@gmail.com>
Date: Fri, 09 Nov 2012 04:54:43 -0800
Local: Fri, Nov 9 2012 7:54 am
Subject: xml data or other?
Hello,

I want to process XML-like data like this:

<testname=ltpacpi.sh>
        <description>
                ACPI (Advanced Control Power & Integration) testscript for 2.5 kernels.

        <\description>
        <test_location>
                ltp/testcases/kernel/device-drivers/acpi/ltpacpi.sh
        <\test_location>
<\testname>

After manually editing the data above, the python module
xml.etree.ElementTree parses it without failing due to error in the data
structure.

Edits were substituting '/' for '\' on the end tags, and adding the
following structure:

<?xml version="1.0"?>
<data>
   <testname name=ltpacpi.sh>
     ...
   <\testname>
</data>

Is there a name for the format above (perhaps xhtml)?
I'd like to find a python module that can translate it to proper xml.
Does one exist? etree?

Many thanks!
az


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
rusi  
View profile  
 More options Nov 9 2012, 8:50 am
Newsgroups: comp.lang.python
From: rusi <rustompm...@gmail.com>
Date: Fri, 9 Nov 2012 05:50:15 -0800 (PST)
Local: Fri, Nov 9 2012 8:50 am
Subject: Re: xml data or other?
On Nov 9, 5:54 pm, Artie Ziff <artie.z...@gmail.com> wrote:

> Hello,

> I want to process XML-like data like this:
<snipped>
> Edits were substituting '/' for '\' on the end tags, and adding the
> following structure:

If thats all you want, you can try the following:

# obviously this should come from a file
input= """<testname=ltpacpi.sh>
        <description>
                ACPI (Advanced Control Power & Integration) testscript
for 2.5 kernels.

        <\description>
        <test_location>
                ltp/testcases/kernel/device-drivers/acpi/ltpacpi.sh
        <\test_location>
<\testname>"""

prefix = """<?xml version="1.0"?>
<data>
"""

postfix = """</data>"""

correctedInput = prefix + input.replace("\\", "/") + postfix
# submit correctedinput to etree


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
shivers.p...@yahoo.co.uk  
View profile  
 More options Nov 13 2012, 9:05 am
Newsgroups: comp.lang.python
From: shivers.p...@yahoo.co.uk
Date: Tue, 13 Nov 2012 06:05:49 -0800 (PST)
Local: Tues, Nov 13 2012 9:05 am
Subject: Re: xml data or other?

maybe an xml tool would be better, a good list of xml tools here; http://www.xml-data.info

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
shivers.p...@yahoo.co.uk  
View profile  
 More options Nov 13 2012, 9:06 am
Newsgroups: comp.lang.python
From: shivers.p...@yahoo.co.uk
Date: Tue, 13 Nov 2012 06:05:49 -0800 (PST)
Local: Tues, Nov 13 2012 9:05 am
Subject: Re: xml data or other?

maybe an xml tool would be better, a good list of xml tools here; http://www.xml-data.info

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Dave Angel  
View profile  
 More options Nov 13 2012, 1:01 pm
Newsgroups: comp.lang.python
From: Dave Angel <d...@davea.name>
Date: Tue, 13 Nov 2012 13:01:17 -0500
Local: Tues, Nov 13 2012 1:01 pm
Subject: Re: xml data or other?
On 11/09/2012 07:54 AM, Artie Ziff wrote:

The only word I can think of is "broken."  xml and html and xhtml all
use forward slashes.

> I'd like to find a python module that can translate it to proper xml.
> Does one exist? etree?

I think you've already figured it out.    Just take your description and
turn it into Python.  in other words, replace all "<\" with "</" and
perhaps " \>" with " /", although your example doesn't happen to have
any of these.  Tack a  xml header on, and try to parse it with etree.
If you can't, then let someone manually fix it.

Or better, fix the program upstream that's creating this mess.  There
isn't a reliable way to "fix" all the possible broken xml it might be
creating, without reverse engineering it.

--

DaveA


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Artie Ziff  
View profile  
 More options Nov 18 2012, 8:32 am
Newsgroups: comp.lang.python
From: Artie Ziff <artie.z...@gmail.com>
Date: Sun, 18 Nov 2012 05:32:26 -0800
Local: Sun, Nov 18 2012 8:32 am
Subject: Re: xml data or other?
On 11/9/12 5:50 AM, rusi wrote:
> On Nov 9, 5:54 pm, Artie Ziff <artie.z...@gmail.com> wrote:
> # submit correctedinput to etree

I was very grateful to get the "leg up" on getting started down that
right path with my coding. Many thanks to you, rusi. I took your
excellent advices and have this working.

class Converter():
     PREFIX = """<?xml version="1.0"?>
     <data>
     """
     POSTFIX = "</data>"
     def __init__(self, data):
         self.data = data
         self.writeXML()
     def writeXML(self):
         pattern = re.compile('<testname=(.*)>')
         replaceStr = r'<testname name="\1">'
         xmlData = re.sub(pattern, replaceStr, self.data)
         self.dataXML = self.PREFIX + xmlData.replace("\\", "/") +
self.POSTFIX

###  main
# input to script is directory:
# sanitize trailing slash
testPkgDir = sys.argv[1].rstrip('/')
# Within each test package directory is doc/testcase
tcDocDir = "doc/testcases"
# set input dir, containing broken files
tcTxtDir = os.path.join(testPkgDir, tcDocDir)
# set output dir, to write proper XML files
tcXmlDir = os.path.join(testPkgDir, tcDocDir + "_XML")
if not os.path.exists(tcXmlDir):
     os.makedirs(tcXmlDir)
# iterate through files in input dir
for filename in os.listdir(tcTxtDir):
     # set filepaths
     filepathTXT = os.path.join(tcTxtDir, filename)
     base = os.path.splitext(filename)[0]
     fileXML = base + ".xml"
     filepathXML = os.path.join(tcXmlDir, fileXML)
     # read broken file, convert to proper XML
     with open(filepathTXT) as f:
         c = Converter(f.read())
         xmlFO = open(filepathXML, 'w')   # xmlFileObject
         xmlFO.write(c.dataXML)
         xmlFO.close()

###

Writing XML files so to see whats happening. My plan is to
keep xml data in memory and parse with xml.etree.ElementTree.

Unfortunately, xml parsing fails due to angle brackets inside
description tags. In particular, xml.etree.ElementTree.parse()
aborts on '<' inside xml data such as the following:

<testname name="cron_test.sh">
     <description>
         This testcase tests if crontab <filename> installs the cronjob
         and cron schedules the job correctly.
     <\description>

##

What is right way to handle the extra angle brackets?
Substitute on line-by-line basis, if that works?
Or learn to write a simple stack-style parser, or
recursive descent, it may be called?

I am open to comments to improve my code more to be more readable,
pythonic, or better.

Many thanks
AZ


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
rusi  
View profile  
 More options Nov 18 2012, 10:54 am
Newsgroups: comp.lang.python
From: rusi <rustompm...@gmail.com>
Date: Sun, 18 Nov 2012 07:54:05 -0800 (PST)
Local: Sun, Nov 18 2012 10:54 am
Subject: Re: xml data or other?
On Nov 18, 6:32 pm, Artie Ziff <artie.z...@gmail.com> wrote:

Start with cgi.escape perhaps?
http://docs.python.org/2/library/cgi.html

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
rusi  
View profile  
 More options Nov 18 2012, 10:58 am
Newsgroups: comp.lang.python
From: rusi <rustompm...@gmail.com>
Date: Sun, 18 Nov 2012 07:58:12 -0800 (PST)
Local: Sun, Nov 18 2012 10:58 am
Subject: Re: xml data or other?
On Nov 18, 8:54 pm, rusi <rustompm...@gmail.com> wrote:

> Start with cgi.escape perhaps?http://docs.python.org/2/library/cgi.html

This may be a better link for starters
http://wiki.python.org/moin/EscapingHtml
(Note the escaping xml at the bottom)

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Prasad, Ramit  
View profile  
 More options Nov 19 2012, 5:06 pm
Newsgroups: comp.lang.python
From: "Prasad, Ramit" <ramit.pra...@jpmorgan.com>
Date: Mon, 19 Nov 2012 21:42:00 +0000
Subject: RE: xml data or other?

I think your description text should be in a CDATA section.
http://en.wikipedia.org/wiki/CDATA#CDATA_sections_in_XML

~Ramit

This email is confidential and subject to important disclaimers and
conditions including on offers for the purchase or sale of
securities, accuracy and completeness of information, viruses,
confidentiality, legal privilege, and legal entity disclaimers,
available at http://www.jpmorgan.com/pages/disclosures/email.  


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Stefan Behnel  
View profile  
 More options Nov 20 2012, 12:48 am
Newsgroups: comp.lang.python
From: Stefan Behnel <stefan...@behnel.de>
Date: Tue, 20 Nov 2012 06:48:20 +0100
Local: Tues, Nov 20 2012 12:48 am
Subject: Re: xml data or other?
Prasad, Ramit, 19.11.2012 22:42:

Ah, don't bother with CDATA. Just make sure the data gets properly escaped,
any XML serialiser will do that for you. Just generate the XML using
ElementTree and you'll be fine. Generating XML as literal text is not a
good idea.

Stefan


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »