<!-- Settings for the algorithm to be performed
-->
<Algorithm>
<!-- The algorithm type.
-->
<!-- The supported options are:
-->
<!-- - Alg0
-->
<!-- - Alg1
-->
<Type>Alg1</Type>
<!-- the location/path of the input file for this algorithm
-->
<path>./Alg1.in</path>
</Algorithm>
<!-- Relevant information during the processing will be written to a
logfile -->
<Logfile>
<!-- the location/path of the logfile (i.e. where to put the
logfile) -->
<path>c:\tmp</path>
<!-- verbosity level (i.e. how much to print)
-->
<!-- The supported options are:
-->
<!-- - 0 (nothing printed)
-->
<!-- - 1 (print on error)
-->
<verbosity>1</verbosity>
</Logfile>
So there are comments, whitespace etc ... in it.
I would like to be able to put everything into some sort of structure
such that i can access it as:
structure['Algorithm']['Type'] == Alg1
I was wondering if there is something out there that does this.
I found and tried a few things:
1) http://code.activestate.com/recipes/534109-xml-to-python-data-structure/
It simply doesn't work. I get the following error:
raise exception
xml.sax._exceptions.SAXParseException: <unknown>:1:2: not well-formed
(invalid token)
But i removed everything from the file except: <?xml version="1.0"
encoding="UTF-8"?>
and i still got the error.
Anyway, i looked at ElementTree, but that error out with:
xml.parsers.expat.ExpatError: junk after document element: line 19, column 0
Anyway, if anyone can give me advice of point me somewhere i'd greatly
appreciate it.
thanks
matt
That's not XML. XML documents have exactly one root element, i.e. you need
an enclosing element around these two tags.
> So there are comments, whitespace etc ... in it.
> I would like to be able to put everything into some sort of structure
Including the comments or without them? Note that ElementTree will ignore
comments.
> such that i can access it as:
> structure['Algorithm']['Type'] == Alg1
Have a look at lxml.objectify. It allows you to write
alg_type = root.Algorithm.Type.text
and a couple of other niceties.
> I was wondering if there is something out there that does this.
> I found and tried a few things:
> 1) http://code.activestate.com/recipes/534109-xml-to-python-data-structure/
> It simply doesn't work. I get the following error:
> raise exception
> xml.sax._exceptions.SAXParseException:<unknown>:1:2: not well-formed
> (invalid token)
"not well formed" == "not XML".
> But i removed everything from the file except:<?xml version="1.0"
> encoding="UTF-8"?>
> and i still got the error.
That's not XML, either.
> Anyway, i looked at ElementTree, but that error out with:
> xml.parsers.expat.ExpatError: junk after document element: line 19, column 0
In any case, ElementTree is preferable over a SAX based solution, both for
performance and maintainability reasons.
Stefan
Why?
...
> So there are comments, whitespace etc ... in it.
> I would like to be able to put everything into some sort of structure
> such that i can access it as:
> structure['Algorithm']['Type'] == Alg1
Unless I have a very good reason otherwise, I would just put everything
in nested dicts in a .py file to begin with. Or if I needed cross
language portability, a JSON file, which is close to the same thing.
--
Terry Jan Reedy
Hmm, right, good call. There's also the configparser module in the stdlib
which may provide a suitable format here.
Stefan
On 2/21/2011 11:22 AM, Terry Reedy wrote:
> On 2/21/2011 12:30 PM, Matt Funk wrote:
>> Hi,
>> I was wondering if someone had some advice:
>> I want to create a set of xml input files to my code that look as
>> follows:
>
> Why?
mmmh. not sure how to answer this question exactly. I guess it's a
design decision. I am not saying that it is best one, but it seemed
suitable to me. I am certainly open to suggestions. But here are some
requirements:
1) My boss needs to be able to read the input and make sense out of it.
XML seems fairly self explanatory, at least when you choose suitable
names for the properties/tags etc ...
2) I want reproducability of a given run without changes to the code.
I.e. all the inputs need to be stored external to the code such that the
state of the run is captured from the input files entirely.
>
>
> ...
>> So there are comments, whitespace etc ... in it.
>> I would like to be able to put everything into some sort of structure
>> such that i can access it as:
>> structure['Algorithm']['Type'] == Alg1
>
> Unless I have a very good reason otherwise, I would just put
> everything in nested dicts in a .py file to begin with.
That is certainly a possibility and it should work. However, eventually
the input files might be written by a webinterface (likely php). Even
though what you are suggesting is still possible it just feels a little
awkward (i.e. to use php to write a python file).
> Or if I needed cross language portability, a JSON file, which is close
> to the same thing.
>
again, i am certainly not infallible and open to suggestions of there is
a better solution.
thanks
matt
My xml file looks like (which i got from the internet):
<?xml version="1.0"?>
<catalog>
<book id="bk101">
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
<genre>Computer</genre>
<price>44.95</price>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications
with XML.</description>
</book>
</catalog>
Then i try to access it as:
from lxml import etree
from lxml import objectify
inputfile="../input/books.xml"
parser = etree.XMLParser(ns_clean=True)
parser = etree.XMLParser(remove_comments=True)
root = objectify.parse(inputfile,parser)
I try to access elements by (for example):
print root.catalog.book.author.text
But it errors out with:
AttributeError: 'lxml.etree._ElementTree' object has no attribute 'catalog'
So i guess i don't understand why this is.
Also, how can i print all the attributes of the root for examples or
obtain keys?
I really appreciate your help
thanks
matt
On 2/21/2011 10:43 AM, Stefan Behnel wrote:
> Matt Funk, 21.02.2011 18:30:
>> I want to create a set of xml input files to my code that look as
>> follows:
>> So there are comments, whitespace etc ... in it.
>> I would like to be able to put everything into some sort of structure
>
> Including the comments or without them? Note that ElementTree will
> ignore comments.
>
>
>> such that i can access it as:
>> structure['Algorithm']['Type'] == Alg1
>
Change that to
parser = objectify.makeparser(ns_clean=True, remove_comments=True)
> root = objectify.parse(inputfile,parser)
Change that to
root = objectify.parse(inputfile,parser).getroot()
Stefan
parser = objectify.makeparser(ns_clean=True, remove_comments=True)
root = objectify.parse(inputfile,parser).getroot()
print root.catalog.book.author.text
which still gives the following error:
AttributeError: no such child: catalog
matt
root == <catalog>
Try "print root.tag", and read the tutorial.
Stefan
How about skipping the whole xml thing? You can dynamically import any
python module, even if it does not have a python filename. I show an
example from my own code, slightly modified. I just hand the function a
filename, and it tries to import the file. If the input file now
contains variables like
algorithm = 'fast'
I can access the variables with
input = getinput('f.txt')
print input.algorithm.
Good luck,
Paul
+++++++++++++
import imp
def getinput(inputfilename):
"""Parse inputs to program from the given file."""
try:
# http://docs.python.org/library/imp.html#imp.load_source
parameters = imp.load_source("parameters", inputfilename)
except IOError as err:
print >>sys.stderr, '%s: %s' % (str(err), inputfilename)
print >>sys.stderr, 'The specified input file was not found -
exiting'
sys.exit(IO_ERROR)
# Verify presence of all required input parameters, see input.py
example file.
required_parameter_names = ['setting1', 'setting2', 'setting3']
msg = 'Required parameter name not found in input file: {0}'
for parameter_name in required_parameter_names:
assert hasattr(parameters, parameter_name),
msg.format(parameter_name)
return parameters
That sounds like ConfigParser actually suits your needs.
Stefan
Having tried XML for something similar, I would strongly advise against
it. It has been nothing but a nightmare.
XML is acceptable for machine to machine communication where the two
sides cannot agree a common
language in advance or they can't coordinate format changes. Even then
it is slow and verbose.
Use the config module if the configuration is simple to moderately complex.
Consider JSON or Python (source) if your requirements are really
complicated.
Regards
Ian
> How about skipping the whole xml thing? You can dynamically import any python module, even if it does not have a python filename.
Great example!
Can you do the same with a cStringIO based file that exists in memory
vs. on disk? Your example requires a physical file on disk. Is it
possible to accomplish the same using a string vs. a file or temporary
file?
Thank you,
Malcolm (not the OP)
Malcolm,
I never had the interest, so I did not try. Why don't you just try it?
Just insert your StringIO file into the function and see if it works, or
if it crashes and burns. Let me know what you find out!
Anyway, you could always just write a temporary file. It won't consume a
lot of resources, I imagine.
Cheers,
Paul
Again, thanks for the time and effort you put in to answer my questions
(and, in Stefan's case for writing tools and making them available to
everyone) and pointing me in the better direction.
matt
For that, you can provide an XSL stylesheet for the XML file which
will cause it to be displayed in a browser in a nice format.
John Nagle
Or rather CSS, which you will likely need anyway, even if you decide to use
an XSLT before hand. CSS is for layout (headings, tables, positioning
etc.), XSLT is only needed when you require major structural changes or
format conversions, which likely won't be necessary here.
Stefan