What YAML engine do you use?

20 views
Skip to first unread message

Reinhold Birkenfeld

unread,
Jan 20, 2005, 12:09:13 PM1/20/05
to
Hello,

I know that there are different YAML engines for Python out there (Syck,
PyYaml, more?).

Which one do you use, and why?

For those of you who don't know what YAML is: visit http://yaml.org/!
You will be amazed, and never think of XML again. Well, almost.

Reinhold

Diez B. Roggisch

unread,
Jan 20, 2005, 12:20:33 PM1/20/05
to
> I know that there are different YAML engines for Python out there (Syck,
> PyYaml, more?).
>
> Which one do you use, and why?

I first used yaml, tried to migrate to syck. What I like about syck is that
it is faster and doesn't try to create objects but only dicts - but it
crashed if the number of yaml objects grew larger. So I still use yaml.

>
> For those of you who don't know what YAML is: visit http://yaml.org/!
> You will be amazed, and never think of XML again. Well, almost.

It is certainly nice.

--
Regards,

Diez B. Roggisch

Jonas Galvez

unread,
Jan 20, 2005, 12:24:32 PM1/20/05
to pytho...@python.org
Diez B. Roggisch wrote:
> I first used yaml, tried to migrate to syck. What I like about
> syck is that it is faster and doesn't try to create objects but
> only dicts - but it crashed if the number of yaml objects grew
> larger. So I still use yaml.

Hmm.. I've never had any problems with syck. In fact, I'm using it in
a small project now where I store a helluva of data in yaml files...

Strange.

Istvan Albert

unread,
Jan 20, 2005, 12:34:33 PM1/20/05
to
Reinhold Birkenfeld wrote:

> You will be amazed, and never think of XML again.

XML with elementtree is what makes me never have think about XML again.

Istvan.

Irmen de Jong

unread,
Jan 20, 2005, 1:14:19 PM1/20/05
to
Istvan Albert wrote:
> XML with elementtree is what makes me never have think about XML again.

+1 QOTW

-Irmen

Paul Rubin

unread,
Jan 21, 2005, 1:23:24 AM1/21/05
to
Reinhold Birkenfeld <reinhold-birk...@wolke7.net> writes:
> For those of you who don't know what YAML is: visit http://yaml.org/!
> You will be amazed, and never think of XML again. Well, almost.

Oh please no, not another one of these. We really really don't need it.

rm

unread,
Jan 21, 2005, 12:30:47 PM1/21/05
to

well, I did look at it, and as a text format is more readable than XML
is. Furthermore, XML's verbosity is incredible. This format is not.
People are abusing the genericity of XML to put everything into it.

Parsing and working with XML are highly optimized, so there's not really
a problem in that sector. But to transfer the same data in a YAML
format, rather than a XML format is much more economic. But networks are
getting faster, right?

Nowadays, people are trying to create binary XML, XML databases,
graphics in XML (btw, I'm quite impressed by SVG), you have XSLT, you
have XSL-FO, ... .

And I think, YAML is a nice initiative.

bye,
rm

Fredrik Lundh

unread,
Jan 21, 2005, 12:54:50 PM1/21/05
to pytho...@python.org
"rm" wrote:

> well, I did look at it, and as a text format is more readable than XML is.

judging from http://yaml.org/spec/current.html (750k), the YAML designers are
clearly insane. that's the most absurd software specification I've ever seen. they
need help, not users.

</F>

A.M. Kuchling

unread,
Jan 21, 2005, 1:04:10 PM1/21/05
to
On Fri, 21 Jan 2005 18:30:47 +0100,
rm <r...@rm.net> wrote:
> Nowadays, people are trying to create binary XML, XML databases,
> graphics in XML (btw, I'm quite impressed by SVG), you have XSLT, you
> have XSL-FO, ... .

Which is an argument in favor of XML -- it's where the activity is, so it's
quite likely you'll encounter the need to know XML. Few projects use YAML,
so the chance of having to know its syntactic details is small.

--amk

A.M. Kuchling

unread,
Jan 21, 2005, 2:20:03 PM1/21/05
to
On Fri, 21 Jan 2005 18:54:50 +0100,
Fredrik Lundh <fre...@pythonware.com> wrote:
> judging from http://yaml.org/spec/current.html (750k), the YAML designers are
> clearly insane. that's the most absurd software specification I've ever seen. they
> need help, not users.

IMHO that's a bit extreme. Specifications are written to be detailed, so
consequently they're torture to read. Seen the ReStructured Text spec
lately?

The basic idea -- a data dumping format that's human-readable -- isn't a bad
one. OTOH, I can't recall wanting such a thing -- when I want readable
output I'm happy using
unreadable pickle files, unpickling the object and calling a .dump() or
.as_text() method.)

But YAML seems to have started out with the goal of being human-writable,
something you would write in Emacs, and that seems to have gotten lost; the
format is now just as complicated as Restructured Text, but more cryptic
(the URI namespacing for tags, for example), not really simpler than
XML and in some ways weaker (e.g. only two encodings supported, more
complicated escaping rules).

For a pure Python application, I can't see a need for YAML; use
pickle/cPickle instead, because they're already there. Exchanging
serialized objects between Python/Perl/Ruby scripts might be a good use case
for YAML, but XML has wider software support and S-expressions are simpler,
so my inclination would be to use them instead of YAML.

--amk

Reinhold Birkenfeld

unread,
Jan 21, 2005, 2:26:36 PM1/21/05
to
A.M. Kuchling wrote:
> On Fri, 21 Jan 2005 18:54:50 +0100,
> Fredrik Lundh <fre...@pythonware.com> wrote:
>> judging from http://yaml.org/spec/current.html (750k), the YAML designers are
>> clearly insane. that's the most absurd software specification I've ever seen. they
>> need help, not users.
>
> IMHO that's a bit extreme. Specifications are written to be detailed, so
> consequently they're torture to read. Seen the ReStructured Text spec
> lately?

Agreed. If you just want to use it, you don't need the spec anyway.

> The basic idea -- a data dumping format that's human-readable -- isn't a bad
> one. OTOH, I can't recall wanting such a thing -- when I want readable
> output I'm happy using
> unreadable pickle files, unpickling the object and calling a .dump() or
> .as_text() method.)
>
> But YAML seems to have started out with the goal of being human-writable,
> something you would write in Emacs,

Exactly. I use it as a format for config files the user can edit
directly without much thinking (the explanation on top of the file are 3
lines).

> and that seems to have gotten lost; the
> format is now just as complicated as Restructured Text, but more cryptic
> (the URI namespacing for tags, for example), not really simpler than
> XML and in some ways weaker (e.g. only two encodings supported, more
> complicated escaping rules).

In most cases you don't need the complicated things, and the
http://www.yaml.org/refcard.html isn't very complex either.

Reinhold

Fredrik Lundh

unread,
Jan 21, 2005, 3:59:57 PM1/21/05
to pytho...@python.org
A.M. Kuchling wrote:

> IMHO that's a bit extreme. Specifications are written to be detailed, so
> consequently they're torture to read. Seen the ReStructured Text spec
> lately?

I've read many specs; YAML (both the spec and the format) is easily
among the worst ten-or-so specs I've ever seen.

ReST and YAML share the same deep flaw: both formats are marketed
as simple, readable formats, and at a first glance, they look simple and read-
able -- but in reality, they're messy as hell, and chances are that the thing
you're looking at doesn't really mean what you think it means (unless you're
the official ReST/YAML parser implementation). experienced designers
know how to avoid that; the ReST/YAML designers don't even understand
why they should.

> But YAML seems to have started out with the goal of being human-writable,
> something you would write in Emacs, and that seems to have gotten lost; the
> format is now just as complicated as Restructured Text, but more cryptic
> (the URI namespacing for tags, for example), not really simpler than
> XML and in some ways weaker (e.g. only two encodings supported, more
> complicated escaping rules).

http://www.modelsmodelsmodels.biz/images/hmo033.jpg

</F>

Daniel Bickett

unread,
Jan 21, 2005, 4:20:54 PM1/21/05
to pytho...@python.org
Istvan Albert wrote:
> XML with elementtree is what makes me never have think about XML again.

I second that. I heard about yaml and I read into it, but when I tried
to use it I didn't seem to get in touch with all of the glory
surrounding it. The yaml module -- when I tried to use it -- was very
error prone, and simply didn't work. I didn't have the time to go
through and try to tweak it because I was pressed for time and need a
quick solution. As for syck, I don't know if it was just me, but when
I downloaded it I got a whole lot of directories with obscure names
and files with .c extensions. So, discouraged, I gave up on yaml.

Elementtree, on the other hand, is wonderful :)

Irmen de Jong wrote:
> +1 QOTW

I second that, as well.

here's-to-appreciating-the-end-without-having-to-be-interested-in-the-means-ly
y'rs
Daniel Bickett

Bengt Richter

unread,
Jan 21, 2005, 8:29:09 PM1/21/05
to

<rant>
I thought XML was a good idea, but IMO requiring quotes around
even integer attribute values was an unfortunate decision. I don't buy
their rationale of keeping parsing simple -- as if extracting a string
with no embedded space from between an equal sign and terminating white
space were that much harder than extracting the same delimited by double quotes.
The result is cluttering SVG with needless cruft around numerical graphics parameters.
</rant>

OTOH, I think the HTML XML spec is very readable, and nicely designed.
At least the version 1.0 spec I snagged from W3C a long time ago.
... I see the third edition at http://www.w3.org/TR/REC-xml/ is differently styled,
(I guess new style sheets) but still pretty readable (glancing at it now).

Regards,
Bengt Richter

Peter Hansen

unread,
Jan 21, 2005, 8:36:33 PM1/21/05
to
A.M. Kuchling wrote:
> On Fri, 21 Jan 2005 18:54:50 +0100,
> Fredrik Lundh <fre...@pythonware.com> wrote:
>>judging from http://yaml.org/spec/current.html (750k), the YAML designers are
>>clearly insane. that's the most absurd software specification I've ever seen. they
>>need help, not users.
>
> IMHO that's a bit extreme. Specifications are written to be detailed, so
> consequently they're torture to read. Seen the ReStructured Text spec
> lately?
[...]
> But YAML ... the format is now ... not really simpler than

> XML and in some ways weaker (e.g. only two encodings supported, more
> complicated escaping rules).

As I recall, one of the key original goals for XML was that the
parsers be relatively easy to write (relative to SGML).

Judging by that YAML spec, I can imagine that a YAML parser could
well be much more difficult to write than an XML parser would be.

Anyone have personal experience with this?

(Yes, I know people don't write parsers as often as they use
them, and that's probably some of the justification behind YAML,
but looking at that YAML spec, I find it hard to imagine I could
ever remember enough of it to write a YAML file by hand, and
yet I can and do write XML files by hand often.)

-Peter

Fredrik Lundh

unread,
Jan 22, 2005, 8:49:59 AM1/22/05
to pytho...@python.org
Reinhold Birkenfeld wrote:

> Agreed. If you just want to use it, you don't need the spec anyway.

but the guy who wrote the parser you're using had to read it, and understand it.
judging from the number of crash reports you see in this thread, chances are that
he didn't.

</F>

Steve Holden

unread,
Jan 22, 2005, 10:34:08 AM1/22/05
to
Bengt Richter wrote:

> On Fri, 21 Jan 2005 12:04:10 -0600, "A.M. Kuchling" <a...@amk.ca> wrote:
>
>
>>On Fri, 21 Jan 2005 18:30:47 +0100,
>> rm <r...@rm.net> wrote:
>>
>>>Nowadays, people are trying to create binary XML, XML databases,
>>>graphics in XML (btw, I'm quite impressed by SVG), you have XSLT, you
>>>have XSL-FO, ... .
>>
>>Which is an argument in favor of XML -- it's where the activity is, so it's
>>quite likely you'll encounter the need to know XML. Few projects use YAML,
>>so the chance of having to know its syntactic details is small.
>>
>
> <rant>
> I thought XML was a good idea, but IMO requiring quotes around
> even integer attribute values was an unfortunate decision. I don't buy
> their rationale of keeping parsing simple -- as if extracting a string
> with no embedded space from between an equal sign and terminating white
> space were that much harder than extracting the same delimited by double quotes.

It isn't that much harder, but if there are two ways to do the same
thing then effectively one of them has to become a special case, thereby
complicating the code that has to handle it (in this case the parser).

"There should be one (and preferably only one) ..." should be a familiar
mantra around here :-)

> The result is cluttering SVG with needless cruft around numerical graphics parameters.
> </rant>
>

It seems to me the misunderstanding here is that XML was ever intended
to be generated directly by typing in a text editor. It was rather
intended (unless I'm mistaken) as a process-to-process data interchange
metalanguage that would be *human_readable*.

Tools that *create* XML are perfectly at liberty not to require quotes
around integer values.

> OTOH, I think the HTML XML spec is very readable, and nicely designed.
> At least the version 1.0 spec I snagged from W3C a long time ago.

> .... I see the third edition at http://www.w3.org/TR/REC-xml/ is differently styled,


> (I guess new style sheets) but still pretty readable (glancing at it now).
>
> Regards,
> Bengt Richter

regards
Steve
--
Steve Holden http://www.holdenweb.com/
Python Web Programming http://pydish.holdenweb.com/
Holden Web LLC +1 703 861 4237 +1 800 494 3119

Doug Holton

unread,
Jan 22, 2005, 2:18:57 PM1/22/05
to
Fredrik Lundh wrote:

> A.M. Kuchling wrote:
>
>
>>IMHO that's a bit extreme. Specifications are written to be detailed, so
>>consequently they're torture to read. Seen the ReStructured Text spec
>>lately?
>
>
> I've read many specs; YAML (both the spec and the format) is easily
> among the worst ten-or-so specs I've ever seen.

What do you expect? YAML is designed for humans to use, XML is not.
YAML also hasn't had the backing and huge community behind it like XML.
XML sucks for people to have to write in, but is straightforward to
parse. The consequence is hordes of invalid XML files, leading to
necessary hacks like the mark pilgrim's universal rss parser. YAML
flips the problem around, making it harder perhaps to implement a
universal parser, but better for the end-user who has to actually use
it. More people need to work on improving the YAML spec and
implementing better YAML parsers. We've got too many XML parsers as it is.

rm

unread,
Jan 22, 2005, 2:31:34 PM1/22/05
to
Doug Holton wrote:

> What do you expect? YAML is designed for humans to use, XML is not.
> YAML also hasn't had the backing and huge community behind it like XML.
> XML sucks for people to have to write in, but is straightforward to
> parse. The consequence is hordes of invalid XML files, leading to
> necessary hacks like the mark pilgrim's universal rss parser. YAML
> flips the problem around, making it harder perhaps to implement a
> universal parser, but better for the end-user who has to actually use
> it. More people need to work on improving the YAML spec and
> implementing better YAML parsers. We've got too many XML parsers as it is.

100% right on, stuff (like this)? should be easy on the users, and if
possible, on the developers, not the other way around. But developers
come second. Now, I didn't check the specs, they might be difficult,
they might be incorrect, maybe their stated goal is not reached with
this implementation of their idea. But I'd love to see a generic,
pythonic data format.

bye,
rm

Fredrik Lundh

unread,
Jan 22, 2005, 2:53:17 PM1/22/05
to pytho...@python.org
"rm" <r...@rm.net> wrote:

> 100% right on, stuff (like this)? should be easy on the users, and if possible, on the developers,
> not the other way around.

I guess you both stopped reading before you got to the second paragraph
in my post. YAML (at least the version described in that spec) isn't easy on
users; it may look that way at a first glance, and as long as you stick to a
small subset, but it really isn't. that's not just bad design, that's plain evil.

and trust me, when things are hard to get right for developers, users will
suffer too.

</F>

rm

unread,
Jan 22, 2005, 3:00:04 PM1/22/05
to
you stopped reading too early as well, I guess:

"maybe their stated goal is not reached with this implementation of
their idea"

and the implementation being the spec,

furthermore, "users will suffer too", I'm suffering if I have to use
C++, with all its exceptions and special cases.

BTW, I pickpocketed the idea that if there is a choice where to put the
complexity, you never put it with the user. "pickpocket" is strong, I've
learned it from an analyst who was 30 years in the business, and I
really respect the guy, basically he was always right and right on. On
the other hand, the management did not always like what he thought :-)

bye,
rm

Doug Holton

unread,
Jan 22, 2005, 3:14:08 PM1/22/05
to
Fredrik Lundh wrote:
> and trust me, when things are hard to get right for developers, users will
> suffer too.

That is exactly why YAML can be improved. But XML proves that getting
it "right" for developers has little to do with getting it right for
users (or for saving bandwidth). What's right for developers is what
requires the least amount of work. The problem is, that's what is right
for end-users, too.

Doug Holton

unread,
Jan 22, 2005, 3:18:10 PM1/22/05
to
rm wrote:
> this implementation of their idea. But I'd love to see a generic,
> pythonic data format.

That's a good idea. But really Python is already close to that. A lot
of times it is easier to just write out a python dictionary than using a
DB or XML or whatever. Python is already close to YAML in some ways.
Maybe even better than YAML, especially if Fredrik's claims of YAML's
inherent unreliability are to be believed. Of course he develops a
competing XML product, so who knows.

Fredrik Lundh

unread,
Jan 22, 2005, 3:35:11 PM1/22/05
to pytho...@python.org
"rm" <r...@rm.net> wrote:

> furthermore, "users will suffer too", I'm suffering if I have to use C++, with all its exceptions
> and special cases.

and when you suffer, your users will suffer. in the C++ case, they're likely to
suffer from spurious program crashes, massively delayed development projects,
obscure security holes, etc.

</F>

Daniel Bickett

unread,
Jan 22, 2005, 3:41:10 PM1/22/05
to pytho...@python.org
Doug Holton wrote:
> What do you expect? YAML is designed for humans to use, XML is not.
> YAML also hasn't had the backing and huge community behind it like XML.
> XML sucks for people to have to write in, but is straightforward to
> parse. The consequence is hordes of invalid XML files, leading to
> necessary hacks like the mark pilgrim's universal rss parser. YAML
> flips the problem around, making it harder perhaps to implement a
> universal parser, but better for the end-user who has to actually use
> it. More people need to work on improving the YAML spec and
> implementing better YAML parsers. We've got too many XML parsers as it is.

However, one of the main reasons that XML is so successful is because
it's roots are shared by (or, perhaps, in) a markup language that a
vast majority of the Internet community knows: HTML.

In it's most basic form, I don't care what anyone says, XML is VERY
straight forward. Throughout the entire concept of XML (again, in its
most basic form) the idea of opening and closing tags (with the
exception of the standalone tags, however still very simple) is
constant, for all different data types.

In my (brief) experience with YAML, it seemed like there were several
different ways of doing things, and I saw this as one of it's failures
(since we're all comparing it to XML). However I maintain, in spite of
all of that, that it can easily boil down to the fact that, for
someone who knows the most minuscule amount of HTML (a very easy thing
to do, not to mention most people have a tiny bit of experience to
boot), the transition to XML is painless. YAML, however, is a brand
new format with brand new semantics.

As for the human read-and-write-ability, I don't know about you, but I
have no trouble whatsoever reading and writing XML. But alas, I don't
need to. Long live elementtree (once again) :-)

Daniel Bickett

Paul Rubin

unread,
Jan 22, 2005, 4:25:10 PM1/22/05
to
Daniel Bickett <dbic...@gmail.com> writes:
> In my (brief) experience with YAML, it seemed like there were several
> different ways of doing things, and I saw this as one of it's failures
> (since we're all comparing it to XML).

YAML looks to me to be completely insane, even compared to Python
lists. I think it would be great if the Python library exposed an
interface for parsing constant list and dict expressions, e.g.:

[1, 2, 'Joe Smith', 8237972883334L, # comment
{'Favorite fruits': ['apple', 'banana', 'pear']}, # another comment
'xyzzy', [3, 5, [3.14159, 2.71828, []]]]

I don't see what YAML accomplishes that something like the above wouldn't.

Note that all the values in the above have to be constant literals.
Don't suggest using eval. That would be a huge security hole.

Stephen Waterbury

unread,
Jan 22, 2005, 4:40:16 PM1/22/05
to pytho...@python.org
Steve Holden wrote:
> It seems to me the misunderstanding here is that XML was ever intended
> to be generated directly by typing in a text editor. It was rather
> intended (unless I'm mistaken) as a process-to-process data interchange
> metalanguage that would be *human_readable*.

The premise that XML had a coherent design intent
stetches my credulity beyond its elastic limit.

rm

unread,
Jan 22, 2005, 4:43:24 PM1/22/05
to

true, it's easy enough to separate the data from the functionality in
python by putting the data in a dictionary/list/tuple, but it stays
source code.

rm

Fredrik Lundh

unread,
Jan 22, 2005, 4:56:42 PM1/22/05
to pytho...@python.org
Stephen Waterbury wrote:

> The premise that XML had a coherent design intent
> stetches my credulity beyond its elastic limit.

the design goals are listed in section 1.1 of the specification.

see tim bray's annotated spec for additional comments by one
of the team members:

http://www.xml.com/axml/testaxml.htm

(make sure to click on all (H)'s and (U)'s in that section for the
full story).

if you think that the XML 1.0 team didn't know what they were
doing, you're seriously mistaken. it's the post-1.0 standards that
are problematic...

</F>

Alex Martelli

unread,
Jan 22, 2005, 5:00:35 PM1/22/05
to
Paul Rubin <http://phr...@NOSPAM.invalid> wrote:
...

> lists. I think it would be great if the Python library exposed an
> interface for parsing constant list and dict expressions, e.g.:
>
> [1, 2, 'Joe Smith', 8237972883334L, # comment
> {'Favorite fruits': ['apple', 'banana', 'pear']}, # another comment
> 'xyzzy', [3, 5, [3.14159, 2.71828, []]]]
>
> I don't see what YAML accomplishes that something like the above wouldn't.
>
> Note that all the values in the above have to be constant literals.
> Don't suggest using eval. That would be a huge security hole.

I do like the idea of a parser that's restricted to "safe expressions"
in this way. Once the AST branch merge is done, it seems to me that
implementing it should be a reasonably simple exercise, at least at a
"toy level".

I wonder, however, if, as an even "toyer" exercise, one might not
already do it easily -- by first checking each token (as generated by
tokenize.generate_tokens) to ensure it's safe, and THEN eval _iff_ no
unsafe tokens were found in the check. Accepting just square brackets,
braces, commas, constant strings and numbers, and comments, should be
pretty safe -- we'd no doubt want to also accept minus (for unary
minus), plus (to make complex numbers), and specifically None, True,
False -- but that, it appears to me, still leaves little margin for an
attacker to prepare an evil string that does bad things when eval'd...


Alex


Michael Spencer

unread,
Jan 22, 2005, 5:06:22 PM1/22/05
to pytho...@python.org
Paul Rubin wrote:

> YAML looks to me to be completely insane, even compared to Python
> lists. I think it would be great if the Python library exposed an
> interface for parsing constant list and dict expressions, e.g.:
>
> [1, 2, 'Joe Smith', 8237972883334L, # comment
> {'Favorite fruits': ['apple', 'banana', 'pear']}, # another comment
> 'xyzzy', [3, 5, [3.14159, 2.71828, []]]]
>
> I don't see what YAML accomplishes that something like the above wouldn't.
>
> Note that all the values in the above have to be constant literals.
> Don't suggest using eval. That would be a huge security hole.

Not hard at all, thanks to compiler.ast:

>>> import compiler
...
>>> class AbstractVisitor(object):
... def __init__(self):
... self._cache = {} # dispatch table
...
... def visit(self, node,**kw):
... cls = node.__class__
... meth = self._cache.setdefault(cls,
... getattr(self,'visit'+cls.__name__,self.default))
... return meth(node, **kw)
...
... def default(self, node, **kw):
... for child in node.getChildNodes():
... return self.visit(child, **kw)
...
>>> class ConstEval(AbstractVisitor):
... def visitConst(self, node, **kw):
... return node.value
...
... def visitName(self,node, **kw):
... raise NameError, "Names are not resolved"
...
... def visitDict(self,node,**kw):
... return dict([(self.visit(k),self.visit(v)) for k,v in node.items])
...
... def visitTuple(self,node, **kw):
... return tuple(self.visit(i) for i in node.nodes)
...
... def visitList(self,node, **kw):
... return [self.visit(i) for i in node.nodes]
...
>>> ast = compiler.parse(source,"eval")
>>> walker = ConstEval()
>>> walker.visit(ast)
[1, 2, 'Joe Smith', 8237972883334L, {'Favorite fruits': ['apple', 'banana',
'pear']}, 'xyzzy', [3, 5, [3.1415899999999999, 2.71828, []]]]

Add sugar to taste

Regards

Michael

Fredrik Lundh

unread,
Jan 22, 2005, 5:13:50 PM1/22/05
to pytho...@python.org
Alex Martelli wrote:

>> [1, 2, 'Joe Smith', 8237972883334L, # comment
>> {'Favorite fruits': ['apple', 'banana', 'pear']}, # another comment
>> 'xyzzy', [3, 5, [3.14159, 2.71828, []]]]
>>
>> I don't see what YAML accomplishes that something like the above wouldn't.
>>
>> Note that all the values in the above have to be constant literals.
>> Don't suggest using eval. That would be a huge security hole.
>
> I do like the idea of a parser that's restricted to "safe expressions"
> in this way. Once the AST branch merge is done, it seems to me that
> implementing it should be a reasonably simple exercise, at least at a
> "toy level".

for slightly more interop, you could plug in a modified tokenizer, and use
JSON:

http://www.crockford.com/JSON/xml.html

> I wonder, however, if, as an even "toyer" exercise, one might not
> already do it easily -- by first checking each token (as generated by
> tokenize.generate_tokens) to ensure it's safe, and THEN eval _iff_ no
> unsafe tokens were found in the check. Accepting just square brackets,
> braces, commas, constant strings and numbers, and comments, should be
> pretty safe -- we'd no doubt want to also accept minus (for unary
> minus), plus (to make complex numbers), and specifically None, True,
> False

or you could use a RE to make sure the string only contains safe literals,
and pass the result to eval.

> but that, it appears to me, still leaves little margin for an attacker to prepare
> an evil string that does bad things when eval'd...

besides running out of parsing time or object memory, of course. unless
you check the size before/during the parse.

</F>

Paul Rubin

unread,
Jan 22, 2005, 5:34:15 PM1/22/05
to
ale...@yahoo.com (Alex Martelli) writes:
> I wonder, however, if, as an even "toyer" exercise, one might not
> already do it easily -- by first checking each token (as generated by
> tokenize.generate_tokens) to ensure it's safe, and THEN eval _iff_ no
> unsafe tokens were found in the check.

I don't trust that for one minute. It's like checking a gun to make
sure that it has no bullets, then putting it to your head and pulling
the trigger. Or worse, it's like checking the gun once, then putting
it to your head and pulling the trigger every day for the next N years
without checking again to see if someone has inserted some bullets
(this is what you basically do if you write your program to check if
the tokens are safe, and then let users keep running it without
re-auditing it, as newer versions of Python get released).

See the history of the pickle module to see how that kind of change
has already screwed people (some comments in SF bug #467384). "Don't
use eval" doesn't mean mean "check if it's safe before using it". It
means "don't use it".

Tim Parkin

unread,
Jan 22, 2005, 6:43:46 PM1/22/05
to Doug Holton, pytho...@python.org
Doug Holton wrote:
> That is exactly why YAML can be improved. But XML proves that getting
> it "right" for developers has little to do with getting it right for
> users (or for saving bandwidth). What's right for developers is what
> requires the least amount of work. The problem is, that's what is right
> for end-users, too.

Having spent some time with YAML and it's implementations (at least
pyyaml and the ruby/python versions of syck), I thought I should
comment. The only problems with syck we've encountered have been to do
with the python wrapper rather than syck itself. Syck seems to be used
widely without problems within the Ruby community and if anybody has
evidence of issues with it I'd really like to know about them. PyYAML is
a little inactive and doesn't conform to the spec in many ways and, as
such, we prefer the syck implementation.

In my opinion there have been some bad decisions made whilst creating
YAML, but for me they are acceptable given the advantages of a data
format that is simple to read and write. Perhaps judging the utility of
a project on it's documentation is one of the problems, as most people
who have 'just used it' seem to be happy enough. These people include
non-technical clients of ours who manage some of their websites by
editing YAML files directly. That said, I don't think it would be the
best way to enter data for a life support machine, but I wouldn't like
to do that with XML either ;-)

One thing that should be pointed out is that there are no parsers
available that are built directly on the YAML pseudo BNF. Such work is
in progress in two different forms but don't expect anything soon. As I
understand it, Syck has been built to pass tests rather than conform to
a constantly changing BNF and it seems to have few warts.

Tim


Stephen Waterbury

unread,
Jan 22, 2005, 7:14:46 PM1/22/05
to pytho...@python.org
Fredrik Lundh wrote:
> Stephen Waterbury wrote:
>>The premise that XML had a coherent design intent
>>stetches my credulity beyond its elastic limit.
>
> the design goals are listed in section 1.1 of the specification.
>
> see tim bray's annotated spec for additional comments by one
> of the team members:
>
> http://www.xml.com/axml/testaxml.htm
>
> (make sure to click on all (H)'s and (U)'s in that section for the
> full story).

Thanks, Fredrik, I hadn't seen that. My credulity has been restored
to its original shape. Whatever that was. :)

However, now that I have direct access to the documented design
goals (intent) of XML, it's interesting to note that the intent
Steve Holden imputed to it earlier is not explicitly among them:

Steve Holden wrote:
> It seems to me the misunderstanding here is that XML was ever intended
> to be generated directly by typing in a text editor. It was rather
> intended (unless I'm mistaken) as a process-to-process data interchange
> metalanguage that would be *human_readable*.

Not unless you interpret "XML shall support a wide variety of applications"
as "XML shall provide a process-to-process data interchange metalanguage".
It might have been a hidden agenda, but it certainly was not an
explicit design goal.

(The "human-readable" part is definitely there:
"6. XML documents should be human-legible and reasonably clear",
and Steve was also correct that generating XML directly by typing
in a text editor was definitely *not* a design intent. ;)

> if you think that the XML 1.0 team didn't know what they were
> doing, you're seriously mistaken. it's the post-1.0 standards that
> are problematic...

Agreed. And many XML-based standards.

- Steve

Peter Hansen

unread,
Jan 22, 2005, 8:18:56 PM1/22/05
to
Stephen Waterbury wrote:
> it's interesting to note that the intent
> Steve Holden imputed to it earlier is not explicitly among them:
>
> Steve Holden wrote:
>
>> It seems to me the misunderstanding here is that XML was ever intended
>> to be generated directly by typing in a text editor. It was rather
>> intended (unless I'm mistaken) as a process-to-process data
>> interchange metalanguage that would be *human_readable*.
>
> Not unless you interpret "XML shall support a wide variety of applications"
> as "XML shall provide a process-to-process data interchange metalanguage".
> It might have been a hidden agenda, but it certainly was not an
> explicit design goal.

If merely thinking about the purpose of XML doesn't make it
clear where Steve got that idea, read up a little bit more in
the spec to the very first paragraph in the Introduction, and
click on the little M-in-a-circle next to the phrase "data objects".
I'll even quote it here for you, to save time:

"""What Do You Mean By "Data Object?"

Good question. The point is that an XML document is sometimes
a file, sometimes a record in a relational database, sometimes an
object delivered by an Object Request Broker, and sometimes a
stream of bytes arriving at a network socket.

These can all be described as "data objects".
"""

I would ask what part of that, or of the simple phrase
"data object", or even of the basic concept of a markup language,
doesn't cry out "data interchange metalanguage" to you?

-Peter

Stephen Waterbury

unread,
Jan 22, 2005, 9:37:41 PM1/22/05
to pytho...@python.org
Peter Hansen wrote:
> If merely thinking about the purpose of XML doesn't make it
> clear where Steve got that idea ...

I meant no disparagement of Steve, and it is quite clear
where he got that (correct!) idea ...

It's also clear that the XML user community sees
that as part of *their* purpose in applying XML.
But here we are talking about intent of its designers,
and "merely thinking about the purpose of XML" won't
enable me to read their minds. ;)

> read up a little bit more in

> the spec [... in which it is stated rather explicitly!]


>
> I would ask what part of that, or of the simple phrase
> "data object", or even of the basic concept of a markup language,
> doesn't cry out "data interchange metalanguage" to you?

It does indeed -- my apologies for not reading the annotations
more carefully! I missed that one in particular. Okay, you've
dragged me, kicking and screaming, to agree that the actual,
published design intent of XML is to provide a "data
interchange metalanguage".

Thanks to Fredrik for the link he included (elsewhere
in the "YAML" thread) to JavaScript Object Notation (JSON).
JSON looks like a notable improvement over XML for data
objects that are more fine-grained (higher ratio of markup to
non-markup -- e.g., most relational data sets, RDF, etc.)
than those at the more traditional "document" end of the
spectrum (less markup, more text).

The latter types of data objects are the ones I happen to believe
are in the sweet spot of XML's design, regardless of its designers'
more sweeping pronouncements (and hopes, no doubt).

I should note that I have to deal with XML a lot, but always
kicking and screaming (though much less now because of Fredrik's
Elementtree package ;). Thanks, Fredrik and Peter, for the
references. ;)

Peace.
Steve

Leif K-Brooks

unread,
Jan 22, 2005, 11:00:06 PM1/22/05
to
Bengt Richter wrote:
> I thought XML was a good idea, but IMO requiring quotes around
> even integer attribute values was an unfortunate decision.

I think it helps guard against incompetent authors who wouldn't
understand when they're required to use quotes and when they're not. I
see HTML pages all of the time where the author's done something like:

<img src=http://example.com/foo/bar/baz/spam/>

Sometimes it even has spaces in it. At least with a proper XML parser,
they would know where they went wrong right away.

Steve Holden

unread,
Jan 22, 2005, 11:04:28 PM1/22/05
to
Doug Holton wrote:

Yet again I will interject that XML was only ever intended to be wriiten
by programs. Hence its moronic stupidity and excellent uniformity.

Paul Rubin

unread,
Jan 22, 2005, 11:08:56 PM1/22/05
to
Stephen Waterbury <go...@comcast.net> writes:
> I should note that I have to deal with XML a lot, but always
> kicking and screaming (though much less now because of Fredrik's
> Elementtree package ;). Thanks, Fredrik and Peter, for the
> references. ;)

I love this old rant about XML:

http://groups-beta.google.com/group/comp.lang.lisp/msg/9a30c508201627ee

Doug Holton

unread,
Jan 23, 2005, 2:27:36 AM1/23/05
to
Peter Hansen wrote:
> Good question. The point is that an XML document is sometimes
> a file, sometimes a record in a relational database, sometimes an
> object delivered by an Object Request Broker, and sometimes a
> stream of bytes arriving at a network socket.
>
> These can all be described as "data objects".
> """
>
> I would ask what part of that, or of the simple phrase
> "data object", or even of the basic concept of a markup language,
> doesn't cry out "data interchange metalanguage" to you?

Actually I don't see any explicit mention that XML was meant to be
limited to data interchange only.
"data object" has to do with more than data interchange. There is data
entry as well. And people are having to hand enter XML files all the
time for things like Ant, XHTML, etc.

I guess all those people who learned how to write web pages by hand were
violating some spec and so they have no cause to complain about any
difficulties doing so. Tim Berners-Lee never intended people to have to
type in URLs, either, but here we are.

Doug Holton

unread,
Jan 23, 2005, 2:27:38 AM1/23/05
to
Steve Holden wrote:

> Yet again I will interject that XML was only ever intended to be wriiten
> by programs. Hence its moronic stupidity and excellent uniformity.

Neither was HTML, neither were URLs, neither were many things used the
way they were intended. YAML, however, is specifically designed to be
easier for people to write and to read, as is Python.

Doug Holton

unread,
Jan 23, 2005, 2:33:35 AM1/23/05
to
Daniel Bickett wrote:
> In my (brief) experience with YAML, it seemed like there were several
> different ways of doing things, and I saw this as one of it's failures
> (since we're all comparing it to XML). However I maintain, in spite of
> all of that, that it can easily boil down to the fact that, for
> someone who knows the most minuscule amount of HTML (a very easy thing
> to do, not to mention most people have a tiny bit of experience to
> boot), the transition to XML is painless. YAML, however, is a brand
> new format with brand new semantics.

That's true and a very good point. Like you said, that's probably the
reason XML took off, because of our familiarity with HTML.

> As for the human read-and-write-ability, I don't know about you, but I
> have no trouble whatsoever reading and writing XML.

You might like programming in XML then: http://www.meta-language.net/
:)

Doug Holton

unread,
Jan 23, 2005, 2:38:32 AM1/23/05
to
> You might like programming in XML then: http://www.meta-language.net/

Actually, the samples are hard to find, they are here:
http://www.meta-language.net/sample.html

Programming in XML makes Perl and PHP look like the cleanest languages
ever invented.

Stephen Waterbury

unread,
Jan 23, 2005, 3:52:14 AM1/23/05
to pytho...@python.org

Yep, Erik Naggum is one of my heroes for that! :)

Alan Kennedy

unread,
Jan 23, 2005, 7:23:27 AM1/23/05
to
[Effbot]
> ReST and YAML share the same deep flaw: both formats are marketed
> as simple, readable formats, and at a first glance, they look simple and read-
> able -- but in reality, they're messy as hell, and chances are that the thing
> you're looking at doesn't really mean what you think it means (unless you're
> the official ReST/YAML parser implementation). experienced designers
> know how to avoid that; the ReST/YAML designers don't even understand
> why they should.

I'm looking for a good textual markup language at the moment, for
capturing web and similar textual content.

I don't want to use XML for this particular usage, because this content
will be entered through a web interface, and I don't want to force users
through multiple rounds of
submit/check-syntax/generate-error-report/re-submit in order to enter
their content.

I have no strong feelings about YAML: If I want to structured data, e.g.
lists, dictionaries, etc, I just use python.

However, I'm torn on whether to use ReST for textual content. On the one
hand, it's looks pretty comprehensive and solidly implemented. But OTOH,
I'm concerned about complexity: I don't want to commit to ReST if it's
going to become a lot of hard work or highly-inefficient when I really
need to use it "in anger".

From what I've seen, pretty much every textual markup targetted for web
content, e.g. wiki markup, seems to have grown/evolved organically,
meaning that it is either underpowered or overpowered, full of special
cases, doesn't have a meaningful object model, etc.

So, I'm hoping that the learned folks here might be able to give me some
pointers to a markup language that has the following characteristics

1. Is straightforward for non-technical users to use, i.e. can be
(mostly) explained in a two to three page document which is
comprehensible to anyone who has ever used a simple word-processor or
text-editor.

2. Allows a wide variety of content semantics to be represented, e.g.
headings, footnotes, sub/superscript, links, etc, etc.

3. Has a complete (but preferably lightweight) object model into which
documents can be loaded, for transformation to other languages.

4. Is speed and memory efficient.

5. Obviously, has a solid python implementation.

Most useful would be a pointer to a good comparison/review page which
compares multiple markup languages, in terms of the above requirements.

If I can't find such a markup language, then I might instead end up
using a WYSIWYG editing component that gives the user a GUI and
generates (x)html.

htmlArea: http://www.htmlarea.com/
Editlet: http://www.editlet.com/

But I'd prefer a markup solution.

TIA for any pointers.

regards,

--
alan kennedy
------------------------------------------------------
email alan: http://xhaus.com/contact/alan

Paul Rubin

unread,
Jan 23, 2005, 7:34:39 AM1/23/05
to
Alan Kennedy <ala...@hotmail.com> writes:
> However, I'm torn on whether to use ReST for textual content. On the
> one hand, it's looks pretty comprehensive and solidly implemented.

It seemed both unnecessary and horrendously overcomplicated when I
looked at it. I'd stay away.

> So, I'm hoping that the learned folks here might be able to give me
> some pointers to a markup language that has the following
> characteristics

I'm a bit biased but I've been using Texinfo for a long time and have
been happy with it. It's reasonably lightweight to implement, fairly
intuitive to use, and doesn't get in the way too much when you're
writing. There are several implementations, none in Python at the
moment but that would be simple enough. It does all the content
semantics you're asking (footnotes etc). It doesn't have an explicit
object model, but is straightforward to convert into a number of
formats including high-quality printed docs (TeX); the original Info
hypertext browser that predates the web; and these days HTML.

> If I can't find such a markup language, then I might instead end up
> using a WYSIWYG editing component that gives the user a GUI and

> generates (x)html.... But I'd prefer a markup solution.

Yes, for heavy-duty users, markup is far superior to yet another
editor. Everyone has their favorite editor and doesn't want to have
to switch to another one, hence the Emacs vs. Vi wars etc.

Fredrik Lundh

unread,
Jan 23, 2005, 7:41:47 AM1/23/05
to pytho...@python.org
Alan Kennedy wrote:

> From what I've seen, pretty much every textual markup targetted for web content, e.g. wiki markup,
> seems to have grown/evolved organically, meaning that it is either underpowered or overpowered,
> full of special cases, doesn't have a meaningful object model, etc.

I spent the eighties designing one textual markup language after another,
for a wide variety of projects (mainly for technical writing). I've since come
to the conclusion that they all suck (for exactly the reasons you mention above,
plus the usual "the implementation is the only complete spec we have" issue).

these days, I usually use HTML+custom classes for authoring (and run them
through a HTML->XHTML converter for processing).

the only markup language I've seen lately that isn't a complete mess is John
Gruber's markdown:

http://daringfireball.net/projects/markdown/

which has an underlying object model (HTML/XHTML) and doesn't have too
many warts. not sure if anyone has done a Python implementation yet, though
(for html->markdown, see http://www.aaronsw.com/2002/html2text/ ), and I
don't think it supports footnotes (HTML doesn't).

> If I can't find such a markup language, then I might instead end up using a WYSIWYG editing
> component that gives the user a GUI and generates (x)html.
>
> htmlArea: http://www.htmlarea.com/
> Editlet: http://www.editlet.com/
>
> But I'd prefer a markup solution.

some of these are amazingly usable. have you asked your users what they
prefer? (or maybe you are your user? ;-)

</F>

Daniel Bickett

unread,
Jan 23, 2005, 7:56:45 AM1/23/05
to pytho...@python.org
Doug Holton wrote:
> You might like programming in XML then: http://www.meta-language.net/
> :)

http://www.meta-language.net/sample.html#class-metal

I'm not so sure ;-)

Daniel Bickett

rm

unread,
Jan 23, 2005, 11:46:24 AM1/23/05
to
rm wrote:
> Paul Rubin wrote:
>
>> Reinhold Birkenfeld <reinhold-birk...@wolke7.net> writes:
>>
>>> For those of you who don't know what YAML is: visit http://yaml.org/!
>>> You will be amazed, and never think of XML again. Well, almost.
>>
>>
>>
>> Oh please no, not another one of these. We really really don't need it.
>
>
> well, I did look at it, and as a text format is more readable than XML
> is. Furthermore, XML's verbosity is incredible. This format is not.
> People are abusing the genericity of XML to put everything into it.
>
> Parsing and working with XML are highly optimized, so there's not really
> a problem in that sector. But to transfer the same data in a YAML
> format, rather than a XML format is much more economic. But networks are
> getting faster, right?
>
> Nowadays, people are trying to create binary XML, XML databases,
> graphics in XML (btw, I'm quite impressed by SVG), you have XSLT, you
> have XSL-FO, ... .
>
> And I think, YAML is a nice initiative.
>
> bye,
> rm

http://www.theinquirer.net/?article=20868 :-)

rm

Alan Kennedy

unread,
Jan 23, 2005, 2:06:11 PM1/23/05
to
[Alan Kennedy]

>> From what I've seen, pretty much every textual markup targetted
>> for web content, e.g. wiki markup, seems to have grown/evolved
>> organically, meaning that it is either underpowered or overpowered,
>> full of special cases, doesn't have a meaningful object model, etc.

[Fredrik Lundh]


> I spent the eighties designing one textual markup language after
> another, for a wide variety of projects (mainly for technical
> writing). I've since come to the conclusion that they all suck
> (for exactly the reasons you mention above, plus the usual
> "the implementation is the only complete spec we have" issue).

Thanks Fredrik, I thought you might have a fair amount of experience in
this area :-)

[Fredrik Lundh]


> the only markup language I've seen lately that isn't a complete mess
> is John Gruber's markdown:
>
> http://daringfireball.net/projects/markdown/
>
> which has an underlying object model (HTML/XHTML) and doesn't have
> too many warts. not sure if anyone has done a Python implementation
> yet, though (for html->markdown, see
> http://www.aaronsw.com/2002/html2text/ ), and I don't think it
> supports footnotes (HTML doesn't).

Thanks for the pointer. I took a look at Markdown, and it does look
nice. But I don't like the dual syntax, e.g. switching into HTML for
tables, etc: I'm concerned that the syntax switch might be too much for
non-techies.

[Alan Kennedy]


>> If I can't find such a markup language, then I might instead end up
>> using a WYSIWYG editing component that gives the user a GUI and
>> generates (x)html.
>>
>> htmlArea: http://www.htmlarea.com/
>> Editlet: http://www.editlet.com/
>>
>> But I'd prefer a markup solution.

[Fredrik Lundh]


> some of these are amazingly usable. have you asked your users what
> they prefer? (or maybe you are your user? ;-)

Actually, I'm looking for a solution for both myself and for end-users
(who will take what they're given ;-).

For myself, I think I'll end up picking Markdown, ReST, or something
comparable from the wiki-wiki-world.

For the end-users, I'm starting to think that GUI is the only way to go.
The last time I looked at this area, a few years ago, the components
were fairly immature and pretty buggy. But the number of such components
and their quality seems to have greatly increased in recent times.

Particularly, many of them seem to address an important requirement that
I neglected to mention in my original list: unicode support. I'll be
processing all kinds of funny characters, e.g. math/scientific symbols,
european, asian and middle-eastern names, etc.

thanks-and-regards-ly-y'rs,

Alan Kennedy

unread,
Jan 23, 2005, 2:11:30 PM1/23/05
to
[Alan Kennedy]

>>So, I'm hoping that the learned folks here might be able to give me
>>some pointers to a markup language that has the following
>>characteristics

[Paul Rubin]


> I'm a bit biased but I've been using Texinfo for a long time and have
> been happy with it. It's reasonably lightweight to implement, fairly
> intuitive to use, and doesn't get in the way too much when you're
> writing. There are several implementations, none in Python at the
> moment but that would be simple enough. It does all the content
> semantics you're asking (footnotes etc). It doesn't have an explicit
> object model, but is straightforward to convert into a number of
> formats including high-quality printed docs (TeX); the original Info
> hypertext browser that predates the web; and these days HTML.

Thanks Paul,

I took a look at texinfo, and it looks powerful and good ....... for
programmers.

Looks like a very steep learning curve for non-programmers though. It
seems to require just a few hundred kilobytes too much documentation ......

Istvan Albert

unread,
Jan 24, 2005, 9:27:01 AM1/24/05
to
Paul Rubin wrote:

This is my favorite:

http://weblog.burningbird.net/archives/2002/10/08/the-parable-of-the-languages

"I’m considered the savior, the ultimate solution, the final word.
Odes are written to me, flowers strewn at my feet, virgins sacrificed at
my altar. Programmers speak my name with awe. Companies insist on using
me in all their projects, though they’re not sure why. And whenever a
problem occurs, someone somewhere says, “Let’s use XML", and miracles
occur and my very name has become a talisman against evil. And yet, all
I am is a simple little markup, from humble origins.
It’s a burden, being XML."

Istvan Albert

unread,
Jan 24, 2005, 9:49:43 AM1/24/05
to
rm wrote:

> http://www.theinquirer.net/?article=20868 :-)

There's a lot of nonsense out there propagated by people who do not
understand XML. You can't possibly blame that on XML...

For me XSLT transformations are the main reason for using XML.
If I have an XML document I can turn it into other
formats with a few lines of code. Most importantly these
are much safer to run than a program.

I think of an XML document as a "mini-database" where one
can easily and efficiently access content via XPath. So there
is a lot more to XML than just markup and that's
why YAML vs XML comparisons make very little sense.

Istvan.


Sion Arrowsmith

unread,
Jan 24, 2005, 10:03:16 AM1/24/05
to
Paul Rubin <http://phr...@NOSPAM.invalid> wrote:
>YAML looks to me to be completely insane, even compared to Python
>lists. I think it would be great if the Python library exposed an
>interface for parsing constant list and dict expressions, e.g.:
> [1, 2, 'Joe Smith', 8237972883334L, # comment
> {'Favorite fruits': ['apple', 'banana', 'pear']}, # another comment
> 'xyzzy', [3, 5, [3.14159, 2.71828, []]]]
> [ ... ]

>Note that all the values in the above have to be constant literals.
>Don't suggest using eval. That would be a huge security hole.

I'm probably not thinking deviously enough here, but how are you
going to exploit an eval() which has very tightly controlled
globals and locals (eg. eval(x, {"__builtins__": None}, {}) ?

--
\S -- si...@chiark.greenend.org.uk -- http://www.chaos.org.uk/~sion/
___ | "Frankly I have no feelings towards penguins one way or the other"
\X/ | -- Arthur C. Clarke
her nu becomeþ se bera eadward ofdun hlæddre heafdes bæce bump bump bump

Doug Holton

unread,
Jan 24, 2005, 10:08:41 AM1/24/05
to
rm wrote:

> Doug Holton wrote:
>
>> rm wrote:
>>
>>> this implementation of their idea. But I'd love to see a generic,
>>> pythonic data format.
>>
>>
>>
>> That's a good idea. But really Python is already close to that. A
>> lot of times it is easier to just write out a python dictionary than
>> using a DB or XML or whatever. Python is already close to YAML in
>> some ways.
>
> true, it's easy enough to separate the data from the functionality in
> python by putting the data in a dictionary/list/tuple, but it stays
> source code.

Check out JSON, an alternative to XML for data interchange. It is
basically just python dictionaries and lists:
http://www.crockford.com/JSON/example.html

I think I would like this better than YAML or XML, and it looks like it
already parses as valid Python code, except for the /* */ multiline
comments (which boo supports).

It was mentioned in a story about JSON-RPC-Java:
http://developers.slashdot.org/article.pl?sid=05/01/24/125236

Fredrik Lundh

unread,
Jan 24, 2005, 10:11:43 AM1/24/05
to pytho...@python.org
Sion Arrowsmith wrote:

> I'm probably not thinking deviously enough here, but how are you
> going to exploit an eval() which has very tightly controlled
> globals and locals (eg. eval(x, {"__builtins__": None}, {}) ?

try this:

eval("'*'*1000000*2*2*2*2*2*2*2*2*2")

(for more on eval and builtins, see the "Evaluating Python expressions"
section here: http://effbot.org/librarybook/builtin.htm )

</F>

Peter Hansen

unread,
Jan 24, 2005, 10:58:16 AM1/24/05
to
Sion Arrowsmith wrote:
> Paul Rubin <http://phr...@NOSPAM.invalid> wrote:
>
>>YAML looks to me to be completely insane, even compared to Python
>>lists. I think it would be great if the Python library exposed an
>>interface for parsing constant list and dict expressions, e.g.:
>> [1, 2, 'Joe Smith', 8237972883334L, # comment
>> {'Favorite fruits': ['apple', 'banana', 'pear']}, # another comment
>> 'xyzzy', [3, 5, [3.14159, 2.71828, []]]]
>>[ ... ]
>>Note that all the values in the above have to be constant literals.
>>Don't suggest using eval. That would be a huge security hole.
>
>
> I'm probably not thinking deviously enough here, but how are you
> going to exploit an eval() which has very tightly controlled
> globals and locals (eg. eval(x, {"__builtins__": None}, {}) ?

See, for example, Alex Martelli's post in an old thread from 2001:
http://groups.google.ca/groups?selm=9db3oi01aph%40news2.newsguy.com

-Peter

Michael Spencer

unread,
Jan 24, 2005, 1:47:32 PM1/24/05
to pytho...@python.org
Fredrik Lundh wrote:

> Sion Arrowsmith wrote:
>>I'm probably not thinking deviously enough here, but how are you
>>going to exploit an eval() which has very tightly controlled
>>globals and locals (eg. eval(x, {"__builtins__": None}, {}) ?
>
> try this:
>
> eval("'*'*1000000*2*2*2*2*2*2*2*2*2")
>

I updated the safe eval recipe I posted yesterday to add the option of reporting
unsafe source, rather than silently ignoring it. Is this completely safe? I'm
interested in feedback.

Michael

Some source to try:

>>> goodsource = """[1, 2, 'Joe Smith', 8237972883334L, # comment
... {'Favorite fruits': ['apple', 'banana', 'pear']}, # another comment
... 'xyzzy', [3, 5, [3.14159, 2.71828, []]]]"""
...

Unquoted string literal
>>> badsource = """[1, 2, JoeSmith, 8237972883334L, # comment
... {'Favorite fruits': ['apple', 'banana', 'pear']}, # another comment
... 'xyzzy', [3, 5, [3.14159, 2.71828, []]]]"""
...
Non-constant expression
>>> effbot = "'*'*1000000*2*2*2*2*2*2*2*2*2"

>>> safe_eval(good_source)
[1, 2, 'Joe Smith', 8237972883334L, {'Favorite fruits': ['apple', 'banana',
'pear']}, 'xyzzy', [3, 5, [3.1415899999999999, 2.71828, []]]]
>>> assert _ == eval(good_source)

>>> safe_eval(bad_source)
Traceback (most recent call last):
[...]
Unsafe_Source_Error: Line 1. Strings must be quoted: JoeSmith

>>> safe_eval(bad_source, fail_on_error = False)
[1, 2, None, 8237972883334L, {'Favorite fruits': ['apple', 'banana', 'pear']},
'xyzzy', [3, 5, [3.1415899999999999, 2.71828, []]]]

>>> safe_eval(effbot)
Traceback (most recent call last):
[...]
Unsafe_Source_Error: Line 1. Unsupported source construct: compiler.ast.Mul

>>> safe_eval(effbot, fail_on_error = False)
...
'*'
>>>

Source:

import compiler

class Unsafe_Source_Error(Exception):
def __init__(self,error,descr = None,node = None):
self.error = error
self.descr = descr
self.node = node
self.lineno = getattr(node,"lineno",None)

def __repr__(self):
return "Line %d. %s: %s" % (self.lineno, self.error, self.descr)
__str__ = __repr__

class AbstractVisitor(object):
def __init__(self):
self._cache = {} # dispatch table

def visit(self, node,**kw):
cls = node.__class__
meth = self._cache.setdefault(cls,
getattr(self,'visit'+cls.__name__,self.default))
return meth(node, **kw)

def default(self, node, **kw):
for child in node.getChildNodes():
return self.visit(child, **kw)
visitExpression = default

class SafeEval(AbstractVisitor):

def visitConst(self, node, **kw):
return node.value

def visitDict(self,node,**kw):
return dict([(self.visit(k),self.visit(v)) for k,v in node.items])

def visitTuple(self,node, **kw):
return tuple(self.visit(i) for i in node.nodes)

def visitList(self,node, **kw):
return [self.visit(i) for i in node.nodes]

class SafeEvalWithErrors(SafeEval):

def default(self, node, **kw):
raise Unsafe_Source_Error("Unsupported source construct",
node.__class__,node)

def visitName(self,node, **kw):
raise Unsafe_Source_Error("Strings must be quoted",
node.name, node)

# Add more specific errors if desired


def safe_eval(source, fail_on_error = True):
walker = fail_on_error and SafeEvalWithErrors() or SafeEval()
try:
ast = compiler.parse(source,"eval")
except SyntaxError, err:
raise
try:
return walker.visit(ast)
except Unsafe_Source_Error, err:
raise

Sion Arrowsmith

unread,
Jan 25, 2005, 7:19:34 AM1/25/05
to
Fredrik Lundh <fre...@pythonware.com> wrote:
>Sion Arrowsmith wrote:
>> I'm probably not thinking deviously enough here, but how are you
>> going to exploit an eval() which has very tightly controlled
>> globals and locals (eg. eval(x, {"__builtins__": None}, {}) ?
>try this:
>
> eval("'*'*1000000*2*2*2*2*2*2*2*2*2")

No thanks.

I guess my problem is a tendency view security issues from the
point of view of access to data rather than access to processing.

Aahz

unread,
Jan 25, 2005, 11:23:44 PM1/25/05
to
In article <eEMId.46151$Z14....@news.indigo.ie>,

Alan Kennedy <ala...@hotmail.com> wrote:
>
>However, I'm torn on whether to use ReST for textual content. On the one
>hand, it's looks pretty comprehensive and solidly implemented. But OTOH,
>I'm concerned about complexity: I don't want to commit to ReST if it's
>going to become a lot of hard work or highly-inefficient when I really
>need to use it "in anger".
>
> From what I've seen, pretty much every textual markup targetted for web
>content, e.g. wiki markup, seems to have grown/evolved organically,
>meaning that it is either underpowered or overpowered, full of special
>cases, doesn't have a meaningful object model, etc.

My perception is that reST is a lot like Python itself: it's easy to hit
the ground running, particularly if you restrict yourself to a specific
subset of featuers. It does give you a fair amount of power, and some
things are difficult or impossible.

Note that reST was/is *not* specifically aimed at web content. Several
people have used it for writing books; some people are using it instead
of PowerPoint.

>So, I'm hoping that the learned folks here might be able to give me some
>pointers to a markup language that has the following characteristics
>
>1. Is straightforward for non-technical users to use, i.e. can be
>(mostly) explained in a two to three page document which is
>comprehensible to anyone who has ever used a simple word-processor or
>text-editor.
>
>2. Allows a wide variety of content semantics to be represented, e.g.
>headings, footnotes, sub/superscript, links, etc, etc.

These two criteria seem to be in opposition. I certainly wouldn't
expect a three-page document to explain all these features, not for
non-technical users. reST fits both these criteria, but only for a
selected subset of featuers.
--
Aahz (aa...@pythoncraft.com) <*> http://www.pythoncraft.com/

"19. A language that doesn't affect the way you think about programming,
is not worth knowing." --Alan Perlis

Alan Kennedy

unread,
Jan 28, 2005, 5:07:59 PM1/28/05
to
[Alan Kennedy]

>>However, I'm torn on whether to use ReST for textual content. On the one
>>hand, it's looks pretty comprehensive and solidly implemented. But OTOH,
>>I'm concerned about complexity: I don't want to commit to ReST if it's
>>going to become a lot of hard work or highly-inefficient when I really
>>need to use it "in anger".
>>
>>From what I've seen, pretty much every textual markup targetted for web
>>content, e.g. wiki markup, seems to have grown/evolved organically,
>>meaning that it is either underpowered or overpowered, full of special
>>cases, doesn't have a meaningful object model, etc.

[Aahz]


> My perception is that reST is a lot like Python itself: it's easy to hit
> the ground running, particularly if you restrict yourself to a specific
> subset of featuers. It does give you a fair amount of power, and some
> things are difficult or impossible.
>
> Note that reST was/is *not* specifically aimed at web content. Several
> people have used it for writing books; some people are using it instead
> of PowerPoint.

Thanks, Aahz, that's a key point that I'll continue on below.

[Alan Kennedy]


>>So, I'm hoping that the learned folks here might be able to give me some
>>pointers to a markup language that has the following characteristics
>>
>>1. Is straightforward for non-technical users to use, i.e. can be
>>(mostly) explained in a two to three page document which is
>>comprehensible to anyone who has ever used a simple word-processor or
>>text-editor.
>>
>>2. Allows a wide variety of content semantics to be represented, e.g.
>>headings, footnotes, sub/superscript, links, etc, etc.

[Aahz]


> These two criteria seem to be in opposition. I certainly wouldn't
> expect a three-page document to explain all these features, not for
> non-technical users. reST fits both these criteria, but only for a
> selected subset of featuers.

The point is well made.

When I wrote my requirements, I did have a specific limited feature set
in mind: basically a print-oriented set of features with which anyone
who reads books would be familiar. I'm trying to capture scientific
abstracts, of the sort that you can see linked off this page.

http://www.paratuberculosis.org/proc7/

But I'm basically only interested in representation of the original
input text. I'll be capturing a lot of metadata as well, but most of
that will be captured outside the markup language, through a series of
form inputs which ask specific metadata questions. So, for example, the
relationships between authors and institutions, seen on the next page,
will not be recorded in the markup.

http://www.paratuberculosis.org/proc7/abst5_p2.htm

I think that is where a lot of markup languages fall down, in that they
end trying to develop a sophisticated metadata model that can capture
that kind of information, and re-engineering the markup to support it.
This co-evolution of the markup and model can go horribly awry, if the
designers are inexperienced or don't know where they're headed.

Since ReST seems to do this stuff fairly well, I think I'll take a
closer look at it. From what I've seen of it, e.g. PEPs, python module
documentation (SQLObject, etc), it seems to be reasonably unobtrusive to
the author.

Aahz

unread,
Jan 29, 2005, 6:43:26 AM1/29/05