XML Considered Harmful

Michael F. Stemper

unread,

Sep 21, 2021, 2:12:33 PM9/21/21

to

On the prolog thread, somebody posted a link to:
<https://dirtsimple.org/2004/12/python-is-not-java.html>

One thing that it tangentially says is "XML is not the answer."

I read this page right when I was about to write an XML parser
to get data into the code for a research project I'm working on.
It seems to me that XML is the right approach for this sort of
thing, especially since the data is hierarchical in nature.

Does the advice on that page mean that I should find some other
way to get data into my programs, or does it refer to some kind
of misuse/abuse of XML for something that it wasn't designed
for?

If XML is not the way to package data, what is the recommended
approach?
--
Michael F. Stemper
Life's too important to take seriously.

Jon Ribbens

unread,

Sep 21, 2021, 2:42:43 PM9/21/21

to

I'd agree that you should not use XML unless the data is being supplied
already in XML format or perhaps if there is already a schema defined in
XML for exactly your purpose.

If there is nothing pre-existing to build upon then I'd suggest JSON.

If anyone suggests YAML, then you should just back slowly away while
speaking in a low calm voice until you have reached sufficient safe
distance, then turn and run.

alister

unread,

Sep 21, 2021, 2:49:42 PM9/21/21

to

1'st can I say don't write your own XML parser, there are already a
number of existing parsers that should do everything you will need. This
is a wheel that does not need re-inventing.

2nd if you are not generating the data then you have to use whatever data
format you are supplied

as far as I can see the main issue with XML is bloat, it tries to do too
many things & is a very verbose format, often the quantity of mark-up can
easily exceed the data contained within it.

other formats such a JSON & csv have far less overhead, although again
not always suitable.

As in all such cases it is a matter of choosing the most apropriate tool
for the job in hand.

--
Antonym, n.:
The opposite of the word you're trying to think of.

Michael F. Stemper

unread,

Sep 21, 2021, 3:23:15 PM9/21/21

to

On 21/09/2021 13.49, alister wrote:
> On Tue, 21 Sep 2021 13:12:10 -0500, Michael F. Stemper wrote:
>
>> On the prolog thread, somebody posted a link to:
>> <https://dirtsimple.org/2004/12/python-is-not-java.html>
>>
>> One thing that it tangentially says is "XML is not the answer."
>>
>> I read this page right when I was about to write an XML parser to get
>> data into the code for a research project I'm working on.
>> It seems to me that XML is the right approach for this sort of thing,
>> especially since the data is hierarchical in nature.
>>
>> Does the advice on that page mean that I should find some other way to
>> get data into my programs, or does it refer to some kind of misuse/abuse
>> of XML for something that it wasn't designed for?
>>
>> If XML is not the way to package data, what is the recommended approach?
>
> 1'st can I say don't write your own XML parser, there are already a
> number of existing parsers that should do everything you will need. This
> is a wheel that does not need re-inventing.

I was going to build it on top of xml.etree.ElementTree

> 2nd if you are not generating the data then you have to use whatever data
> format you are supplied

It's my own research, so I can give myself the data in any format that I
like.

> as far as I can see the main issue with XML is bloat, it tries to do too
> many things & is a very verbose format, often the quantity of mark-up can
> easily exceed the data contained within it.
>
> other formats such a JSON & csv have far less overhead, although again
> not always suitable.

I've heard of JSON, but never done anything with it.

How does CSV handle hierarchical data? For instance, I have
generators[1], each of which has a name, a fuel and one or more
incremental heat rate curves. Each fuel has a name, UOM, heat content,
and price. Each incremental cost curve has a name, and a series of
ordered pairs (representing a piecewise linear curve).

Can CSV files model this sort of situation?

> As in all such cases it is a matter of choosing the most apropriate tool
> for the job in hand.

Naturally. That's what I'm exploring.

[1] The kind made of tons of iron and copper, filled with oil, and
rotating at 1800 rpm.

--
Michael F. Stemper
This sentence no verb.

Pete Forman

unread,

Sep 21, 2021, 5:21:54 PM9/21/21

to

"Michael F. Stemper" <michael...@gmail.com> writes:

> On 21/09/2021 13.49, alister wrote:
>> On Tue, 21 Sep 2021 13:12:10 -0500, Michael F. Stemper wrote:
> It's my own research, so I can give myself the data in any format that I
> like.
>
>> as far as I can see the main issue with XML is bloat, it tries to do
>> too many things & is a very verbose format, often the quantity of
>> mark-up can easily exceed the data contained within it. other formats
>> such a JSON & csv have far less overhead, although again not always
>> suitable.
>
> I've heard of JSON, but never done anything with it.

Then you should certainly try to get a basic understanding of it. One
thing JSON shares with XML is that it is best left to machines to
produce and consume. Because both can be viewed in a text editor there
is a common misconception that they are easy to edit. Not so, commas are
a common bugbear in JSON and non-trivial edits in (XML unaware) text
editors are tricky.

Consider what overhead you should worry about. If you are concerned
about file sizes then XML, JSON and CSV should all compress to a similar
size.

> How does CSV handle hierarchical data? For instance, I have
> generators[1], each of which has a name, a fuel and one or more
> incremental heat rate curves. Each fuel has a name, UOM, heat content,
> and price. Each incremental cost curve has a name, and a series of
> ordered pairs (representing a piecewise linear curve).
>
> Can CSV files model this sort of situation?

The short answer is no. CSV files represent spreadsheet row-column
values with nothing fancier such as formulas or other redirections.

CSV is quite good as a lowest common denominator exchange format. I say
quite because I would characterize it by 8 attributes and you need to
pick a dialect such as MS Excel which sets out what those are. XML and
JSON are controlled much better. You can easily verify that you conform
to those and guarantee that *any* conformant parser can read your
content. XML is more powerful in that repect than JSON in that you can
define and enforce schemas. In your case the fuel name, UOM, etc. can be
validated with standard tools. In JSON all that checking is entirely
handled by the consuming program(s).

>> As in all such cases it is a matter of choosing the most apropriate tool
>> for the job in hand.
>
> Naturally. That's what I'm exploring.

You might also like to consider HDF5. It is targeted at large volumes of
scientific data and its capabilities are well above what you need.
MATLAB, Octave and Scilab use it as their native format. PyTables and
h2py provide Python/NumPy bindings to it.

--
Pete Forman

alister

unread,

Sep 21, 2021, 6:30:52 PM9/21/21

to

On Tue, 21 Sep 2021 14:22:52 -0500, Michael F. Stemper wrote:

> On 21/09/2021 13.49, alister wrote:
>> On Tue, 21 Sep 2021 13:12:10 -0500, Michael F. Stemper wrote:
>>
>>> On the prolog thread, somebody posted a link to:
>>> <https://dirtsimple.org/2004/12/python-is-not-java.html>
>>>
>>> One thing that it tangentially says is "XML is not the answer."
>>>
>>> I read this page right when I was about to write an XML parser to get
>>> data into the code for a research project I'm working on.
>>> It seems to me that XML is the right approach for this sort of thing,
>>> especially since the data is hierarchical in nature.
>>>
>>> Does the advice on that page mean that I should find some other way to
>>> get data into my programs, or does it refer to some kind of
>>> misuse/abuse of XML for something that it wasn't designed for?
>>>
>>> If XML is not the way to package data, what is the recommended
>>> approach?
>>
>> 1'st can I say don't write your own XML parser, there are already a
>> number of existing parsers that should do everything you will need.
>> This is a wheel that does not need re-inventing.
>
> I was going to build it on top of xml.etree.ElementTree
>

so not writing a parser, using one, that's ok

>> 2nd if you are not generating the data then you have to use whatever
>> data format you are supplied
>
> It's my own research, so I can give myself the data in any format that I
> like.
>
>> as far as I can see the main issue with XML is bloat, it tries to do
>> too many things & is a very verbose format, often the quantity of
>> mark-up can easily exceed the data contained within it.
>>
>> other formats such a JSON & csv have far less overhead, although again
>> not always suitable.
>
> I've heard of JSON, but never done anything with it.

the python json library makes it simple.
it was originally invented for javascript, it looks very much like the
repl for a list/dictionary but if you are using std libraries you don't
really need to know except for academic interst

>
> How does CSV handle hierarchical data?

It dosn't, if you have heirachiacl data it is not a suitable format

> For instance, I have
> generators[1], each of which has a name, a fuel and one or more
> incremental heat rate curves. Each fuel has a name, UOM, heat content,
> and price. Each incremental cost curve has a name, and a series of
> ordered pairs (representing a piecewise linear curve).
>
> Can CSV files model this sort of situation?
>
>> As in all such cases it is a matter of choosing the most apropriate
>> tool for the job in hand.
>
> Naturally. That's what I'm exploring.
>
>
> [1] The kind made of tons of iron and copper, filled with oil, and
> rotating at 1800 rpm.

--

Riches cover a multitude of woes.
-- Menander

Joe Pfeiffer

unread,

Sep 21, 2021, 6:50:07 PM9/21/21

to

r...@zedat.fu-berlin.de (Stefan Ram) writes:
<snip>
> - S expressions (i.e., LISP notation)

If you're looking at hierarchical data and you don't have some good
reason to use something else, this is very likely to be your simplest
option.

Jon Ribbens

unread,

Sep 21, 2021, 6:58:26 PM9/21/21

to

On 2021-09-21, Pete Forman <petef4...@gmail.com> wrote:
> CSV is quite good as a lowest common denominator exchange format. I say
> quite because I would characterize it by 8 attributes and you need to
> pick a dialect such as MS Excel which sets out what those are. XML and
> JSON are controlled much better. You can easily verify that you conform
> to those and guarantee that *any* conformant parser can read your
> content. XML is more powerful in that repect than JSON in that you can
> define and enforce schemas. In your case the fuel name, UOM, etc. can be
> validated with standard tools. In JSON all that checking is entirely
> handled by the consuming program(s).

That's not true. You can use "JSON Schema" to create a schema
for validating JSON files, and there appear to be at least four
implementations in Python.

Eli the Bearded

unread,

Sep 21, 2021, 8:30:46 PM9/21/21

to

In comp.lang.python, Michael F. Stemper <michael...@gmail.com> wrote:
> I've heard of JSON, but never done anything with it.

You probably have used it inadvertantly on a regular basis over the
past few years. Websites live on it.

> How does CSV handle hierarchical data? For instance, I have
> generators[1], each of which has a name, a fuel and one or more
> incremental heat rate curves. Each fuel has a name, UOM, heat content,
> and price. Each incremental cost curve has a name, and a series of
> ordered pairs (representing a piecewise linear curve).
>
> Can CSV files model this sort of situation?

Can a string of ones and zeros encode the sounds of Bach, the images
of his sheet music, the details to reproduce his bust in melted plastic
extruded from nozzle under the control of machines?

Yes, CSV files can model that. But it would not be my first choice of
data format. (Neither would JSON.) I'd probably use XML.

I rather suspect that all (many) of those genomes that end up in
Microsoft Excel files get there via a CSV export from a command line
tool. Once you can model life in CSV, everything seems possible.

> [1] The kind made of tons of iron and copper, filled with oil, and
> rotating at 1800 rpm.

Those are rather hard to model in CSV, too, but I'm sure it could be
done.

Elijah
------
for bonus round, use punched holes in paper to encode the ones and zeros

Joe Pfeiffer

unread,

Sep 21, 2021, 9:27:51 PM9/21/21

to

Eli the Bearded <*@eli.users.panix.com> writes:

> In comp.lang.python, Michael F. Stemper <michael...@gmail.com> wrote:
>> I've heard of JSON, but never done anything with it.
>
> You probably have used it inadvertantly on a regular basis over the
> past few years. Websites live on it.

If the user has any interaction whatever with the formats being used to
transfer data then something is very, very wrong. Someone using a
website built on JSON isn't using JSON in any meaningful sense of the
term.

>> How does CSV handle hierarchical data? For instance, I have
>> generators[1], each of which has a name, a fuel and one or more
>> incremental heat rate curves. Each fuel has a name, UOM, heat content,
>> and price. Each incremental cost curve has a name, and a series of
>> ordered pairs (representing a piecewise linear curve).
>>
>> Can CSV files model this sort of situation?
>
> Can a string of ones and zeros encode the sounds of Bach, the images
> of his sheet music, the details to reproduce his bust in melted plastic
> extruded from nozzle under the control of machines?
>
> Yes, CSV files can model that. But it would not be my first choice of
> data format. (Neither would JSON.) I'd probably use XML.
>
> I rather suspect that all (many) of those genomes that end up in
> Microsoft Excel files get there via a CSV export from a command line
> tool. Once you can model life in CSV, everything seems possible.

Whenever someone asks "can this be done?" in any sort of computer
related question, the real question is "is this practical?" I have hazy
memories of seeing a Turing Machine implemented in an Excel spreadsheet,
so *anything* can, with sufficiently ridiculous amounts of work. That's
not really helpful here.

>> [1] The kind made of tons of iron and copper, filled with oil, and
>> rotating at 1800 rpm.
>
> Those are rather hard to model in CSV, too, but I'm sure it could be
> done.

So let's try to point him at representations that are easy.

Ethan Furman

unread,

Sep 21, 2021, 10:36:36 PM9/21/21

to

On 9/21/21 11:12 AM, Michael F. Stemper wrote:

> It seems to me that XML is the right approach for this sort of
> thing, especially since the data is hierarchical in nature.

If you're looking for a format that you can read (as a human) and possibly hand-edit,
check out NestedText:

https://nestedtext.org/en/stable/

--
~Ethan~

Dan Stromberg

unread,

Sep 21, 2021, 10:46:44 PM9/21/21

to

On Tue, Sep 21, 2021 at 7:26 PM Michael F. Stemper <
michael...@gmail.com> wrote:

> If XML is not the way to package data, what is the recommended
> approach?
>

I prefer both JSON and YAML over XML.

XML has both elements and tags, but it didn't really need both. This
results in more complexity than necessary. Also, XSLT and XPath are not
really all that simple.

But there's hope. If you're stuck with XML, you can use xmltodict, which
makes XML almost as easy as JSON.

HTH.

Pete Forman

unread,

Sep 22, 2021, 3:56:59 AM9/22/21

to

Fair point. It has been a while since I looked at JSON schemas and they
were rather less mature then.

--
Pete Forman

Michael F. Stemper

unread,

Sep 22, 2021, 10:41:09 AM9/22/21

to

On 21/09/2021 16.21, Pete Forman wrote:
> "Michael F. Stemper" <michael...@gmail.com> writes:
>> On 21/09/2021 13.49, alister wrote:
>>> On Tue, 21 Sep 2021 13:12:10 -0500, Michael F. Stemper wrote:
>> It's my own research, so I can give myself the data in any format that I
>> like.
>>
>>> as far as I can see the main issue with XML is bloat, it tries to do
>>> too many things & is a very verbose format, often the quantity of
>>> mark-up can easily exceed the data contained within it. other formats
>>> such a JSON & csv have far less overhead, although again not always
>>> suitable.
>>
>> I've heard of JSON, but never done anything with it.
>
> Then you should certainly try to get a basic understanding of it. One
> thing JSON shares with XML is that it is best left to machines to
> produce and consume. Because both can be viewed in a text editor there
> is a common misconception that they are easy to edit. Not so, commas are
> a common bugbear in JSON and non-trivial edits in (XML unaware) text
> editors are tricky.

Okay, after playing around with the example in Lubanovic's book[1]
I've managed to create a dict of dicts of dicts and write it to a
json file. It seems to me that this is how json handles hierarchical
data. Is that understanding correct?

Is this then the process that I would use to create a *.json file
to provide data to my various programs? Copy and paste the current
hard-coded assignment statements into REPL, use json.dump(dict,fp)
to write it to a file, and then read the file into each program
with json.load(fp)? (Actually, I'd write a function to do that,
just as I would with XML.)

> Consider what overhead you should worry about. If you are concerned
> about file sizes then XML, JSON and CSV should all compress to a similar
> size.

Not a concern at all for my current application.

>> How does CSV handle hierarchical data? For instance, I have
>> generators[1], each of which has a name, a fuel and one or more
>> incremental heat rate curves. Each fuel has a name, UOM, heat content,
>> and price. Each incremental cost curve has a name, and a series of
>> ordered pairs (representing a piecewise linear curve).
>>
>> Can CSV files model this sort of situation?
>
> The short answer is no. CSV files represent spreadsheet row-column
> values with nothing fancier such as formulas or other redirections.

Okay, that was what I suspected.

> CSV is quite good as a lowest common denominator exchange format. I say
> quite because I would characterize it by 8 attributes and you need to
> pick a dialect such as MS Excel which sets out what those are. XML and
> JSON are controlled much better. You can easily verify that you conform
> to those and guarantee that *any* conformant parser can read your
> content. XML is more powerful in that repect than JSON in that you can
> define and enforce schemas. In your case the fuel name, UOM, etc. can be
> validated with standard tools.

Yeah, validating against a DTD is pretty easy, since lxml.etree does all
of the work.

> In JSON all that checking is entirely
> handled by the consuming program(s).

Well, the consumer's (almost) always going to need to do *some*
validation. For instance, as far as I can tell, a DTD can't specify
that there must be at least two of a particular item.

The designers of DTD seem to have taken the advice of MacLennan[2]:
"The only reasonable numbers are zero, one, or infinity."

Which is great until you need to make sure that you have enough
points to define at least one line segment.

>>> As in all such cases it is a matter of choosing the most apropriate tool
>>> for the job in hand.
>>
>> Naturally. That's what I'm exploring.
>
> You might also like to consider HDF5. It is targeted at large volumes of
> scientific data and its capabilities are well above what you need.

Yeah, I won't be looking at more than five or ten generators at most. A
small number is enough to confirm or refute the behavior that I'm
testing.

[1] _Introducing Python: Modern Computing in Simple Packages_,
Second Release, (c) 2015, Bill Lubanovic, O'Reilly Media, Inc.
[2] _Principles of Programming Languages: Design, Evaluation,
and Implementation_, Second Edition, (c) 1987, Bruce J. MacLennan,
Holt, Rinehart, & Winston
--
Michael F. Stemper
No animals were harmed in the composition of this message.

Michael F. Stemper

unread,

Sep 22, 2021, 10:53:13 AM9/22/21

to

On 21/09/2021 19.30, Eli the Bearded wrote:
> In comp.lang.python, Michael F. Stemper <michael...@gmail.com> wrote:
>> I've heard of JSON, but never done anything with it.
>
> You probably have used it inadvertantly on a regular basis over the
> past few years. Websites live on it.

I used to use javascript when I was running Windows (up until 2009),
since it was the only programming language to which I had ready
access. Then I got a linux box and quickly discovered python. I
dropped javascript like a hot potato.

>> How does CSV handle hierarchical data? For instance, I have
>> generators[1], each of which has a name, a fuel and one or more
>> incremental heat rate curves. Each fuel has a name, UOM, heat content,
>> and price. Each incremental cost curve has a name, and a series of
>> ordered pairs (representing a piecewise linear curve).
>>
>> Can CSV files model this sort of situation?
>
> Can a string of ones and zeros encode the sounds of Bach, the images
> of his sheet music, the details to reproduce his bust in melted plastic
> extruded from nozzle under the control of machines?
>
> Yes, CSV files can model that. But it would not be my first choice of
> data format. (Neither would JSON.) I'd probably use XML.

Okay. 'Go not to the elves for counsel, for they will say both no
and yes.' (I'm not actually surprised to find differences of opinion.)

>> [1] The kind made of tons of iron and copper, filled with oil, and
>> rotating at 1800 rpm.
>
> Those are rather hard to model in CSV, too, but I'm sure it could be
> done.

> for bonus round, use punched holes in paper to encode the ones and zeros

I've done cardboard.

Dennis Lee Bieber

unread,

Sep 22, 2021, 2:18:40 PM9/22/21

to

On Tue, 21 Sep 2021 13:12:10 -0500, "Michael F. Stemper"

<michael...@gmail.com> declaimed the following:

>On the prolog thread, somebody posted a link to:
><https://dirtsimple.org/2004/12/python-is-not-java.html>
>
>One thing that it tangentially says is "XML is not the answer."
>
>I read this page right when I was about to write an XML parser
>to get data into the code for a research project I'm working on.
>It seems to me that XML is the right approach for this sort of
>thing, especially since the data is hierarchical in nature.
>
>Does the advice on that page mean that I should find some other
>way to get data into my programs, or does it refer to some kind
>of misuse/abuse of XML for something that it wasn't designed
>for?

There are some that try to use XML as a /live/ data /storage/ format
(such as http://www.drivehq.com/web/brana/pandora.htm which has to parse
XML files for all configuration data and filter definitions on start-up,
and update those files on any changes).

If you control both the data generation and the data consumption,
finding some format with less overhead than XML is probably to be
recommended. XML is more a self-documented (in theory) means of packaging
data for transport between widely disparate applications, which are likely
written by different teams, if not different companies, who only interface
via the definition of the data as seen by XML.

>
>If XML is not the way to package data, what is the recommended
>approach?

Again, if you control both generation and consumption... I'd probably
use an RDBM. SQLite tends to be packaged with Python [Windows] or, at the
least, the DB-API adapter [Linux tends to expect SQLite as a standard
installed item]. SQLite is a "file server" model (as is the JET engine used
by M$ Access) -- each application (instance) is directly accessing the
database file; there is no server process mediating access.

Hierarchical (since you mention that in later posts) would be
represented by relations (terminology from relational theory -- a "table"
to most) linked by foreign keys.

--
Wulfraed Dennis Lee Bieber AF6VN
wlf...@ix.netcom.com http://wlfraed.microdiversity.freeddns.org/

Mark Lawrence

unread,

Sep 22, 2021, 3:41:14 PM9/22/21

to

On Tuesday, September 21, 2021 at 7:12:33 PM UTC+1, Michael F. Stemper wrote:
> On the prolog thread, somebody posted a link to:
> <https://dirtsimple.org/2004/12/python-is-not-java.html>

Me, but as the moderators on this group/mailing list have no objection to some people slamming Python where ever and whenever they like it, and when I object I get banned need I say more? Python doesn't discriminate against anybody unless you're on the autistic spectrum, in which case you can fuck off.

Michael F. Stemper

unread,

Sep 22, 2021, 4:57:11 PM9/22/21

to

On 22/09/2021 14.41, Mark Lawrence wrote:
> On Tuesday, September 21, 2021 at 7:12:33 PM UTC+1, Michael F. Stemper wrote:
>> On the prolog thread, somebody posted a link to:
>> <https://dirtsimple.org/2004/12/python-is-not-java.html>
>
> Me, but as the moderators on this group/mailing list have no objection to some people slamming Python where ever and whenever they like it, and when I object I get banned need I say more? Python doesn't discriminate against anybody unless you're on the autistic spectrum, in which case you can fuck off.

What on earth did I do to deserve this?

--
Michael F. Stemper
A preposition is something you should never end a sentence with.

Dennis Lee Bieber

unread,

Sep 22, 2021, 10:31:09 PM9/22/21

to

On Wed, 22 Sep 2021 09:52:59 -0500, "Michael F. Stemper"
<michael...@gmail.com> declaimed the following:

>On 21/09/2021 19.30, Eli the Bearded wrote:
>> In comp.lang.python, Michael F. Stemper <michael...@gmail.com> wrote:
>>> How does CSV handle hierarchical data? For instance, I have
>>> generators[1], each of which has a name, a fuel and one or more
>>> incremental heat rate curves. Each fuel has a name, UOM, heat content,
>>> and price. Each incremental cost curve has a name, and a series of
>>> ordered pairs (representing a piecewise linear curve).
>>>
>>> Can CSV files model this sort of situation?
>>

<SNIP>

>> Yes, CSV files can model that. But it would not be my first choice of
>> data format. (Neither would JSON.) I'd probably use XML.
>
>Okay. 'Go not to the elves for counsel, for they will say both no
>and yes.' (I'm not actually surprised to find differences of opinion.)
>

You'd have to include a "level" (and/or data type if multiple objects
can be at the same level) field (as the first field) in CSV which
identifies how to parse the rest of the CSV data (well, technically, the
CSV module has "parsed" it -- in terms of splitting at commas, handling
quoted strings (which may contain commas which are not split points, etc.).

1-generator, name
2-fuel, name, UOM, heat-content, price
2-curve, name
3-point, X, Y
3-point, X, Y
...
2-curve, name
3-point, X, Y
3-point, X, Y
...

You extract objects at each level; if the level is the same or "lower"
(numerically -- higher in hierarchy) you attach the "previously" extracted
object to the parent object... Whether list or dictionary, or class
instance(s):

class Point():
#Point may be overkill, easier to just use a tuple (X, Y)
def __init__(self, X, Y):
self.X = X
self.Y = Y

class Curve():
def __init__(self, name):
self.name = name
self.points = []

#use as aCurve.points.append(currentPoint)

class Fuel():
def __init__(self, name, ..., price):
self.name = name
...
self.price = price

class Generator():
def __init__(self, name):
self.name = name
self.fuel = None
self.curves = []

#aGenerator.fuel = currentCurve
#aGenerator.curves.append(currentCurve)

Christian Gollwitzer

unread,

Sep 23, 2021, 5:22:04 AM9/23/21

to

Am 22.09.21 um 16:52 schrieb Michael F. Stemper:

> On 21/09/2021 19.30, Eli the Bearded wrote:
>> Yes, CSV files can model that. But it would not be my first choice of
>> data format. (Neither would JSON.) I'd probably use XML.
>
> Okay. 'Go not to the elves for counsel, for they will say both no
> and yes.' (I'm not actually surprised to find differences of opinion.)

It is wrong, CSV has no model of hierarchical data. A CSV file is a 2d
table, just like a database table or an Excel sheet.

You can /layer/ high-dimensional data on top of a 2D table, there is the
relational algebra theory behind this, but it is wrong (or misleading at
best) to say that CSV can model hierarchical data.

It's the same as saying "CSV supports images". Of course it doesn't, its
a textfile, but you could encode a JPEG as base64 and then put this
string into the cell of a CSV table. That definitely isn't what a sane
person would understand as "support".

Christian

Mats Wichmann

unread,

Sep 23, 2021, 8:54:20 AM9/23/21

to

On 9/22/21 10:31, Dennis Lee Bieber wrote:

> If you control both the data generation and the data consumption,

> finding some format ...

This is really the key. I rant at people seeming to believe that csv is
THE data interchange format, and it's about as bad as it gets at that,
if you have a choice. xml is noisy but at least (potentially)
self-documenting, and ought to be able to recover from certain errors.
The problem with csv is that a substantial chunk of the world seems to
live inside Excel, and so data is commonly both generated in csv so it
can be imported into excel and generated in csv as a result of exporting
from excel, so the parts often are *not* in your control.

Sigh.

Mostowski Collapse

unread,

Sep 23, 2021, 9:27:19 AM9/23/21

to

I didn't slam Python. In the end I found PyPy, and had good results.
But there are like two kind of lists, one moderated and one unmoderated.
For example my PyPy testing is not found here:

https://mail.python.org/pipermail/python-list/2021-September/thread.html#start

On the other and this thread is intact:

https://groups.google.com/g/comp.lang.python/c/JrZ-Zywmzwg/m/wEArUPblAwAJ

Also on pipermail there is a strange chronological reordering. My
last recorded post is from 16. Sept from then on I got censored,
whereas Schachner, Joseph post is from 14. Sept and follows

my post. So I guess the list is heavily manipulated. Which is a pitty,
since Python has nothing to hide. Its a good language.

Chris Angelico

unread,

Sep 23, 2021, 9:27:40 AM9/23/21

to

The only people who think that CSV is *the* format are people who
habitually live in spreadsheets. People who move data around the
internet, from program to program, are much more likely to assume that
JSON is the sole format. Of course, there is no single ultimate data
interchange format, but JSON is a lot closer to one than CSV is.

(Or to be more precise: any such thing as a "single ultimate data
interchange format" will be so generic that it isn't enough to define
everything. For instance, "a stream of bytes" is a universal data
interchange format, but that's not ultimately a very useful claim.)

ChrisA

Mostowski Collapse

unread,

Sep 23, 2021, 9:35:44 AM9/23/21

to

Or its a problem with the thread view on pipermail.
If I use the date view I see again other stuff.

---- end excursion, wont interupt again ---

Michael F. Stemper

unread,

Sep 23, 2021, 1:23:27 PM9/23/21

to

On 22/09/2021 17.37, Dennis Lee Bieber wrote:
> On Wed, 22 Sep 2021 09:52:59 -0500, "Michael F. Stemper"
> <michael...@gmail.com> declaimed the following:
>> On 21/09/2021 19.30, Eli the Bearded wrote:
>>> In comp.lang.python, Michael F. Stemper <michael...@gmail.com> wrote:

>>>> How does CSV handle hierarchical data? For instance, I have

>>>> Can CSV files model this sort of situation?
>>>
> <SNIP>
>>> Yes, CSV files can model that. But it would not be my first choice of
>>> data format. (Neither would JSON.) I'd probably use XML.
>>
>> Okay. 'Go not to the elves for counsel, for they will say both no
>> and yes.' (I'm not actually surprised to find differences of opinion.)
>>
> You'd have to include a "level" (and/or data type if multiple objects
> can be at the same level) field (as the first field) in CSV which
> identifies how to parse the rest of the CSV data (well, technically, the
> CSV module has "parsed" it -- in terms of splitting at commas, handling
> quoted strings (which may contain commas which are not split points, etc.).
>
> 1-generator, name
> 2-fuel, name, UOM, heat-content, price
> 2-curve, name
> 3-point, X, Y
> 3-point, X, Y
> ...
> 2-curve, name
> 3-point, X, Y
> 3-point, X, Y

This reminds me of how my (former) employer imported data models into
our systems from the 1970s until the mid-2000s. We had 80-column records
(called "card images"), that would have looked like:

FUEL0 LIGNITE TON 13.610 043.581
UNIT1 COAL CREK1
UNIT2 ...

The specific columns for the start and end of each field on each record
were defined in a thousand-plus page document. (We modeled all of a
power system, not just economic data about generators.)

However, this doesn't seem like it would fit too well with the csv
module, since it requires a lot more logic on the part of the consuming
program.

Interesting flashback, though.

--
Michael F. Stemper
Deuteronomy 24:17

Eli the Bearded

unread,

Sep 23, 2021, 1:51:55 PM9/23/21

to

In comp.lang.python, Christian Gollwitzer <auri...@gmx.de> wrote:
> Am 22.09.21 um 16:52 schrieb Michael F. Stemper:
>> On 21/09/2021 19.30, Eli the Bearded wrote:
>>> Yes, CSV files can model that. But it would not be my first choice of
>>> data format. (Neither would JSON.) I'd probably use XML.
>> Okay. 'Go not to the elves for counsel, for they will say both no
>> and yes.' (I'm not actually surprised to find differences of opinion.)

Well, I have a recommendation with my answer.

> It's the same as saying "CSV supports images". Of course it doesn't, its
> a textfile, but you could encode a JPEG as base64 and then put this
> string into the cell of a CSV table. That definitely isn't what a sane
> person would understand as "support".

I'd use one of the netpbm formats instead of JPEG. PBM for one bit
bitmaps, PGM for one channel (typically grayscale), PPM for three
channel RGB, and PAM for anything else (two channel gray plus alpha,
CMYK, RGBA, HSV, YCbCr, and more exotic formats). JPEG is tricky to
map to CSV since it is a three channel format (YCbCr), where the
channels are typically not at the same resolution. Usually Y is full
size and the Cb and Cr channels are one quarter size ("4:2:0 chroma
subsampling"). The unequal size of the channels does not lend itself
to CSV, but I can't say it's impossible.

But maybe you meant the whole JFIF or Exif JPEG file format base64
encoded with no attempt to understand the image. That sort of thing
is common in JSON, and I've seen it in YAML, too. It wouldn't surprise
me if people do that in CSV or XML, but I have so far avoided seeing
that. I used that method for sticking a tiny PNG in a CSS file just
earlier this month. The whole PNG was smaller than the typical headers
of an HTTP/1.1 request and response, so I figured "don't make it a
separate file".

Elijah
------
can at this point recegnize a bunch of "magic numbers" in base64

Michael F. Stemper

unread,

Sep 23, 2021, 4:06:36 PM9/23/21

to

On 23/09/2021 12.51, Eli the Bearded wrote:
>> Am 22.09.21 um 16:52 schrieb Michael F. Stemper:
>>> On 21/09/2021 19.30, Eli the Bearded wrote:
>>>> Yes, CSV files can model that. But it would not be my first choice of
>>>> data format. (Neither would JSON.) I'd probably use XML.
>>> Okay. 'Go not to the elves for counsel, for they will say both no
>>> and yes.' (I'm not actually surprised to find differences of opinion.)
>
> Well, I have a recommendation with my answer.

Sorry, didn't mean that to be disparaging.

--
Michael F. Stemper
This post contains greater than 95% post-consumer bytes by weight.

Avi Gross

unread,

Sep 23, 2021, 5:27:06 PM9/23/21

to

Can we agree that there are way more general ways to store data than
anything currently in common use and that in some ways, CSV and cousins like
TSV are a subset of the others in a sense? There are trees and arbitrary
graphs and many complex data structures often encountered while a program is
running as in-memory objects. Many are not trivial to store.

But some are if all you see is table-like constructs including matrices and
data.frames.

I mean any rectangular data format with umpteen rows and N columns can
trivially be stored in many other formats and especially when it allows some
columns to have NA values. The other format would simply have major
categories that contain components with one per column, and if missing,
represents an NA. Is there any reason JSON or XML cannot include the
contents of any CSV with headers and without loss of info?

Going the other way is harder. Note that a data.frame type of structure
often imposes restrictions on a CSV and requires everything in a column to
be of the same type, or coercible to a common type. (well, not always true
as in using list columns in R.) But given some arbitrary structure in XML,
can you look at all possible labels and if it is not too complex, make a CSV
with one or more columns for every possible need? It can be a problem if say
a record for an Author allows multiple actual co-authors. Normal books may
let you get by with multiple columns (mostly containing an NA) with names
like author1, author2, author3, ...

But scientific papers seemingly allow oodles of authors and any time you
update the data, you may need yet another column. And, of course, processing
data where many columns have the same meaning is a bit of a pain. Data
structures can also often be nested multiple levels and at some point, CSV
is not a reasonable fit unless you play database games and make multiple
tables you can store and retrieve to make complex queries, as in many
relational database systems. Yes, each such table can be a CSV.

But if you give someone a hammer, they tend to stop using thumbtacks or
other tools. The real question is what kind of data makes good sense for an
application. If a nice rectangular format works, great. Even if not, the
Author problem above can fairly easily be handled by making the author
column something like a character string you compose as "Last1, First1;
Last2, First2; Last3, First3" and that fits fine in a CSV but can be taken
apart in your software if looking for any book by a particular author. Not
optimal, but a workaround I am sure is used.

But using the most abstract and complex storage method is very often
overkill and unless you are very good at it, may well be a fairly slow and
even error-prone way to solve a problem.

--
https://mail.python.org/mailman/listinfo/python-list

Julio Di Egidio

unread,

Sep 23, 2021, 5:42:14 PM9/23/21

to

On Tuesday, 21 September 2021 at 20:12:33 UTC+2, Michael F. Stemper wrote:
> On the prolog thread, somebody posted a link to:
> <https://dirtsimple.org/2004/12/python-is-not-java.html>
>

> One thing that it tangentially says is "XML is not the answer."

In the detail, I do not agree with many things he says, yet it's just true that XML has been and still is often times abused: e.g. a classic is XML to store arbitrary data in a DB field...

<snip>

> Does the advice on that page mean that I should find some other
> way to get data into my programs, or does it refer to some kind
> of misuse/abuse of XML for something that it wasn't designed
> for?

Just don't use XML for program state ever, period. XML is an *exchange* format, to transmit information to other programs, or store it on disk, though again most probably for later transmission, since even for storage there is quite better. It's very formal (whence not user-friendly) to allow for strict programmatic validation and discovery...

And that's essentially it.

Have fun,

Julio

Avi Gross

unread,

Sep 23, 2021, 6:00:02 PM9/23/21

to

What you are describing Stephen, is what I meant by emulating a relational database with tables.

And, FYI, There is no guarantee that two authors with the same name will not be assumed to be the same person.

Besides the lack of any one official CSV format, there are oodles of features I have seen that are normally external to the CSV. For example, I have often read in data from a CSV or similar, where you could tell the software to consider a blank or 999 to mean NA and what denotes a line in the file to be ignored as a comment and whether a separator is a space or any combination of whitespace and what quotes something so say you can hide a comma and how to handle escapes and whether to skip blank lines and more.

Now a really good design might place some metadata into the file that can be used to set defaults for things like that or incorporate them into the format unambiguously. It might calculate the likely data type for various fields and store that in the metadata. So even if you stored rectangular data in a CSV file, perhaps the early lines would be in some format that can be read as comments and supply some info like the above.

Are any of the CSV variants more like that?

-----Original Message-----
From: Python-list <python-list-bounces+avigross=veriz...@python.org> On Behalf Of Stefan Ram
Sent: Thursday, September 23, 2021 5:43 PM
To: pytho...@python.org
Subject: Re: XML Considered Harmful

"Avi Gross" <avig...@verizon.net> writes:
>But scientific papers seemingly allow oodles of authors and any time
>you update the data, you may need yet another column.

You can use three CSV files: papers, persons, and authors:

papers.csv

1, "Is the accelerated expansion evidence of a change of signature?"

persons.csv

1, Marc Mars

authors.csv

1, 1

I.e., paper 1 is authored by person 1.

Now, when we learn that José M. M. Senovilla also is a
co-author of "Is the accelerated expansion evidence of a
forthcoming change of signature?", we do only have to add
new rows, no new colums.

papers.csv

1, "Is the accelerated expansion evidence of a change of signature?"

persons.csv

1, "Marc Mars"
2, "José M. M. Senovilla"

authors.csv

1, 1
1, 2

The real problem with CSV is that there is no CSV.

This is not a specific data language with a specific
specification. Instead it is a vague designation for
a plethora of CSV dialects, which usually dot not even
have a specification. Compare this with XML. XML has
a sole specification managed by the W3C.

--
https://mail.python.org/mailman/listinfo/python-list

Chris Angelico

unread,

Sep 23, 2021, 6:03:11 PM9/23/21

to

On Fri, Sep 24, 2021 at 7:11 AM Eli the Bearded <*@eli.users.panix.com> wrote:
>
> In comp.lang.python, Christian Gollwitzer <auri...@gmx.de> wrote:
> > Am 22.09.21 um 16:52 schrieb Michael F. Stemper:
> >> On 21/09/2021 19.30, Eli the Bearded wrote:
> >>> Yes, CSV files can model that. But it would not be my first choice of
> >>> data format. (Neither would JSON.) I'd probably use XML.
> >> Okay. 'Go not to the elves for counsel, for they will say both no
> >> and yes.' (I'm not actually surprised to find differences of opinion.)
>
> Well, I have a recommendation with my answer.
>
> > It's the same as saying "CSV supports images". Of course it doesn't, its
> > a textfile, but you could encode a JPEG as base64 and then put this
> > string into the cell of a CSV table. That definitely isn't what a sane
> > person would understand as "support".
>
> I'd use one of the netpbm formats instead of JPEG. PBM for one bit
> bitmaps, PGM for one channel (typically grayscale), PPM for three
> channel RGB, and PAM for anything else (two channel gray plus alpha,
> CMYK, RGBA, HSV, YCbCr, and more exotic formats). JPEG is tricky to
> map to CSV since it is a three channel format (YCbCr), where the
> channels are typically not at the same resolution. Usually Y is full
> size and the Cb and Cr channels are one quarter size ("4:2:0 chroma
> subsampling"). The unequal size of the channels does not lend itself
> to CSV, but I can't say it's impossible.
>

Examine prior art, and I truly do mean art, from Matt Parker:

https://www.youtube.com/watch?v=UBX2QQHlQ_I

ChrisA

Jon Ribbens

unread,

Sep 23, 2021, 6:55:41 PM9/23/21

to

On 2021-09-23, Stefan Ram <r...@zedat.fu-berlin.de> wrote:
> The real problem with CSV is that there is no CSV.
>
> This is not a specific data language with a specific
> specification. Instead it is a vague designation for
> a plethora of CSV dialects, which usually dot not even
> have a specification.

Indeed. For example, at least at some points in its history,
Excel has been unable to import CSV written by itself, because
its importer was incompatible with its own exporter.

> Compare this with XML. XML has a sole specification managed
> by the W3C.

Other well-defined formats are also available ;-)

dn

unread,

Sep 23, 2021, 9:02:59 PM9/23/21

to

On 22/09/2021 07.22, Michael F. Stemper wrote:
> On 21/09/2021 13.49, alister wrote:

>> On Tue, 21 Sep 2021 13:12:10 -0500, Michael F. Stemper wrote:
>>
>>> On the prolog thread, somebody posted a link to:
>>> <https://dirtsimple.org/2004/12/python-is-not-java.html>

Given the source, shouldn't one take any criticism of Python (or Java)
with at least the proverbial grain of salt!

>>> One thing that it tangentially says is "XML is not the answer."

"tangential" as in 'spinning off'?

...

> It's my own research, so I can give myself the data in any format that I
> like.

...
With that, why not code it as Python expressions, and include the module?
--
Regards,
=dn

Chris Angelico

unread,

Sep 23, 2021, 11:12:14 PM9/23/21

to

On Fri, Sep 24, 2021 at 12:22 PM Stefan Ram <r...@zedat.fu-berlin.de> wrote:

>
> dn <Pytho...@DancesWithMice.info> writes:
> >With that, why not code it as Python expressions, and include the module?
>

> This might create a code execution vulnerability if such
> files are exchanged between multiple parties.
>
> If code execution vulnerabilities and human-readability are
> not an issue, then one could also think about using pickle.
>
> If one ignores security concerns for a moment, serialization into
> a text format and subsequent deserialization can be a easy as:
>
> |>>> eval( str( [1, (2, 3)] ))
> |[1, (2, 3)]
>

One good hybrid is to take a subset of Python syntax (so it still
looks like a Python script for syntax highlighting etc), and then
parse that yourself, using the ast module. For instance, you can strip
out comments, then look for "VARNAME = ...", and parse the value using
ast.literal_eval(), which will give you a fairly flexible file format
that's still quite safe.

ChrisA

Dan Stromberg

unread,

Sep 23, 2021, 11:44:56 PM9/23/21

to

On Thu, Sep 23, 2021 at 8:12 PM Chris Angelico <ros...@gmail.com> wrote:

> One good hybrid is to take a subset of Python syntax (so it still
> looks like a Python script for syntax highlighting etc), and then
> parse that yourself, using the ast module. For instance, you can strip
> out comments, then look for "VARNAME = ...", and parse the value using
> ast.literal_eval(), which will give you a fairly flexible file format
> that's still quite safe.
>

Restricting Python with the ast module is interesting, but I don't think
I'd want to bet my career on the actual safety of such a thing. Given that
Java bytecode was a frequent problem inside web browsers, imagine all the
messiness that could accidentally happen with a subset of Python syntax
from untrusted sources.

ast.literal_eval might be a little better - or a list of such, actually.

Better still to use JSON or ini format - IOW something designed for the
purpose.

Chris Angelico

unread,

Sep 23, 2021, 11:49:27 PM9/23/21

to

Uhh, I specifically mention literal_eval in there :) Simple text
parsing followed by literal_eval for the bulk of it is a level of
safety that I *would* bet my career on.

> Better still to use JSON or ini format - IOW something designed for the purpose.

It all depends on how human-editable it needs to be. JSON has several
problems in that respect, including some rigidities, and a lack of
support for comments. INI format doesn't have enough data types for
many purposes. YAML might be closer, but it's not for every situation
either.

That's why we have options.

ChrisA

dn

unread,

Sep 24, 2021, 1:40:34 AM9/24/21

to

On 24/09/2021 14.07, Stefan Ram wrote:

> dn <Pytho...@DancesWithMice.info> writes:
>> With that, why not code it as Python expressions, and include the module?
>

> This might create a code execution vulnerability if such
> files are exchanged between multiple parties.

The OP's spec, as quoted earlier(!), reads:

"It's my own research, so I can give myself the data in any format that
I like."

Whither "files are exchanged" and/or "multiple parties"? Are these
anticipations of problems that may/won't ever apply? aka YAGNI.

Concern about such an approach *is* warranted.

However, the preceding question to be considered during the design-stage
is: 'does such concern apply?'. The OP describes full and unique agency.
Accordingly, "KISS"!

NB my personal choice would likely be JSON or YAML, but see reservations
(eg @Chris) - and with greater relevance: shouldn't we consider the OP's
'learning curve'?
(such deduced only from OP's subsequent reactions/responses 'here' -
with any and all due apologies)
--
Regards,
=dn

Mike Dewhirst

unread,

Sep 24, 2021, 2:43:03 AM9/24/21

to

I had to use XML once because that was demanded by the receiving machine over which I had no say.I wouldn't use it otherwise because staring at it makes you dizzy.I would want to know how the data are derived from the multiple sources and transmitted to the collating platform before pontificating.Then I would ignore any potential future enhancements and choose the easiest possible mechanism. I have used json with python and been delighted at the ease of converting data into dicts and even arbitrary nesting where data values can also be dicts etc.Good luck--(Unsigned mail from my phone)
-------- Original message --------From: dn via Python-list <pytho...@python.org> Date: 24/9/21 15:42 (GMT+10:00) To: pytho...@python.org Subject: Re: XML Considered Harmful On 24/09/2021 14.07, Stefan Ram wrote:> dn <Pytho...@DancesWithMice.info> writes:>> With that, why not code it as Python expressions, and include the module?> > This might create a code execution vulnerability if such > files are exchanged between multiple parties.The OP's spec, as quoted earlier(!), reads:"It's my own research, so I can give myself the data in any format thatI like."Whither "files are exchanged" and/or "multiple parties"? Are theseanticipations of problems that may/won't ever apply? aka YAGNI.Concern about such an approach *is* warranted.However, the preceding question to be considered during the design-stageis: 'does such concern apply?'. The OP describes full and unique agency.Accordingly, "KISS"!NB my personal choice would likely be JSON or YAML, but see reservations(eg @Chris) - and with greater relevance: shouldn't we consider the OP's'learning curve'?(such deduced only from OP's subsequent reactions/responses 'here' -with any and all due apologies)-- Regards,=dn-- https://mail.python.org/mailman/listinfo/python-list

Mostowski Collapse

unread,

Sep 24, 2021, 9:16:46 AM9/24/21

to

Mark Lawrence is also not visible on pipermail? Ha Ha, thats
the problem with this censoring, its like cherry picking, allmost
like the aledged organ trade in republic of china, We now

have the case that somebody picked up some link via another
Python channel, like the uncensored google groups, but on the
pipermail channel even the original link is not visible.

The annoying thing nowadays, you don't get an email anymore
that tells you "We are sorry, but we have banned you", this would
be too flattering and possibly make you famous, it all works now

under the hood as reported here:

What Are Shadowbans?
Shadowbans block a user or individual pieces of content
without letting the offending user know they’ve been blocked.
https://builtin.com/marketing/shadowban

I must admit, I have applauded when Donald Trump was
banned from twitter, but this "management" of programming
language groups gets ridiculous.

Mark Lawrence schrieb am Mittwoch, 22. September 2021 um 21:41:14 UTC+2:

Mostowski Collapse

unread,

Sep 24, 2021, 9:46:27 AM9/24/21

to

BTW: I think its problematic to associate Java with XML.

Michael F. Stemper schrieb am Dienstag, 21. September 2021 um 20:12:33 UTC+2:
> On the prolog thread, somebody posted a link to:
> <https://dirtsimple.org/2004/12/python-is-not-java.html>

The above linke is very old, from 2004, and might apply
how Java presented itself back in those days. But since
the Jigsaw project, XML has practically left Java.

Its all not anymore part of the javax.* or java.* namespace,
Oracle got rid of XML technologies housing in these
namespaces, and there is now the jakarta.* namespace.

Example JAXB:
Jakarta XML Binding (JAXB; formerly Java Architecture for XML Binding)
https://de.wikipedia.org/wiki/Jakarta_XML_Binding

If I remember well, also XML never went into the Java
Language Specification, unlike the Scala programming
language, where you can have XML literals:

XML literals in scala
https://tuttlem.github.io/2015/02/24/xml-literals-in-scala.html

An easy protection against tampered XML data vulnerabilities
is DTD or some other XML schema language. It can at least catch
problems that are in the scope of the schema language.

Mostowski Collapse

unread,

Sep 24, 2021, 10:55:45 AM9/24/21

to

Or then use cryptographic methods to protect your XML
file when in transit. Like encryption and/or signatures.

Peter J. Holzer

unread,

Sep 24, 2021, 2:29:36 PM9/24/21

to

On 2021-09-21 19:46:19 -0700, Dan Stromberg wrote:
> On Tue, Sep 21, 2021 at 7:26 PM Michael F. Stemper <
> michael...@gmail.com> wrote:
> > If XML is not the way to package data, what is the recommended
> > approach?
> >
>
> I prefer both JSON and YAML over XML.
>
> XML has both elements and tags, but it didn't really need both.

I think you meant "both elements and attributes". Tags are how you
denote elements, so they naturally go together.

I agree that for representing data (especially object-oriented data) the
distiction between (sub-)elements and attributes seems moot (should
represent that field as an attribute or a field?), but don't forget that
XML was intended to replace SGML, and that SGML was intended to mark up
text, not represent any data.

Would you really want to write

<p>Mr. <party role="defendant">Smith</person>s point was corroborated by
Ms. <witness>Jones</witness> point that <quote>bla, bla</quote>, which
seemed more plausibe than Mr. <party role="plaintiff">Willam</party>
claim that <quote>blub, blub</quote>.

as

<p>Mr. <party role><defendant/>Smith</person>s point was corroborated by
Ms. <witness>Jones</witness> point that <quote>bla, bla</quote>, which
seemed more plausibe than Mr. <party role><plaintiff/>Willam</party>
claim that <quote>blub, blub</quote>.

or

<p>Mr. <party role><defendant>Smith<(defendant></person>s point was
corroborated by Ms. <witness>Jones</witness> point that <quote>bla,
bla</quote>, which seemed more plausibe than Mr. <party>
<plaintiff/>Willam</plaintiff></party> claim that <quote>blub,
blub</quote>.

?

I probably chose an example (no doubt influenced by the fact that SGML
was originally invented to digitize court decisions) which is too simple
(in HTML I often see many attributes on a single element, even with
CSS), but even here you can see that attributes add clarity.

hp

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | h...@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"

signature.asc

Peter J. Holzer

unread,

Sep 24, 2021, 2:34:23 PM9/24/21

to

On 2021-09-23 06:53:10 -0600, Mats Wichmann wrote:
> The problem with csv is that a substantial chunk of the world seems to
> live inside Excel,

This is made sp much worse by Excel being exceptionally bad at reading
CSV.

Several hundred genes were recently renamed because Excel was unable to
read their names as simply strings and insisted on interpreting them as
something else (e.g. dates).

signature.asc

Peter J. Holzer

unread,

Sep 24, 2021, 2:59:35 PM9/24/21

to

On 2021-09-21 13:12:10 -0500, Michael F. Stemper wrote:
> I read this page right when I was about to write an XML parser
> to get data into the code for a research project I'm working on.
> It seems to me that XML is the right approach for this sort of
> thing, especially since the data is hierarchical in nature.

>
> Does the advice on that page mean that I should find some other
> way to get data into my programs, or does it refer to some kind
> of misuse/abuse of XML for something that it wasn't designed
> for?
>

> If XML is not the way to package data, what is the recommended
> approach?

There are a gazillion formats and depending on your needs one of them
might be perfect. Or you may have to define you own bespoke format (I
mean, nobody (except Matt Parker) tries to represent images or videos as
CSVs: There's PNG and JPEG and WEBP and H.264 and AV1 and whatever for
that).

Of the three formats discussed here my take is:

CSV: Good for tabular data of a single data type (strings). As soon as
there's a second data type (numbers, dates, ...) you leave standard
territory and are into "private agreements".

JSON: Has a few primitive data types (bool, number, string) and a two
compound types (list, dict(string -> any)). Still missing many
frequently used data types (e.g. dates) and has no standard way to
denote composite types. But its simple and if it's sufficient for your
needs, use it.

XML: Originally invented for text markup, and that shows. Can represent
different types (via tags), can define those types (via DTD and/or
schemas), can identify schemas in a globally-unique way and you can mix
them all in a single document (and there are tools available to validate
your files). But those features make it very complex (you almost
certainly don't want to write your own parser) and you really have to
understand the data model (especiall namespaces) to use it.

You can of course represent any data in any format if you jump through
enough hoops, but the real question is "does the data I have fit
naturally within the data model of the format I'm trying to use". If it
doesn't, look for something else. For me, CSV, JSON and XML form a
hierarchy where each can naturally represent all the data of its
predecessors, but not vice versa.

signature.asc

dn

unread,

Sep 24, 2021, 6:51:58 PM9/24/21

to

On 25/09/2021 06.59, Peter J. Holzer wrote:
> There are a gazillion formats and depending on your needs one of them
> might be perfect. Or you may have to define you own bespoke format (I
> mean, nobody (except Matt Parker) tries to represent images or videos as
> CSVs: There's PNG and JPEG and WEBP and H.264 and AV1 and whatever for
> that).
>
> Of the three formats discussed here my take is:
>
> CSV: Good for tabular data of a single data type (strings). As soon as
> there's a second data type (numbers, dates, ...) you leave standard
> territory and are into "private agreements".
>
> JSON: Has a few primitive data types (bool, number, string) and a two
> compound types (list, dict(string -> any)). Still missing many
> frequently used data types (e.g. dates) and has no standard way to
> denote composite types. But its simple and if it's sufficient for your
> needs, use it.
>
> XML: Originally invented for text markup, and that shows. Can represent
> different types (via tags), can define those types (via DTD and/or
> schemas), can identify schemas in a globally-unique way and you can mix
> them all in a single document (and there are tools available to validate
> your files). But those features make it very complex (you almost
> certainly don't want to write your own parser) and you really have to
> understand the data model (especiall namespaces) to use it.

and YAML?
--
Regards,
=dn

Chris Angelico

unread,

Sep 24, 2021, 7:00:33 PM9/24/21

to

Invented because there weren't enough markup languages, so we needed another?

ChrisA

David L Neil

unread,

Sep 24, 2021, 7:27:09 PM9/24/21

to

On 25/09/2021 11.00, Chris Angelico wrote:

> Invented because there weren't enough markup languages, so we needed another?

Anything You Can Do I Can Do Better
https://www.youtube.com/watch?v=_UB1YAsPD6U

--
Regards =dn

Jon Ribbens

unread,

Sep 24, 2021, 7:32:58 PM9/24/21

to

On 2021-09-24, Chris Angelico <ros...@gmail.com> wrote:
> On Sat, Sep 25, 2021 at 8:53 AM dn via Python-list
><pytho...@python.org> wrote:
>> On 25/09/2021 06.59, Peter J. Holzer wrote:
>> > CSV: Good for tabular data of a single data type (strings). As soon as
>> > there's a second data type (numbers, dates, ...) you leave standard
>> > territory and are into "private agreements".

CSV is not good for strings, as there is no one specification of how to
encode things like newlines and commas within the strings, so you may
find that your CSV data transfer fails or even silently corrupts data.

>> > JSON: Has a few primitive data types (bool, number, string) and a two
>> > compound types (list, dict(string -> any)). Still missing many
>> > frequently used data types (e.g. dates) and has no standard way to
>> > denote composite types. But its simple and if it's sufficient for your
>> > needs, use it.

JSON Schema provides a way to denote composite types.

>> > XML: Originally invented for text markup, and that shows. Can represent
>> > different types (via tags), can define those types (via DTD and/or
>> > schemas), can identify schemas in a globally-unique way and you can mix
>> > them all in a single document (and there are tools available to validate
>> > your files). But those features make it very complex (you almost
>> > certainly don't want to write your own parser) and you really have to
>> > understand the data model (especiall namespaces) to use it.
>>
>> and YAML?
>
> Invented because there weren't enough markup languages, so we needed
> another?

Invented as a drunken bet that got out of hand, and used by people who
don't realise this.

Greg Ewing

unread,

Sep 24, 2021, 7:46:28 PM9/24/21

to

On 25/09/21 6:29 am, Peter J. Holzer wrote:
> don't forget that
> XML was intended to replace SGML, and that SGML was intended to mark up
> text, not represent any data.

And for me this is the number one reason why XML is the wrong
tool for almost everything it's used for nowadays.

It's bizarre. It's as though there were a large community of
professional builders who insisted on using hammers to drive
scews, and extolled the advantages of doing so.

--
Greg

Greg Ewing

unread,

Sep 24, 2021, 8:01:26 PM9/24/21

to

On 25/09/21 6:34 am, Peter J. Holzer wrote:
> Several hundred genes were recently renamed because Excel was unable to
> read their names as simply strings and insisted on interpreting them as
> something else (e.g. dates).

Another fun one I've come across is interpreting phone numbers
as floating point and writing them out again with exponents...

--
Greg

Greg Ewing

unread,

Sep 24, 2021, 8:14:12 PM9/24/21

to

On 25/09/21 10:51 am, dn wrote:
>> XML: Originally invented for text markup, and that shows. Can represent
>> different types (via tags), can define those types (via DTD and/or
>> schemas), can identify schemas in a globally-unique way and you can mix
>> them all in a single document (and there are tools available to validate
>> your files). But those features make it very complex

And for all that complexity, it still doesn't map very well
onto the kinds of data structures used inside programs (lists,
structs, etc.), so you end up having to build those structures
on top of it, and everyone does that in a different way.

--
Greg

Greg Ewing

unread,

Sep 24, 2021, 8:16:22 PM9/24/21

to

There were *too many* markup languages, so we invented another!

--
Greg

Peter J. Holzer

unread,

Sep 25, 2021, 6:47:08 AM9/25/21

to

On 2021-09-24 23:32:47 -0000, Jon Ribbens via Python-list wrote:
> On 2021-09-24, Chris Angelico <ros...@gmail.com> wrote:
> > On Sat, Sep 25, 2021 at 8:53 AM dn via Python-list
> ><pytho...@python.org> wrote:
> >> On 25/09/2021 06.59, Peter J. Holzer wrote:
> >> > CSV: Good for tabular data of a single data type (strings). As soon as
> >> > there's a second data type (numbers, dates, ...) you leave standard
> >> > territory and are into "private agreements".
>
> CSV is not good for strings, as there is no one specification of how to
> encode things like newlines and commas within the strings, so you may
> find that your CSV data transfer fails or even silently corrupts data.

Those two cases are actually pretty straightforward: Just enclose the
field in quotes.

Handling quotes is less standardized. I think doubling quotes is much more
common than an escape character, but I've certainly seen both.

But if you get down to it, the problems with CSV start at a much lower
level:

1) The encoding is not defined. These days UTF-8 (with our without BOM)
is pretty common, but I still regularly get files in Windows-1252
encoding and occasionally something else.

2) The record separator isn't defined. CRLF is most common, followed by
LF. But just recently I got a file with CR (Does Eurostat still use
some Macs with MacOS 9?)

3) The field separator isn't defined. Officially the format is known as
"comma separated values", but in my neck of the woods it's actually
semicolon-separated in the vast majority of cases.

So even for the most simple files there are three parameters the sender
and the receiver have to agree on.

> >> > JSON: Has a few primitive data types (bool, number, string) and a two
> >> > compound types (list, dict(string -> any)). Still missing many
> >> > frequently used data types (e.g. dates) and has no standard way to
> >> > denote composite types. But its simple and if it's sufficient for your
> >> > needs, use it.
>
> JSON Schema provides a way to denote composite types.

I probably wasn't clear what I meant. In XML, every element has a tag,
which is basically its type. So by looking at an XML file (without
reference to a schema) you can tell what each element is. And a
validator can say something like "expected a 'product' or 'service'
element here but found a 'person'".

In JSON everything is just an object or a list. You may guess that an
object with a field "product_id" is a product, but is one with "name":
"Billy" a person or a piece of furniture?

I'm not familiar with JSON schema (I know that it exists and I've read a
tutorial or two but I've never used it in a real project), but as far as
I know it doesn't change that. It describes the structure of a JSON
document but it doesn't add type information to that document. So a
validator can at best guess what the malformed thing it just found was
supposed to be.

signature.asc

Jon Ribbens

unread,

Sep 25, 2021, 7:50:09 AM9/25/21

to

JSON Schema absolutely does change that. You can create named types
and specify where they may appear in the document. With a well-defined
schema you do not need to make any guesses about what type something is.

Karsten Hilbert

unread,

Sep 25, 2021, 9:25:11 AM9/25/21

to

Am Fri, Sep 24, 2021 at 08:59:23PM +0200 schrieb Peter J. Holzer:

> JSON: Has a few primitive data types (bool, number, string) and a two
> compound types (list, dict(string -> any)). Still missing many
> frequently used data types (e.g. dates)

But that (dates) at least has a well-known mapping to string,
which makes it usable within JSON.

Karsten
--
GPG 40BE 5B0E C98E 1713 AFA6 5BC0 3BEA AC80 7D4F C89B

Michael F. Stemper

unread,

Sep 25, 2021, 4:20:37 PM9/25/21

to

On 21/09/2021 13.12, Michael F. Stemper wrote:

> If XML is not the way to package data, what is the recommended
> approach?

Well, there have been a lot of ideas put forth on this thread,
many more than I expected. I'd like to thank everyone who
took the time to contribute.

Most of the reasons given for avoiding XML appear to be along
the lines of "XML has all of these different options that it
supports."

However, it seems that I could ignore 99% of those things and
just use a teeny subset of its capabilities. For instance, if
I modeled a fuel like this:

<Fuel name="Montana Sub-Bituminous">
<uom>ton</uom>
<price>21.96</price>
<heat_content>18.2</heat_content>
</Fuel>

and a generating unit like this:

<Generator name="Skunk Creek 1">
<IHRcurve name="normal">
<point P="63" IHR="8.513"/>
<point P="105" IHR="8.907"/>
<point P="241" IHR="9.411"/>
<point P="455" IHR="10.202"/>
</IHRcurve>
<IHRcurve name="constrained">
<point P="63" IHR="8.514"/>
<point P="103" IHR="9.022"/>
<point P="223" IHR="9.511"/>
<point P="415" IHR="10.102"/>
</IHRcurve>
</Generator>

why would the fact that I could have chosen, instead, to model
the unit of measure as an attribute of the fuel, or its name
as a sub-element matter? Once the modeling decision has been
made, all of the decisions that might have been would seem to
be irrelevant.

Some years back, IEC's TC57 came up with CIM[1]. This nailed down
a lot of decisions. The fact that other decisions could have been
made doesn't seem to keep utilities from going forward with it as
an enterprise-wide data model.

My current interests are not anywhere so expansive, but it seems
that the situations are at least similar:
1. Look at an endless range of options for a data model.
2. Pick one.
3. Run with it.

To clearly state my (revised) question:

Why does the existence of XML's many options cause a problem
for my use case?

Other reactions:

Somebody pointed out that some approaches would require that I
climb a learning curve. That's appreciated, although learning
new things is always good.

NestedText looks cool, and a lot like YAML. Having not gotten
around to playing with YAML yet, I was surprised to learn that it
tries to guess data types. This sounds as if it could lead to the
same type of problems that led to the names of some genes being
turned into dates.

It was suggested that I use an RDBMS, such as sqlite3, for the
input data. I've used sqlite3 for real-time data exchange between
concurrently-running programs. However, I don't see syntax like:

sqlite> INSERT INTO Fuels
...> (name,uom,price,heat_content)
...> VALUES ("Montana Sub-Bituminous", "ton", 21.96, 13.65);

as being nearly as readable as the XML that I've sketched above.
Yeah, I could write a program to do this, but that doesn't really
change anything, since I'd still need to get the data into the
program.

(Changing a value would be even worse, requiring the dreaded
UPDATE INTO statement, instead of five seconds in vi.)

Many of the problems listed for CSV, which come from its lack of
standardization, seem similar to those given for XML. "Commas
or tabs?" "How are new-lines represented?" If I was to use CSV,
I'd be able to just pick answers. However, fitting hierarchical
data into rows/columns just seems wrong, so I doubt that I'll
end up going that way.

As far as disambiguating authors, I believe that most journals
are now expecting an ORCID[2] (which doesn't help with papers
published before that came around).

As far as use of XML to store program state, I wouldn't ever
consider that. As noted above, I've used an RDBMS to do so.
It handles all of the concurrency issues for me. The current use
case is specifically for raw, static input.

Fascinating to find out that XML was originally designed to
mark up text, especially legal text.

It was nice to be reminded of what Matt Parker looked like when
he had hair.

[1] <https://en.wikipedia.org/wiki/Common_Information_Model_(electricity)>
[2] <https://orcid.org/>
--
Michael F. Stemper
Psalm 82:3-4

Avi Gross

unread,

Sep 25, 2021, 5:39:46 PM9/25/21

to

Michael,

I don't care what you choose. Whatever works is fine for an internal use.

But is the data scheme you share representative of your actual application?

>From what I see below, unless the number of "point" variables is not always
exactly four, the application might be handled well by any format that
handles rectangular data, perhaps even CSV.

You show a I mean anything like a data.frame can contain data columns like
p1,p2,p3,p4 and a categorical one like IHRcurve_name.

Or do you have a need for more variability such as an undetermined number of
similar units in ways that might require more flexibility or be more
efficient done another way?

MOST of the discussion I am seeing here seems peripheral to getting you what
you need for your situation and may require a learning curve to learn to use
properly. Are you planning on worrying about how to ship your data
encrypted, for example? Any file format you use for storage can presumably
be encrypted and send and decrypted if that matters.

So, yes, from an abstract standpoint we can discuss the merits of various
approaches. If it matters that humans can deal with your data in a file or
that it be able to be imported into a program like EXCEL, those are
considerations. But if not, there are quite a few relatively binary formats
where your program can save a snapshot of the data into a file and read it
back in next time. I often do that in another language that lets me share
variable including nested components such as the complex structures that
come out of a statistical analysis or the components needed to make one or
more graphs later. If you write the program that creates the darn things as
well as the one that later reads them back in, you can do what you want.

Or, did I miss something and others have already produced the data using
other tools, in which case you have to read it in at least once/

-----Original Message-----
From: Python-list <python-list-bounces+avigross=veriz...@python.org> On
Behalf Of Michael F. Stemper
Sent: Saturday, September 25, 2021 4:20 PM
To: pytho...@python.org
Subject: Re: XML Considered Harmful

--
https://mail.python.org/mailman/listinfo/python-list

2QdxY4Rz...@potatochowder.com

unread,

Sep 25, 2021, 5:49:09 PM9/25/21

to

On 2021-09-25 at 15:20:19 -0500,

"Michael F. Stemper" <michael...@gmail.com> wrote:

> ... For instance, if

Disclaimer: I am not a big XML fan, for a number of reasons
already stated in this thread.

That said, please do include units in elements like heat_content,
whether or not it's Joules/kilogram/K, and price, even if is the
local currency in the only country to which your data applies.
If there's a standard for your industry, or your company, or on
some other level, then at least document what it is and that
you're using it, so that the next person (which may be you a
year from now) doesn't have to guess.

You also never know when someone else on the other side of the
planet will notice your work and try to duplicate it and/or
update it (again, even if it's you). The fewer assumptions
that person has to make, the better.

dn

unread,

Sep 25, 2021, 5:56:30 PM9/25/21

to

On 26/09/2021 10.07, Stefan Ram wrote:

> "Michael F. Stemper" <michael...@gmail.com> writes:
>> fitting hierarchical
>> data into rows/columns just seems wrong
>

> There were hierarchical database management systems like
> IMS by IBM based on that point of view. Today, almost all
> hierarchical data that is stored in databases is stored
> in relational databases. Maybe, the relational model has
> proven superior to the hierarchical data model after all.

Back in the days of mainframes (and when the Flintstones was 'filmed
before a live studio audience') hierarchical DBs were considerably
faster than RDBMS. Because of this, we used to take a daily 'snapshot'
of the transaction DBs (in IMS) and make a 'copy' as DB2 relational DBs,
which were (supposedly) used for MIS (Management Information Systems -
as distinct from TPS (Transaction Processing Systems)).

These days RDBMS are (a lot!) faster - much of which would be better
expressed as: the hardware these days is a lot faster. Therefore an
RDBMS is sufficiently responsive, and we no-longer need to maintain
separate, 'parallel' systems (and multiple mainframes)!

Cue: NoSQL justifications...

Today's best example of an hierarchical DB is probably LDAP. It is most
commonly used within the 'directory' of communications systems, eg
email. Such waters muddied considerably by MSFT's attempts to 'improve'
international 'standards' and integrate AD with Exchange (so don't go
there!).

There have been some well-engineered systems based on LDAP, eg
organisational/personnel and part/component break-downs.

That said, unless looking at something such as just-mentioned,
overlaying hierarchy onto 3NF and using an RDBMS would be my first
thought - but because of the recursive JOINs, I recommend something more
capable than SQLite.
--
Regards,
=dn

dn

unread,

Sep 25, 2021, 6:02:06 PM9/25/21

to

On 26/09/2021 10.48, 2QdxY4Rz...@potatochowder.com wrote:
> On 2021-09-25 at 15:20:19 -0500,
> "Michael F. Stemper" <michael...@gmail.com> wrote:
>

>> ... For instance, if

>> I modeled a fuel like this:
>>
>> <Fuel name="Montana Sub-Bituminous">
>> <uom>ton</uom>
>> <price>21.96</price>
>> <heat_content>18.2</heat_content>
>> </Fuel>

...

> Disclaimer: I am not a big XML fan, for a number of reasons
> already stated in this thread.
>
> That said, please do include units in elements like heat_content,
> whether or not it's Joules/kilogram/K, and price, even if is the
> local currency in the only country to which your data applies.
> If there's a standard for your industry, or your company, or on
> some other level, then at least document what it is and that
> you're using it, so that the next person (which may be you a
> year from now) doesn't have to guess.

+1
*always* add unit attributes
--
Regards,
=dn

Eli the Bearded

unread,

Sep 25, 2021, 6:47:10 PM9/25/21

to

In comp.lang.python, Chris Angelico <ros...@gmail.com> wrote:
> Eli the Bearded <*@eli.users.panix.com> wrote:
>> I'd use one of the netpbm formats instead of JPEG. PBM for one bit
>> bitmaps, PGM for one channel (typically grayscale), PPM for three
>> channel RGB, and PAM for anything else (two channel gray plus alpha,
>> CMYK, RGBA, HSV, YCbCr, and more exotic formats). JPEG is tricky to
>> map to CSV since it is a three channel format (YCbCr), where the
>> channels are typically not at the same resolution. Usually Y is full
>> size and the Cb and Cr channels are one quarter size ("4:2:0 chroma
>> subsampling"). The unequal size of the channels does not lend itself
>> to CSV, but I can't say it's impossible.
> Examine prior art, and I truly do mean art, from Matt Parker:
> https://www.youtube.com/watch?v=UBX2QQHlQ_I

His spreadsheet is a PPM file, not a JPEG. You can tell because all of
the cells are the same size.

He also ignores vector graphics when considering digital images. Often
they are rendered in what he calls "spreadsheets" but not always. I have
a Vectrex, for example.

Elijah
------
then there's typewriter art with non-square "pixels"

Chris Angelico

unread,

Sep 25, 2021, 7:14:38 PM9/25/21

to

Ah, I remember playing around with line printer art. We mostly had
Epsons and IBMs that did have some measure of graphical capabilities,
but it was WAY faster to print text, so we sometimes did things the
hacky and elegant way instead.

ChrisA

Paul Rubin

unread,

Sep 25, 2021, 7:29:44 PM9/25/21

to

r...@zedat.fu-berlin.de (Stefan Ram) writes:
> Today, almost all hierarchical data that is stored in databases is
> stored in relational databases. Maybe, the relational model has
> proven superior to the hierarchical data model after all.

Meh, all the major relational databases now have ways to handle JSON
data inside columns, plus there is GraphQL, plus the many NoSQL
databases, etc. There is a mix of all kinds of things.

SQL databases are widespread and robust, so they are also often used for
things to which they aren't all that well suited. Think of all the
awful ORMs out there.

Michael F. Stemper

unread,

Sep 27, 2021, 11:40:56 AM9/27/21

to

On 25/09/2021 16.39, Avi Gross wrote:
> Michael,
>
> I don't care what you choose. Whatever works is fine for an internal use.

Maybe I should have taken the provoking article with a few more grains
of salt. At this point, I'm not seeing any issues that are applicable to
my use case.

> But is the data scheme you share representative of your actual application?
>
>>From what I see below, unless the number of "point" variables is not always
> exactly four, the application might be handled well by any format that
> handles rectangular data, perhaps even CSV.
>
> You show a I mean anything like a data.frame can contain data columns like
> p1,p2,p3,p4 and a categorical one like IHRcurve_name.
>
> Or do you have a need for more variability such as an undetermined number of
> similar units in ways that might require more flexibility or be more
> efficient done another way?

As far as the number of points per IHR curve, the only requirement
is that there must be at least two. It's hard to define a line segment
with only one. The mock data that I have so far has curves ranging
from two to five points. I didn't notice that the snippet that I
posted had two curves with the same number of breakpoints, which was
misleading.

My former employer's systems had, IIRC, space for seven points per curve
in the database structures. Of all the sizing changes made over a long
career, I don't recall any customer ever requiring more than that. But,
it's cleanest to use python lists (with no inherent sizing limitations)
to represent the IHR (and incremental cost) curves.

> MOST of the discussion I am seeing here seems peripheral to getting you what
> you need for your situation and may require a learning curve to learn to use
> properly. Are you planning on worrying about how to ship your data
> encrypted, for example? Any file format you use for storage can presumably
> be encrypted and send and decrypted if that matters.

This work is intended to look at the feasability of relaxing some
constraints normally required for the solution of Economic Dispatch.
So all of my data are hypothetical. Once I have stuff up and running,
I'll be making up data for lots of different generators.

Being retired, I don't have access to any proprietary information
about any specific generators, so all of the data is made up out
of my head. I still need a way to get it into my programs, of course.

> So, yes, from an abstract standpoint we can discuss the merits of various
> approaches. If it matters that humans can deal with your data in a file or
> that it be able to be imported into a program like EXCEL, those are
> considerations. But if not, there are quite a few relatively binary formats
> where your program can save a snapshot of the data into a file and read it
> back in next time.

Not needed here. I'm strictly interested in getting the models of
(generic) generating fleets in. Output of significant results will
probably be in CSV, which nicely replicates tabular displays that
I used through most of my career.

> Or, did I miss something and others have already produced the data using
> other tools, in which case you have to read it in at least once/

Well, the "tool" is vi, but this is a good description of what I'm
doing.

--
Michael F. Stemper
The FAQ for rec.arts.sf.written is at
<http://leepers.us/evelyn/faqs/sf-written.htm>
Please read it before posting.

Michael F. Stemper

unread,

Sep 27, 2021, 11:47:26 AM9/27/21

to

Since the units (dimensions) don't matter as long as they're consistent
between heat_content and the IHR value (MBTU and MBTU/MWh or GJ and
GJ/MWh), I was initially going to ignore this suggestion. However, it
seems that if I added attributes for the unit of measure of heat, that
would allow checking that the data provided are indeed consistent.

Thanks for the suggestion.

With respect to currency, I've had customers (back when I had to work
for a living) use dollars, pesetas, Euros, and pounds. In of Wood and
Wollenberg[1], the authors use \cancel{R} to represent a generic
currency. But I might even add a currency attribute to the price
element.

> If there's a standard for your industry, or your company, or on
> some other level, then at least document what it is and that
> you're using it, so that the next person (which may be you a
> year from now) doesn't have to guess.

As far as power is concerned, this is utility-level generating fleets,
so it's always going to be MW -- even in the US, where we still use
BTUs for heat.

[1] _Power Generation, Operation, and Control; Allen J. Wood and Bruce
F. Wollenberg; (c) 1984, John Wiley & Sons.

Chris Angelico

unread,

Sep 27, 2021, 12:49:33 PM9/27/21

to

On Tue, Sep 28, 2021 at 2:30 AM Michael F. Stemper
<michael...@gmail.com> wrote:
> As far as power is concerned, this is utility-level generating fleets,
> so it's always going to be MW -- even in the US, where we still use
> BTUs for heat.
>

It's easy for *you* to know, and therefore assume, that it's always
MW. But someone else coming along will appreciate some sort of
indication that it's MW and not (say) KW or GW.

I've spent a long time decoding other people's file formats, trying to
figure out what unit something is in. "Huh. The date seems to be
stored in........ hours?!?"

ChrisA

Avi Gross

unread,

Sep 27, 2021, 9:01:24 PM9/27/21

to

Michael,

Given your further explanation, indeed reading varying numbers of points in
using a CSV is not valid, albeit someone might just make N columns (maybe a
few more than 7) to handle a hopefully worst case. Definitely it makes more
sense to read in a list or other data structure.

You keep talking about generators, though. If the generators are outside of
your program, then yes, you need to read in whatever they produce. But if
your data generator is within your own program, that opens up other
possibilities. I am not saying you necessarily would want to use the usual
numpy/pandas modules and have some kind of data.frame. I do know other
languages (like R) where I have used columns that are lists.

My impression is you may not be using your set of data points for any other
purposes except when ready to draw a spline. Again, in some languages this
opens up many possibilities. A fairly trivial one is if you store your
points as something like "1.2:3.86:12:83.2" meaning a character string with
some divider. When ready to use that, it is fairly straightforward to
convert it to a list to use for your purpose.

Can I just ask if by a generator, you do NOT mean the more typical use of
"generator" as used in python in which some code sort of runs as needed to
keep generating the next item to work on. Do you mean something that creates
realistic test cases to simulate a real-word scenario? These often can
create everything at once and often based on random numbers. Again, if you
have or build such code, it is not clear it needs to be written to disk and
then read back. You may of course want to save it, perhaps as a log, to show
what your program was working on.

-----Original Message-----
From: Python-list <python-list-bounces+avigross=veriz...@python.org> On
Behalf Of Michael F. Stemper
Sent: Monday, September 27, 2021 11:40 AM
To: pytho...@python.org
Subject: Re: XML Considered Harmful

--
https://mail.python.org/mailman/listinfo/python-list

Peter J. Holzer

unread,

Sep 28, 2021, 3:26:02 AM9/28/21

to

On 2021-09-27 21:01:04 -0400, Avi Gross via Python-list wrote:
> You keep talking about generators, though. If the generators are outside of
> your program, then yes, you need to read in whatever they produce.

As I understood it, the "generators" don't generate the data, they are
the subject of the data: Devices that generate electricity by burning
fuel and he's modelling some aspect of their operation. Maybe efficiency
or power output or something like that (I tried to search for "IHR
curve", but couldn't find anything).

signature.asc

dn

unread,

Sep 28, 2021, 3:28:25 AM9/28/21

to

On 25/09/2021 11.26, David L Neil via Python-list wrote:
> On 25/09/2021 11.00, Chris Angelico wrote:
>

>> Invented because there weren't enough markup languages, so we needed another?
>

> Anything You Can Do I Can Do Better
> https://www.youtube.com/watch?v=_UB1YAsPD6U

Article (rather brief) introducing YAML, of possible interest:
https://opensource.com/article/21/9/intro-yaml

--
Regards,
=dn

Michael F. Stemper

unread,

Sep 28, 2021, 11:38:24 AM9/28/21

to

On 27/09/2021 20.01, Avi Gross wrote:
> Michael,
>
> Given your further explanation, indeed reading varying numbers of points in
> using a CSV is not valid, albeit someone might just make N columns (maybe a
> few more than 7) to handle a hopefully worst case. Definitely it makes more
> sense to read in a list or other data structure.
>
> You keep talking about generators, though. If the generators are outside of
> your program, then yes, you need to read in whatever they produce.

My original post (which is as the snows of yesteryear) made explicit the
fact that when I refer to a generator, I'm talking about something made
from tons of iron and copper that is oil-filled and rotates at 1800 rpm.
(In most of the world other than North America, they rotate at 1500 rpm.)

Nothing to do with the similarly-named python construct. Sorry for the
ambiguity.

> But if
> your data generator is within your own program,

The data is created in my mind, and approximates typical physical
characteristics of real generators.

> My impression is you may not be using your set of data points for any other
> purposes except when ready to draw a spline.

Nope, the points give a piecewise-linear curve, and values between two
consecutive points are found by linear interpolation. It's industry
standard practice.

> Can I just ask if by a generator, you do NOT mean the more typical use of
> "generator" as used in python

Nope; I mean something that weighs 500 tons and rotates, producing
electrical energy.

> Do you mean something that creates
> realistic test cases to simulate a real-word scenario?

The thing that creates realistic test cases is my brain.

> These often can
> create everything at once and often based on random numbers.

I have written such, but not in the last thirty years. At that time, I
needed to make up data for fifty or one hundred generators, along with
tie lines and loads.

What I'm working on now only needs a handful of generators at a time;
just enough to test my hypothesis. (Theoretically, I could get by with
two, but that offends my engineering sensibilities.)

> create everything at once and often based on random numbers. Again, if you
> have or build such code, it is not clear it needs to be written to disk and
> then read back.

Well, I could continue to hard-code the data into one of the test
programs, but that would mean that every time that I wanted to look
at a different scenario, I'd need to modify a program. And when I
discover anomalous behavior, I'd need to copy the hard-coded data
into another program.

Having the data in a separate file means that I can provide a function
to read that file and return a list of generators (or fuels) to a
program. Multiple test cases are then just multiple files, all of which
are available to multiple programs.

> You may of course want to save it, perhaps as a log, to show
> what your program was working on.

That's another benefit of having the data in external files.

--
Michael F. Stemper
A preposition is something you should never end a sentence with.

Michael F. Stemper

unread,

Sep 28, 2021, 11:45:37 AM9/28/21

to

On 28/09/2021 02.25, Peter J. Holzer wrote:
> On 2021-09-27 21:01:04 -0400, Avi Gross via Python-list wrote:
>> You keep talking about generators, though. If the generators are outside of
>> your program, then yes, you need to read in whatever they produce.
>
> As I understood it, the "generators" don't generate the data, they are
> the subject of the data: Devices that generate electricity by burning
> fuel and he's modelling some aspect of their operation. Maybe efficiency
> or power output or something like that (I tried to search for "IHR
> curve", but couldn't find anything).

If you expand "IHR curve" to "incremental heat rate curve", you'll get
better results. When power engineers talk, we say the first, when we
publish papers, we write the second.

If you want to see the bigger picture, search on "Economic Dispatch".
In fact, doing so points me to something written by a guy I worked with
back in the 1980s:
<http://www2.econ.iastate.edu/classes/econ458/tesfatsion/EconomicDispatchIntroToOptimization.DKirschen2004.LTEdits.pdf>

Slide 3 even shows a piecewise-linear curve.

Michael F. Stemper

unread,

Sep 28, 2021, 1:54:15 PM9/28/21

to

On 28/09/2021 10.53, Stefan Ram wrote:
> "Michael F. Stemper" <michael...@gmail.com> writes:

>> Well, I could continue to hard-code the data into one of the test
>> programs
>

> One can employ a gradual path from a program with hardcoded
> data to an entity sharable by different programs.
>
> When I am hurried to rush to a working program, I often
> end up with code that contains configuration data spread
> (interspersed) all over the code. For example:

> 1st step: give a name to all the config data:

> 2nd: move all config data to the top of the source code,
> directly after all the import statements:

> 3rd: move all config data to a separate "config.py" module:
>
> import ...
> import config
> ...
>
> ...
> open( config.project_directory + "data.txt" )
> ...

>
>> but that would mean that every time that I wanted to look
>> at a different scenario, I'd need to modify a program.
>

> Now you just have to modify "config.py" - clearly separated
> from the (rest of the) "program".

Well, that doesn't really address what format to store the data
in. I was going to write a module that would read data from an
XML file:

import EDXML
gens = EDXML.GeneratorsFromXML( "gendata1.xml" )
fuels = EDXML.FuelsFromXML( "fueldata3.xml" )

(Of course, I'd really get the file names from command-line arguments.)

Then I read a web page that suggested use of XML was a poor idea,
so I posted here asking for a clarification and alternate suggestions.

One suggestion was that I use YAML, in which case, I'd write:

import EDfromYAML
gens = EDfromYAML( "gendata1.yaml" )
fuels = EDXML.FuelsFromYAML( "fueldata3.yaml" )

>> And when I discover anomalous behavior, I'd need to copy the
>> hard-coded data into another program.
>

> Now you just have to import "config.py" from the other program.

This sounds like a suggestion that I hard-code the data into a
module. I suppose that I could have half-a-dozen modules with
different data sets and ln them as required:

$ rm GenData.py* FuelData.py*
$ ln gendata1.py GenData.py
$ ln fueldata3.py FuelData.py

It seems to me that a more thorough separation of code and data
might be useful.

--
Michael F. Stemper
The name of the story is "A Sound of Thunder".
It was written by Ray Bradbury. You're welcome.

Avi Gross

unread,

Sep 28, 2021, 2:23:50 PM9/28/21

to

I replied to Michael privately but am intrigued by his words here:

"The thing that creates realistic test cases is my brain."

I consider extensions to my brain to include using a language like Python on
my computer and in particular, to take a model I think of and instantiate
it. Lots of people have shared modules that can be tweaked to do all kinds
of simulations using a skeleton you provide that guides random number usage.
Some will generate lots of those and stare at them and use their brain to
further narrow it down to realistic ones. For example, in designing say a
car with characteristics like miles per gallon should randomly range between
10 and 100 while engine size ranges from this to that and so on, it may turn
out that large engines don't go well with large number for miles per gallon.

I have worked on projects where a set of guides then created hundreds of
thousands of fairly realistic scenarios using every combination of an
assortment of categorical variables and the rest of the program sliced and
diced the results and did all kinds of statistical calculations and then
generated all kinds of graphs. There was no real data but there was a
generator that was based on the kinds of distributions previously published
in the field that helped guide parameters to be somewhat realistic.

In your case, I understand you will decide how to do it and just note you
used language with multiple meanings that misled a few of us into thinking
you either had a python function in mind using one of several ways Python
refers to as generators, such as one that efficiently yields the next prime
number when asked. Clearly your explanation now shows you plan on making a
handful of data sets by hand using an editor like vi. Fair enough. No need
to write complex software if your mind is easily able to just make half a
dozen variations in files. And, frankly, not sure why you need XML or much
of anything. It obviously depends on how much you are working with and how
variable. For simpler things, you can hard-code your data structure directly
into your program, run an analysis, change the variables to your second
simulation and repeat.

I am afraid that I, like a few others here, assumed a more abstract and much
more complex need to be addressed. Yours may be complex in other parts but
may need nothing much for the part we are talking about. It sounds like you
do want something easier to create while editing.

-----Original Message-----
From: Python-list <python-list-bounces+avigross=veriz...@python.org> On
Behalf Of Michael F. Stemper
Sent: Tuesday, September 28, 2021 11:38 AM
To: pytho...@python.org
Subject: Re: XML Considered Harmful

--
https://mail.python.org/mailman/listinfo/python-list

Avi Gross

unread,

Sep 28, 2021, 2:28:02 PM9/28/21

to

Well, Michael, if you want to go back to the eighties, and people you worked
with, I did my Thesis with a professor who later had an Erdős number of 1!
Too bad I never got around to publishing something with him or I could have
been a 2!

But that work, being so long ago, was not in Python but mainly in PASCAL.

Ah the good old days.

-----Original Message-----
From: Python-list <python-list-bounces+avigross=veriz...@python.org> On
Behalf Of Michael F. Stemper
Sent: Tuesday, September 28, 2021 11:45 AM
To: pytho...@python.org
Subject: Re: XML Considered Harmful

--
https://mail.python.org/mailman/listinfo/python-list

Karsten Hilbert

unread,

Sep 28, 2021, 2:31:14 PM9/28/21

to

Am Tue, Sep 28, 2021 at 12:53:49PM -0500 schrieb Michael F. Stemper:

> This sounds like a suggestion that I hard-code the data into a
> module. I suppose that I could have half-a-dozen modules with
> different data sets and ln them as required:
>
> $ rm GenData.py* FuelData.py*
> $ ln gendata1.py GenData.py
> $ ln fueldata3.py FuelData.py

vi data.py

generators = {}
generators['name1'] = {'fuel': ..., ...}
generators['name2'] = {...}
...

vi simulation.py

import sys
import data

generator = data.generators[sys.argv[1]]
run_simulation(generator)

or some such ?

Your data "format" is ... Python code.

Michael F. Stemper

unread,

Sep 28, 2021, 2:41:29 PM9/28/21

to

On 28/09/2021 13.27, Avi Gross wrote:
> Well, Michael, if you want to go back to the eighties, and people you worked
> with, I did my Thesis with a professor who later had an Erdős number of 1!
> Too bad I never got around to publishing something with him or I could have
> been a 2!

Lucky you. If a paper that a friend of mine is submitting to various
journals gets accepted by one of them, I'll end up with a 4 or 5 through
him. However, as the months pass, it's looking more like mine will end
up NaN.

--
Michael F. Stemper
Isaiah 58:6-7

Avi Gross

unread,

Sep 28, 2021, 3:22:01 PM9/28/21

to

Not lucky at all, Michael. The problem is he published a number of things
with Paul Erdős a few years after I got my degrees and went to Bell
laboratories. I never met Erdős but he was prolific and had 507 people
publish with him as co-authors. I would have loved to as I also speak
languages he spoke including Hungarian and Math.

Well, time to get back to something remotely about Python. Is there any
concept of a Rossum Number where anyone who worked directly with Guido Van
Rossum is a 1 (or True or truthy) and ...

Hey I just realized my Berners-Lee number might be 1 but it was so long ago
we worked on what Hypertext should look like, ...

-----Original Message-----
From: Python-list <python-list-bounces+avigross=veriz...@python.org> On
Behalf Of Michael F. Stemper
Sent: Tuesday, September 28, 2021 2:41 PM
To: pytho...@python.org
Subject: Re: XML Considered Harmful

--
https://mail.python.org/mailman/listinfo/python-list

Chris Angelico

unread,

Sep 28, 2021, 6:08:30 PM9/28/21

to

On Wed, Sep 29, 2021 at 8:00 AM Stefan Ram <r...@zedat.fu-berlin.de> wrote:
> JSON is a kind of a subset of JavaScript for JavaScript
> programmers. In Python, we can use JSON too, or we can
> use Python itself.
>
> When some external requirement to use a data exchange
> notation like JSON should appear, one can still "translate"
> such Python modules to JSON. This path is not blocked.

JSON exists as a transport mechanism because it is restricted and
can't contain malicious code. A Python equivalent would be
ast.literal_eval - a strict subset of the language but restricted for
safety. For trusted code, yes, straight code can be used.

(And ast.literal_eval, unlike JSON, can handle comments.)

ChrisA

Greg Ewing

unread,

Sep 28, 2021, 7:21:43 PM9/28/21

to

On 29/09/21 4:37 am, Michael F. Stemper wrote:
> I'm talking about something made
> from tons of iron and copper that is oil-filled and rotates at 1800 rpm.

To avoid confusion, we should rename them "electricity comprehensions".

--
Greg

dn

unread,

Sep 28, 2021, 7:49:36 PM9/28/21

to

Dear Michael,

May I suggest that you are right - and that he is right!
(which is a polite way of saying, also, that both are wrong. Oops!)
(with any and all due apologies)

There are likely cross-purposes here.

I am interpreting various clues, from throughout the thread (from when
the snowflakes were still falling!) that you and I were trained
way-back: to first consider the problem, state the requirements
("hypothesis" in Scientific Method), and work our way to a solution
on-paper. Only when we had a complete 'working solution', did we step up
to the machine (quite possibly a Card Punch, cf a 'computer') and
implement.

Also, that we thought in terms of a clear distinction between
"program[me]" and "data" - and the compiler and link[age]-editor
software technology of the time maintained such.

Whereas 'today', many follow the sequence of "Test-Driven Development"
(er, um, often omitting the initial test) of attempting some idea as
code, reviewing the result, and then "re-factoring" (improving), in a
circular progression - until it not only works, but works well.

This requires similar "stepwise decomposition" to what we learned, but
differs when it comes to code-composition. This approach is more likely
to accumulate a solution 'bottom-up' and component-wise, rather than
creating an entire (and close-to-perfect) solution first and as an whole.

Let's consider the Python REPL. Opening a terminal and starting the
Python interpreter, gives us the opportunity to write short "snippets"
of code and see the results immediately. This is VERY handy for ensuring
that an idea is correct, or to learn exactly how a particular construct
works. Thus, we can 'test' before we write any actual code (and can
copy-paste the successful 'prototype' into our IDE/editor!).

We didn't enjoy such luxury back in the good?bad old days. Young people
today - they just don't know how lucky they are!
(cue other 'grumpy old man' mutterings)

Other points to consider: 'terminals' (cf mainframes), interpreted
languages, and 'immediacy'. These have all brought "opportunities" and
thus "change" to the way developers (can) work and think! (which is why
I outlined what I think of as 'our training' and thus 'our thinking
process' when it comes to software design, above)

Another 'tectonic shift' is that in the old days 'computer time' was
hugely expensive and thus had to be optimised. Whereas these days (even
in retirement) programming-time has become the more expensive component
as computers (or compute-time in cloud-speak) have become cheaper - and
thus we reveal one of THE major attractive attributes of the Python
programming language!

Accordingly, (and now any apologies-due may be due to our colleague -
who was amplifying/making a similar comment to my earlier contribution):
if we decompose the wider-problem into (only) the aspects of collecting
the data, we can assume/estimate/document/refer to that, as a Python
function:

def fetch_operating_parameters():
"""Docstring!"""
pass

(yes, under TDD we would first write a test to call the function and
test its results, but for brevity (hah!) I'll ignore that and stick with
the dev.philosophy point)

Decomposing further, we decide there's a need to pull-in characteristics
of generators, fuel, etc. So, then we can similarly expect to need, and
thus declare, a bunch more functions - with the expectation that they
will probably be called from 'fetch_operating_parameters()'. (because
that was our decomposition hierarchy)

Now, let's return to the program[me] cf data contention. This can also
be slapped-together 'now', and refined/improved 'later'. So, our first
'sub' input function could be:

def fetch_generator_parameters() -> tuple[ dict ]:
"""Another docstring."""
skunk_creek_1 = {
"IHRcurve_name" : "normal",
"63" : "8.513",
"105" : "8.907",
etc
}
...
return skunk_creek_1, ...

Accordingly, if we 'rinse-and-repeat' for each type of input parameter
and flesh-out the coding of the overall input-construct
(fetch_operating_parameters() ) we will be able to at least start
meaningful work on the ensuing "process" and "output" decompositions of
the whole.

(indeed, reverting to the Input-Process-Output overview, if you prefer
to stick with the way we were taught, there's no issue with starting at
'the far end' by writing an output routine and feeding it 'expected
results' as arguments (which you have first calculated on-paper) to
ensure it works, and continuing to work 'backwards' through 'Process' to
'Input'. Whatever 'works' for you!)

Note that this is a Python-code solution to the original post about
'getting data in there'. It is undeniably 'quick-and-dirty', but it is
working, and working 'now'! Secondly, because the total-system only
'sees' a function, you may come back 'later' and improve the
code-within, eg by implementing a JSON-file interface, one for XML, one
for YAML, or whatever your heart-desires - and that you can have the
entire system up-and-running before you get to the stage of 'how can I
make this [quick-and-dirty code] better?'.

(with an alternate possible-conclusion)

Here's where "skill" starts to 'count'. If sufficient forethought went
into constructing the (sub-)function's "signature", changing the code
within the function will not result in any 'ripple' of
consequent-changes throughout the entire system! Thus, as long as
'whatever you decide to do' (initially, and during any
improvements/changes) returns a tuple of dict-s (my example only), you
can keep (learning, experimenting, and) improving the function without
other-cost!

(further reading: the Single Responsibility Principle)

So, compared with our mouldy-old (initial) training, today's approach
seems bumbling and to waste time on producing a first-attempt which
(must) then require time to be improved (and watching folk work, I
regularly have to 'bite my tongue' rather than say something that might
generate philosophical conflict). However, when combined with TDD,
whereby each sub-component is known to be working before it is
incorporated into any larger component of the (and eventually the whole)
solution, we actually find a practical and workable, alternate-approach
to the business of coding!

Yes, things are not as cut-and-dried as the attempted-description(s)
here. It certainly pays to sit down and think about the problem first -
but 'they' don't keep drilling down to 'our' level of detail, before
making (some, unit) implementation. Indeed, as this thread shows, as
long as we have an idea of the inputs required by Process, we don't need
to detail the processes, we can attack the sub-problem of Input quite
separately. Yes, it is a good idea to enact a 'design' step at each
level of decomposition (rushing past which is too frequently a problem
exhibited - at least by some of my colleagues).

Having (some) working-code also enables learning - and in this case (but
not at all), is a side-benefit. Once some 'learning' or implementation
has been achieved, you may well feel it appropriate to improve the code
- even to trial some other 'new' technique. At which point, another
relevance arises (or should!): do I do it now, or do I make a ToDo note
to come back to it later?

(see also "Technical Debt", but please consider that the fleshing-out
the rest of the solution (and 'learnings' from those steps) may
(eventually) realise just as many, or even more, of the benefits of 'our
approach' of producing a cohesive overall-design first! Possibly even
more than the benefits we intended in 'our' approach(?).

Unfortunately, it is a difficult adjustment to make (as related), and
there are undoubtedly stories of how the 'fools rush in where angels
fear to tread' approach is but a road to disaster and waste. The 'trick'
is to "cherry pick" from today's affordances and modify our
training/habits and experience to take the best advantages from both...

Hope this helps to explain why you may have misunderstood some
contributions 'here', or felt like arguing-back. Taking a step back or a
'wider' view, as has been attempted here, may show the implicit and
intended value of (many) contributions.

I'll leave you with a quote from Donald Knuth in The Art of Computer
Programming, Volume 1: Fundamental Algorithms (which IIRC was first
published in the late-60s/early-70s): “Premature optimization is the
root of all evil.” So, maybe early-coding/prototyping and later
"optimisation" isn't all bad!
--
Regards,
=dn

Michael F. Stemper

unread,

Sep 29, 2021, 9:04:45 AM9/29/21

to

Hah!

--
Michael F. Stemper
If you take cranberries and stew them like applesauce they taste much
more like prunes than rhubarb does.

Avi Gross

unread,

Sep 29, 2021, 8:29:54 PM9/29/21

to

I think that to make electricity comprehend, you need a room temperature
superconductor. The Cooper Pairs took a while to comprehend but now ...

I think, seriously, we have established the problems with guessing that
others are using the language in a way we assume.

So how many comprehensions does Python have?

[] - list comprehension
{} - dictionary OR set comprehension
() - generator expression

Tuples are incomprehensible and I wonder if any other comprehensions might
make sense to add, albeit we may need new symbols.

-----Original Message-----
From: Python-list <python-list-bounces+avigross=veriz...@python.org> On
Behalf Of Michael F. Stemper
Sent: Wednesday, September 29, 2021 9:04 AM
To: pytho...@python.org
Subject: Re: XML Considered Harmful

--
https://mail.python.org/mailman/listinfo/python-list