Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Is anyone happy with csv module?

10 views
Skip to first unread message

massimo s.

unread,
Dec 11, 2007, 2:14:29 PM12/11/07
to
Hi,

I'm struggling to use the python in-built csv module, and I must say
I'm less than satisfied. Apart from being rather poorly documented, I
find it especially cumbersome to use, and also rather limited. What I
dislike more is that it seems working by *rows* instead than by
*columns*.

So I have some questions:
1) Is there a good tutorial, example collection etc. on the csv module
that I'm missing?
2) Is there an alternative csv read/write module?
3) In case anyone else is as unhappy as me, and no tutorial etc.
enlighten us, and no alternative is present, anyone is interested in
an alternative csv module? I'd like to write one if it is the case.


Thanks,
Massimo

Guilherme Polo

unread,
Dec 11, 2007, 2:24:18 PM12/11/07
to massimo s., pytho...@python.org
2007/12/11, massimo s. <device...@gmail.com>:
> --
> http://mail.python.org/mailman/listinfo/python-list
>

Hello,

Post your actual problem so you can get more accurate help.
For the questions you placed: google for them, look at python pep 305


--
-- Guilherme H. Polo Goncalves

Bruno Desthuilliers

unread,
Dec 11, 2007, 4:04:01 PM12/11/07
to
massimo s. a écrit :

> Hi,
>
> I'm struggling to use the python in-built csv module, and I must say
> I'm less than satisfied. Apart from being rather poorly documented, I
> find it especially cumbersome to use, and also rather limited. What I
> dislike more is that it seems working by *rows* instead than by
> *columns*.

Indeed.

> So I have some questions:
> 1) Is there a good tutorial, example collection etc. on the csv module
> that I'm missing?

Not AFAIK

> 2) Is there an alternative csv read/write module?

Not AFAIK

> 3) In case anyone else is as unhappy as me, and no tutorial etc.
> enlighten us, and no alternative is present, anyone is interested in
> an alternative csv module? I'd like to write one if it is the case.

As far as I'm concerned, and having spent some time trying to write a
CSV module (before I noticed the existing one - duh...), I'm fully
satisfied with the existing one. But if you feel like doing better and
contributing it, by all means do it.

massimo s.

unread,
Dec 11, 2007, 4:31:11 PM12/11/07
to
On 11 Dic, 20:24, "Guilherme Polo" <ggp...@gmail.com> wrote:

>
> Post your actual problem so you can get more accurate help.

Hi Guilhermo,
I have not an actual problem. I'm just trying to use the CSV module
and I mostly can get it working. I just think its interface is much
less than perfect. I'd like something I can, say, give a whole
dictionary in input and obtain a CSV file in output, with each key of
the dictionary being a column in the CSV file. Or a row, if I prefer.
Something like:

dict={'First':[1,2,3,4],'Second':[10,20,30,40],'Third':
[100,200,300,400]}
f=open('test.csv','w')
try:
csv_write_dict(f,dict,keys='columns',delimiter=',')
finally:
f.close()

and obtaining:
First,Second,Third
1,10,100
2,20,200
3,30,300
4,40,400

Doing the same thing with the current csv module is much more
cumbersome: see this example from http://www.oreillynet.com/onlamp/blog/2007/08/pymotw_csv.html

f = open(sys.argv[1], 'wt')
try:
fieldnames = ('Title 1', 'Title 2', 'Title 3')
writer = csv.DictWriter(f, fieldnames=fieldnames)
headers = {}
for n in fieldnames:
headers[n] = n
writer.writerow(headers)
for i in range(10):
writer.writerow({ 'Title 1':i+1,
'Title 2':chr(ord('a') + i),
'Title 3':'08/%02d/07' % (i+1),
})
finally:
f.close()


Another unrelated quirk I've found is that iterating the rows read by
a csv reader object seems to erase the rows themselves; I have to copy
them in another list to use them.

Probably it's me not being a professional programmer, so I don't
understand that somehow the csv module *has* to be done this way. If
it's so, I'd like to know about it so I can learn something.

> For the questions you placed: google for them, look at python pep 305

I googled (before and after sending this post). I found mentions of
people writing a purely Python csv module but I didn't find their
code. As for pep 305, thanks, but it seems just to be a description of
the actual csv module (useful,anyway).

m.

Duncan Booth

unread,
Dec 11, 2007, 4:33:45 PM12/11/07
to
"massimo s." <device...@gmail.com> wrote:

> 1) Is there a good tutorial, example collection etc. on the csv module
> that I'm missing?

Yes, see http://docs.python.org/lib/csv-examples.html

> 2) Is there an alternative csv read/write module?

No but feel free to write your own

> 3) In case anyone else is as unhappy as me, and no tutorial etc.
> enlighten us, and no alternative is present, anyone is interested in
> an alternative csv module? I'd like to write one if it is the case.

I'm happy with the one in the standard library.

John Machin

unread,
Dec 11, 2007, 4:37:10 PM12/11/07
to
On Dec 12, 6:14 am, "massimo s." <deviceran...@gmail.com> wrote:
> Hi,
>
> I'm struggling to use the python in-built csv module, and I must say
> I'm less than satisfied. Apart from being rather poorly documented,

Patches are welcome :-)

> I
> find it especially cumbersome to use,

Can you be more specific? What are you trying to do with it?

> and also rather limited.

What extra facilities do you think there should be?

> What I
> dislike more is that it seems working by *rows* instead than by
> *columns*.

Perhaps you'd better explain what you mean by "working by". Here's my
take on it:

A CSV file is organised such that each line of the file represents a
row, and the nth field in the line relates to the nth column, so it's
natural for any CSV reader/writer to work by rows. *Additional*
functionality e.g. to suck the file into a list of lists that could be
accessed easily by column index is possible, but it is well within the
competence of average Python programmers to write that themselves:

data = [row for row in csv.reader(......)]
...
whatever = data[row_index][col_index]

Accessing the data by columns *instead* of by rows would definitely
not be appreciated by people who are using the csv module to read
millions of lines of data.

>
> So I have some questions:
> 1) Is there a good tutorial, example collection etc. on the csv module
> that I'm missing?

AFAIK, no.

> 2) Is there an alternative csv read/write module?

Is your googler broken?

> 3) In case anyone else is as unhappy as me, and no tutorial etc.
> enlighten us, and no alternative is present, anyone is interested in
> an alternative csv module?

-1

> I'd like to write one if it is the case.

In what language?

Cheers,
John

massimo s.

unread,
Dec 11, 2007, 4:49:27 PM12/11/07
to
On 11 Dic, 22:37, John Machin <sjmac...@lexicon.net> wrote:
> On Dec 12, 6:14 am, "massimo s." <deviceran...@gmail.com> wrote:
>
> > Hi,
>
> > I'm struggling to use the python in-built csv module, and I must say
> > I'm less than satisfied. Apart from being rather poorly documented,
>
> Patches are welcome :-)

Yes, but maybe I was in the wrong. I'm not so bold to submit patches
to an official Python module without asking.
*I* feel troubles, but maybe it's just me being dense.

>
> > I
> > find it especially cumbersome to use,
>
> Can you be more specific? What are you trying to do with it?

See examples in previous post.

> > and also rather limited.
>
> What extra facilities do you think there should be?

Ability to work by columns together with rows and maybe some random
access facilities would be nice. A more user-friendly interface too.

> A CSV file is organised such that each line of the file represents a
> row, and the nth field in the line relates to the nth column, so it's
> natural for any CSV reader/writer to work by rows.

Yes, but it's natural for a spreadsheet-like thing to have organized
columns of data, often.
Often I want those columns to be read into lists, or to write lists
into columns. The actual csv doesn't allow this naturally. Especially
writing is a bit painful.

I just wanted to know if there was something allowing this with a
simple command, that I missed, or if just there wasn't.

> Accessing the data by columns *instead* of by rows would definitely
> not be appreciated by people who are using the csv module to read
> millions of lines of data.

I don't want anything *instead*, I would like *additional*. :)
(Btw: who is using csv to read >10**6 lines of data?)

> > So I have some questions:
> > 1) Is there a good tutorial, example collection etc. on the csv module
> > that I'm missing?
>
> AFAIK, no.

Ok. I found something on google but nothing answering to my questions.

> > 2) Is there an alternative csv read/write module?
>
> Is your googler broken?

Apparently, yes. I googled but apart from some hint here and there of
someone thinking about writing a pure Python csv module, I found
nothing. I'm usually decent at googling, but maybe my skills are
wearing out.

> > 3) In case anyone else is as unhappy as me, and no tutorial etc.
> > enlighten us, and no alternative is present, anyone is interested in
> > an alternative csv module?
>
> -1

Ok. :)

> > I'd like to write one if it is the case.
>
> In what language?

Python.

m.

John Machin

unread,
Dec 11, 2007, 5:27:21 PM12/11/07
to
On Dec 12, 8:49 am, "massimo s." <deviceran...@gmail.com> wrote:
> On 11 Dic, 22:37, John Machin <sjmac...@lexicon.net> wrote:
>
> > On Dec 12, 6:14 am, "massimo s." <deviceran...@gmail.com> wrote:
>
> > > Hi,
>
> > > I'm struggling to use the python in-built csv module, and I must say
> > > I'm less than satisfied. Apart from being rather poorly documented,
>
> > Patches are welcome :-)
>
> Yes, but maybe I was in the wrong. I'm not so bold to submit patches
> to an official Python module without asking.
> *I* feel troubles, but maybe it's just me being dense.

Quite probably. You did however assert that it was poorly documented.

> > > I
> > > find it especially cumbersome to use,
>
> > Can you be more specific? What are you trying to do with it?
>
> See examples in previous post.

Definitions of cumbersome vary with the definer. IMHO the dictionary
interface in the csv module is an unnecessary gimmick. What you want
to do can be done easily using the bare-bones interface. If you are
doing that sort of thing often, it can be packaged up into a function.

# untested
# Allows for non-uniform column lengths.
maxlen = max(len(value) for value in adict.itervalues())
# The order in which columns appear is
# not easily predictable.
keys = adict.keys()
# ... or supply keys as an argument to the function.
writer.writerow(keys)
for rowx in xrange(maxlen):
row = []
for key in keys:
try:
value = adict[key][rowx]
except IndexError:
value = ''
row.append(value)
writer.writerow(row)


>
> > > and also rather limited.
>
> > What extra facilities do you think there should be?
>
> Ability to work by columns together with rows and maybe some random
> access facilities would be nice. A more user-friendly interface too.

Your prospectus would have to be much less vague and woolly for anyone
to pay you much attention.

>
> > A CSV file is organised such that each line of the file represents a
> > row, and the nth field in the line relates to the nth column, so it's
> > natural for any CSV reader/writer to work by rows.
>
> Yes, but it's natural for a spreadsheet-like thing to have organized
> columns of data, often.
> Often I want those columns to be read into lists, or to write lists
> into columns. The actual csv doesn't allow this naturally. Especially
> writing is a bit painful.
>
> I just wanted to know if there was something allowing this with a
> simple command, that I missed, or if just there wasn't.

If you can't find in the documentation, treat it as not existing.

>
> > Accessing the data by columns *instead* of by rows would definitely
> > not be appreciated by people who are using the csv module to read
> > millions of lines of data.
>
> I don't want anything *instead*, I would like *additional*. :)

You did say "working by *rows* instead than by *columns*"

> (Btw: who is using csv to read >10**6 lines of data?)

The folk who wrote what became the csv module. Me. Anybody who gets
query results from a big database and doesn't want the overhead of
(say) XML.

>
> > > So I have some questions:
> > > 1) Is there a good tutorial, example collection etc. on the csv module
> > > that I'm missing?
>
> > AFAIK, no.
>
> Ok. I found something on google but nothing answering to my questions.
>
> > > 2) Is there an alternative csv read/write module?
>
> > Is your googler broken?
>
> Apparently, yes. I googled but apart from some hint here and there of
> someone thinking about writing a pure Python csv module, I found
> nothing. I'm usually decent at googling, but maybe my skills are
> wearing out.
>
> > > 3) In case anyone else is as unhappy as me, and no tutorial etc.
> > > enlighten us, and no alternative is present, anyone is interested in
> > > an alternative csv module?
>
> > -1
>
> Ok. :)
>
> > > I'd like to write one if it is the case.
>
> > In what language?
>
> Python.

It wouldn't run fast enough (without psyco).

zug...@gmail.com

unread,
Dec 11, 2007, 5:31:32 PM12/11/07
to
On Dec 12, 10:49 am, "massimo s." <deviceran...@gmail.com> wrote:

> > Accessing the data by columns *instead* of by rows would definitely
> > not be appreciated by people who are using the csv module to read
> > millions of lines of data.
>
> I don't want anything *instead*, I would like *additional*. :)
> (Btw: who is using csv to read >10**6 lines of data?)
>

Well I am for one. Often around 3 million wide rows.
If you can slurp the whole thing into memory then you should be able
to pull out columns pretty easily?

Ben Finney

unread,
Dec 11, 2007, 5:50:28 PM12/11/07
to
"massimo s." <device...@gmail.com> writes:

> Yes, but maybe I was in the wrong. I'm not so bold to submit patches
> to an official Python module without asking.

Be bold. The worst that can happen is that your patch will be
rejected. Any discussion that happens can only improve your
understanding (and that of others).

> Ability to work by columns together with rows and maybe some random
> access facilities would be nice.

The CSV format doesn't organise naturally into columns, only rows.
However, Python's native constructs make it easy to get the column you
want:

csv_rows = [
["foo", 1, "spam"],
["bar", 2, "eggs"],
["baz", 3, "beans"],
]
column_1 = [row[1] for row in csv_rows]

> A more user-friendly interface too.

This isn't saying anything about what you want the interface to be
like. What specifically would you change, and what would the result
be?

> Yes, but it's natural for a spreadsheet-like thing to have organized
> columns of data, often.

Perhaps, but that's not relevant. CSV is a serialisation format for
tabular data, and is only "a spreadsheet-like thing" in its heritage.
The CSV data stream is not "spreadsheet-like" at all.

> Often I want those columns to be read into lists, or to write lists
> into columns. The actual csv doesn't allow this naturally.

Python provides those facilities, with easy-to-use syntax for
manipulating them. What more, specifically, do you want the csv module
to do?

> Especially writing is a bit painful.

Again, what specifically would you change, and what would the result
look like?

> (Btw: who is using csv to read >10**6 lines of data?)

Yes, quite often. The csv module provides exactly what I need to turn
a CSV data stream into Python-native data structures. From that point,
Python's own data structures do everything necessary, without needing
specific support in the csv module.

> I googled but apart from some hint here and there of someone
> thinking about writing a pure Python csv module, I found nothing.

Some of your questions need to be elaborated, as above.

--
\ "Reality must take precedence over public relations, for nature |
`\ cannot be fooled." —Richard P. Feynman |
_o__) |
Ben Finney

Gabriel Genellina

unread,
Dec 11, 2007, 6:08:21 PM12/11/07
to pytho...@python.org
En Tue, 11 Dec 2007 18:49:27 -0300, massimo s. <device...@gmail.com>
escribiďż˝:

Expanding on a previous example:

data = [row for row in csv.reader(......)]

col3 = [row[3] for row in data]

Pretty simple, isn't it? If you prefer to use field names instead of
indexes, try with a DictReader instead:

data = [row for row in csv.DictReader(......)]
price = [float(row['PRICE']) for row in data]

Note that all the above (as any operation involving a whole *column*)
requires reading the whole file in memory. Working by rows, on the other
hand, only requires holding ONE row at a time. For big files this is
significant.

An example of writing data given in columns:

id = [1,2,3,4]
name = ['Moe','Larry','Curly','Shemp']
hair = ['black','red',None,'black']
writer = csv.writer(...)
writer.writerows(itertools.izip(id, name, hair))

I think your problem is not with the csv module, but lack of familiarity
with the Python language itself and how to use it efficiently.

> (Btw: who is using csv to read >10**6 lines of data?)

Me, and many others AFAIK. 1M lines is not so big, btw.

--
Gabriel Genellina

massimo s.

unread,
Dec 11, 2007, 6:35:11 PM12/11/07
to
On 12 Dic, 00:08, "Gabriel Genellina" <gagsl-...@yahoo.com.ar> wrote:
> Note that all the above (as any operation involving a whole *column*)
> requires reading the whole file in memory. Working by rows, on the other
> hand, only requires holding ONE row at a time. For big files this is
> significant.
>
> An example of writing data given in columns:
>
> id = [1,2,3,4]
> name = ['Moe','Larry','Curly','Shemp']
> hair = ['black','red',None,'black']
> writer = csv.writer(...)
> writer.writerows(itertools.izip(id, name, hair))
>
> I think your problem is not with the csv module, but lack of familiarity
> with the Python language itself and how to use it efficiently.

Maybe. As stated at the beginning, I am not a professional programmer.
I am a scientist using Python at work. It's years I use it and I love
it, but I surely miss many nuances.

For example, I never ever looked into itertools. I am also not so
familiar with iterators. Itertools seem fantastic, and I'll definitely
look into them, however I can't but feel it's a bit strange that
someone wanting a quick csv parsing/writing has to dig into those
apparently unrelated stuff.

> > (Btw: who is using csv to read >10**6 lines of data?)
>
> Me, and many others AFAIK. 1M lines is not so big, btw.

It's clear that I am thinking to completely different usages for CSV
than what most people in this thread. I use csv to export and import
numerical data columns to and from spreadsheets. That's why I found 1M
lines a lot. Didn't know csv had other uses, now I see more clearly
why the module is as it is.

Thanks for your tips, I've learned quite a lot.

m.

John Machin

unread,
Dec 11, 2007, 7:19:41 PM12/11/07
to
On Dec 12, 10:35 am, "massimo s." <deviceran...@gmail.com> wrote:
>
> For example, I never ever looked into itertools. I am also not so
> familiar with iterators. Itertools seem fantastic, and I'll definitely
> look into them, however I can't but feel it's a bit strange that
> someone wanting a quick csv parsing/writing has to dig into those
> apparently unrelated stuff.

The idea is (should be) that each module should be a building block
which can be coupled with other modules as necessary to give the
desired result. A similar idea is found in *x command-line tools (e.g.
tr ... | sort ... | uniq ...). Otherwise each module would become a
"Swiss army knife" with its API littered with tiny functions calling
stuff in other modules.

> > > (Btw: who is using csv to read >10**6 lines of data?)
>
> > Me, and many others AFAIK. 1M lines is not so big, btw.
>
> It's clear that I am thinking to completely different usages for CSV
> than what most people in this thread. I use csv to export and import
> numerical data columns to and from spreadsheets.

For that purpose, CSV files are the utter pox and then some. Consider
using xlrd and xlwt (nee pyexcelerator) to read (resp. write) XLS
files directly.

Cheers,
John

Bruno Desthuilliers

unread,
Dec 11, 2007, 7:29:15 PM12/11/07
to
massimo s. a écrit :

> On 11 Dic, 20:24, "Guilherme Polo" <ggp...@gmail.com> wrote:
>
>
>>Post your actual problem so you can get more accurate help.
>
>
> Hi Guilhermo,
> I have not an actual problem.

Yes you do - even if you don't realize it yet !-)

> I'm just trying to use the CSV module
> and I mostly can get it working. I just think its interface is much
> less than perfect. I'd like something I can, say, give a whole
> dictionary in input and obtain a CSV file in output, with each key of
> the dictionary being a column in the CSV file. Or a row, if I prefer.
> Something like:
>
> dict={'First':[1,2,3,4],'Second':[10,20,30,40],'Third':
> [100,200,300,400]}

<ot>
you're shadowing the builtin 'dict' type here, which is usalluy a bad idea
</ot>

> f=open('test.csv','w')
> try:
> csv_write_dict(f,dict,keys='columns',delimiter=',')
> finally:
> f.close()
>
> and obtaining:
> First,Second,Third
> 1,10,100
> 2,20,200
> 3,30,300
> 4,40,400

Doing the needed transformation (from a column:rows dict to the required
format) is close to trivial. So you could actually implement it
yourself, monkeypatch the relevant csv class, and submit a patch to the
maintainer of the module.

FWIW, I never had data structured that way to pass to the csv module -
to be true, I think I never had a case where tabular data were
structured by columns.

> Doing the same thing with the current csv module is much more
> cumbersome: see this example from http://www.oreillynet.com/onlamp/blog/2007/08/pymotw_csv.html
>
> f = open(sys.argv[1], 'wt')
> try:
> fieldnames = ('Title 1', 'Title 2', 'Title 3')
> writer = csv.DictWriter(f, fieldnames=fieldnames)
> headers = {}
> for n in fieldnames:
> headers[n] = n
> writer.writerow(headers)

# same as the 4 lines above
writer.writerow(dict((item, item) for item in fieldnames))

> for i in range(10):
> writer.writerow({ 'Title 1':i+1,
> 'Title 2':chr(ord('a') + i),
> 'Title 3':'08/%02d/07' % (i+1),
> })

This one looks so totally unrealistic to me - I mean, wrt/ to real-life
use cases - that I won't even propose a rewrite.

> finally:
> f.close()

A bit of a WTF, indeed. But most of the problem is with this example
code, not with the csv module (apologies to whoever wrote this snippet).

FWIW, here's a function what you want, at least for your first use case:

def csv_write_cols(writer, data):
keys = data.keys()
writer.writerow(dict(zip(keys,keys)))
for row in zip(*data.values()):
writer.writerow(dict(zip(keys, row)))

Now you do what you want, but as far as I'm concerned, I wouldn't start
a total rewrite of an otherwise working (and non-trivial) module just
for a trivial four (4) lines function.

Also, have you considered that your columns may as well be rows, ie:

First, 1, 2, 3, 4
Second, 10, 20, 30, 40
Third, 100, 200, 300, 400

>
> Another unrelated quirk I've found is that iterating the rows read by
> a csv reader object seems to erase the rows themselves; I have to copy
> them in another list to use them.

It's not a "quirk", Sir, it's a feature !-)

The csv reader object - like file objects and a couple others - are
iterators. In this case, it means the csv reader is smart enough to not
read the whole file into memory - which is not necessarily what you
want, specially for huge files - but iterating over lines as long as you
ask for them.

Note that if you need the whole thing in memory, "copying" the rows in a
list is a no-brainer:
rows = list(reader)


> Probably it's me not being a professional programmer,

<ot>
Not sure the professional status is key here - I mean, it just mean
you're getting paid for it, but says nothing about your competences.
</ot>

> so I don't
> understand that somehow the csv module *has* to be done this way. If
> it's so, I'd like to know about it so I can learn something.

As about why it's sometimes better to not read a whole file into memory
at once, try with multi-gigabytes and watch your system crawl to a halt.
wrt/ csv being 'row-oriented', fact is that 1/ it's by far the most
common use case for tabular data and 2/ it's a simple mapping from lines
to rows (and back) - which is important wrt/ perfs and maintainability.
Try to read a csv file "by columns", and you'll find out that you'll
either need to read it all in memory, parse it line by line, then turn
lines into columns (the inverse operation of my small function above),
or to rearrange your data the way I suggested above. And let's not talk
about writing...

Now I don't mean there's no room for improvement in the csv module -
there almost always is - but given the usefulness of this module in a
programmer's daily life, it would probably have been superseded by
something better if it wasn't at least perceived as good enough by it's
users.

HTH

Bruno Desthuilliers

unread,
Dec 11, 2007, 7:34:03 PM12/11/07
to
massimo s. a écrit :
(snip)

> (Btw: who is using csv to read >10**6 lines of data?)
>
Count me in. Never had to work on an RDBM dump ?

Istvan Albert

unread,
Dec 11, 2007, 9:22:33 PM12/11/07
to
On Dec 11, 2:14 pm, "massimo s." <deviceran...@gmail.com> wrote:

> dislike more is that it seems working by *rows* instead than by
> *columns*.

you can easily transpose the data to get your columns, for a data file
that looks like this:

---- data.txt ----
A,B,C
1,2,3
10,20,30
100,200,300

do the following:

--------------------
import csv
reader = csv.reader( file('data.txt', 'U') )
rows = list(reader)

print rows

cols = zip(*rows)

print cols[0]
print cols[1]
print cols[2]

this will print:

---------- Python ----------

[['A', 'B', 'C'], ['1', '2', '3'], ['10', '20', '30'], ['100', '200',
'300']]
('A', '1', '10', '100')
('B', '2', '20', '200')
('C', '3', '30', '300')

John Machin

unread,
Dec 11, 2007, 9:22:45 PM12/11/07
to massimo s., pytho...@python.org
massimo s. wrote:
> If your line of reasoning is "well, but you can write a function here
> and there", well, why bothering writing a csv parser at all? You can
> parse it yourself with a couple of Python lines! :)
>
>
I would be *very* interested to see those couple of lines.
Here is some test data for you:
file_contents = '"Tom, Dick, and Harry","He said: ""Hello!"""\r\n"'
expected output: 1 row, with two fields:
(1) 'Tom, Dick, and Harry'
(2) 'He said: "Hello"'

Cliff Wells

unread,
Dec 11, 2007, 9:27:55 PM12/11/07
to Python List
On Wed, 2007-12-12 at 09:50 +1100, Ben Finney wrote:
> "massimo s." <device...@gmail.com> writes:

> > Yes, but it's natural for a spreadsheet-like thing to have organized
> > columns of data, often.
>
> Perhaps, but that's not relevant. CSV is a serialisation format for
> tabular data, and is only "a spreadsheet-like thing" in its heritage.
> The CSV data stream is not "spreadsheet-like" at all.

To add some weight to this point, if CSV *weren't* considered a
serialization format (or protocol) this module most likely would have
never been accepted into the standard library in the first place. There
was some debate when the PEP was submitted over whether CSV was to be
considered a file format (like an .XLS file) or a serialization protocol
(like XML or HTTP).
Fortunately the latter was agreed up and so the module was deemed
appropriate for inclusion in the standard libraries.

Whether or not anyone agrees with this point of view is now mostly
irrelevant, since *by definition* the Python csv module intends to
implement a protocol. Other implementations remain free to vary in
their definition of CSV.

Regards,
Cliff

Marc 'BlackJack' Rintsch

unread,
Dec 12, 2007, 3:23:45 AM12/12/07
to
On Tue, 11 Dec 2007 20:08:21 -0300, Gabriel Genellina wrote:

> data = [row for row in csv.reader(......)]

A bit shorter::

data = list(csv.reader(......))

Ciao,
Marc 'BlackJack' Rintsch

Message has been deleted

massimo s.

unread,
Dec 12, 2007, 5:22:25 AM12/12/07
to
Thanks to everyone in this thread. As always on this newsgroup, I
learned very much.

I'm also quite embarrassed of my ignorance. Only excuse I have is that
I learned programming and Python by myself, with no formal (or
informal) education in programming. So, I am often clumsy.

On Dec 12, 1:29 am, Bruno Desthuilliers


<bdesth.quelquech...@free.quelquepart.fr> wrote:
> > I'm just trying to use the CSV module
> > and I mostly can get it working. I just think its interface is much
> > less than perfect. I'd like something I can, say, give a whole
> > dictionary in input and obtain a CSV file in output, with each key of
> > the dictionary being a column in the CSV file. Or a row, if I prefer.
> > Something like:
>
> > dict={'First':[1,2,3,4],'Second':[10,20,30,40],'Third':
> > [100,200,300,400]}
>
> <ot>
> you're shadowing the builtin 'dict' type here, which is usalluy a bad idea
> </ot>

Yes, this I know, I just overlooked it when improvising the example.

> > f=open('test.csv','w')
> > try:
> > csv_write_dict(f,dict,keys='columns',delimiter=',')
> > finally:
> > f.close()
>
> > and obtaining:
> > First,Second,Third
> > 1,10,100
> > 2,20,200
> > 3,30,300
> > 4,40,400
>
> Doing the needed transformation (from a column:rows dict to the required
> format) is close to trivial. So you could actually implement it
> yourself, monkeypatch the relevant csv class, and submit a patch to the
> maintainer of the module.
>
> FWIW, I never had data structured that way to pass to the csv module -
> to be true, I think I never had a case where tabular data were
> structured by columns.

FWIW, I never had data structured by row. At most, I had data
structured by *both* row and column.
Vive la différence. :)

> > Doing the same thing with the current csv module is much more

> > cumbersome: see this example fromhttp://www.oreillynet.com/onlamp/blog/2007/08/pymotw_csv.html


>
> > f = open(sys.argv[1], 'wt')
> > try:
> > fieldnames = ('Title 1', 'Title 2', 'Title 3')
> > writer = csv.DictWriter(f, fieldnames=fieldnames)
> > headers = {}
> > for n in fieldnames:
> > headers[n] = n
> > writer.writerow(headers)
>
> # same as the 4 lines above
> writer.writerow(dict((item, item) for item in fieldnames))
>
> > for i in range(10):
> > writer.writerow({ 'Title 1':i+1,
> > 'Title 2':chr(ord('a') + i),
> > 'Title 3':'08/%02d/07' % (i+1),
> > })
>
> This one looks so totally unrealistic to me - I mean, wrt/ to real-life
> use cases - that I won't even propose a rewrite.

I can frankly think of a lot of cases where this kind of pattern makes
a lot of sense, but in that case it was just for the example purpose.

> > finally:
> > f.close()
>
> A bit of a WTF, indeed. But most of the problem is with this example
> code, not with the csv module (apologies to whoever wrote this snippet).

Thank you. Let me say it was the *best* tutorial I found online -much
better than official docs, IMHO. Maybe it is the reason I felt dizzy
when trying to use csv.

> FWIW, here's a function what you want, at least for your first use case:
>
> def csv_write_cols(writer, data):
> keys = data.keys()
> writer.writerow(dict(zip(keys,keys)))
> for row in zip(*data.values()):
> writer.writerow(dict(zip(keys, row)))

Thanks!

> Now you do what you want, but as far as I'm concerned, I wouldn't start
> a total rewrite of an otherwise working (and non-trivial) module just
> for a trivial four (4) lines function.

I fully agree. I would like to add a bit of other trivial functions,
but this is a *clear* example of csv writer usage, which I did not
find.

> Also, have you considered that your columns may as well be rows, ie:
>
> First, 1, 2, 3, 4
> Second, 10, 20, 30, 40
> Third, 100, 200, 300, 400

Doesn't play well with my data for a number of reasons. For example,
columns VS rows limits on spreadsheets.

> > Another unrelated quirk I've found is that iterating the rows read by
> > a csv reader object seems to erase the rows themselves; I have to copy
> > them in another list to use them.
>
> It's not a "quirk", Sir, it's a feature !-)
>
> The csv reader object - like file objects and a couple others - are
> iterators. In this case, it means the csv reader is smart enough to not
> read the whole file into memory - which is not necessarily what you
> want, specially for huge files - but iterating over lines as long as you
> ask for them.
>
> Note that if you need the whole thing in memory, "copying" the rows in a
> list is a no-brainer:
> rows = list(reader)

I know. I just thought odd it was undocumented. But it's self-evident
now that I missed how iterators work.
I'll look into the issue.

> > Probably it's me not being a professional programmer,
>
> <ot>
> Not sure the professional status is key here - I mean, it just mean
> you're getting paid for it, but says nothing about your competences.
> </ot>

In the meaning that I have no formal training in it and the like.

> > so I don't
> > understand that somehow the csv module *has* to be done this way. If
> > it's so, I'd like to know about it so I can learn something.
>
> As about why it's sometimes better to not read a whole file into memory
> at once, try with multi-gigabytes and watch your system crawl to a halt.
> wrt/ csv being 'row-oriented', fact is that 1/ it's by far the most
> common use case for tabular data and 2/ it's a simple mapping from lines
> to rows (and back) - which is important wrt/ perfs and maintainability.
> Try to read a csv file "by columns", and you'll find out that you'll
> either need to read it all in memory, parse it line by line, then turn
> lines into columns (the inverse operation of my small function above),
> or to rearrange your data the way I suggested above. And let's not talk
> about writing...

Yes, I understand. I just didn't think about such usage cases. :)

Thanks to everyone,
Massimo

Neil Cerutti

unread,
Dec 12, 2007, 8:58:50 AM12/12/07
to
On 2007-12-11, massimo s. <device...@gmail.com> wrote:
> Hi,
>
> I'm struggling to use the python in-built csv module, and I
> must say I'm less than satisfied. Apart from being rather
> poorly documented, I find it especially cumbersome to use, and
> also rather limited. What I dislike more is that it seems
> working by *rows* instead than by *columns*.

It is very *thoroughly* documented, which is a style that won't
suit every purpose.

> So I have some questions:
> 1) Is there a good tutorial, example collection etc. on the csv
> module that I'm missing?

Just skip to 9.1.5 Examples, and you'll be on your way.

> 2) Is there an alternative csv read/write module?

There are other ways to tackle the data, for example, using an
csv ODBC apaptor. That may or may not seem like an easy solution
to you. It certainly doesn't suit me.

> 3) In case anyone else is as unhappy as me, and no tutorial
> etc. enlighten us, and no alternative is present, anyone is
> interested in an alternative csv module? I'd like to write one
> if it is the case.

I was intimidated by it at first, implemented my own reader
(mostly as a fun parsing exercise), used that for a while, and
then threw it out.

I advise you to spend time staring at the examples, and use the
simplest example the suits your needs. Also search this archives
of this group for examples.

--
Neil Cerutti
The pastor will preach his farewell message, after which the choir will sing,
"Break Forth Into Joy." --Church Bulletin Blooper

Neil Cerutti

unread,
Dec 12, 2007, 8:58:50 AM12/12/07
to
On 2007-12-12, John Machin <sjma...@lexicon.net> wrote:
>> It's clear that I am thinking to completely different usages
>> for CSV than what most people in this thread. I use csv to
>> export and import numerical data columns to and from
>> spreadsheets.
>
> For that purpose, CSV files are the utter pox and then some.
> Consider using xlrd and xlwt (nee pyexcelerator) to read (resp.
> write) XLS files directly.

I can vouch for that advice. I was exporting .xls files to csv
text files for over a year before I tried the xlrd solution--the
whole process is less cumbersome now, though it was bewildering
at first working with Excel in Python. Actually, surprises still
crop up now and then, mostly to do with cell types. The advantage
of working with csv was that everything was a string.

--
Neil Cerutti
The world is more like it is now than it ever has been before. --Dwight
Eisenhower

massimo s.

unread,
Dec 12, 2007, 10:04:09 AM12/12/07
to
On Dec 12, 2:58 pm, Neil Cerutti <horp...@yahoo.com> wrote:

> On 2007-12-11, massimo s. <deviceran...@gmail.com> wrote:
>
> > Hi,
>
> > I'm struggling to use the python in-built csv module, and I
> > must say I'm less than satisfied. Apart from being rather
> > poorly documented, I find it especially cumbersome to use, and
> > also rather limited. What I dislike more is that it seems
> > working by *rows* instead than by *columns*.
>
> It is very *thoroughly* documented, which is a style that won't
> suit every purpose.
> > So I have some questions:
> > 1) Is there a good tutorial, example collection etc. on the csv
> > module that I'm missing?
>
> Just skip to 9.1.5 Examples, and you'll be on your way.

If by "thoroughly" you mean "it actually describes technically what it
is and does but not how to really do things", yes, it is thoroughly
documented.
The examples section is a joke. It gives good examples for the
simplest usage cases (good), then it almost immediately digs into
details like the Unicode stuff, leaving aside the rest. DictWriter and
DictReader are absent from the examples. And also the Sniffer.

And, as a sidenote, why putting something useful like the unicode
decoder-encoders in the example section instead of inserting them
directly in the library?

I don't want to be mean with the author of csv and its docs. I now
understand there are excellent reasons for csv to be done the way it
is, and it's only my fault if I didn't see that before. I also know
first hand that documenting code is hard and boring stuff. Kudos to
anyone doing that, but in the Example section there is surely room for
improvement. It's probably OK for people doing things row-by-row and
that already know perfectly their way in and out all that, but if this
thread teaches us something, is that the same thing can be used for
vastly different purposes.

I will try to submit a patch to the documentation based on examples
coming from here and what I will learn by digging into csv.

> > 3) In case anyone else is as unhappy as me, and no tutorial
> > etc. enlighten us, and no alternative is present, anyone is
> > interested in an alternative csv module? I'd like to write one
> > if it is the case.
>
> I was intimidated by it at first, implemented my own reader
> (mostly as a fun parsing exercise), used that for a while, and
> then threw it out.
>
> I advise you to spend time staring at the examples, and use the
> simplest example the suits your needs. Also search this archives
> of this group for examples.

OK, thanks!

As for people advicing xlrd/xlrwt: thanks for the useful tip, I didn't
know about it and looks cool, but in this case no way I'm throwing
another dependency to the poor users of my software. Csv module was
good because was built-in.

m.

Marco Mariani

unread,
Dec 12, 2007, 10:37:03 AM12/12/07
to
John Machin wrote:

> For that purpose, CSV files are the utter pox and then some. Consider
> using xlrd and xlwt (nee pyexcelerator) to read (resp. write) XLS
> files directly.

xlwt is unreleased (though quite stable, they say) at the moment, so the
links are:

easy_install xlrd
svn co https://secure.simplistix.co.uk/svn/xlwt/trunk

Marco Mariani

unread,
Dec 12, 2007, 10:43:43 AM12/12/07
to
massimo s. wrote:

> As for people advicing xlrd/xlrwt: thanks for the useful tip, I didn't
> know about it and looks cool, but in this case no way I'm throwing
> another dependency to the poor users of my software. Csv module was
> good because was built-in.

The trouble with sending CSV files to Excel (or OpenOffice, or Gnumeric,
or whatever) is that there is no way to specify the data types.

Unless you're using a predefined worksheet (and refreshing its data from
the CSV file) the spreadsheet program _won't_ get the data types right.
Strings will become numbers (and stripped of precious leading zeroes),
dates may (or may not) become floats or something entirely different.
Fixed point decimals might grow eyes and bite you.


Message has been deleted

Neil Cerutti

unread,
Dec 12, 2007, 11:17:59 AM12/12/07
to
On 2007-12-12, je.s...@hehxduhmp.org <je.s...@hehxduhmp.org> wrote:

> John Machin <sjma...@lexicon.net> wrote:
>> For that purpose, CSV files are the utter pox and then some.
>> Consider using xlrd and xlwt (nee pyexcelerator) to read
>> (resp. write) XLS files directly.
>
> FWIW, CSV is a much more generic format for spreadsheets than
> XLS. For example, I deal almost exclusively in CSV files for
> simialr situations as the OP because I also work with software
> that can't (or in some cases "can't easily") deal with XLS
> files. CSV files can be read in by basically anything.

When I have a choice, I use simple tab-delimited text files. The
usually irrelevent limitation is the inability to embed tabs or
newlines in fields. The relevant advantage is the simplicity.

--
Neil Cerutti
The recording I listened to had Alfred Brendel doing the dirty work of
performing this sonata (Liszt B minor) --Music Lit Essay

J. Clifford Dyer

unread,
Dec 12, 2007, 11:49:20 AM12/12/07
to je.s...@hehxduhmp.org
On Wed, Dec 12, 2007 at 10:08:38AM -0600, je.s...@hehxduhmp.org wrote regarding Re: Is anyone happy with csv module?:
>
> FWIW, CSV is a much more generic format for spreadsheets than XLS.
> For example, I deal almost exclusively in CSV files for simialr situations
> as the OP because I also work with software that can't (or in some
> cases "can't easily") deal with XLS files. CSV files can be read in
> by basically anything.

Compatibility-wise, yes, CSV is much more generic. But spreadsheet != ledger paper. Spreadsheets incorporate many functions (as simple as summation!) that CSV cannot handle. Thus functionality-wise CSV is infact a very specific subset of spreadsheets, and is not generic in the slightest.

But the software you are dealing with probably doesn't actually need spreadsheets. It just needs digital ledgers.

Cheers,
Cliff

Message has been deleted

Shane Geiger

unread,
Dec 12, 2007, 12:01:28 PM12/12/07
to Neil Cerutti, pytho...@python.org
Neil Cerutti wrote:
> On 2007-12-12, je.s...@hehxduhmp.org <je.s...@hehxduhmp.org> wrote:
>
>> John Machin <sjma...@lexicon.net> wrote:
>>
>>> For that purpose, CSV files are the utter pox and then some.
>>> Consider using xlrd and xlwt (nee pyexcelerator) to read
>>> (resp. write) XLS files directly.
>>>
>> FWIW, CSV is a much more generic format for spreadsheets than
>> XLS. For example, I deal almost exclusively in CSV files for
>> simialr situations as the OP because I also work with software
>> that can't (or in some cases "can't easily") deal with XLS
>> files. CSV files can be read in by basically anything.
>>
>
> When I have a choice, I use simple tab-delimited text files. The
> usually irrelevent limitation is the inability to embed tabs or
> newlines in fields. The relevant advantage is the simplicity.
>

That is very unnecessary. You can have your tabs and not eat them, too:

#!/usr/bin/python
"""
EXAMPLE USAGE OF PYTHON'S CSV.DICTREADER FOR PEOPLE NEW TO PYTHON AND/OR
CSV.DICTREADER

Python - Batteries Included(tm)

This file will demonstrate that when you use the python CSV module, you
don't have to remove the newline characters, as between "acorp_ Ac" and
"orp Foundation" and other parts of the data below.

It also demonstrates python's csv.DictReader, which allows you to read a
CSV record into a dictionary.

This will also demonstrate the use of lists ([]s) and dicts ({}s).

If this doesn't whet your appetite for getting ahold of a powertool
instead of sed for managing CSV data, I don't know what will.

"""

#### FIRST: CREATE A TEMPORARY CSV FILE FOR DEMONSTRATION PURPOSES
mycsvdata = """
"Category","0","acorp_ Ac
orp Foundation","","","Acorp Co","(480) 905-1906","877-462-5267 toll
free","800-367-2228","800-367-2228","in...@acorp.or
g","7895 East Drive","Scottsdale","AZ","85260-6916","","","","","","Pres
Fred & Linda ","0","0","1","3","4","1"

"Category","0","acorp_ Bob and Margaret Schwartz","","","","317-321-6030
her","317-352-0844","","","","321 North Butler Ave.","In
dianapolis","IN","46219","","","","","","Refrigeration
man","0","1","2","3","4","0"

"Category","0","acorp_ Elschlager,
Bob","","","","","702-248-4556","","","Tro...@aol.com","7950 W.
Flamingo Rd. #2032","Las Vega
s","NV","89117","","","","","","guy I met","0","1","2","3","4","1"

"""

## NOTE: IF YOU HAVE A RECORD SEPARATOR WITHIN QUOTES, IT WILL NOT BE
TREATED LIKE A RECORD SEPARATOR!
## Beef|"P|otatos"|Dinner Roll|Ice Cream


import os, sys
def writefile(filename, filedata, perms=750):
f = open(filename, "w")
f.write(filedata)
os.system("chmod "+str(perms)+" "+filename)
f.close()

file2write = 'mycsvdata.txt'
writefile(file2write,mycsvdata)

# Check that the file exists
if not os.path.exists(file2write):
print "ERROR: unable to write file:", file2write," Exiting now!"
sys.exit()

# ...so everything down to this point merely creates the
# temporary CSV file for the code to test (below).

#### SECOND: READ IN THE CSV FILE TO CREATE A LIST OF PYTHON
DICTIONARIES, WHERE EACH
# DICTIONARY CONTAINS THE DATA FROM ONE ROW. THE KEYS OF THE
DICTIONARY WILL BE THE FIELD NAMES
# AND THE VALUES OF THE DICTIONARY WILL BE THE VALUES CONTAINED WITHIN
THE CSV FILE'S ROW.

import csv

### NOTE: Modify this list to match the fields of the CSV file.
header_flds =
['cat','num','name','blank1','blank2','company','phone1','phone2', \

'phone3','phone4','email','addr1','city','state','zip','blank3', \

'blank4','blank5','blank6','blank7','title','misc1','misc2','misc3', \
'mics4','misc5','misc6']

file2open = 'mycsvdata.txt'

reader = csv.DictReader(open(file2open), [], delimiter=",")
data = []
while True:
try:
# Read next "header" line (if there isn't one then exit the loop)
reader.fieldnames = header_flds
rdr = reader.next()
data.append(rdr)
except StopIteration: break


def splitjoin(x):
""" This removes any nasty \n that might exist in a field
(of course, if you want that in the field, don't use this)
"""
return ''.join((x).split('\n'))


#### THIRD: ITERATE OVER THE LIST OF DICTS (IN WHICH EACH DICT IS A
ROW/RECORD FROM THE CSV FILE)

# example of accessing all the dictionaries once they are in the list
'data':
import string
for rec in data: # for each CVS record
itmz = rec.items() # get the items from the dictionary
print "- = " * 20
for key,val in itmz:
print key.upper()+": \t\t",splitjoin(val)
# Note: splitjoin() allows a record to contain fields
with newline characters


--
Shane Geiger
IT Director
National Council on Economic Education
sge...@ncee.net | 402-438-8958 | http://www.ncee.net

Leading the Campaign for Economic and Financial Literacy

Neil Cerutti

unread,
Dec 12, 2007, 1:36:28 PM12/12/07
to
On 2007-12-12, Shane Geiger <sge...@ncee.net> wrote:
> Neil Cerutti wrote:
>> On 2007-12-12, je.s...@hehxduhmp.org <je.s...@hehxduhmp.org> wrote:
>>
>>> John Machin <sjma...@lexicon.net> wrote:
>>>
>>>> For that purpose, CSV files are the utter pox and then some.
>>>> Consider using xlrd and xlwt (nee pyexcelerator) to read
>>>> (resp. write) XLS files directly.
>>>>
>>> FWIW, CSV is a much more generic format for spreadsheets than
>>> XLS. For example, I deal almost exclusively in CSV files for
>>> simialr situations as the OP because I also work with software
>>> that can't (or in some cases "can't easily") deal with XLS
>>> files. CSV files can be read in by basically anything.
>>>
>>
>> When I have a choice, I use simple tab-delimited text files. The
>> usually irrelevent limitation is the inability to embed tabs or
>> newlines in fields. The relevant advantage is the simplicity.
>>
>
> That is very unnecessary. You can have your tabs and not eat them, too:
>
>
>
> #!/usr/bin/python
> """
> EXAMPLE USAGE OF PYTHON'S CSV.DICTREADER FOR PEOPLE NEW TO
> PYTHON AND/OR CSV.DICTREADER

I gladly use the csv module to generate valid csv data for others
or for myself. But I'm no longer comfortable using just anyone's
csv export feature. A commercial product I use every day creates
invalid csv files. How many more products have tried to "roll
their own" and botched it horribly? I wish more apps embedded
Python. ;-)

Thanks for posting the example code.

--
Neil Cerutti
You've got to take the sour with the bitter. --Samuel Goldwyn

J. Clifford Dyer

unread,
Dec 12, 2007, 2:51:43 PM12/12/07
to pytho...@python.org
On Wed, Dec 12, 2007 at 11:02:04AM -0600, je.s...@hehxduhmp.org wrote regarding Re: Is anyone happy with csv module?:

>
> J. Clifford Dyer <j...@sdf.lonestar.org> wrote:
> > But the software you are dealing with probably doesn't actually
> > need spreadsheets. It just needs digital ledgers.
>
> I saw someone else in this thread note that they considered CSV to
> be a serialization method and not a file format, which I thought was a
> brilliant way of looking at things.
>
> OTOH, from a practical perspective, most of the people I end up interacting
> with say "spreadsheet" to mean exactly what you're describing as a
> "ledger paper" ... sure they might want to do operations on those cells
> in the data, but it isn't key what actual format the data is presented
> to them in (XLS, CSV, tab delimited, etc). Oddly enough though, they
> almost always say "excel spreadsheet" even though that's not what they
> need (rather, its just a side effect of them not realizing there actually
> *is* other software that handles these things)

I know, I know. The last place I worked, we had timesheets that were excel spreadsheets, but you actually had to total up your own hours. I added to my job description the role of teaching people how to make the spreadsheet do the work for you.

Cheers,
Cliff

John Machin

unread,
Dec 12, 2007, 3:34:05 PM12/12/07
to
On Dec 13, 12:58 am, Neil Cerutti <horp...@yahoo.com> wrote:

> On 2007-12-12, John Machin <sjmac...@lexicon.net> wrote:
>
> >> It's clear that I am thinking to completely different usages
> >> for CSV than what most people in this thread. I use csv to
> >> export and import numerical data columns to and from
> >> spreadsheets.
>
> > For that purpose, CSV files are the utter pox and then some.
> > Consider using xlrd and xlwt (nee pyexcelerator) to read (resp.
> > write) XLS files directly.
>
> I can vouch for that advice. I was exporting .xls files to csv
> text files for over a year before I tried the xlrd solution--the
> whole process is less cumbersome now, though it was bewildering
> at first working with Excel in Python. Actually, surprises still
> crop up now and then, mostly to do with cell types.

Hi Neil, I'd be interested in hearing from you what caused the initial
bewilderment with xlrd, and could it have been reduced by better
documentation? What kinds of surprises?

> The advantage
> of working with csv was that everything was a string.

It depends of your point of view. I'd regard that as a
DISadvantage :-) With xlrd, if you have no expectation about the type
of data in a cell, but need/want to know, xlrd will tell you. If you
do have an expectation, you can check if actual == expected.

Here's an example. Create a tiny csv file with dates in a format
that's NOT appropriate to your locale (e.g. if you are in the USA,
like the ddmmyyyy_dates.csv below). Open it with Excel by double-
clicking on the name in Windows Explorer. Make the column twice its
initial width. You'll notice some of the data (about 60% in a large
dataset with approx. uniform distribution e.g. birth-dates) is LEFT-
justified (text) and the remainder is RIGHT-justified (date). If you
were given the xls file in that state, using xlrd the problem could be
detected and worked around. Alternatively, go into user emulation
mode: "fix" the problem with some formulas, forget immediately what
you did, save the result as a new csv file, pass that on to the next
user without a murmur, and delete the original csv file and the xls
file.

This is based on a true story, with one difference: the dates in the
original csv file were formatted correctly for the DD/MM/YYYY-using
locale; however Excel 97 would ignore the locale and assume MM/DD/YYYY
if you opened the file from within Excel instead of double-clicking in
Explorer (or vice versa; I forget which).

8<== ddmmyyyy_dates.csv
01/01/2007
31/01/2007
01/12/2007
31/12/2007
8<== mmddyyyy_dates.csv
01/01/2007
01/31/2007
12/01/2007
12/31/2007
8<==

Cheers,
John

Neil Cerutti

unread,
Dec 13, 2007, 10:09:05 AM12/13/07
to
On 2007-12-12, John Machin <sjma...@lexicon.net> wrote:
> On Dec 13, 12:58 am, Neil Cerutti <horp...@yahoo.com> wrote:
>> On 2007-12-12, John Machin <sjmac...@lexicon.net> wrote:
>>
>> >> It's clear that I am thinking to completely different usages
>> >> for CSV than what most people in this thread. I use csv to
>> >> export and import numerical data columns to and from
>> >> spreadsheets.
>>
>> > For that purpose, CSV files are the utter pox and then some.
>> > Consider using xlrd and xlwt (nee pyexcelerator) to read
>> > (resp. write) XLS files directly.
>>
>> I can vouch for that advice. I was exporting .xls files to csv
>> text files for over a year before I tried the xlrd
>> solution--the whole process is less cumbersome now, though it
>> was bewildering at first working with Excel in Python.
>> Actually, surprises still crop up now and then, mostly to do
>> with cell types.
>
> Hi Neil, I'd be interested in hearing from you what caused the
> initial bewilderment with xlrd, and could it have been reduced
> by better documentation? What kinds of surprises?

The bewilderment had to do not with xlrd, but with learning the
structure of an Excel spreadsheet. My brain was extremely
resistant to what it didn't want to know. ;-)

The suprises are when a different data type gets returned for
something, like a zip code, whose cases I thought I had covered.
This is not a problem with xlrd either, but with my data source
providing slighly different data, resulting in errors. E.g., the
first time I got a zip+4 instead of a zip.

When I was exporting to csv, I handled those issues by manually
formatting the columns before exporting. This is what made it
cumbersome. I traded that in for the occasional pupu platter of
data. But by using Python directly on the spreadsheet, I have to
fix new problems only *once*.

>> The advantage of working with csv was that everything was a
>> string.
>
> It depends of your point of view. I'd regard that as a
> DISadvantage :-) With xlrd, if you have no expectation about
> the type of data in a cell, but need/want to know, xlrd will
> tell you. If you do have an expectation, you can check if
> actual == expected.

Sorry, my statement was nonsense. My data was only unified before
exporting because I unified it manually, e.g., making a ziptext
column and setting it to =TEXT(ZIP, "00000").

So I'd say the bewilderment and surprise I experience are coming
from my reluctance to learn, and my data, respectively--not from
any property of xlrd.

--
Neil Cerutti
To succeed in the world it is not enough to be stupid, you must also be well-
mannered. --Voltaire

Cliff Wells

unread,
Dec 14, 2007, 6:40:32 PM12/14/07
to massimo s., pytho...@python.org
On Wed, 2007-12-12 at 07:04 -0800, massimo s. wrote:

> If by "thoroughly" you mean "it actually describes technically what it
> is and does but not how to really do things", yes, it is thoroughly
> documented.
> The examples section is a joke.

Actually I rarely use the csv module these days, but I find a quick
glance at the examples to be sufficient in most cases.

> It gives good examples for the
> simplest usage cases (good), then it almost immediately digs into
> details like the Unicode stuff, leaving aside the rest. DictWriter and
> DictReader are absent from the examples. And also the Sniffer.

I take full responsibility for the lack of documentation on the Sniffer.
If you are interested in using it, I'd be happy to help you out and
maybe we could get some docs submitted. It's actually remarkably simple
to use, but I admit looking at the source won't make you think so ;-)

Regards,
Cliff

0 new messages