SQL and CSV

Nick

unread,

May 5, 2009, 12:07:29 PM5/5/09

to

I have a requirement to read a CSV file. Normally, no problem, just
import CSV and slurp the file up.

However, in this case I want to filter out lines that have fields set
to particular values.

It would be neat to be able to do something like this.

select * from test.csv where status <> "Canceled"

Using adodb I can do this, so long as I don't have the where clause. :-
(

Is there a reasonable lightweight way of doing this in Python?

I could write some python code that is used to filter rows, and inport
that from config, but it's not quite as elegant as an SQL route.

Thanks

Nick

Tim Golden

unread,

May 5, 2009, 12:19:32 PM5/5/09

to pytho...@python.org

Not entirely clear what you are and aren't prepared to try here, but...
the most obvious Python-based way to do this is treating the csv reader
as an iterator and filtering there. Your last line suggests that's not
what you want but just in case I've misunderstood:

<test.csv>
id,code,status
1,"ONE","Active"
2,"TWO","Cancelled"
3,"THREE","Active"
</test.csv>

<code>
import csv

for row in csv.DictReader (open ("c:/temp/test.csv", "rb")):
if row['status'] != 'Cancelled':
print row

</code>

Doesn't seem too onerous, and could obviously be wrapped in
some useful class/module.

But if you really want to go the SQL route, I believe there are
ODBC adapters for CSV which, combined with PyODBC or CeODBC,
would probably take you where you want to go.

TJG

Nick

unread,

May 5, 2009, 12:25:42 PM5/5/09

to

Part of the problem is that the 'selection' needs to be in a config
file. I can put the if row['status'] != 'Cancelled': return True into
a config, read it and eval it, but its not quite as clean as an sql
route.

Nick

Matimus

unread,

May 5, 2009, 2:04:00 PM5/5/09

to

Well, if you are using 2.5.x you could always stuff it into a sqlite
in-memory database, and then execute a SQL query. Heck, you don't even
_need_ 2.5, but in 2.5 sqlite is part of the distribution.

Matt

Tim Golden

unread,

May 5, 2009, 3:27:37 PM5/5/09

to pytho...@python.org

Nick wrote:
> Part of the problem is that the 'selection' needs to be in a config
> file. I can put the if row['status'] != 'Cancelled': return True into
> a config, read it and eval it, but its not quite as clean as an sql
> route.

Still not clear what the restriction is. If you were writing
SQL you'd have to read *something* from your config file,
unless you're suggesting that the "config file" is in fact
a SQL file. Which is one way of doing it, but then you might
just as well have your config file as a Python file and
import it.

Have I missed the point somewhere here? Can you give an
example -- even a fictional one -- of what you couldn't
do using, say, the example I gave earlier?

TJG

Nick

unread,

May 7, 2009, 11:45:06 AM5/7/09

to

On May 5, 8:27 pm, Tim Golden <m...@timgolden.me.uk> wrote:
> Nick wrote:
> > Part of the problem is that the 'selection' needs to be in a config
> > file. I can put the if row['status'] != 'Cancelled': return True into
> > a config, read it and eval it, but its not quite as clean as ansql
> > route.
>

> Still not clear what the restriction is. If you were writingSQLyou'd have to read *something* from your config file,

> unless you're suggesting that the "config file" is in fact

> aSQLfile. Which is one way of doing it, but then you might

> just as well have your config file as a Python file and
> import it.
>
> Have I missed the point somewhere here? Can you give an
> example -- even a fictional one -- of what you couldn't
> do using, say, the example I gave earlier?
>
> TJG

Solution found. In the end I used SQLite to read from a csv file, and
now I can query the CSV file. The file is read using the csv module

First create a function

def fraction(p, denom):
num, frac = p.split ('-')
return float (num) + float (frac) / denom

for use within queries.

Now build the class.

self.filename = filename
self.dialect = dialect
self.query = query
reader = csv.reader (open (filename, 'r'))
self.connection = sqlite.connect(":memory:")
self.connection.create_function("fraction", 2, fraction) #
Adds in function
self.cursor = self.connection.cursor()
first = True
for row in reader:
if first:
headers = []
for r in row:
n = r.strip().replace (' ', '_').replace ('-','_')
headers.append (n)
command = 'create table csv (%s)' % ','.join (headers)
self.cursor.execute (command)
first = False
else:
command = 'insert into csv values ("%s")' % '","'.join
(row)
self.cursor.execute (command)

and then I can use this

self.cursor.execute (self.query)
rows = self.cursor.fetchall()
headers = []
for r in self.cursor.description:
headers.append (r[0])
results = Results.Results (headers, self.name, {})
i = 0
for row in rows:
results.add (row, i)
i = i + 1
return results

to query the results.

Results.Results is one of my classes that's reused in lots of places.

The query then looks somethign like this

select
Client_Reference_Number as TrdNbr,
Asset_Number as ISIN,
Quantity as Qty,
status
from
csv
where status in ("CANCELLED")

union

select
Client_Reference_Number as TrdNbr,
Asset_Number as ISIN,
Quantity as Qty,
status
from
csv
where status not in ("CANCELLED")

All incredibly neat and the first time I've used SQLite.

nick

John Machin

unread,

May 7, 2009, 10:42:04 PM5/7/09

to

The remaining lines of your SELECT statement are incredibly redundant
AFAICT. It seems you have pushed the contents of your csv file into a
data base and pulled them ALL out again ... not what I'd call a
"query". What's the point?

Message has been deleted

Peter Otten

unread,

May 8, 2009, 3:54:30 AM5/8/09

to

Nick wrote:

> self.cursor = self.connection.cursor()
> first = True
> for row in reader:
> if first:
> headers = []
> for r in row:
> n = r.strip().replace (' ', '_').replace ('-','_')
> headers.append (n)
> command = 'create table csv (%s)' % ','.join (headers)
> self.cursor.execute (command)
> first = False
> else:
> command = 'insert into csv values ("%s")' % '","'.join
> (row)
> self.cursor.execute (command)
>

You can simplify that a bit:

cursor = self.cursor = self.connection.cursor()

first_row = next(reader)
headers = [column.strip().replace(" ", "_").replace("-", "_") for column in
first_row]
cursor.execute("create table csv (%s)" % ", ".join(headers))

placeholders = ", ".join("?"*len(headers))
command = "insert into csv values (%s)" % placeholders
cursor.executemany(command, reader)

While it may not matter here using placeholders instead of manually escaping
user-provided values is a good habit to get into.

> self.cursor.execute (self.query)
> rows = self.cursor.fetchall()

rows = self.cursor.execute(self.query)

doesn't build an intermediate list.

> i = 0
> for row in rows:
> results.add (row, i)
> i = i + 1

This is written

for i, row in enumerate(rows):
results.add(row, i)

in idiomatic Python.

Peter

Lawrence D'Oliveiro

unread,

May 8, 2009, 7:11:52 AM5/8/09

to

In message <gu0ofm$oj9$00$1...@news.t-online.com>, Peter Otten wrote:

> While it may not matter here using placeholders instead of manually
> escaping user-provided values is a good habit to get into.

Until you hit things it can't deal with.

andrew cooke

unread,

May 8, 2009, 8:49:07 AM5/8/09

to pytho...@python.org

The post you are replying to was talking about using the SQL library's "?"
syntax that automatically escapes values. The usual reason this is
recommended (if I have understood correctly) is that the library code is
much more likely to foil injection attacks. I have seen this mentioned
often and assume it is good advice.

Can you expand on your comment? I assume you are thinking of how the
library might handle some strange class. But aren't the number of types
limited by SQL? In which case a "thing that can't be handled" could
presumably be managed by adding an appropriate __str__ or __float__ or
whatever? And you would still use the library to give safety with other
values.

Maybe you could give an example of the kind of problem you're thinking of?

Thanks,
Andrew

Nick

unread,

May 8, 2009, 10:22:02 AM5/8/09

to

On May 8, 1:49 pm, "andrew cooke" <and...@acooke.org> wrote:
> Lawrence D'Oliveiro wrote:

Injection attacks aren't an issue, its a local app.

It's part of a reconciliation system, where sometimes data is in csv
files. If you want the whole csv file, you can use csv module without
a problem.

In some cases, I need to manipulate the data.

The choices are hard code the manipulation, or load the data from a
config file.

So what I've got is the query in the config and I can process it.

Nick

andrew cooke

unread,

May 8, 2009, 12:38:33 PM5/8/09

to pytho...@python.org

even if you're not open to injection attacks, you're still less likely to
get escaping correct than a puprose written, widely used library.

my request for more information was directed to lawrence, who said "until
you hit things it can't deal with" which seemed to be some kind of cryptic
argument against parameters.

andrew

Nick wrote:

> On May 8, 1:49锟絧m, "andrew cooke" <and...@acooke.org> wrote:
>> Lawrence D'Oliveiro wrote:
>> > In message <gu0ofm$oj9$0...@news.t-online.com>, Peter Otten wrote:
>>
>> >> While it may not matter here using placeholders instead of manually
>> >> escaping user-provided values is a good habit to get into.
>>
>> > Until you hit things it can't deal with.
>>
>> The post you are replying to was talking about using the SQL library's
>> "?"

>> syntax that automatically escapes values. 锟絋he usual reason this is

>> recommended (if I have understood correctly) is that the library code is

>> much more likely to foil injection attacks. 锟絀 have seen this mentioned

>> often and assume it is good advice.
>>

>> Can you expand on your comment? 锟絀 assume you are thinking of how the
>> library might handle some strange class. 锟紹ut aren't the number of types
>> limited by SQL? 锟絀n which case a "thing that can't be handled" could

>> presumably be managed by adding an appropriate __str__ or __float__ or

>> whatever? 锟紸nd you would still use the library to give safety with other

>> values.
>>
>> Maybe you could give an example of the kind of problem you're thinking
>> of?
>>
>> Thanks,
>> Andrew
>
> Injection attacks aren't an issue, its a local app.
>
> It's part of a reconciliation system, where sometimes data is in csv
> files. If you want the whole csv file, you can use csv module without
> a problem.
>
> In some cases, I need to manipulate the data.
>
> The choices are hard code the manipulation, or load the data from a
> config file.
>
> So what I've got is the query in the config and I can process it.
>
> Nick

> --
> http://mail.python.org/mailman/listinfo/python-list
>
>

Lawrence D'Oliveiro

unread,

May 9, 2009, 12:48:48 AM5/9/09

to

In message <mailman.5285.1241786...@python.org>, andrew
cooke wrote:

> Lawrence D'Oliveiro wrote:
>> In message <gu0ofm$oj9$00$1...@news.t-online.com>, Peter Otten wrote:
>>
>>> While it may not matter here using placeholders instead of manually
>>> escaping user-provided values is a good habit to get into.
>>
>> Until you hit things it can't deal with.
>

> Can you expand on your comment? I assume you are thinking of how the
> library might handle some strange class.

Consider something as simple as a "like" clause.

Lawrence D'Oliveiro

unread,

May 9, 2009, 12:50:55 AM5/9/09

to

In message <mailman.5299.1241800...@python.org>, andrew
cooke wrote:

> even if you're not open to injection attacks, you're still less likely to
> get escaping correct than a puprose written, widely used library.

Escaping isn't really that hard to do. For example, the rules for MySQL
literals are clearly spelled out here
<http://dev.mysql.com/doc/refman/5.0/en/string-syntax.html>. You can easily
check that against my quoting routines here
<http://codecodex.com/wiki/index.php?title=Useful_MySQL_Routines#Quoting>.