newbie file/DB processing

len

unread,

May 18, 2005, 3:52:49 PM5/18/05

to

I am in the process of learning python. I have bought Learning Python
by Mark Lutz, printed a copy of Dive into Python and various other
books and looked at several tutorials. I have started a stupid little
project in python and things are proceeding well. I am an old time
cobol programmer from the IBM 360/370 eria and this ingrained idea of
file processing using file definition (FD's) I believe is causing me
problems because I think python requires a different way of looking at
datafiles and I haven't really gotten my brain around it yet. I would
like to create a small sequential file, processing at first to store a
group id, name, amount, and date which I can add to delete from and
update. Could someone point me to some code that would show me how this
is done in python. Eventually, I intend to expand my little program to
process this file as a flat comma delimited file, move it to some type
of indexed file and finally to some RDBMS system. My little program
started out at about 9 lines of code and is now at about 100 with 5 or
six functions which I will eventually change to classes (I need to
learn OOP to but one step at a time).

So any recommendations of code, online tutorial, or book that might
address this file processing/database would be appreciated

Thanks
Len

Message has been deleted

Mike Meyer

unread,

May 19, 2005, 3:27:03 AM5/19/05

to

"len" <lsum...@gmail.com> writes:

> I am in the process of learning python. I have bought Learning Python
> by Mark Lutz, printed a copy of Dive into Python and various other
> books and looked at several tutorials. I have started a stupid little
> project in python and things are proceeding well. I am an old time
> cobol programmer from the IBM 360/370 eria and this ingrained idea of
> file processing using file definition (FD's) I believe is causing me
> problems because I think python requires a different way of looking at
> datafiles and I haven't really gotten my brain around it yet. I would
> like to create a small sequential file, processing at first to store a
> group id, name, amount, and date which I can add to delete from and
> update. Could someone point me to some code that would show me how this
> is done in python. Eventually, I intend to expand my little program to
> process this file as a flat comma delimited file, move it to some type
> of indexed file and finally to some RDBMS system. My little program
> started out at about 9 lines of code and is now at about 100 with 5 or
> six functions which I will eventually change to classes (I need to
> learn OOP to but one step at a time).

What you're looking for isn't so much the Python way of doing things;
it's the Unix way of doing things. The OS doesn't present the file as
a sequence of records; files are presented as a sequence of bytes. Any
structure beyond that is provided by the application - possibly via a
library. This is a sufficiently powerful way of looking at files that
every modern OS I'm familiar with uses this view of files.

You might want to look at <URL: http://www.faqs.org/docs/artu/ >. It's
not really an answer to your question, but looks at Unix programming
in general. It uses fetchmail as an example application, including
examining the configuration editor written in Python.

A classic Unix approach to small databases is to use text files. When
you need to update the file, you just rewrite the whole thing. This
works well on Unix, because it comes with a multitude of tools for
processing text files. Such an approach is simple and easy to
implement, but not very efficient for large files. A classic example
is a simple phone book application: you have a simple tool for
updating the phone book, and use the "grep" command for searching
it. Works like a charm for small files, and allows for some amazingly
sophisticated queries.

To provide some (simple) code, assume your file is a list of lines,
with id, name, amount, date on each line, separated by spaces. Loading
this into a list in memory is trivial:

datafile = open("file", "r")
datalist = []
for line in data:
datalist.append(line.split())
datafile.close()

At this point, datalist is a list of lists. datalist[0] is a list of
[id, name, amount, date]. You could (for example) sum all the amounts
like so:

total = 0
for datum in datalist:
total += datum[2]

There are more concise ways to write this, but this is simple and
obvious.

Writing the list back out is also trivial:

datafile = open("file", "w")
datafile.writelines(" ".join(x) + "\n" for x in datalist)
datafile.close()

Note that that uses a 2.4 feature. The more portable - and obvious -
way to write this is:

datafile = open("file", "w")
for datum in datalist:
datafile.write(" ".join(datum) + "\n")
datafile.close()

For comma delimited files, there's the CSV module. It loads files
formmated as Comma Seperated Values (a common interchange format for
spreadsheets) into memory, and writes them back out again. This is a
slightly more structured version of the simple text file approach. It
may be just what you're looking for.

If you want to store string objects selectable by a single string key,
the various Unix db libraries are just what the doctor ordered. The
underlying C libraries allow arbitrary memory chunks as keys/objects,
but the Python libraries use Python strings, and dbms look like
dictionaries. The shelve module is built on top of these, allowing you
to store arbitrary Python objects instead of just strings.

Finally, for RDBMS, you almost always get SQL these days. The options
run from small embedded databases built in python to network
connections to full-blown SQL servers. I'd put that decision off until
you really need a database.

<mike
--
Mike Meyer <m...@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.

Magnus Lycka

unread,

May 19, 2005, 6:24:42 AM5/19/05

to

len wrote:
> I am an old time
> cobol programmer from the IBM 360/370 eria and this ingrained idea of
> file processing using file definition (FD's) I believe is causing me
> problems because I think python requires a different way of looking at
> datafiles and I haven't really gotten my brain around it yet.

Yup, Python uses the same world view as C, C++, Unix etc.

> I would
> like to create a small sequential file, processing at first to store a
> group id, name, amount, and date which I can add to delete from and
> update. Could someone point me to some code that would show me how this
> is done in python. Eventually, I intend to expand my little program to
> process this file as a flat comma delimited file, move it to some type
> of indexed file and finally to some RDBMS system. My little program
> started out at about 9 lines of code and is now at about 100 with 5 or
> six functions which I will eventually change to classes (I need to
> learn OOP to but one step at a time).

I think it's much easier to go directly to SQL without those
diversions. In a way, an SQL database maps better to your idea
of files / records, and it just takes much less code and effort
to use SQL than to twist "normal" (for me) files into behaving
like main frame dittos.

I'd suggest that you download pysqlite. Then you have a small
embedded SQL database in your python program and don't need to
bother with a server. See http://initd.org/tracker/pysqlite

Another obvious solution, since you are saying that it's a
small file, is to always read the whole file into memory, and
to rewrite the whole file when you change things.

For info in CSV handling, see http://docs.python.org/lib/module-csv.html

For other non-SQL solutions, please have a look at
http://www-106.ibm.com/developerworks/library/l-pypers.html
and http://docs.python.org/lib/node77.html

len

unread,

May 19, 2005, 9:53:10 AM5/19/05

to

Thanks for the reply.

I just read your response and will be taking your suggestion immediatly

Len Sumnler

len

unread,

May 19, 2005, 9:59:34 AM5/19/05

to

Thanks for the reply

I think you might be right. I have been playing around with Linux at
home. What I may have to do in switch my mindset from IBM/Microsoft to
a more Unix way of thinking.

Also thanks for the code samples.

Len Sumnler

len

unread,

May 19, 2005, 10:01:21 AM5/19/05

to

Thanks for the reply

Everyone seems to be saying the same thing which is jump into some
RDBM.

Len Sumnler

Paul Watson

unread,

May 19, 2005, 10:42:00 AM5/19/05

to

"len" <lsum...@gmail.com> wrote in message
news:1116445969....@f14g2000cwb.googlegroups.com...

> I am an old time
> cobol programmer from the IBM 360/370 eria and this ingrained idea of
> file processing using file definition (FD's) I believe is causing me
> problems because I think python requires a different way of looking at
> datafiles and I haven't really gotten my brain around it yet.

Welcome, Len.

> I would
> like to create a small sequential file, processing at first to store a
> group id, name, amount, and date which I can add to delete from and
> update

In addition to the suggestions already given, you might take a look at the
struct module. This will let you use fixed-width binary records.

The concept of streams found in UNIX takes some getting used to. Many files
are maintained as text using delimited, variable length fields with a
newline at the end. Try 'cat /etc/passwd' on a UNIX/Linux host to see such
a file using a colon ':' as the delimiter.

I turn to the 'od' command when I want the truth. Use it to see what bytes
are -really- in the file. The following should work on Linux or under
Cygwin if you are still using Windows.

od -Ax -tcx1 thefile.dat

You can use od to look at data in the stream. The output of the print
command is going into the od command.

$ print "now"|od -Ax -tcx1
000000 6e 6f 77 0a
n o w \n
6e 6f 77 0a
000004

Message has been deleted

jl...@ozdevelopment.com

unread,

May 19, 2005, 10:16:26 PM5/19/05

to

Hi Len,
If you want to still try this with a Windows programming language try
OZEXE Lite at http://www.ozdevelopment.com. With built in support for
ODBC and a simplified language you be up and running in no time. You
could just use a datasource for a text file. It's free by the way.

Regards,
James