So any recommendations of code, online tutorial, or book that might
address this file processing/database would be appreciated
Thanks
Len
> I am in the process of learning python. I have bought Learning Python
> by Mark Lutz, printed a copy of Dive into Python and various other
> books and looked at several tutorials. I have started a stupid little
> project in python and things are proceeding well. I am an old time
> cobol programmer from the IBM 360/370 eria and this ingrained idea of
> file processing using file definition (FD's) I believe is causing me
> problems because I think python requires a different way of looking at
> datafiles and I haven't really gotten my brain around it yet. I would
> like to create a small sequential file, processing at first to store a
> group id, name, amount, and date which I can add to delete from and
> update. Could someone point me to some code that would show me how this
> is done in python. Eventually, I intend to expand my little program to
> process this file as a flat comma delimited file, move it to some type
> of indexed file and finally to some RDBMS system. My little program
> started out at about 9 lines of code and is now at about 100 with 5 or
> six functions which I will eventually change to classes (I need to
> learn OOP to but one step at a time).
What you're looking for isn't so much the Python way of doing things;
it's the Unix way of doing things. The OS doesn't present the file as
a sequence of records; files are presented as a sequence of bytes. Any
structure beyond that is provided by the application - possibly via a
library. This is a sufficiently powerful way of looking at files that
every modern OS I'm familiar with uses this view of files.
You might want to look at <URL: http://www.faqs.org/docs/artu/ >. It's
not really an answer to your question, but looks at Unix programming
in general. It uses fetchmail as an example application, including
examining the configuration editor written in Python.
A classic Unix approach to small databases is to use text files. When
you need to update the file, you just rewrite the whole thing. This
works well on Unix, because it comes with a multitude of tools for
processing text files. Such an approach is simple and easy to
implement, but not very efficient for large files. A classic example
is a simple phone book application: you have a simple tool for
updating the phone book, and use the "grep" command for searching
it. Works like a charm for small files, and allows for some amazingly
sophisticated queries.
To provide some (simple) code, assume your file is a list of lines,
with id, name, amount, date on each line, separated by spaces. Loading
this into a list in memory is trivial:
datafile = open("file", "r")
datalist = []
for line in data:
datalist.append(line.split())
datafile.close()
At this point, datalist is a list of lists. datalist[0] is a list of
[id, name, amount, date]. You could (for example) sum all the amounts
like so:
total = 0
for datum in datalist:
total += datum[2]
There are more concise ways to write this, but this is simple and
obvious.
Writing the list back out is also trivial:
datafile = open("file", "w")
datafile.writelines(" ".join(x) + "\n" for x in datalist)
datafile.close()
Note that that uses a 2.4 feature. The more portable - and obvious -
way to write this is:
datafile = open("file", "w")
for datum in datalist:
datafile.write(" ".join(datum) + "\n")
datafile.close()
For comma delimited files, there's the CSV module. It loads files
formmated as Comma Seperated Values (a common interchange format for
spreadsheets) into memory, and writes them back out again. This is a
slightly more structured version of the simple text file approach. It
may be just what you're looking for.
If you want to store string objects selectable by a single string key,
the various Unix db libraries are just what the doctor ordered. The
underlying C libraries allow arbitrary memory chunks as keys/objects,
but the Python libraries use Python strings, and dbms look like
dictionaries. The shelve module is built on top of these, allowing you
to store arbitrary Python objects instead of just strings.
Finally, for RDBMS, you almost always get SQL these days. The options
run from small embedded databases built in python to network
connections to full-blown SQL servers. I'd put that decision off until
you really need a database.
<mike
--
Mike Meyer <m...@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
Yup, Python uses the same world view as C, C++, Unix etc.
> I would
> like to create a small sequential file, processing at first to store a
> group id, name, amount, and date which I can add to delete from and
> update. Could someone point me to some code that would show me how this
> is done in python. Eventually, I intend to expand my little program to
> process this file as a flat comma delimited file, move it to some type
> of indexed file and finally to some RDBMS system. My little program
> started out at about 9 lines of code and is now at about 100 with 5 or
> six functions which I will eventually change to classes (I need to
> learn OOP to but one step at a time).
I think it's much easier to go directly to SQL without those
diversions. In a way, an SQL database maps better to your idea
of files / records, and it just takes much less code and effort
to use SQL than to twist "normal" (for me) files into behaving
like main frame dittos.
I'd suggest that you download pysqlite. Then you have a small
embedded SQL database in your python program and don't need to
bother with a server. See http://initd.org/tracker/pysqlite
Another obvious solution, since you are saying that it's a
small file, is to always read the whole file into memory, and
to rewrite the whole file when you change things.
For info in CSV handling, see http://docs.python.org/lib/module-csv.html
For other non-SQL solutions, please have a look at
http://www-106.ibm.com/developerworks/library/l-pypers.html
and http://docs.python.org/lib/node77.html
I just read your response and will be taking your suggestion immediatly
Len Sumnler
I think you might be right. I have been playing around with Linux at
home. What I may have to do in switch my mindset from IBM/Microsoft to
a more Unix way of thinking.
Also thanks for the code samples.
Len Sumnler
Everyone seems to be saying the same thing which is jump into some
RDBM.
Len Sumnler
Welcome, Len.
> I would
> like to create a small sequential file, processing at first to store a
> group id, name, amount, and date which I can add to delete from and
> update
In addition to the suggestions already given, you might take a look at the
struct module. This will let you use fixed-width binary records.
The concept of streams found in UNIX takes some getting used to. Many files
are maintained as text using delimited, variable length fields with a
newline at the end. Try 'cat /etc/passwd' on a UNIX/Linux host to see such
a file using a colon ':' as the delimiter.
I turn to the 'od' command when I want the truth. Use it to see what bytes
are -really- in the file. The following should work on Linux or under
Cygwin if you are still using Windows.
od -Ax -tcx1 thefile.dat
You can use od to look at data in the stream. The output of the print
command is going into the od command.
$ print "now"|od -Ax -tcx1
000000 6e 6f 77 0a
n o w \n
6e 6f 77 0a
000004
Regards,
James