I would like a pure Python solution to the extent reasonable.
Suggestions?
Thank you,
Alan Isaac
Gadfly?
The answer about which database depends on your target
platform but you could consider gadfly.
-Larry Bates
If you want really simple, look at the anydbm module. If nothing better
is available, anydbm will use dumbdbm. All of these are in the Python
build, so you do not need to fetch/read/install anything additional.
Doing the DB-API would be much stronger, but might be overkill in your
situation.
May take a look at buzhug (very pythonic way to manipulate data in the
base).
http://buzhug.sourceforge.net/
>
> Thank you,
> Alan Isaac
>
>
Once Python 2.5 comes out, I recommend using sqlite because it avoids
the mess that dbm can cause.
--
Aahz (aa...@pythoncraft.com) <*> http://www.pythoncraft.com/
"LL YR VWL R BLNG T S" -- www.nancybuttons.com
and if you don't want to wait for 2.5, you can install pysqlite without too
much trouble - and it is *very* easy to use!
For SQLite design and data browsing, check out the SQLite Browser at
http://sqlitebrowser.sourceforge.net.
-- Paul
Yeah, just be sure to do this:
from pysqlite import dbapi2 as sqlite3
then you're ready for 2.5! :)
But sqlite is not "pure Python" because it's just a wrapper around
sqlite (which has to be installed separately)...
Thorsten
> But sqlite is not "pure Python" because it's just a wrapper around
> sqlite (which has to be installed separately)...
But that's the point. Once 2.5 is released, sqlite is built-in. Unless
there's more to it that I don't know, and something must still be
installed? But that makes no sense.
SQL : SELECT name FROM persons WHERE age > 20
strakell : [ r["name"] for r in persons if r["age"] > 20 ]
You can also create an index : persons.create_index("age")
and then use it like this : persons.age[20] = list of the records where
age = 20
Other pure-Python databases : ZODB (probably overkill for a small
database) and Durus (I didn't test it)
As said in others answers, the inclusion of SQLite in the standard
distribution might make pure-Python solutions less attractive
Regards,
Pierre
I usually use anydbm when I want something quick and simple.
I could get all theoretical about why that's not so in most cases,
but there are plenty of cases where it is so (especially when the
person doing the DB doesn't get the idea behind all filesystems,
which is that they are themselves simplified databases), so
I won't*.
In this case, the filesystem may be the best place to
do the work, because it's the cheapest to implement
and maintain.
--Blair
* - okay, I will
1. Since the filesystem is a database, making accesses
to it after being directed there by a database means you're
using two database systems (and an intervening operating
system) to do one thing. Serious databases work from
disks with no filesystem to get rid of that extra layer entirely.
But there are benefits to having things in files reachable by
ordinary tools, and to having the OS mediating access to
the data, but you need to be sure you need those benefits
and can afford the overhead. Academic in most cases,
including the one that started this thread.
2. When using the filesystem as the database
you only get one kind of native association, and have to
use semantics in the directory and filenames to give you
hints as to the type stored at a particular location. You get a
few pieces of accounting data (mod times, etc.) in the
directory listing, but can't associate anything else with
the file directly, at least not unless you create another
file that has the associated data in it, or stuff the extra
data in the file itself, but then that makes each file
a database...see where it goes? Sometimes it's better
to come up with a schema you can extend rationally to
fit the problem you are trying to solve.
--Blair
2.5 will include the sqlite library itself on Windows (and Macs? I
forget) but you need the to install the library separately on Linux
boxes, which is generally about as complicated as apt-get install
sqlite-dev.
Item 1 - The OP who specifically said he wanted to store 100's
of files. You rarely need a database to store 100's of anything
and the overhead of installing and maintaining one isn't typically
worth the effort. Store the info in a text file and read the
entire file into memory and do linear searches. Python can search
100's of items in a list faster than you can even begin an SQL
query.
Item 2 - You will note that I said "If you need multiple indexes
into these files, then use a database, but only for the indexes
that point to the files on the filesystem". You sometimes need
multiple indexes (which databases are GREAT at providing).
As far as "rational extension" is concerned, I think I can relate.
As a developer of imaging systems that store multiple-millions of
scanned pieces of paper online for customers, I can promise you
the file system is quite efficient at storing files (and that is
what the OP asked for in the original post) and way better than
storing in Oracle blobs. Can you store them in the database,
absolutely. Is it efficient and manageable. It has been our
experience that it is not. Ever tried to upgrade Oracle 9 to
Oracle 10 with a Tb of blobs?
-Larry
I was under the impression that you still have to install the sqlite
executable but that's only for compiling from source: "If you're
compiling the Python source yourself, note that the source tree
doesn't include the SQLite code, only the wrapper module."
Thorsten
I keep hearing complaints about Oracle's blob handling and I don't
doubt they're true, but that sounds like an Oracle problem. I haven't
had any problems using blobs in MySQL though I've been a fairly
lightweight user.
You don't _need_ to install the SQlite executable[s] -- maybe the
_libraries_, unless they come bundled w/your Python distro (typically
the case on Win and Mac, but some "sumo distros" for other OSs may
choose to do the same).
Alex
For small numbers of blobs it works fine. The problem comes about,
more specifically, because Oracle's method for upgrading from one
version to another is Export, create new database, Import. Exporting
of a large number of blobs is slow, requires lots of disk space, etc.
If the blobs are on the filesystem with a pointer in the database,
upgrading is is MUCH easier. Granted I'm talking about millions of
pages of scanned .TIF images here. Not a few files.
-Larry
> For small numbers of blobs it works fine. The problem comes about,
> more specifically, because Oracle's method for upgrading from one
> version to another is Export, create new database, Import.
Does "Pray" come before or after the steps you mentioned?
</F>
Since no one's mentioned it:
--
> - SnakeSQL : another SQL engine, less mature I think and very slow when
> I tested it
And strange bugs when I used it.
> - buzhug : Pythonic syntax (uses list comprehensions or methods like
> create(), select() on the db object), much faster than all the above.
> I'm obviously biaised : I wrote it...
Looks cool! Apperently there are still mavericks who believe in "Python
first" while all others prefer refering to "standards" or what they
personally believe those standards to be [1]
Just one stupid remark since the limits of my language are the limits
of my world: I've not the slightest association with the seemingly
nonsense word "buzhug" and don't even know how to pronounce it
correctly. Would you have the kindness to enlighten me/us ?
> Just one stupid remark since the limits of my language are the limits
> of my world: I've not the slightest association with the seemingly
> nonsense word "buzhug" and don't even know how to pronounce it
> correctly. Would you have the kindness to enlighten me/us ?
I simply assumed it was "guhzub" backwards.
Cliff
--
Summarizing:
Those who were willing to consider a database suggested:
anydbm
Gadfly
SQLite (included with Python 2.5)
Schevo
Some preferred using the file system.
The core suggestion was to choose a directory structure
along with special naming conventions to indicate relationships.
Not all who suggested this said how to store info about the files.
One suggestion was:
Store the info in a text file and read the
entire file into memory and do linear searches. Python can search
100's of items in a list faster than you can even begin an SQL query.
Alan Isaac
Can't be any harder than switching between incompatible filesystems,
unless you assume it should "just work...".
--Blair
> Can't be any harder than switching between incompatible filesystems,
> unless you assume it should "just work...".
so what file systems are you using that don't support file names and
binary data ?
</F>
Buzhug means "earthworm", the big long brown worms that you find when
you dig ; the shape is the same as a python, only smaller and less
dangerous...
You pronounce it "buzuk", with the French "u" or German "ü"
Karrigell means "cart" and strakell, any sort of engine that you don't
know its name. Bot rhyme with "hell" ; a and r like in French, g like
in goat
Now you know 3 words of Breton !
Regards,
Pierre
Thanks !!!
Mmmm, no.
I'm saying that the change from Oracle 9 to Oracle 10 is like changing
from ffs to fat32.
They have different structures related to the location and
identification of every stored object. Sometimes different storage
structures (block sizes, block organization, fragmentation rules, etc.)
for the insides of a file.
A filesystem is a specialized database that stores generalized data.
The value of a database program and its data storage system is that you
can get the filesystem out of the way, and deal only in one layer of
searching and retrieval.
A DB may be only trivially more efficient when the data are a
collection of very large objects with a few externally associated
attributes that can all be found in the average filesystem's directory
structures; but a DB doing raw accesses on a bare disk is a big
improvement in speed when dealing with a huge collection of relatively
small data, each with a relatively large number of inconsistently
associated attributes.
The tradeoff is that you end up giving your DB vendor the option of
making you have to offload and reload that disk if they change their
system between versions.
--Blair
> Mmmm, no.
>
> I'm saying that the change from Oracle 9 to Oracle 10 is like changing
> from ffs to fat32.
well, I'm quite sure that the people I know who's spending a lot of
their time moving stuff from Oracle N to Oracle N+1 (and sometimes
getting stuck, due to incompatibilities between SQL and SQL and a lack
of infinite resources) would say you're completely and utterly nuts.
</F>
You haven't provided enough requirements for us
to make any intelligent suggestions. Perhaps you
might learn something from reading through my old
EuroPython presentation.
http://www.thinkware.se/cgi-bin/thinki.cgi/DatabaseProgrammingWithPython
Relational databases with SQL syntax provides a convenient
way to store data with an appropriate structure. You can
always force a tool into handling things it wasn't designed
for, but SQL database work best when you have strict, well
defined structures, such as in accounting systems, booking
systems etc. It gives you a declarative query language,
transaction handling, typically multi user support and
varying degrees of scalability and high availability
features.
For you, it's probably overkill, and if you have files
to start with, keeping them in the file system is the
natural thing to do. That means that you can use a lot
of standard tools to access, manipulate, backup and search
through them. Perhaps you rather need a search engine for
the file system?
Do you intend to store information concerning how these
files relate to each other? Perhaps it's better in that
case to just keep that relationship information in some
small database system, and to keep the actual files in
the file system.
Perhaps it's enough to keep an XML file with the structure,
and to use something like ElementTree to manipulate that
XML structure.
You gain a lot of power, robustness and flexibility by
using some kind of plain text format. Simple files play
well with configuration management systems, backup systems,
editors, standard search tools, etc. If you use XML, it's
also easier to transform your structural information to
some presentable layout through standard techniques such
as XSL.
You missed buzhug:
http://buzhug.sourceforge.net/
A very thorough pure Python database.
Maybe they'd just be hyperbolic from the frustration. Filesystems
/are/ databases, and incompatibilities /are/ incompatibilities. And
without ANSI, the SQL problem could be like incompatibilities in C.
Not unheard-of. Not at all.
--Blair