best small database?

David Isaac

unread,

Sep 11, 2006, 9:23:08 AM9/11/06

to

I have no experience with database applications.
This database will likely hold only a few hundred items,
including both textfiles and binary files.

I would like a pure Python solution to the extent reasonable.

Suggestions?

Thank you,
Alan Isaac

Thorsten Kampe

unread,

Sep 11, 2006, 10:00:37 AM9/11/06

to

* David Isaac (2006-09-11 14:23 +0100)

> I have no experience with database applications.
> This database will likely hold only a few hundred items,
> including both textfiles and binary files.
>
> I would like a pure Python solution to the extent reasonable.

Gadfly?

Larry Bates

unread,

Sep 11, 2006, 10:13:27 AM9/11/06

to

If they are files, just use the filesyatem, you don't need a
database. If you need multiple indexes into these files, then
use a database, but only for the indexes that point to the
files on the filesystem. The filesystem is almost always the
most efficient place to store files, not as blobs in a
database.

The answer about which database depends on your target
platform but you could consider gadfly.

-Larry Bates

Paul Watson

unread,

Sep 11, 2006, 10:19:42 AM9/11/06

to

If you want really simple, look at the anydbm module. If nothing better
is available, anydbm will use dumbdbm. All of these are in the Python
build, so you do not need to fetch/read/install anything additional.

Doing the DB-API would be much stronger, but might be overkill in your
situation.

Laurent Pointal

unread,

Sep 11, 2006, 11:17:19 AM9/11/06

to

David Isaac a écrit :

May take a look at buzhug (very pythonic way to manipulate data in the
base).

http://buzhug.sourceforge.net/

>
> Thank you,
> Alan Isaac
>
>

Aahz

unread,

Sep 11, 2006, 11:34:38 AM9/11/06

to

In article <4505707E...@redlinepy.com>,

Paul Watson <pwa...@redlinepy.com> wrote:
>David Isaac wrote:
>>
>> I have no experience with database applications.
>> This database will likely hold only a few hundred items,
>> including both textfiles and binary files.
>>
>> I would like a pure Python solution to the extent reasonable.
>>
>> Suggestions?
>

>If you want really simple, look at the anydbm module. If nothing better
>is available, anydbm will use dumbdbm. All of these are in the Python
>build, so you do not need to fetch/read/install anything additional.
>
>Doing the DB-API would be much stronger, but might be overkill in your
>situation.

Once Python 2.5 comes out, I recommend using sqlite because it avoids
the mess that dbm can cause.
--
Aahz (aa...@pythoncraft.com) <*> http://www.pythoncraft.com/

"LL YR VWL R BLNG T S" -- www.nancybuttons.com

Paul McGuire

unread,

Sep 11, 2006, 12:13:58 PM9/11/06

to

"Aahz" <aa...@pythoncraft.com> wrote in message
news:ee3vme$cjj$1...@panix3.panix.com...
> In article <4505707E...@redlinepy.com>,

> Once Python 2.5 comes out, I recommend using sqlite because it avoids
> the mess that dbm can cause.
> --
> Aahz (aa...@pythoncraft.com) <*>
> http://www.pythoncraft.com/

and if you don't want to wait for 2.5, you can install pysqlite without too
much trouble - and it is *very* easy to use!

For SQLite design and data browsing, check out the SQLite Browser at
http://sqlitebrowser.sourceforge.net.

-- Paul

John Salerno

unread,

Sep 11, 2006, 1:28:25 PM9/11/06

to

Paul McGuire wrote:
> "Aahz" <aa...@pythoncraft.com> wrote in message
> news:ee3vme$cjj$1...@panix3.panix.com...
>> In article <4505707E...@redlinepy.com>,
>> Once Python 2.5 comes out, I recommend using sqlite because it avoids
>> the mess that dbm can cause.
>> --
>> Aahz (aa...@pythoncraft.com) <*>
>> http://www.pythoncraft.com/
>
> and if you don't want to wait for 2.5, you can install pysqlite without too
> much trouble - and it is *very* easy to use!

Yeah, just be sure to do this:

from pysqlite import dbapi2 as sqlite3

then you're ready for 2.5! :)

Thorsten Kampe

unread,

Sep 11, 2006, 2:00:28 PM9/11/06

to

* Aahz (2006-09-11 16:34 +0100)

> In article <4505707E...@redlinepy.com>,
> Paul Watson <pwa...@redlinepy.com> wrote:
>>David Isaac wrote:
>>>
>>> I have no experience with database applications.
>>> This database will likely hold only a few hundred items,
>>> including both textfiles and binary files.
>>>
>>> I would like a pure Python solution to the extent reasonable.
>>>
>>> Suggestions?
>>
>>If you want really simple, look at the anydbm module. If nothing better
>>is available, anydbm will use dumbdbm. All of these are in the Python
>>build, so you do not need to fetch/read/install anything additional.
>>
>>Doing the DB-API would be much stronger, but might be overkill in your
>>situation.
>
> Once Python 2.5 comes out, I recommend using sqlite because it avoids
> the mess that dbm can cause.

But sqlite is not "pure Python" because it's just a wrapper around
sqlite (which has to be installed separately)...

Thorsten

John Salerno

unread,

Sep 11, 2006, 2:58:47 PM9/11/06

to

Thorsten Kampe wrote:

> But sqlite is not "pure Python" because it's just a wrapper around
> sqlite (which has to be installed separately)...

But that's the point. Once 2.5 is released, sqlite is built-in. Unless
there's more to it that I don't know, and something must still be
installed? But that makes no sense.

Pierre Quentel

unread,

Sep 11, 2006, 3:38:48 PM9/11/06

to

Here are some pure-Python databases :
- gadfly : an SQL engine, mature and well tested, works in memory so
not fit for large data sets
- SnakeSQL : another SQL engine, less mature I think and very slow when
I tested it
- KirbyBase : stores data in a single file ; uses a more Pythonic
syntax (no SQL) ; no size limit but performance decreases very much
with the size. It looked promising but the last version is more than 1
year old and the author seems to focus on the Ruby version now
- buzhug : Pythonic syntax (uses list comprehensions or methods like
create(), select() on the db object), much faster than all the above.
I'm obviously biaised : I wrote it...
- for a small set of data you could also try strakell, the recipe I
published on the Python Cookbook :
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/496770
With less than 200 lines of code, it's a very fast in-memory db
engine that also uses list comprehensions for requests :

SQL : SELECT name FROM persons WHERE age > 20
strakell : [ r["name"] for r in persons if r["age"] > 20 ]

You can also create an index : persons.create_index("age")
and then use it like this : persons.age[20] = list of the records where
age = 20

Other pure-Python databases : ZODB (probably overkill for a small
database) and Durus (I didn't test it)

As said in others answers, the inclusion of SQLite in the standard
distribution might make pure-Python solutions less attractive

Regards,
Pierre

Paul Rubin

unread,

Sep 11, 2006, 4:51:32 PM9/11/06

to

"David Isaac" <ais...@verizon.net> writes:
> I have no experience with database applications.
> This database will likely hold only a few hundred items,
> including both textfiles and binary files.
>
> I would like a pure Python solution to the extent reasonable.

I usually use anydbm when I want something quick and simple.

Blair P. Houghton

unread,

Sep 11, 2006, 5:14:56 PM9/11/06

to

Larry Bates wrote:
> The filesystem is almost always the
> most efficient place to store files, not as blobs in a
> database.

I could get all theoretical about why that's not so in most cases,
but there are plenty of cases where it is so (especially when the
person doing the DB doesn't get the idea behind all filesystems,
which is that they are themselves simplified databases), so
I won't*.

In this case, the filesystem may be the best place to
do the work, because it's the cheapest to implement
and maintain.

--Blair

* - okay, I will

1. Since the filesystem is a database, making accesses
to it after being directed there by a database means you're
using two database systems (and an intervening operating
system) to do one thing. Serious databases work from
disks with no filesystem to get rid of that extra layer entirely.
But there are benefits to having things in files reachable by
ordinary tools, and to having the OS mediating access to
the data, but you need to be sure you need those benefits
and can afford the overhead. Academic in most cases,
including the one that started this thread.

2. When using the filesystem as the database
you only get one kind of native association, and have to
use semantics in the directory and filenames to give you
hints as to the type stored at a particular location. You get a
few pieces of accounting data (mod times, etc.) in the
directory listing, but can't associate anything else with
the file directly, at least not unless you create another
file that has the associated data in it, or stuff the extra
data in the file itself, but then that makes each file
a database...see where it goes? Sometimes it's better
to come up with a schema you can extend rationally to
fit the problem you are trying to solve.

--Blair

Aahz

unread,

Sep 11, 2006, 5:37:34 PM9/11/06

to

In article <HliNg.2734$No6....@news.tufts.edu>,
John Salerno <john...@NOSPAMgmail.com> wrote:
>Thorsten Kampe wrote:
>.

2.5 will include the sqlite library itself on Windows (and Macs? I
forget) but you need the to install the library separately on Linux
boxes, which is generally about as complicated as apt-get install
sqlite-dev.

Larry Bates

unread,

Sep 11, 2006, 7:27:41 PM9/11/06

to

Not quite sure why response "bothered" you so much but it
appears it did. I'll admit that I was doing my best to read
the OP's mind in my answer.

Item 1 - The OP who specifically said he wanted to store 100's
of files. You rarely need a database to store 100's of anything
and the overhead of installing and maintaining one isn't typically
worth the effort. Store the info in a text file and read the
entire file into memory and do linear searches. Python can search
100's of items in a list faster than you can even begin an SQL
query.

Item 2 - You will note that I said "If you need multiple indexes

into these files, then use a database, but only for the indexes

that point to the files on the filesystem". You sometimes need
multiple indexes (which databases are GREAT at providing).

As far as "rational extension" is concerned, I think I can relate.
As a developer of imaging systems that store multiple-millions of
scanned pieces of paper online for customers, I can promise you
the file system is quite efficient at storing files (and that is
what the OP asked for in the original post) and way better than
storing in Oracle blobs. Can you store them in the database,
absolutely. Is it efficient and manageable. It has been our
experience that it is not. Ever tried to upgrade Oracle 9 to
Oracle 10 with a Tb of blobs?

-Larry

Thorsten Kampe

unread,

Sep 11, 2006, 7:35:18 PM9/11/06

to

* John Salerno (2006-09-11 19:58 +0100)

I was under the impression that you still have to install the sqlite
executable but that's only for compiling from source: "If you're
compiling the Python source yourself, note that the source tree
doesn't include the SQLite code, only the wrapper module."

Thorsten

Paul Rubin

unread,

Sep 11, 2006, 7:43:11 PM9/11/06

to

Larry Bates <larry...@websafe.com> writes:
> As far as "rational extension" is concerned, I think I can relate.
> As a developer of imaging systems that store multiple-millions of
> scanned pieces of paper online for customers, I can promise you
> the file system is quite efficient at storing files (and that is
> what the OP asked for in the original post) and way better than
> storing in Oracle blobs. Can you store them in the database,
> absolutely. Is it efficient and manageable. It has been our
> experience that it is not. Ever tried to upgrade Oracle 9 to
> Oracle 10 with a Tb of blobs?

I keep hearing complaints about Oracle's blob handling and I don't
doubt they're true, but that sounds like an Oracle problem. I haven't
had any problems using blobs in MySQL though I've been a fairly
lightweight user.

Alex Martelli

unread,

Sep 12, 2006, 12:58:43 AM9/12/06

to

Thorsten Kampe <thor...@thorstenkampe.de> wrote:

You don't _need_ to install the SQlite executable[s] -- maybe the
_libraries_, unless they come bundled w/your Python distro (typically
the case on Win and Mac, but some "sumo distros" for other OSs may
choose to do the same).

Alex

Larry Bates

unread,

Sep 12, 2006, 12:22:31 PM9/12/06

to

For small numbers of blobs it works fine. The problem comes about,
more specifically, because Oracle's method for upgrading from one
version to another is Export, create new database, Import. Exporting
of a large number of blobs is slow, requires lots of disk space, etc.
If the blobs are on the filesystem with a pointer in the database,
upgrading is is MUCH easier. Granted I'm talking about millions of
pages of scanned .TIF images here. Not a few files.

-Larry

Fredrik Lundh

unread,

Sep 12, 2006, 1:22:04 PM9/12/06

to pytho...@python.org

Larry Bates wrote:

> For small numbers of blobs it works fine. The problem comes about,
> more specifically, because Oracle's method for upgrading from one
> version to another is Export, create new database, Import.

Does "Pray" come before or after the steps you mentioned?

</F>

Cliff Wells

unread,

Sep 12, 2006, 2:50:30 PM9/12/06

to David Isaac, pytho...@python.org

On Mon, 2006-09-11 at 13:23 +0000, David Isaac wrote:
> I have no experience with database applications.
> This database will likely hold only a few hundred items,
> including both textfiles and binary files.
>
> I would like a pure Python solution to the extent reasonable.

Since no one's mentioned it:

http://schevo.org/trac/wiki

--

Kay Schluehr

unread,

Sep 12, 2006, 3:29:15 PM9/12/06

to

Pierre Quentel wrote:

> - SnakeSQL : another SQL engine, less mature I think and very slow when
> I tested it

And strange bugs when I used it.

> - buzhug : Pythonic syntax (uses list comprehensions or methods like
> create(), select() on the db object), much faster than all the above.
> I'm obviously biaised : I wrote it...

Looks cool! Apperently there are still mavericks who believe in "Python
first" while all others prefer refering to "standards" or what they
personally believe those standards to be [1]

Just one stupid remark since the limits of my language are the limits
of my world: I've not the slightest association with the seemingly
nonsense word "buzhug" and don't even know how to pronounce it
correctly. Would you have the kindness to enlighten me/us ?

[1]
http://groups.google.com/group/comp.lang.python/browse_frm/thread/8ba5fb96d3117e46/6410ac0dbbf23931?hl=en#6410ac0dbbf23931

Cliff Wells

unread,

Sep 12, 2006, 3:55:34 PM9/12/06

to pytho...@python.org

On Tue, 2006-09-12 at 12:29 -0700, Kay Schluehr wrote:

> Just one stupid remark since the limits of my language are the limits
> of my world: I've not the slightest association with the seemingly
> nonsense word "buzhug" and don't even know how to pronounce it
> correctly. Would you have the kindness to enlighten me/us ?

I simply assumed it was "guhzub" backwards.

Cliff

--

David Isaac

unread,

Sep 12, 2006, 10:11:41 PM9/12/06

to

Thanks to all for the suggestions and much else
to think about.

Summarizing:

Those who were willing to consider a database suggested:
anydbm
Gadfly
SQLite (included with Python 2.5)
Schevo

Some preferred using the file system.
The core suggestion was to choose a directory structure
along with special naming conventions to indicate relationships.
Not all who suggested this said how to store info about the files.
One suggestion was:

Store the info in a text file and read the
entire file into memory and do linear searches. Python can search
100's of items in a list faster than you can even begin an SQL query.

Alan Isaac

Blair P. Houghton

unread,

Sep 13, 2006, 1:33:15 AM9/13/06

to

Larry Bates wrote:
> As far as "rational extension" is concerned, I think I can relate.
> As a developer of imaging systems that store multiple-millions of
> scanned pieces of paper online for customers, I can promise you
> the file system is quite efficient at storing files (and that is
> what the OP asked for in the original post) and way better than
> storing in Oracle blobs. Can you store them in the database,
> absolutely. Is it efficient and manageable. It has been our
> experience that it is not. Ever tried to upgrade Oracle 9 to
> Oracle 10 with a Tb of blobs?

Can't be any harder than switching between incompatible filesystems,
unless you assume it should "just work...".

--Blair

Fredrik Lundh

unread,

Sep 13, 2006, 1:40:47 AM9/13/06

to pytho...@python.org

Blair P. Houghton wrote:

> Can't be any harder than switching between incompatible filesystems,
> unless you assume it should "just work...".

so what file systems are you using that don't support file names and
binary data ?

</F>

Pierre Quentel

unread,

Sep 13, 2006, 6:24:08 AM9/13/06

to

Buzhug (like Karrigell and Strakell) is a Breton word ; Breton is the
language spoken in Brittany, the westernmost part of France. Less and
less spoken, actually, but I do, like all my ancestors. It is a close
cousin of Welsh, and has common roots with Irish and Gaelic

Buzhug means "earthworm", the big long brown worms that you find when
you dig ; the shape is the same as a python, only smaller and less
dangerous...

You pronounce it "buzuk", with the French "u" or German "ü"

Karrigell means "cart" and strakell, any sort of engine that you don't
know its name. Bot rhyme with "hell" ; a and r like in French, g like
in goat

Now you know 3 words of Breton !

Regards,
Pierre

Kay Schluehr

unread,

Sep 13, 2006, 10:12:30 AM9/13/06

to

Thanks !!!

Blair P. Houghton

unread,

Sep 13, 2006, 9:49:52 PM9/13/06

to

Mmmm, no.

I'm saying that the change from Oracle 9 to Oracle 10 is like changing
from ffs to fat32.

They have different structures related to the location and
identification of every stored object. Sometimes different storage
structures (block sizes, block organization, fragmentation rules, etc.)
for the insides of a file.

A filesystem is a specialized database that stores generalized data.

The value of a database program and its data storage system is that you
can get the filesystem out of the way, and deal only in one layer of
searching and retrieval.

A DB may be only trivially more efficient when the data are a
collection of very large objects with a few externally associated
attributes that can all be found in the average filesystem's directory
structures; but a DB doing raw accesses on a bare disk is a big
improvement in speed when dealing with a huge collection of relatively
small data, each with a relatively large number of inconsistently
associated attributes.

The tradeoff is that you end up giving your DB vendor the option of
making you have to offload and reload that disk if they change their
system between versions.

--Blair

Fredrik Lundh

unread,

Sep 14, 2006, 3:15:01 AM9/14/06

to pytho...@python.org

Blair P. Houghton wrote:

> Mmmm, no.
>
> I'm saying that the change from Oracle 9 to Oracle 10 is like changing
> from ffs to fat32.

well, I'm quite sure that the people I know who's spending a lot of
their time moving stuff from Oracle N to Oracle N+1 (and sometimes
getting stuck, due to incompatibilities between SQL and SQL and a lack
of infinite resources) would say you're completely and utterly nuts.

</F>

Magnus Lycka

unread,

Sep 14, 2006, 7:47:06 AM9/14/06

to

David Isaac wrote:
> I have no experience with database applications.
> This database will likely hold only a few hundred items,
> including both textfiles and binary files.
>
> I would like a pure Python solution to the extent reasonable.
>

> Suggestions?

You haven't provided enough requirements for us
to make any intelligent suggestions. Perhaps you
might learn something from reading through my old
EuroPython presentation.

http://www.thinkware.se/cgi-bin/thinki.cgi/DatabaseProgrammingWithPython

Relational databases with SQL syntax provides a convenient
way to store data with an appropriate structure. You can
always force a tool into handling things it wasn't designed
for, but SQL database work best when you have strict, well
defined structures, such as in accounting systems, booking
systems etc. It gives you a declarative query language,
transaction handling, typically multi user support and
varying degrees of scalability and high availability
features.

For you, it's probably overkill, and if you have files
to start with, keeping them in the file system is the
natural thing to do. That means that you can use a lot
of standard tools to access, manipulate, backup and search
through them. Perhaps you rather need a search engine for
the file system?

Do you intend to store information concerning how these
files relate to each other? Perhaps it's better in that
case to just keep that relationship information in some
small database system, and to keep the actual files in
the file system.

Perhaps it's enough to keep an XML file with the structure,
and to use something like ElementTree to manipulate that
XML structure.

You gain a lot of power, robustness and flexibility by
using some kind of plain text format. Simple files play
well with configuration management systems, backup systems,
editors, standard search tools, etc. If you use XML, it's
also easier to transform your structural information to
some presentable layout through standard techniques such
as XSL.

metaperl

unread,

Sep 14, 2006, 5:14:02 PM9/14/06

to

David Isaac wrote:
> Thanks to all for the suggestions and much else
> to think about.
>
> Summarizing:
>
> Those who were willing to consider a database suggested:
> anydbm
> Gadfly
> SQLite (included with Python 2.5)
> Schevo

You missed buzhug:
http://buzhug.sourceforge.net/

A very thorough pure Python database.

Blair P. Houghton

unread,

Sep 14, 2006, 8:33:30 PM9/14/06

to

Fredrik Lundh wrote:
> Blair P. Houghton wrote:

Maybe they'd just be hyperbolic from the frustration. Filesystems
/are/ databases, and incompatibilities /are/ incompatibilities. And
without ANSI, the SQL problem could be like incompatibilities in C.
Not unheard-of. Not at all.

--Blair