Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[Q] Databases -- Gadfly vs Metakit vs SQLite

1 view
Skip to first unread message

ach...@easystreet.com

unread,
Mar 15, 2003, 2:31:47 PM3/15/03
to
I'm looking for a simple database for a single-user Windows desktop
application. The user's data will be text, maybe 10-20 data tables,
with identifying keys and a few short fields plus plenty of long
text memo-type fields. I should like to plan for databases up to
about 50k rows and 100 MB of data, but the typical will be far less
than this.

It looks like Python gives me 3 good choices for this, Gadfly,
SQLite, Metakit. I'd like comments that compare and contrast these
for such an application.

This is a desktop productivity app, so I'd like to get snappy response,
no 1 or 2 second delays for a single update or retrieval. This should
work to such a standard on a user's 300 MHz, 64 MB Windows machine.

Does Gadfly need to fetch the entire database into RAM? This would be
fine for me 99% of the time, but I don't want to worry about the user
with the big database and the small machine.

Other criteria:

1. Very high reliability -- Which of these can give us zero database
failures from 1000's of users?

2. Simple -- We want to hide all details of the database from the
end-users, and easy to program wouldn't hurt either. It looks like
all three will meet this criteria. Any traps?

3. Storage efficiency -- We don't want 10 MB of user data to take
100 MB of disk space. I know that disk is cheap, but we want the
data to be easy to back up, send, and transport.

The hierarchical non-SQL aspects of Metakit are probably OK.

Can anyone with experience with more than one of these advise?

Thanks much in advance.


Al

Tim Churches

unread,
Mar 15, 2003, 6:29:39 PM3/15/03
to
On Sun, 2003-03-16 at 06:31, ach...@easystreet.com wrote:
> I'm looking for a simple database for a single-user Windows desktop
> application. The user's data will be text, maybe 10-20 data tables,
> with identifying keys and a few short fields plus plenty of long
> text memo-type fields. I should like to plan for databases up to
> about 50k rows and 100 MB of data, but the typical will be far less
> than this.
>
> It looks like Python gives me 3 good choices for this, Gadfly,
> SQLite, Metakit. I'd like comments that compare and contrast these
> for such an application.

Don't forget Sleepycat's Berkeley DB (BSD DB)- see
http://www.sleepycat.com/ - which has a good reputation for robustness
and efficiency. It also has the advantage of being part of the standard
Python distribution on most platforms. The big problem is that the
ancient version of BSD DB (1.8) included in versions of Python up to
2.2 has been terribly broken on many platforms. This situation has
thankfully been resolved in Python 2.3, which includes a recent release
of BSD DB and the latest Python interface code for it
(http://pybsddb.sourceforge.net).

>
> This is a desktop productivity app, so I'd like to get snappy response,
> no 1 or 2 second delays for a single update or retrieval. This should
> work to such a standard on a user's 300 MHz, 64 MB Windows machine.

That might rule out Gadfly - it depends whether appropriate indexes will
be available to optimise all your queries.

>
> Does Gadfly need to fetch the entire database into RAM? This would be
> fine for me 99% of the time, but I don't want to worry about the user
> with the big database and the small machine.

Yes, it does, AFAIK.

> 1. Very high reliability -- Which of these can give us zero database
> failures from 1000's of users?

BSD DB is probably the most widely used, but all of the systems you have
identified are of production quality. But that doesn't mean you
shouldn't test, test, test in the context of your app.

> 3. Storage efficiency -- We don't want 10 MB of user data to take
> 100 MB of disk space. I know that disk is cheap, but we want the
> data to be easy to back up, send, and transport.

I think they all support variable length text fields.

> Can anyone with experience with more than one of these advise?

Metakit is great but performance drops dramatically when the number of
records rises above about 250,000 - this is clearly acknowledged in the
Metakit documentation, and it is true. BSD DB performance when using the
hash table format also falls off dramatically with more than a few
hundred thousand records (but there are lots of tuning parameters which
might improve this). I've only just started to evaluate SQLite, but it
looks like its performance holds up well with large numbers of records.
Its main drawback is that everything is stored as text - but that
doesn't sound like a problem for your application (nor mine). The SQLite
data structure, in which records are stored on the leaves of a b-tree in
which integer record IDs are the keys, is interesting. B-trees are also
used for indexes.

Please summarise the responses you receive - embedded persistence for
Python is a topic of ongoing interest for many people.

--

Tim C

PGP/GnuPG Key 1024D/EAF993D0 available from keyservers everywhere
or at http://members.optushome.com.au/tchur/pubkey.asc
Key fingerprint = 8C22 BF76 33BA B3B5 1D5B EB37 7891 46A9 EAF9 93D0

signature.asc

yaipa h.

unread,
Mar 15, 2003, 7:12:32 PM3/15/03
to
Al,

If you are not worried about cross platform, then why not
use the Jet (access) DB. I'm not sure if it comes with
Office or the OS these days, but it seems to be on just
about every uSoft machine I've looked at lately. It is
also dead simple to use.

For info on Access, search c.l.p for
"Access, DB, [Martelli | Holden | Hammond]"

In some weird coincidence Alex Martelli, Steve Holden and
Mark Hammond each have at least one very good book out on Python.


Cheers,

-Alan


ach...@easystreet.com wrote in message news:<3E737FA3...@easystreet.com>...

ach...@easystreet.com

unread,
Mar 15, 2003, 8:01:56 PM3/15/03
to
"yaipa h." wrote:
>
> If you are not worried about cross platform, then why not
> use the Jet (access) DB.

Yes, that's part of Windows, so it's almost safe to assume that it's
there, and you can get to it through com. However, I don't want to
worry one bit about what version is there or if MS will put in changes
in subsequent releases that will break my application. And I'd rather
embed the database so that I don't have to worry about the user changing
it with an MS desktop app to violate my rules of relational integrity
and crash my app.


Al

yaipa h.

unread,
Mar 16, 2003, 12:57:28 AM3/16/03
to
Al,

All fair statements and valid concerns. Just ask all the VB6
programmers looking down the barrel of VB.net.

-Al2


ach...@easystreet.com wrote in message news:<3E73CD04...@easystreet.com>...

Alex Martelli

unread,
Mar 16, 2003, 2:48:11 AM3/16/03
to
ach...@easystreet.com wrote:

Thus speaks the voice of experience!!!

Let me commend and second this attitude: unless you have specific
user requirements to the contrary, AVOID relying on closed-source
components if you can. By using open-source components, and bundling
them with your application if feasible, you may be able to make your
applications more long-lived and thus more valuable, without being
at the mercy of the closed-source supplier.


Alex

Alex Martelli

unread,
Mar 16, 2003, 3:04:32 AM3/16/03
to
ach...@easystreet.com wrote:

> I'm looking for a simple database for a single-user Windows desktop
> application. The user's data will be text, maybe 10-20 data tables,
> with identifying keys and a few short fields plus plenty of long
> text memo-type fields. I should like to plan for databases up to
> about 50k rows and 100 MB of data, but the typical will be far less
> than this.
>
> It looks like Python gives me 3 good choices for this, Gadfly,
> SQLite, Metakit. I'd like comments that compare and contrast these
> for such an application.

I see you've already received good answers, and I'd just like to
stress one factor: by far the best approach is to *benchmark* the
components under consideration for some simple simulation of your
intended workloads -- "typical", "maximum reasonable", "unreasonably
heavy but we don't want to break outright".

Benchmarking might take an unreasonable amount of effort in some
cases, but with Python that's not really true -- in fact benchmarking
will encourage you to write a simple "database-independence layer"
that exists in several implementations with the same interface,
and have all of your application-level code use that layer, which
is a *VERY* good thing. Strewing database-engine-dependent code
all over the place is something that happens VERY often (because
it looks "convenient"...) unless you take specific steps to help
yourself avoid it -- and developing to a "db independence layer"
helps a LOT.


> This is a desktop productivity app, so I'd like to get snappy response,
> no 1 or 2 second delays for a single update or retrieval. This should
> work to such a standard on a user's 300 MHz, 64 MB Windows machine.

Then this is the machine you should be running benchmarks on, and
you already know the targets for what you consider acceptable
performance -- this puts you WAY ahead of most projects at this
stage, where it's never very clear at the outset (in my experience)
whether "one second" counts as "snappy" or whether we need to deal
with typical machines sold today (over a GHz CPU, 128 MB or more
often 256 MB of DDR or rambus, latest and resource-hungry versions
of Win/XP) or legacy machines with a resale value under $100 such
as the one you describe.


> Does Gadfly need to fetch the entire database into RAM? This would be
> fine for me 99% of the time, but I don't want to worry about the user
> with the big database and the small machine.

Yes, this constraint is present (at least per-table -- not sure if
per-database also still applies, but if "not worrying" is the criterion
then that may not matter).


> Other criteria:
>
> 1. Very high reliability -- Which of these can give us zero database
> failures from 1000's of users?

I _think_ (but have no hard data) this holds for them all.


> 2. Simple -- We want to hide all details of the database from the
> end-users, and easy to program wouldn't hurt either. It looks like
> all three will meet this criteria. Any traps?

Ditto.


> 3. Storage efficiency -- We don't want 10 MB of user data to take
> 100 MB of disk space. I know that disk is cheap, but we want the
> data to be easy to back up, send, and transport.

I wouldn't worry about that when a simple zip or gzip will no
doubt help you solve it smoothly anyway.


> The hierarchical non-SQL aspects of Metakit are probably OK.

Then, as I see somebody else already said, BSD-DB (latest version)
should perhaps also be in your set of candidate DB components.
But take care: relational flexibility may save your bacon at
some point down the road...


> Can anyone with experience with more than one of these advise?

Guess I shouldn't have answered, since I haven't in fact deployed
end-user applications with ANY of these embedded-DB-engines. But
I think the "benchmark with a DB-independence layer" applies to
them just as well as it does to "heavy" engines such as PostgreSQL
or MS SQL Server...


Alex

David LeBlanc

unread,
Mar 16, 2003, 3:11:28 AM3/16/03
to

Aside from it's dubious parentage, Jet has a reputation for being
excruciatingly slow. I believe, but am not positive, that Outlook uses Jet
and there are times when Outlook is incredibly slow. With a lot of saved
mail message, and after a normal shutdown, Outlook can take several minutes
to startup. It can take a noticable time to switch from viewing one email to
another in the same folder!

Sqlite is my db of choice for it's rubustness, performance, licensing
(totally unencumbered!) and the fact that it comes from a developer with a
sterling reputation across no less than 3 language groups.

Dave LeBlanc
Seattle, WA USA


Chris Tavares

unread,
Mar 17, 2003, 2:30:18 AM3/17/03
to

"David LeBlanc" <whi...@oz.net> wrote in message
news:mailman.104780234...@python.org...

>
> Aside from it's dubious parentage, Jet has a reputation for being
> excruciatingly slow. I believe, but am not positive, that Outlook uses Jet
> and there are times when Outlook is incredibly slow. With a lot of saved
> mail message, and after a normal shutdown, Outlook can take several
minutes
> to startup. It can take a noticable time to switch from viewing one email
to
> another in the same folder!
>

Outlook does not use Jet for its storage. It'd actually be an improvement if
it did, I think. :-)

> Sqlite is my db of choice for it's rubustness, performance, licensing
> (totally unencumbered!) and the fact that it comes from a developer with a
> sterling reputation across no less than 3 language groups.
>

I also like Sqlite; I haven't had a need for it YET, but I really
appreciate the effort that the author has put into development AND
packaging. Sqlite is powerful, and incredibly easy to link into your app.
It's definately good stuff.

-Chris


Ganesan R

unread,
Mar 17, 2003, 3:25:41 AM3/17/03
to
>>>>> "Chris" == Chris Tavares <c...@tavaresstudios.com> writes:

> I also like Sqlite; I haven't had a need for it YET, but I really
> appreciate the effort that the author has put into development AND
> packaging. Sqlite is powerful, and incredibly easy to link into your app.
> It's definately good stuff.

I like it a lot too. PySQLite actually makes SQLite even more easy to
use. For e.g. while SQLite itself is untyped (all strings), PySQLite can
automatically convert the data to the real python type. You can also write
SQLite extension functions in Python. Finally, PySQLite sticks to the DB-API
standard very closely. I only had to make only a couple of changes to my
DB-API application to get it working with PySQLite.

I don't know much about Gadly. It doesn't seem to be very actively
maintained anymore. The fact that it maintains live databases in memory
makes it unsuitable for my use.

I don't know about much MetaKit either :-). I came across MetaKit after
PySQLite and I've been asking myself the same question that you asked in
this group :-). MetaKit API is quite interesting. I've not had any
programming experience with it.

Personally, I'd probably go with PySQLite if there is a possibility that the
application might grow beyond it's original "embedded" database. If that is
not a concern - I'd give a serious look at MetaKit; probably write the
application to both APIs as some one else suggested and take a final
decision after some benchmarking.

And let us know what you picked :-).

Ganesan

--
Ganesan R

Alex Martelli

unread,
Mar 17, 2003, 6:13:52 AM3/17/03
to
Ganesan R wrote:
...

> I don't know much about Gadly. It doesn't seem to be very actively
> maintained anymore. The fact that it maintains live databases in memory
> makes it unsuitable for my use.

The in-memory DB issue is one thing, but Gadfly (note the spelling)
does seem to be quite actively maintained to me. While the latest
release was 8 months ago, browsing the CVS trees shows bugs fixed
4 weeks ago, for example. That's gadfly.sourceforge.net and links
therefrom, of course.


Alex

A. Lloyd Flanagan

unread,
Mar 17, 2003, 9:44:57 AM3/17/03
to
"David LeBlanc" <whi...@oz.net> wrote in message news:<mailman.104780234...@python.org>...

> Aside from it's dubious parentage, Jet has a reputation for being


> excruciatingly slow. I believe, but am not positive, that Outlook uses Jet

Outlook doesn't use Jet AFAIK. Based on my experience, here's an
overview of Jet:
Pros:
1. Doesn't come with Windows, but does with Office, and it's easy
to obtain and install.
2. Fairly good SQL and referential integrity features.
3. Performance generally isn't bad, especially for databases of the
size you're describing.
4. You can set up the database and do a lot of data manipulation in
Access, for setting up test cases, evolving the database design, etc.
But see Con #4 below.

Cons:
1. Closed-source, Microsoft, yada yada yada...
2. Reliability: probably OK for a single-user app. Do NOT share a
database file over a network with more than one user doing updates, I
don't care what the documentation says. Unless you're perfectly OK
with rebuilding the database once a week or so and don't care if you
lose data.
3. A lot of access to Jet databases (including the python driver) is
through ODBC. As an early attempt at a general database driver, ODBC
isn't particularly well designed and often has problems if you stray
from basic database operations. Performance ain't great, either.
4. As you noted, anybody with Access can get into your database and
muck with it (although you can password-protect it). Of course,
anybody who does that deserves what they get.

Bottom line: I wouldn't recommend Jet for most uses. If you're
willing to limit your app to single-user access, on a Windows
platform, it will probably work for you. On the other hand, I've been
known to develop with Jet because it's convenient and later switch the
app to something more industrial-strength.

David LeBlanc

unread,
Mar 17, 2003, 5:12:31 PM3/17/03
to
There has been discussion on python-dev about merging the python only
version (no C coded kjbuckets) of Gadfly into the next distro of Python
(2.3), but I don't recall what was decided.

David LeBlanc
Seattle, WA USA

> -----Original Message-----
> From: python-l...@python.org
> [mailto:python-l...@python.org]On Behalf Of Alex Martelli
> Sent: Monday, March 17, 2003 3:14
> To: pytho...@python.org
> Subject: Re: [Q] Databases -- Gadfly vs Metakit vs SQLite
>
>

> --
> http://mail.python.org/mailman/listinfo/python-list


Jeremy Fincher

unread,
Mar 18, 2003, 12:02:33 AM3/18/03
to
"David LeBlanc" <whi...@oz.net> wrote in message news:<mailman.1047939260...@python.org>...

> There has been discussion on python-dev about merging the python only
> version (no C coded kjbuckets) of Gadfly into the next distro of Python
> (2.3), but I don't recall what was decided.

I believe they decide it was better off left as an external package,
since most people don't expect that "Batteries included" means "SQL
database included."

Jeremy

Michael Hudson

unread,
Mar 18, 2003, 7:04:40 AM3/18/03
to
tweed...@hotmail.com (Jeremy Fincher) writes:

And because the developers don't want to be landed with maintaining a
large wodge of code they don't understand.

Cheers,
M.

--
That's why the smartest companies use Common Lisp, but lie about it
so all their competitors think Lisp is slow and C++ is fast. (This
rumor has, however, gotten a little out of hand. :)
-- Erik Naggum, comp.lang.lisp

Cameron Laird

unread,
Mar 18, 2003, 3:52:54 PM3/18/03
to
In article <%_Vca.92242$zo2.2...@news2.tin.it>,
Alex Martelli <al...@aleax.it> wrote:
.
.
.

>Let me commend and second this attitude: unless you have specific
>user requirements to the contrary, AVOID relying on closed-source
>components if you can. By using open-source components, and bundling
>them with your application if feasible, you may be able to make your
>applications more long-lived and thus more valuable, without being
>at the mercy of the closed-source supplier.
>
>
>Alex
>

Alex' sermon might sound prosaic, or even tautologous,
to many of the readers here. What's remarkable is that
officials from Microsoft and many leading organizations
*still* say THE OPPOSITE in public, apparently in all
sincerity. Their claim is that ... I have trouble re-
peating this with a straight face ... customers want
software suppliers who understand their needs, and will
take care of them. Open-source, they say, supplies
poorly integrated pieces of unknown quality, while only
Microsoft or a comparable large-scale vendor can supply
the high-value dependability customers truly desire.

I conclude, therefore, that there's a need for Alex to
continue to speak the truth as precisely and as often as
he does.

Microsoft: it's a national security risk (according to
testimony in federal court).
--

Cameron Laird <Cam...@Lairds.com>
Business: http://www.Phaseit.net
Personal: http://phaseit.net/claird/home.html

0 new messages