Why not move Apt to a relational database

Justin Emmanuel

unread,

Jun 3, 2007, 6:20:06 AM6/3/07

to

I am brand new to this mailing list, I joined it because I had an idea
that I would like to have considered. Moving apt to a relational
database, for several reasons.

Based on a relational database it will run faster, also there should be
some more data stored about the programs to facilitate system restoring.
The data should be backed up automatically and regularly, so that if the
database is stored on another computer and first computer has a hardware
failure, the data from the backup can be used to completely restore the
computer to its status again. It should be a relational database that
contains checksums of the compressed and uncompressed state of files
that will be installed. So that if there is a problem with the computer
and something is segfaulting, every file on the computer can be checked
against this information, including freshly downloaded files, so that
they can find out if any of them are corrupt and need to be replaced.
Then apt can automatically download the file. I have had to numerous
times manually edit the text database that apt writes to because
something had been changed to "." when it should have been ">". In a
good relational database, the version numbers can be kept separately
from the rest of the data, this will all go to help avoid corruption and
lead to scalability both for individual machines and networked
enterprise machines.
The data at every level can be split into different tables using
normalisation, increasing the speed of the reading and making sure that
only the files that need to be parsed get parsed.

So what do you think? Is this the correct mailing list to send this idea
to?

Many thanks.

--
To UNSUBSCRIBE, email to debian-dev...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org

Joey Schulze

unread,

Jun 3, 2007, 6:30:06 AM6/3/07

to

Justin Emmanuel wrote:
> I am brand new to this mailing list, I joined it because I had an idea
> that I would like to have considered. Moving apt to a relational
> database, for several reasons.

Before reading on, have you considered the following:

- using an RDB requires a running RDBMS
- running an RDBMS requires memory and cpu usage
- in case of an upgrade RDBMSs often dump there databases and import
them. What will happen when the upgrade fails during this? No
package database would be really bad
- copying parts of the database to another system, or patching it
is not as easy as it is with plain files

Regards,

Joey

--
Testing? What's that? If it compiles, it is good, if it boots up, it is perfect.

Please always Cc to me when replying to me on the lists.

martin f krafft

unread,

Jun 3, 2007, 6:30:08 AM6/3/07

to

also sprach Justin Emmanuel <justine...@gmail.com> [2007.06.03.1155 +0200]:

> I am brand new to this mailing list, I joined it because I had an
> idea that I would like to have considered. Moving apt to
> a relational database, for several reasons.

Having something as essential as the dpkg database depend on
a system as complex as a relational database is asking for trouble.
There are other ways to optimise dpkg/APT without pulling in massive
complexity.

--
Please do not send copies of list mail to me; I read the list!

.''`. martin f. krafft <mad...@debian.org>
: :' : proud Debian developer, author, administrator, and user
`. `'` http://people.debian.org/~madduck - http://debiansystem.info
`- Debian - when you have better things to do than fixing systems

drink canada dry! you might not succeed, but it *is* fun trying.

signature.asc

Neil Williams

unread,

Jun 3, 2007, 6:40:06 AM6/3/07

to

On Sun, 03 Jun 2007 10:55:01 +0100
Justin Emmanuel <justine...@gmail.com> wrote:

> I am brand new to this mailing list, I joined it because I had an idea
> that I would like to have considered. Moving apt to a relational
> database, for several reasons.

What about embedded systems that can barely run sqlite?

apt needs to be part of the debian-installer, why lumber the installer
with postgres or mysql or whatever?

> Based on a relational database it will run faster, also there should be
> some more data stored about the programs to facilitate system restoring.

That doesn't justify adding 10-20Mb of extra code to a rootfs -
especially when an Emdebian rootfs may need to be <5Mb in total.

> The data should be backed up automatically and regularly, so that if the
> database is stored on another computer and first computer has a hardware
> failure, the data from the backup can be used to completely restore the
> computer to its status again.

The man page for apt does specify that the cache must not be treated as
permanent. It is just a cache, there is no need to back it up or store
it in another form. It should be regenerated from dpkg data.

> It should be a relational database that
> contains checksums of the compressed and uncompressed state of files
> that will be installed. So that if there is a problem with the computer
> and something is segfaulting, every file on the computer can be checked
> against this information, including freshly downloaded files, so that
> they can find out if any of them are corrupt and need to be replaced.

Packages already include md5sums and a segfault isn't usually down to a
corrupt ELF file, it is down to a bug in the source code.

> Then apt can automatically download the file.

Sorry, that won't work. The package will still segfault because the
source code has not been patched. Segfaults need bug reports which then
need patches and a new Debian release or a new upstream release. apt
can only fix a segfault in an application by downgrading to the
previous version and it can do that already.

> I have had to numerous
> times manually edit the text database that apt writes to because
> something had been changed to "." when it should have been ">".

? Specific examples ? Did you make a bug report?

Is there some reason why this change would only affect you? Editing a
cache file (that will subsequently be regenerated) seems a strange way
to "fix" anything. If this is a bug that affects other people, it
should be reported as a bug in the BTS.

Specifically which file are you talking about? There is rarely
any point writing to anything in /var/cache/apt.

> So what do you think? Is this the correct mailing list to send this idea
> to?

Right mailing list but, IMHO, not a particularly good idea. Sorry.

--

Neil Williams
=============
http://www.data-freedom.org/
http://www.nosoftwarepatents.com/
http://www.linux.codehelp.co.uk/

Mike Hommey

unread,

Jun 3, 2007, 6:40:10 AM6/3/07

to

On Sun, Jun 03, 2007 at 12:24:04PM +0200, martin f krafft <mad...@debian.org> wrote:
> also sprach Justin Emmanuel <justine...@gmail.com> [2007.06.03.1155 +0200]:
> > I am brand new to this mailing list, I joined it because I had an
> > idea that I would like to have considered. Moving apt to
> > a relational database, for several reasons.
>
> Having something as essential as the dpkg database depend on
> a system as complex as a relational database is asking for trouble.

It's enough to take a look at how often the rpm databases tend to corrupt.
(or at least used to, because it seems they have switched to sqlite)

Mike

Josselin Mouette

unread,

Jun 3, 2007, 7:20:12 AM6/3/07

to

Le dimanche 03 juin 2007 à 10:55 +0100, Justin Emmanuel a écrit :
> So what do you think? Is this the correct mailing list to send this
> idea
> to?

I'm so happy that people who send such posts to this mailing list are
not the ones developing our core software.

--
.''`.
: :' : We are debian.org. Lower your prices, surrender your code.
`. `' We will add your hardware and software distinctiveness to
`- our own. Resistance is futile.

signature.asc

martin f krafft

unread,

Jun 3, 2007, 7:40:08 AM6/3/07

to

also sprach Josselin Mouette <jo...@debian.org> [2007.06.03.1314 +0200]:

> > So what do you think? Is this the correct mailing list to send
> > this idea to?
>
> I'm so happy that people who send such posts to this mailing list
> are not the ones developing our core software.

Dude, there is no need to be offensive.

--
Please do not send copies of list mail to me; I read the list!

.''`. martin f. krafft <mad...@debian.org>
: :' : proud Debian developer, author, administrator, and user
`. `'` http://people.debian.org/~madduck - http://debiansystem.info
`- Debian - when you have better things to do than fixing systems

$complex->{'data'}[$structures][$in_perl] = @{$can{'be'}->[$painful]};

signature.asc

Christoph Haas

unread,

Jun 3, 2007, 7:50:11 AM6/3/07

to

On Sun, Jun 03, 2007 at 11:31:18AM +0100, Neil Williams wrote:
> On Sun, 03 Jun 2007 10:55:01 +0100
> Justin Emmanuel <justine...@gmail.com> wrote:
> > I am brand new to this mailing list, I joined it because I had an idea
> > that I would like to have considered. Moving apt to a relational
> > database, for several reasons.
>
> What about embedded systems that can barely run sqlite?

Is sqlite really *that* heavyweight? Storing information about tens of
thousands of packages in plain text files is surely not the best idea.
Historically grown. Okay. But still worth to think about.

> apt needs to be part of the debian-installer, why lumber the installer
> with postgres or mysql or whatever?

Nobody wants to use pgsql or mysql as a prerequisite to a base
installation. Not the right tool for the job. Perhaps there is other
software that is even more basic than sqlite but more basic.

> That doesn't justify adding 10-20Mb of extra code to a rootfs -
> especially when an Emdebian rootfs may need to be <5Mb in total.

The current sqlite package is ~80 KB uncompressed. It I can imagine that
the database might even be smaller and waste less inodes than what apt
currently does.

> > So what do you think? Is this the correct mailing list to send this idea
> > to?
>
> Right mailing list but, IMHO, not a particularly good idea. Sorry.

Right mailing list. Very good idea IMHO and right to the point. We just
need a volunteer who knows enough about apt to make it use sqlite
without breaking everything. Or do we have to wait until Ubuntu sends us
a patch? ;)

When I complained about the slow package database (it turned out to be
that that many files on ext3 make reading the package cache take longer
than formatting a floppy disc on a 1541) someone pointed to:

http://people.debian.org/~seanius/dpkg-sqlite/

Christoph
--
Peer review means that you can feel better because someone else
missed the problem, too.

signature.asc

Neil Williams

unread,

Jun 3, 2007, 8:40:08 AM6/3/07

to

On Sun, 3 Jun 2007 13:47:20 +0200
Christoph Haas <ha...@debian.org> wrote:

> > What about embedded systems that can barely run sqlite?
>
> Is sqlite really *that* heavyweight?

No, that's why it is used in some embedded systems. Even so, it has no
place in the rootfs for an embedded system, IMHO. I'd rather not have
to repackage apt to remove this change.

> Storing information about tens of
> thousands of packages in plain text files is surely not the best idea.

I think it's OK. Are there open bug reports arising from this method?

> Historically grown. Okay. But still worth to think about.

I disagree. Emdebian uses SQLite to store package data and just 1,000
packages takes 1.5Mb. The entire /var/lib/dpkg/available file on this
full Debian system is only 1.5Mb.

> > apt needs to be part of the debian-installer, why lumber the
> > installer with postgres or mysql or whatever?
>
> Nobody wants to use pgsql or mysql as a prerequisite to a base
> installation. Not the right tool for the job. Perhaps there is other
> software that is even more basic than sqlite but more basic.

There are two problems with changing the apt/dpkg data storage method:
absolute file sizes and package sizes. Text file storage leads to the
smallest combination of data file size and rootfs size.

> > That doesn't justify adding 10-20Mb of extra code to a rootfs -
> > especially when an Emdebian rootfs may need to be <5Mb in total.
>
> The current sqlite package is ~80 KB uncompressed. It I can imagine
> that the database might even be smaller and waste less inodes than
> what apt currently does.

The long descriptions appear to be the largest element.

> > Right mailing list but, IMHO, not a particularly good idea. Sorry.
>
> Right mailing list. Very good idea IMHO and right to the point. We
> just need a volunteer who knows enough about apt to make it use sqlite
> without breaking everything. Or do we have to wait until Ubuntu sends
> us a patch? ;)

I still disagree - I see no need for the change. The other problems
identified by Joey also apply to sqlite:

- using an RDB requires a running RDBMS
- running an RDBMS requires memory and cpu usage
- in case of an upgrade RDBMSs often dump there databases and import
them. What will happen when the upgrade fails during this? No
package database would be really bad
- copying parts of the database to another system, or patching it
is not as easy as it is with plain files

>

> When I complained about the slow package database (it turned out to be
> that that many files on ext3 make reading the package cache take
> longer than formatting a floppy disc on a 1541) someone pointed to:
>
> http://people.debian.org/~seanius/dpkg-sqlite/

Umm, that uses python to create the database - if there are problems
putting sqlite into a rootfs, there is NO place for python!! Emdebian
is removing perl from essential, let alone python (and replacing bash
with dash/busybox too).

Neil Williams

unread,

Jun 3, 2007, 9:00:15 AM6/3/07

to

On Sun, 3 Jun 2007 13:35:07 +0100
Neil Williams <li...@codehelp.co.uk> wrote:

> I disagree. Emdebian uses SQLite to store package data

on the emdebian.org server (running Etch), not on the devices,

> and just 1,000
> packages takes 1.5Mb. The entire /var/lib/dpkg/available file on this
> full Debian system is only 1.5Mb.

Sorry, should have made that clear - sqlite is used behind our version of packages.debian.org : http://www.emdebian.org/toolchains/search.php

Sam Hocevar

unread,

Jun 3, 2007, 9:10:08 AM6/3/07

to

On Sun, Jun 03, 2007, Josselin Mouette wrote:
> Le dimanche 03 juin 2007 à 10:55 +0100, Justin Emmanuel a écrit :
> > So what do you think? Is this the correct mailing list to send this
> > idea to?
>
> I'm so happy that people who send such posts to this mailing list are
> not the ones developing our core software.

I'm not, nor is anyone using an embedded platform. Given how much
useless memory dpkg uses, no suggestion to fix it is worthy of such
dismissal without explanation.

Regards,
--
Sam.

Philippe Cloutier

unread,

Jun 3, 2007, 9:50:07 AM6/3/07

to

>
> Is this the correct mailing list to send this idea to?

de...@lists.debian.org

Joerg Jaspert

unread,

Jun 3, 2007, 10:30:17 AM6/3/07

to

On 11039 March 1977, Josselin Mouette wrote:

>> So what do you think? Is this the correct mailing list to send this
>> idea
>> to?
> I'm so happy that people who send such posts to this mailing list are
> not the ones developing our core software.

How about not posting if you dont have anything useful to say?

--
bye Joerg
<Wrecktum> Deine Größe macht mich klein
<@joerg> doll
<Wrecktum> du darfst mein Bestrafer sein
(!) Wrecktum was kicked from #german by joerg [ok]

sean finney

unread,

Jun 3, 2007, 3:20:08 PM6/3/07

to

On Sunday 03 June 2007 14:35:07 Neil Williams wrote:

> > http://people.debian.org/~seanius/dpkg-sqlite/
>
> Umm, that uses python to create the database - if there are problems
> putting sqlite into a rootfs, there is NO place for python!! Emdebian
> is removing perl from essential, let alone python (and replacing bash
> with dash/busybox too).

Umm, that was a *proof of concept*. given that i didn't want to spend more
than 2-3 hours on something that might not be acceptable by the dpkg
maintainers anyway, i decided to spend the time writing the part that
actually mattered and shortcutted the rest.

and if you read the dpkg devel thread that spawned this (don't think it was
referenced yet in this thread, but it has been referenced the last time or
two dpkg has been brought up on -devel), you'll see that i'm not particularly
attached to sqlite3 as "the format"--i'm more pushing for the concept of
abstracting/outsourcing the data representation/retrieval/storage from the
handling thereof.

that said, i think that sqlite3 would be a pretty good candidate if we were to
go with anything other than plaintext.

sean

signature.asc

Josselin Mouette

unread,

Jun 3, 2007, 3:40:13 PM6/3/07

to

Le dimanche 03 juin 2007 à 21:17 +0200, sean finney a écrit :
> and if you read the dpkg devel thread that spawned this (don't think it was
> referenced yet in this thread, but it has been referenced the last time or
> two dpkg has been brought up on -devel), you'll see that i'm not particularly
> attached to sqlite3 as "the format"--i'm more pushing for the concept of
> abstracting/outsourcing the data representation/retrieval/storage from the
> handling thereof.
>
> that said, i think that sqlite3 would be a pretty good candidate if we were to
> go with anything other than plaintext.

Even if SQLite is more robust than Berkeley DB, I don't think you could
recover anything from a corrupt database. Plain text will always turn
out better in terms of disaster recovery. If performance is an issue, a
text file can - just like a bdb file - be indexed. Corrupt indexes can
be regenerated, but corrupt databases cannot.

signature.asc

Eduard Bloch

unread,

Jun 3, 2007, 4:00:13 PM6/3/07

to

#include <hallo.h>
* Joey Schulze [Sun, Jun 03 2007, 12:22:46PM]:

> Justin Emmanuel wrote:
> > I am brand new to this mailing list, I joined it because I had an idea
> > that I would like to have considered. Moving apt to a relational
> > database, for several reasons.
>
> Before reading on, have you considered the following:

Interesting points, but IMHO it is not Apt having the described problems
but dpkg. Let's reconsider your objections on that:

> - using an RDB requires a running RDBMS
> - running an RDBMS requires memory and cpu usage

If that takes much less than reading all the files, why is that a bad
deal? Finally, this could be made configurable. Embedded systems having
less CPU power but much faster filesystem access times (running in
memory) may keep using the plain file based database.

This all is subjective talk, I didn't have a look at the particular dpkg
code yet.

> - in case of an upgrade RDBMSs often dump there databases and import
> them. What will happen when the upgrade fails during this? No
> package database would be really bad

What is the problem with having an internal embedded version of sqlite
and only upgrade when needed and make the upgrade path more safe than
with regular packages?

> - copying parts of the database to another system, or patching it
> is not as easy as it is with plain files

How often do you copy and patch /var/lib/dpkg/?

Regards,
Eduard.

Alex Queiroz

unread,

Jun 3, 2007, 4:10:09 PM6/3/07

to

Hallo,

On 6/3/07, Eduard Bloch <e...@gmx.de> wrote:
>
> > - in case of an upgrade RDBMSs often dump there databases and import
> > them. What will happen when the upgrade fails during this? No
> > package database would be really bad
>
> What is the problem with having an internal embedded version of sqlite
> and only upgrade when needed and make the upgrade path more safe than
> with regular packages?
>

It is even possible to download SQLite as a single C source file
for easy embedding. I mean, SQLite was born for this kind of thing.

--
-alex
http://www.ventonegro.org/

sean finney

unread,

Jun 3, 2007, 4:50:06 PM6/3/07

to

On Sunday 03 June 2007 21:30:26 Josselin Mouette wrote:
> Even if SQLite is more robust than Berkeley DB, I don't think you could
> recover anything from a corrupt database. Plain text will always turn
> out better in terms of disaster recovery. If performance is an issue, a
> text file can - just like a bdb file - be indexed. Corrupt indexes can
> be regenerated, but corrupt databases cannot.

i believe that i also stated in my last posting to dpkg-devel that a good
implementation would treat such a "db" as cache, and handle them being
corrupted/deleted:

http://lists.debian.org/debian-dpkg/2007/04/msg00015.html

there were two answers to the thread. first, andreas barth said "well,
there's the source, show us something and we'll talk about it", which is fair
enough. ian jackson also replied, (though he didn't cc me in spite of my
multiple requests, tsk tsk), somewhat skeptical--though i don't think he
has actually looked at the code i supplied given his arguments.

sean

signature.asc

Roger Leigh

unread,

Jun 3, 2007, 5:00:11 PM6/3/07

to

Neil Williams <li...@codehelp.co.uk> writes:

> On Sun, 3 Jun 2007 13:47:20 +0200
> Christoph Haas <ha...@debian.org> wrote:
>
>> > What about embedded systems that can barely run sqlite?
>>
>> Is sqlite really *that* heavyweight?
>
> No, that's why it is used in some embedded systems. Even so, it has no
> place in the rootfs for an embedded system, IMHO. I'd rather not have
> to repackage apt to remove this change.

Why would it need to be on the root? Surely the binaries and data
would just go on /usr and /var as normal?

Perhaps just using sqlite as an (optional) cache for dpkg and/or apt
would bring sufficient improvements to systems which desire it.

Regards,
Roger

--
.''`. Roger Leigh
: :' : Debian GNU/Linux http://people.debian.org/~rleigh/
`. `' Printing on GNU/Linux? http://gutenprint.sourceforge.net/
`- GPG Public Key: 0x25BFB848 Please GPG sign your mail.

Wouter Verhelst

unread,

Jun 3, 2007, 8:30:08 PM6/3/07

to

On Sun, Jun 03, 2007 at 11:31:18AM +0100, Neil Williams wrote:

> On Sun, 03 Jun 2007 10:55:01 +0100
> Justin Emmanuel <justine...@gmail.com> wrote:
> > Based on a relational database it will run faster, also there should be
> > some more data stored about the programs to facilitate system restoring.
>
> That doesn't justify adding 10-20Mb of extra code to a rootfs -

Eh, please run 'apt-cache show libsqlite3-0 | grep Installed-Size'
before making that assertion.

[...]
--
Shaw's Principle:
Build a system that even a fool can use, and only a fool will
want to use it.

Wouter Verhelst

unread,

Jun 3, 2007, 8:30:14 PM6/3/07

to

On Sun, Jun 03, 2007 at 12:22:46PM +0200, Joey Schulze wrote:
> Justin Emmanuel wrote:
> > I am brand new to this mailing list, I joined it because I had an idea
> > that I would like to have considered. Moving apt to a relational
> > database, for several reasons.
>
> Before reading on, have you considered the following:
>
> - using an RDB requires a running RDBMS
> - running an RDBMS requires memory and cpu usage
> - in case of an upgrade RDBMSs often dump there databases and import
> them. What will happen when the upgrade fails during this? No
> package database would be really bad

Not so with sqlite (although perhaps the second point might be).

> - copying parts of the database to another system, or patching it
> is not as easy as it is with plain files

That, however, most certainly is true.

--
Shaw's Principle:
Build a system that even a fool can use, and only a fool will
want to use it.

Neil Williams

unread,

Jun 4, 2007, 3:40:09 AM6/4/07

to

On Sun, 03 Jun 2007 21:50:24 +0100
Roger Leigh <rle...@whinlatter.ukfsn.org> wrote:

> > No, that's why it is used in some embedded systems. Even so, it has
> > no place in the rootfs for an embedded system, IMHO. I'd rather not
> > have to repackage apt to remove this change.
>
> Why would it need to be on the root? Surely the binaries and data
> would just go on /usr and /var as normal?

? A rootfs is the base filesystem created for the installer and for
test environments like chroot. It is a normal filesystem with /usr/bin
etc., it is just v.v.v.small and designed only to achieve the most
minimal functionality before the rest of the system is installed.
apt/dpkg/busybox have to be part of that rootfs for any flavour of
Debian, as do their dependencies.

> Perhaps just using sqlite as an (optional) cache for dpkg and/or apt
> would bring sufficient improvements to systems which desire it

That could actually be quite difficult - how would you migrate from one
to the other? The installer will inevitably use the smallest possible
combination of packages, the finished installation might need to use
sqlite. Besides, you still have the same problems of trying to copy
package sets and having to run sqlite before anything else can be done.

Migrating from a busybox rootfs (without dpkg) would potentially cause
more problems and making busybox depend on sqlite is plain crazy.

Neil Williams

unread,

Jun 4, 2007, 3:50:09 AM6/4/07

to

On Sun, 3 Jun 2007 22:45:52 +0200
sean finney <sea...@debian.org> wrote:

> On Sunday 03 June 2007 21:30:26 Josselin Mouette wrote:
> > Even if SQLite is more robust than Berkeley DB, I don't think you
> > could recover anything from a corrupt database. Plain text will
> > always turn out better in terms of disaster recovery. If
> > performance is an issue, a text file can - just like a bdb file -
> > be indexed. Corrupt indexes can be regenerated, but corrupt
> > databases cannot.
>
> i believe that i also stated in my last posting to dpkg-devel that a
> good implementation would treat such a "db" as cache, and handle them
> being corrupted/deleted:
>
> http://lists.debian.org/debian-dpkg/2007/04/msg00015.html

I like the idea that the flat files remain and that the db is "just a
cache". It would be fine if this cache is "disposable" in that way
because it does solve the issues of corruption, upgrade paths etc.

To me, the best solution would be for an option in /etc/apt/apt.conf (or
similar) to enable and disable the sqlite cache. This would solve my
problems because I could disable the sqlite during the initial stages
and only enable it if the system has sufficient resources to run sqlite
almost constantly during the rest of the installation.

My problem is with trying to replace the flat files with any kind of
database - I believe that the flat files should always exist on every
system and a disposable cache (just like the apt-cache) suits this
usage quite well.

Warren Turkal

unread,

Jun 4, 2007, 4:20:11 AM6/4/07

to

On Monday 04 June 2007 01:34:01 Neil Williams wrote:
> That could actually be quite difficult - how would you migrate from one
> to the other?

Have the raw files and the sqlite cache on the mirrors. Give the local program
the option to use either. Then you could use the raw files if the sqlite
cache can't be used.

> The installer will inevitably use the smallest possible
> combination of packages, the finished installation might need to use
> sqlite. Besides, you still have the same problems of trying to copy
> package sets and having to run sqlite before anything else can be done.

I don't understand why you'd have to run sqlite before anything else. It is a
library, not an RDBMS like PostgreSQL.

> Migrating from a busybox rootfs (without dpkg) would potentially cause
> more problems and making busybox depend on sqlite is plain crazy.

No need with the above approach, as the dpkg from busybox could still use the
raw files.

wt
--
Warren Turkal

Roger Leigh

unread,

Jun 4, 2007, 5:30:13 PM6/4/07

to

Neil Williams <li...@codehelp.co.uk> writes:

> On Sun, 03 Jun 2007 21:50:24 +0100
> Roger Leigh <rle...@whinlatter.ukfsn.org> wrote:
>
>> > No, that's why it is used in some embedded systems. Even so, it has
>> > no place in the rootfs for an embedded system, IMHO. I'd rather not
>> > have to repackage apt to remove this change.
>>
>> Why would it need to be on the root? Surely the binaries and data
>> would just go on /usr and /var as normal?
>
> ? A rootfs is the base filesystem created for the installer and for
> test environments like chroot.

OK.

>> Perhaps just using sqlite as an (optional) cache for dpkg and/or apt
>> would bring sufficient improvements to systems which desire it
>
> That could actually be quite difficult - how would you migrate from one
> to the other?

If the database is "just a cache", then it should get transparently
rebuilt as soon as you change it.

> The installer will inevitably use the smallest possible combination
> of packages, the finished installation might need to use
> sqlite. Besides, you still have the same problems of trying to copy
> package sets and having to run sqlite before anything else can be
> done.

If it's an optional cache, then there's no need to actually build the
cache if it's not possible; you can just fall back to the real data.

> Migrating from a busybox rootfs (without dpkg) would potentially cause
> more problems and making busybox depend on sqlite is plain crazy.

Sorry, but I fail to see the connection between busybox and sqlite.
If enabled, sqlite would be part of dpkg, probably either statically
linked or dynamically loaded. I would think static, for safety.

Warren Turkal

unread,

Jun 4, 2007, 6:10:09 PM6/4/07

to

On Monday 04 June 2007 15:23:54 Roger Leigh wrote:
> Sorry, but I fail to see the connection between busybox and sqlite.
> If enabled, sqlite would be part of dpkg, probably either statically
> linked or dynamically loaded. I would think static, for safety.

Doesn't Busybox include an implementation of dpkg?

wt
--
Warren Turkal

Daniel Burrows

unread,

Jun 4, 2007, 11:40:07 PM6/4/07

to

I'm sorry I don't have more time to comment on this.

On Sun, Jun 03, 2007 at 10:55:01AM +0100, Justin Emmanuel <justine...@gmail.com> was heard to say:

> Based on a relational database it will run faster, also there should be
> some more data stored about the programs to facilitate system restoring.

Is this really true? I'll freely admit that I have only cursory
experience with RDBMSes. However, with the current apt cache code,
lookups are basically a pointer dereference (and maybe a page fault).
I don't see how an RDBMS could possibly improve on that. There might be
other benefits to an RDBMS, but I'm not convinced this is one.

One benefit which I didn't see listed in your mail is that it might
become easier to augment the cache with more information; a great deal
of slowness in aptitude's startup, for instance, comes from reading
tables that aren't included in apt's global cache.

Daniel

Oleg Verych

unread,

Jun 8, 2007, 11:00:11 PM6/8/07

to

* From: Justin Emmanuel
* Date: Sun, 03 Jun 2007 10:55:01 +0100

Hallo, Justin. Hope, you are still here.

> I am brand new to this mailing list, I joined it because I had an idea
> that I would like to have considered. Moving apt to a relational
> database, for several reasons.
>

> Based on a relational database it will run faster,

First reason is "faster". What if i'll say: based on tmpfs and
directory/file structure it will run even faster?

> also there should be some more data stored about the programs to
> facilitate system restoring.

File size in UNIX systems is limited to two things:

- amount of memory (soft limit)
- arch (hard limit, on AMD64 practically it approaches infinity)

> The data should be backed up automatically and regularly,

Periodic job:

(lock-db) && (tar c -C /var/cache/db-tmpfs -f /var/backup/db.$$ .) \
&& (unlock-db) || echo error | mail -s '[db] backup daemon' root

> so that if the database is stored on another computer and first
> computer has a hardware failure, the data from the backup can be used
> to completely restore the computer to its status again.

clients on failed machine: scp, curl, lftp, whatever to transfer a file

> It should be a relational database that contains checksums of the
> compressed and uncompressed state of files that will be installed. So
> that if there is a problem with the computer and something is
> segfaulting, every file on the computer can be checked against this
> information, including freshly downloaded files, so that they can find

> out if any of them are corrupt and need to be replaced. Then apt can
> automatically download the file. I have had to numerous times manually

> edit the text database that apt writes to because something had been

> changed to "." when it should have been ">". In a good relational
> database, the version numbers can be kept separately from the rest of
> the data, this will all go to help avoid corruption and lead to
> scalability both for individual machines and networked enterprise
> machines. The data at every level can be split into different tables
> using normalisation, increasing the speed of the reading and making
> sure that only the files that need to be parsed get parsed.

Can't see more reasons here, only new features.
Problem: is it possible without RDB, with scheme, i've proposed?

> So what do you think? Is this the correct mailing list to send this idea
> to?

I think, we must take FREE version of DB2 Express and take control of
our XML

(composed from sf.net's ads in mailing lists ;)

As i'm new here also, just expressing my stupid (Linux specific)
contr-``idea''.
____

Bastian Blank

unread,

Jun 9, 2007, 5:50:07 AM6/9/07

to

On Sat, Jun 09, 2007 at 02:52:04AM +0000, Oleg Verych wrote:
> > Based on a relational database it will run faster,
> First reason is "faster". What if i'll say: based on tmpfs and
> directory/file structure it will run even faster?

tmpfs is not faster than a real disk. You need the memory anyway and the
data on a real disk should be in the cache anyway if possible.

> > so that if the database is stored on another computer and first
> > computer has a hardware failure, the data from the backup can be used
> > to completely restore the computer to its status again.
> clients on failed machine: scp, curl, lftp, whatever to transfer a file

Most db formats are not transfer formats and are incompatible between
different versions and architectures. You need to dump them for such
sort of backup.

> > It should be a relational database that contains checksums of the
> > compressed and uncompressed state of files that will be installed.

prelink changes this value.

> > So
> > that if there is a problem with the computer and something is
> > segfaulting, every file on the computer can be checked against this
> > information, including freshly downloaded files, so that they can find
> > out if any of them are corrupt and need to be replaced.

The database is much more written to. Why do you think is it less likely
that this file is corrupted?

Bastian

--
... The prejudices people feel about each other disappear when they get
to know each other.
-- Kirk, "Elaan of Troyius", stardate 4372.5

Oleg Verych

unread,

Jun 9, 2007, 8:30:11 AM6/9/07

to

On 2007-06-09, Bastian Blank <wa...@debian.org> wrote:
> On Sat, Jun 09, 2007 at 02:52:04AM +0000, Oleg Verych wrote:
>> > Based on a relational database it will run faster,
>> First reason is "faster". What if i'll say: based on tmpfs and
>> directory/file structure it will run even faster?
>
> tmpfs is not faster than a real disk. You need the memory anyway and the
> data on a real disk should be in the cache anyway if possible.

First startup is. But after that, tmpfs will go to swap and unless
swap is as fragmented as hdd, and not all parts of db will be needed
immediately, next startup will be far more faster. Even if swap
version will be not plausible, "untar" can run with "apt-get update",
"apt-get upgrade" (sometimes very slow) *downloads* in _parallel_, thus
db will be ready and in memory, before it will start to further
package processing.

>> > so that if the database is stored on another computer and first
>> > computer has a hardware failure, the data from the backup can be used
>> > to completely restore the computer to its status again.
>> clients on failed machine: scp, curl, lftp, whatever to transfer a file
>
> Most db formats are not transfer formats and are incompatible between
> different versions and architectures. You need to dump them for such
> sort of backup.

I've described tar (ar, cpio,etc.) file as storage, i doubt if it has
version/arch problems.
____