Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

apt-get wrapper for maintaining Partial Mirrors

84 views

Skip to first unread message

sanket agarwal

unread,

Jun 9, 2009, 2:40:11 PM6/9/09

Hi all,

We all know that there are various distro's that build around Debian.
I had an idea in mind whereby the task of making mirrors for personal
distributions can be automated. This can be stated as: if a person
wants to keep a customised set of packages for usage with the
distribution, the tool should be able to develop dependencies, fetch
packages, generate appropriate documentation and then create the
corresponding directory structure in the target mirror! The task can
be extended to include packages which are currently not under one of
the standard mirrors!

I think the tool can have immense utility in helping people automate
the task of mantaining the repositories. Suggestions, positive and
negative are invited.

I have not included the impl details as I would first like to evaluate
the idea at a feasibility and utility level.

--
To UNSUBSCRIBE, email to debian-dev...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org

Joseph Rawson

unread,

Jun 9, 2009, 5:20:09 PM6/9/09

On Tuesday 09 June 2009 13:14:53 sanket agarwal wrote:
> Hi all,
>
> We all know that there are various distro's that build around Debian.
> I had an idea in mind whereby the task of making mirrors for personal
> distributions can be automated. This can be stated as: if a person
> wants to keep a customised set of packages for usage with the
> distribution, the tool should be able to develop dependencies, fetch
> packages, generate appropriate documentation and then create the
> corresponding directory structure in the target mirror! The task can
> be extended to include packages which are currently not under one of
> the standard mirrors!
>
> I think the tool can have immense utility in helping people automate
> the task of mantaining the repositories. Suggestions, positive and
> negative are invited.
>
> I have not included the impl details as I would first like to evaluate
> the idea at a feasibility and utility level.

I have been working on this idea myself for quite a while, but I haven't
messed with the problem recently. I was using reprepro to maintain partial
mirrors, but it required using the output from "dpkg --get-selections" from
almost every machine that I needed to mirror packages for. The reprepro
program is excellent for making partial mirrors, but it has a drawback in
that it doesn't help resolve dependencies. This means that you can't just
make a short list of packages and easily build a partial mirror that contains
those packages and their dependencies, rather you have to install a machine
with those packages and use the list of packages from that machine with
reprepro to get a decent mirror.

There is another application that will help with the dependencies. It's
called germinate, and it will take a short list of packages and a list of
repositories and build a bunch of different lists of packages and their
dependencies. Germinate will also determine build dependencies for those
packages and recursively build a list of builddeps and the builddeps'
builddeps.

I have thought of making an application that would get germinate and reprepro
to work together to help build a decent partial mirror that had the correct
set of packages, but the process was a bit time consuming. It's been a while
since I've worked on this, since my temporary solution to the problem was to
buy a larger hard drive. Currently, I have a full mirror that I keep
updated, and a repository of locally built packages next to it. I'm not
really happy with this solution, as it uses too much disk space and I'm
downloading packages that will never be used, but it's given me time to
tackle more important problems.

Before writing any code, I would recommend taking a look at both reprepro and
germinate, as each of these applications is good at solving half of the
problems you describe. I think that an ideal solution would be to write a
frontend program that takes a list of packages and upstream repositories,
feeds them into germinate, obtains the result from germinate, parse those
results and build a reprepro configuration from that, then get reprepro to
fetch the appropriate packages.

I would be happy to help with this, as I could use such an application, and I
already have a meager bit of python code that parses the output of germinate
(germinate uses a wiki-type markup in it's output files). I stopped working
on the code since I bought a new hard drive, since I just used the extra
space to solve the problem for me, but I can bring it back to life, as I
would desire to use a more correct solution.

--
Thanks:
Joseph Rawson

signature.asc

Goswin von Brederlow

unread,

Jun 18, 2009, 3:50:13 AM6/18/09

Joseph Rawson <umeb...@gmail.com> writes:

> There is another application that will help with the dependencies. It's
> called germinate, and it will take a short list of packages and a list of
> repositories and build a bunch of different lists of packages and their
> dependencies. Germinate will also determine build dependencies for those
> packages and recursively build a list of builddeps and the builddeps'
> builddeps.
>
> I have thought of making an application that would get germinate and reprepro
> to work together to help build a decent partial mirror that had the correct
> set of packages, but the process was a bit time consuming. It's been a while

Was it that bad? It only needs to run 4 times a day when the mirror
push comes in.

> since I've worked on this, since my temporary solution to the problem was to
> buy a larger hard drive. Currently, I have a full mirror that I keep
> updated, and a repository of locally built packages next to it. I'm not
> really happy with this solution, as it uses too much disk space and I'm
> downloading packages that will never be used, but it's given me time to
> tackle more important problems.
>
> Before writing any code, I would recommend taking a look at both reprepro and
> germinate, as each of these applications is good at solving half of the
> problems you describe. I think that an ideal solution would be to write a
> frontend program that takes a list of packages and upstream repositories,
> feeds them into germinate, obtains the result from germinate, parse those
> results and build a reprepro configuration from that, then get reprepro to
> fetch the appropriate packages.

Combining germinate and reprepro is the right thing to do. Or reprepro
and a new filter instead of germinate. But don't rewrite reprepro.

Given a little bit of care when writing the reprepro config this can
be completly done as part of the filtering. There is no need for a
seperate run that scanns all upstream repositories as long as you can
define a partial order between them, i.e. contrib needs things from
main but main never from contrib. That would also have the benefit
that you only need to process those packages files that have changed.

> I would be happy to help with this, as I could use such an application, and I
> already have a meager bit of python code that parses the output of germinate
> (germinate uses a wiki-type markup in it's output files). I stopped working
> on the code since I bought a new hard drive, since I just used the extra
> space to solve the problem for me, but I can bring it back to life, as I
> would desire to use a more correct solution.

Urgs, that sucks. It should take a Packages/Sources style input and
output the same format.

Maybe rewriting it using libapt would be better than wrapping germinate.

MfG
Goswin

Frank Lin PIAT

unread,

Jun 18, 2009, 4:20:12 AM6/18/09

On Tue, 2009-06-09 at 16:16 -0500, Joseph Rawson wrote:
> On Tuesday 09 June 2009 13:14:53 sanket agarwal wrote:
> > I had an idea in mind whereby the task of making mirrors for personal
> > distributions can be automated.

<lazy-way>
Depending on what you want to achieve, a caching proxy might be an easy
solution (there are a specialized in the archive already)
</lazy-way>

> > This can be stated as: if a person
> > wants to keep a customised set of packages for usage with the
> > distribution, the tool should be able to develop dependencies, fetch
> > packages, generate appropriate documentation and then create the
> > corresponding directory structure in the target mirror! The task can
> > be extended to include packages which are currently not under one of
> > the standard mirrors!

<lazy-way>
One don't have to merge the repositories, one can just declare multiple
sources in /etc/apt/*
</lazy-way>

> > I think the tool can have immense utility in helping people automate
> > the task of mantaining the repositories. Suggestions, positive and
> > negative are invited.
> >
> > I have not included the impl details as I would first like to evaluate
> > the idea at a feasibility and utility level.

If the scope of your project includes being able to bootstrap systems
from the mirror, resolving dependency is much more complex (some
packages aren't resolved by dependencies. For instance, the right kernel
is select by some logic in Debian-installer).
I found some interesting logic in debian-cd package.

Still, I don't consider that allowing bootstrapping is mandatory. Your
project would still be extremely valuable without it. [for those 95% of
the people that install from CD, as opposed to netboot].

Regards,

Franklin

Goswin von Brederlow

unread,

Jun 18, 2009, 5:50:12 AM6/18/09

Frank Lin PIAT <fp...@klabs.be> writes:

> On Tue, 2009-06-09 at 16:16 -0500, Joseph Rawson wrote:
>> On Tuesday 09 June 2009 13:14:53 sanket agarwal wrote:
>> > This can be stated as: if a person
>> > wants to keep a customised set of packages for usage with the
>> > distribution, the tool should be able to develop dependencies, fetch
>> > packages, generate appropriate documentation and then create the
>> > corresponding directory structure in the target mirror! The task can
>> > be extended to include packages which are currently not under one of
>> > the standard mirrors!
>
> <lazy-way>
> One don't have to merge the repositories, one can just declare multiple
> sources in /etc/apt/*
> </lazy-way>

Lets say I want to mirror xserver-xorg from experimental. Then I would
want it to include xserver-xorg-core (>= xyz) also from experimental
as the dependency dictates but not include libc6 from experimental as
the sid one is sufficient.

A key point here would be flexibility.

>> > I think the tool can have immense utility in helping people automate
>> > the task of mantaining the repositories. Suggestions, positive and
>> > negative are invited.
>> >
>> > I have not included the impl details as I would first like to evaluate
>> > the idea at a feasibility and utility level.
>
> If the scope of your project includes being able to bootstrap systems
> from the mirror, resolving dependency is much more complex (some
> packages aren't resolved by dependencies. For instance, the right kernel
> is select by some logic in Debian-installer).
> I found some interesting logic in debian-cd package.

You would include "linux-image-<type>" in your package list. That
isn't really a problem of the tool. Just of the input you need to provide.
Also you would include everything udeb and everything
essential/required for bootstraping purposes.

Again flexibility is the key.

> Still, I don't consider that allowing bootstrapping is mandatory. Your
> project would still be extremely valuable without it. [for those 95% of
> the people that install from CD, as opposed to netboot].
>
> Regards,
>
> Franklin

MfG
Goswin

PS: the essential/required packages can already easily be filtered
with grep-dctrl.

Joseph Rawson

unread,

Jun 19, 2009, 12:40:09 AM6/19/09

On Thursday 18 June 2009 02:46:42 Goswin von Brederlow wrote:
> Joseph Rawson <umeb...@gmail.com> writes:
> > There is another application that will help with the dependencies. It's
> > called germinate, and it will take a short list of packages and a list of
> > repositories and build a bunch of different lists of packages and their
> > dependencies. Germinate will also determine build dependencies for those
> > packages and recursively build a list of builddeps and the builddeps'
> > builddeps.
> >
> > I have thought of making an application that would get germinate and
> > reprepro to work together to help build a decent partial mirror that had
> > the correct set of packages, but the process was a bit time consuming.
> > It's been a while
>
> Was it that bad? It only needs to run 4 times a day when the mirror
> push comes in.
>

It wasn't the running that was time consuming, but the writing of all the code
to seed germinate, then try and use the results for reprepro. I'm sorry if I
wasn't clear on which part was consuming time.

> > since I've worked on this, since my temporary solution to the problem was
> > to buy a larger hard drive. Currently, I have a full mirror that I keep
> > updated, and a repository of locally built packages next to it. I'm not
> > really happy with this solution, as it uses too much disk space and I'm
> > downloading packages that will never be used, but it's given me time to
> > tackle more important problems.
> >
> > Before writing any code, I would recommend taking a look at both reprepro
> > and germinate, as each of these applications is good at solving half of
> > the problems you describe. I think that an ideal solution would be to
> > write a frontend program that takes a list of packages and upstream
> > repositories, feeds them into germinate, obtains the result from
> > germinate, parse those results and build a reprepro configuration from
> > that, then get reprepro to fetch the appropriate packages.
>
> Combining germinate and reprepro is the right thing to do. Or reprepro
> and a new filter instead of germinate. But don't rewrite reprepro.

I never intended to rewrite reprepro. It does it's job very well. It's not
reprepro's job to resolve dependencies, nor should it be, as a dependency
could lie in an entirely different repository.

I do think that since each program has it's specific area of responsibility,
that a program that glues them together would be appropriate, and help from
reinventing wheels when it's not necessary.

>
> Given a little bit of care when writing the reprepro config this can
> be completly done as part of the filtering. There is no need for a
> seperate run that scanns all upstream repositories as long as you can
> define a partial order between them, i.e. contrib needs things from
> main but main never from contrib. That would also have the benefit
> that you only need to process those packages files that have changed.
>
> > I would be happy to help with this, as I could use such an application,
> > and I already have a meager bit of python code that parses the output of
> > germinate (germinate uses a wiki-type markup in it's output files). I
> > stopped working on the code since I bought a new hard drive, since I just
> > used the extra space to solve the problem for me, but I can bring it back
> > to life, as I would desire to use a more correct solution.
>
> Urgs, that sucks. It should take a Packages/Sources style input and
> output the same format.
>

I don't like the output either, but I haven't taken much time to dig into the
germinate code very much.

> Maybe rewriting it using libapt would be better than wrapping germinate.

Germinate uses libapt. It imports apt_pkg from the python-apt package, which
is a python binding to libapt, AFAIK. It might be easier to just
add '/usr/lib/germinate' to the sys.path and control the Germinator object
directly, bypassing the way that the package lists are output from germinate.

Germinate does have an advantage in that it can recursively add the builddeps
for a package list, making a list for a partial, self-building mirror.

BTW, the subject of this thread is "apt-get wrapper for maintaining Partial
Mirrors". The solution I'm proposing is "a simple tool for maintaining
Partial Mirrors" (which could possibly be wrapped by apt-get later).

I think that just pursuing an "apt-get wrapper" leads to some complications
that could be avoided by creating the "partial mirror tool" first, then
looking at wrapping it later. One complication might be "how do handle
apt-get remove", and another might be "how to handle sid libraries that
disappear from official repository, yet local machines must have them".

>
> MfG
> Goswin

--
Thanks:
Joseph Rawson

signature.asc

Goswin von Brederlow

unread,

Jun 19, 2009, 1:30:16 AM6/19/09

Joseph Rawson <umeb...@gmail.com> writes:

> BTW, the subject of this thread is "apt-get wrapper for maintaining Partial
> Mirrors". The solution I'm proposing is "a simple tool for maintaining
> Partial Mirrors" (which could possibly be wrapped by apt-get later).
>
> I think that just pursuing an "apt-get wrapper" leads to some complications
> that could be avoided by creating the "partial mirror tool" first, then
> looking at wrapping it later. One complication might be "how do handle
> apt-get remove", and another might be "how to handle sid libraries that
> disappear from official repository, yet local machines must have them".

Ahh, so maybe I completly misread that part.

Do you mean a wrapper around apt-get so that "apt-get install foo" on
any client would automatically add "foo" to the list of packages being
mirrored on the server?

If so then you can configure a post invoke hook in apt that will copy
the dpkg status file of the host to the server [as status.$(hostname)]
and then use those on the server to generate the filter for
reprepro. I think I still have a script for that somewhere but it is
easy enough to rewrite.

Joseph Rawson

unread,

Jun 19, 2009, 3:10:19 AM6/19/09

On Friday 19 June 2009 00:27:06 Goswin von Brederlow wrote:
> Joseph Rawson <umeb...@gmail.com> writes:
> > BTW, the subject of this thread is "apt-get wrapper for maintaining
> > Partial Mirrors". The solution I'm proposing is "a simple tool for
> > maintaining Partial Mirrors" (which could possibly be wrapped by apt-get
> > later).
> >
> > I think that just pursuing an "apt-get wrapper" leads to some
> > complications that could be avoided by creating the "partial mirror tool"
> > first, then looking at wrapping it later. One complication might be "how
> > do handle apt-get remove", and another might be "how to handle sid
> > libraries that disappear from official repository, yet local machines
> > must have them".
>
> Ahh, so maybe I completly misread that part.
>

It was my fault for not making this point clear, as I should've done. FWIW, I
would be much more interested in making a tool that would make it easier to
manage local/partial debian mirrors (i.e. one that helped resolve the
dependencies), rather than have an apt-get wrapper. I also think that once
such a tool is made, it would make it easier to build an apt-get wrapper that
works with it. I don't think that viewing the problem with an "apt-get
wrapper" solution is the best way to approach it, but I do think that it
would be valuable once the underlying problems are solved.

> Do you mean a wrapper around apt-get so that "apt-get install foo" on
> any client would automatically add "foo" to the list of packages being
> mirrored on the server?
>

It was the original poster who mentioned the apt-get wrapper, but I took it to
mean exactly what you said above. The tool I was envisioning would take a
short list of packages (a text file with package names separated by newlines,
or a collection of such text files) combined with a list of apt sources and
generate the partial mirror from just that information. There are still some
things that should be explicitly included in those lists, such as either
gamin, fam, or both, as an example.

> If so then you can configure a post invoke hook in apt that will copy
> the dpkg status file of the host to the server [as status.$(hostname)]
> and then use those on the server to generate the filter for
> reprepro. I think I still have a script for that somewhere but it is
> easy enough to rewrite.

That's good for binaries, but I don't know about the source. It wasn't long
ago that I noticed a problem with reprepro not obtaining the corresponding
source packages when you use a filter list taken
from "dpkg --get-selections". I remember that the source for jigdo wasn't
in my partial mirror, because there were no binaries named "jigdo",
rather "jigdo-file" and "jigdo-lite". Since there were no sources with that
name, the jigdo source was never mirrored on my partial mirror. I don't know
if that behavior has been fixed now, since there is now a binary named jigdo,
instead of jigdo-lite.

Also, it's more difficult for the local repository to determine the difference
between the automatically selected and manually selected packages in this
type of setup, since you would be sending a longer list of "manually selected
packages", instead of distinguishing which ones are actually selected. I
guess that it doesn't matter much, as a package would only be removed from
the repository once it's not listed on any of the lists. There were times
when I didn't want certain packages to be removed from the repository,
regardless of whether they were installed or not, so I used to run xxdiff on
the packages files, so the newer ones were added.

In my way of thinking, I'm not looking to merge upstream repositories together
in one repository. Besides, there are already tools, such as apt-move that
would be better for this job. Long ago, apt-move was the primary tool that I
used to keep a local repository, and it worked pretty well, as long as all
the machines that were using it were on the same release.

I have found that reprepro is the absolute best tool for maintaining a debian
mirror. The only problem I have with it is when I want to maintain a partial
mirror, and I don't want a merged repository, is that I have to spread the
packages lists to different places, and when you start adding machines, you
start adding more lists to the configuration, when it would probably be
better to maintain a set of "master" lists that are generated from the many
lists that come from the machines.

signature.asc

Joseph Rawson

unread,

Jun 19, 2009, 3:30:22 AM6/19/09

On Thursday 18 June 2009 03:17:13 Frank Lin PIAT wrote:
> On Tue, 2009-06-09 at 16:16 -0500, Joseph Rawson wrote:
> > On Tuesday 09 June 2009 13:14:53 sanket agarwal wrote:
> > > I had an idea in mind whereby the task of making mirrors for personal
> > > distributions can be automated.
>
> <lazy-way>
> Depending on what you want to achieve, a caching proxy might be an easy
> solution (there are a specialized in the archive already)
> </lazy-way>
>

Or possibly apt-move called as a post-invoke action of apt-get.

> > > This can be stated as: if a person
> > > wants to keep a customised set of packages for usage with the
> > > distribution, the tool should be able to develop dependencies, fetch
> > > packages, generate appropriate documentation and then create the
> > > corresponding directory structure in the target mirror! The task can
> > > be extended to include packages which are currently not under one of
> > > the standard mirrors!
>
> <lazy-way>
> One don't have to merge the repositories, one can just declare multiple
> sources in /etc/apt/*
> </lazy-way>
>

Then it becomes harder to send the package to the appropriate local
repository, since they aren't merged. I would also prefer to not have to
deal with a merged repository, but keep separate upstream partial mirrors, as
they would probably be easier to manage.

> > > I think the tool can have immense utility in helping people automate
> > > the task of mantaining the repositories. Suggestions, positive and
> > > negative are invited.
> > >
> > > I have not included the impl details as I would first like to evaluate
> > > the idea at a feasibility and utility level.
>
> If the scope of your project includes being able to bootstrap systems
> from the mirror, resolving dependency is much more complex (some
> packages aren't resolved by dependencies. For instance, the right kernel
> is select by some logic in Debian-installer).
> I found some interesting logic in debian-cd package.
>
> Still, I don't consider that allowing bootstrapping is mandatory. Your
> project would still be extremely valuable without it. [for those 95% of
> the people that install from CD, as opposed to netboot].
>

The reason that I recommended tying germinate and reprepro together with a
tool was because the original post was discussing "personal distributions".
To me, this implies the ability to bootstrap, and also the need to have
a "self building" source/binary repository.

I have just made some other responses to Goswin that should help explain my
view on things a bit better.

> Regards,
>
> Franklin

--
Thanks:
Joseph Rawson

signature.asc

Joseph Rawson

unread,

Jun 19, 2009, 3:50:12 AM6/19/09

On Thursday 18 June 2009 04:47:45 Goswin von Brederlow wrote:
> Frank Lin PIAT <fp...@klabs.be> writes:
> > On Tue, 2009-06-09 at 16:16 -0500, Joseph Rawson wrote:
> >> On Tuesday 09 June 2009 13:14:53 sanket agarwal wrote:
> >> > This can be stated as: if a person
> >> > wants to keep a customised set of packages for usage with the
> >> > distribution, the tool should be able to develop dependencies, fetch
> >> > packages, generate appropriate documentation and then create the
> >> > corresponding directory structure in the target mirror! The task can
> >> > be extended to include packages which are currently not under one of
> >> > the standard mirrors!
> >
> > <lazy-way>
> > One don't have to merge the repositories, one can just declare multiple
> > sources in /etc/apt/*
> > </lazy-way>
>
> Lets say I want to mirror xserver-xorg from experimental. Then I would
> want it to include xserver-xorg-core (>= xyz) also from experimental
> as the dependency dictates but not include libc6 from experimental as
> the sid one is sufficient.
>
> A key point here would be flexibility.

This is something that I haven't considered yet. This would be one of the
problems that might occur with the "post invoke hook" that you mentioned
earlier using dpkg status. Actually this wouldn't be much of a problem, I
was confused. I was thinking you were meaning "--get-selections" which just
returns the name of the package and "install/deinstall", but status also
contains the version being used, and this could be matched to the appropriate
repository in the sources list (so you get the libc from main instead of
experimental, since the status file uses the version that's in main).

However, I don't know how to use that info with reprepro. With reprepro, I've
only sent "--get-selections" lists to it. In fact, this is how I used to
install new packages in sid, and make sure they came from the local
repository first.

------------------------------------
#!/bin/bash
packages=`grep-status " install ok not-installed" | grep Package |
gawk '{print $2}'`
#packages=`aptitude search ~N | grep ^.i | gawk '{print $2}'`
touch conf/list-uninstalled.tmp
for package in $packages
do echo -e "$package\t\tinstall" >> conf/list-uninstalled.tmp
done
cat conf/list-uninstalled.tmp | uniq | sort > conf/list-uninstalled
rm conf/list-uninstalled.tmp
------------------------------------

You may be able to tell by looking at the script that I'm still in the process
of getting used to aptitude, being a longtime dselect user. ;)
Anyway, I don't know much about determining (with reprepro) which upstream
repository holds the version of the package that I want installed.

>
> >> > I think the tool can have immense utility in helping people automate
> >> > the task of mantaining the repositories. Suggestions, positive and
> >> > negative are invited.
> >> >
> >> > I have not included the impl details as I would first like to evaluate
> >> > the idea at a feasibility and utility level.
> >
> > If the scope of your project includes being able to bootstrap systems
> > from the mirror, resolving dependency is much more complex (some
> > packages aren't resolved by dependencies. For instance, the right kernel
> > is select by some logic in Debian-installer).
> > I found some interesting logic in debian-cd package.
>
> You would include "linux-image-<type>" in your package list. That
> isn't really a problem of the tool. Just of the input you need to provide.
> Also you would include everything udeb and everything
> essential/required for bootstraping purposes.
>

I was also thinking along those lines, too. Same with fam/gamin and other
packages that have "drop-in" replacements.

> Again flexibility is the key.
>
> > Still, I don't consider that allowing bootstrapping is mandatory. Your
> > project would still be extremely valuable without it. [for those 95% of
> > the people that install from CD, as opposed to netboot].
> >
> > Regards,
> >
> > Franklin
>
> MfG
> Goswin
>
> PS: the essential/required packages can already easily be filtered
> with grep-dctrl.

--
Thanks:
Joseph Rawson

signature.asc

Tzafrir Cohen

unread,

Jun 19, 2009, 6:20:08 AM6/19/09

On Fri, Jun 19, 2009 at 01:52:43AM -0500, Joseph Rawson wrote:

> would be much more interested in making a tool that would make it easier to
> manage local/partial debian mirrors (i.e. one that helped resolve the
> dependencies), rather than have an apt-get wrapper. I also think that once
> such a tool is made, it would make it easier to build an apt-get wrapper that
> works with it. I don't think that viewing the problem with an "apt-get
> wrapper" solution is the best way to approach it, but I do think that it
> would be valuable once the underlying problems are solved.

And reprepro does not fit the bill because?

Joseph Rawson

unread,

Jun 19, 2009, 7:30:23 AM6/19/09

On Friday 19 June 2009 05:09:31 Tzafrir Cohen wrote:
> On Fri, Jun 19, 2009 at 01:52:43AM -0500, Joseph Rawson wrote:
> > would be much more interested in making a tool that would make it easier
> > to manage local/partial debian mirrors (i.e. one that helped resolve the
> > dependencies), rather than have an apt-get wrapper. I also think that
> > once such a tool is made, it would make it easier to build an apt-get
> > wrapper that works with it. I don't think that viewing the problem with
> > an "apt-get wrapper" solution is the best way to approach it, but I do
> > think that it would be valuable once the underlying problems are solved.
>
> And reprepro does not fit the bill because?
>

It fits part of the bill, as it's an excellent tool for maintaining a
repository, but it doesn't resolve dependencies (nor should it).

--
Thanks:
Joseph Rawson

signature.asc

Joseph Rawson

unread,

Jun 19, 2009, 7:40:10 AM6/19/09

On Friday 19 June 2009 00:27:06 Goswin von Brederlow wrote:

> Joseph Rawson <umeb...@gmail.com> writes:
> > BTW, the subject of this thread is "apt-get wrapper for maintaining
> > Partial Mirrors". The solution I'm proposing is "a simple tool for
> > maintaining Partial Mirrors" (which could possibly be wrapped by apt-get
> > later).
> >
> > I think that just pursuing an "apt-get wrapper" leads to some
> > complications that could be avoided by creating the "partial mirror tool"
> > first, then looking at wrapping it later. One complication might be "how
> > do handle apt-get remove", and another might be "how to handle sid
> > libraries that disappear from official repository, yet local machines
> > must have them".
>
> Ahh, so maybe I completly misread that part.
>
> Do you mean a wrapper around apt-get so that "apt-get install foo" on
> any client would automatically add "foo" to the list of packages being
> mirrored on the server?
>
> If so then you can configure a post invoke hook in apt that will copy
> the dpkg status file of the host to the server [as status.$(hostname)]
> and then use those on the server to generate the filter for
> reprepro. I think I still have a script for that somewhere but it is
> easy enough to rewrite.
>

When you mentioned the word "hook", I was reminded of reprepro's ability to
use hooks. I started testing using a ListHook script with reprepro. I'm
attaching the script so you can see the general idea. The script doesn't do
anything effective, but may be helpful in understanding more of the way I'm
approaching the idea. Please don't laugh too hard, I'm just playing with
ideas now.

Among other possible reasons, there are two main reasons why this particular
approach won't work. One reason is that the ListHook calls a script for each
list independently. So, if you have a package in contrib that depends on a
package in main, like many do, the dependency won't be resolved using this
method. Also, the germinator object only handles one arch at a time, so if
you are mirroring multiple arches, you need to use a germinator object for
each one. One way that this problem can be countered is by running a simple
server that holds the germinator object, and the script that ListHook
executes would communicate with that server. Then the server would "grow"
the seeds and create the filter lists that would be used by reprepro.

I tried this approach because I didn't see the sense in downloading the
packages lists more than necessary. The way I was thinking before was to
seed germinate (which would download the package lists), parse the output,
create filter lists from that output, send them to reprepro, and call
reprepro to update. This forces all of those package lists to be downloaded
twice, which was something I tried to avoid with this short experiment.

It also seems to be somewhat difficult to "plant the seeds" into germinate
manually. I'm sure that problem could be solved by looking through the code
a bit longer.

testgerm

signature.asc

Bernhard R. Link

unread,

Jun 19, 2009, 9:00:17 AM6/19/09

* Joseph Rawson <umeb...@gmail.com> [090619 13:23]:

> On Friday 19 June 2009 05:09:31 Tzafrir Cohen wrote:
> > On Fri, Jun 19, 2009 at 01:52:43AM -0500, Joseph Rawson wrote:
> > > would be much more interested in making a tool that would make it easier
> > > to manage local/partial debian mirrors (i.e. one that helped resolve the
> > > dependencies), rather than have an apt-get wrapper. I also think that
> > > once such a tool is made, it would make it easier to build an apt-get
> > > wrapper that works with it. I don't think that viewing the problem with
> > > an "apt-get wrapper" solution is the best way to approach it, but I do
> > > think that it would be valuable once the underlying problems are solved.
> >
> > And reprepro does not fit the bill because?
> >
> It fits part of the bill, as it's an excellent tool for maintaining a
> repository, but it doesn't resolve dependencies (nor should it).

Actually, I'm quite open to having some depedency handling in reprepro
and already have written some simple prototype for a related project.
The problem is that calculating a simple cover of selected packages in
the dependency graph is not enough:

Usually the cover is not unique but the existance of alternatives in
dependencies causes multiple solutions. For an initial checkout that
is no problem, as one can choose one some set by some pseudo-random
selection (like "packages with alphabetically lower names get the
first depedency in an alternative tried first" and similar things
for virtual packages). The problem is that no such criterion can be
stable against changes in the partially mirrored distribution.

So in this cases knowing what packages upstream has and what packages
are wanted is not enough but one has to take into account what packages
are currently selected. And a simply covering no longer is enough but
one needs a full resolver knowing which installed states can be easily
brought to which other installed states. (and things get even more
complicated if the currently mirrored packages allow multiple subsets
which clients using this repository might have installed)...

Hochachtungsvoll,
Bernhard R. Link

Goswin von Brederlow

unread,

Jun 19, 2009, 2:00:20 PM6/19/09

Joseph Rawson <umeb...@gmail.com> writes:

> On Friday 19 June 2009 00:27:06 Goswin von Brederlow wrote:
>> Joseph Rawson <umeb...@gmail.com> writes:
>> If so then you can configure a post invoke hook in apt that will copy
>> the dpkg status file of the host to the server [as status.$(hostname)]
>> and then use those on the server to generate the filter for
>> reprepro. I think I still have a script for that somewhere but it is
>> easy enough to rewrite.
> That's good for binaries, but I don't know about the source. It wasn't long
> ago that I noticed a problem with reprepro not obtaining the corresponding
> source packages when you use a filter list taken
> from "dpkg --get-selections". I remember that the source for jigdo wasn't
> in my partial mirror, because there were no binaries named "jigdo",
> rather "jigdo-file" and "jigdo-lite". Since there were no sources with that
> name, the jigdo source was never mirrored on my partial mirror. I don't know
> if that behavior has been fixed now, since there is now a binary named jigdo,
> instead of jigdo-lite.

My filter first converted the packages listed in the status file(s) to
source package names (packages with different name have a "Source:"
entry) and then output those for sources.

> Also, it's more difficult for the local repository to determine the difference
> between the automatically selected and manually selected packages in this
> type of setup, since you would be sending a longer list of "manually selected
> packages", instead of distinguishing which ones are actually selected. I
> guess that it doesn't matter much, as a package would only be removed from
> the repository once it's not listed on any of the lists. There were times
> when I didn't want certain packages to be removed from the repository,
> regardless of whether they were installed or not, so I used to run xxdiff on
> the packages files, so the newer ones were added.

Same problem here. Esspecially build-depends. There where a lot of
packages I only needed inside my build chroots and only for the time
of the build. So they never showed up on the mirror. Then I just
resized the mirror partition and mirrored all debs.

> In my way of thinking, I'm not looking to merge upstream repositories together
> in one repository. Besides, there are already tools, such as apt-move that
> would be better for this job. Long ago, apt-move was the primary tool that I
> used to keep a local repository, and it worked pretty well, as long as all
> the machines that were using it were on the same release.
>
> I have found that reprepro is the absolute best tool for maintaining a debian
> mirror. The only problem I have with it is when I want to maintain a partial
> mirror, and I don't want a merged repository, is that I have to spread the
> packages lists to different places, and when you start adding machines, you
> start adding more lists to the configuration, when it would probably be
> better to maintain a set of "master" lists that are generated from the many
> lists that come from the machines.

Or have a proxy that adds packages that are requested.

Joseph Rawson

unread,

Jun 19, 2009, 9:40:14 PM6/19/09

On Friday 19 June 2009 07:14:08 Bernhard R. Link wrote:

> Actually, I'm quite open to having some depedency handling in reprepro

That is interesting. I've been working on the assumption that there would
never be any dependency handling in reprepro, as I didn't consider it part of
it's function.

> and already have written some simple prototype for a related project.
> The problem is that calculating a simple cover of selected packages in
> the dependency graph is not enough:
>
> Usually the cover is not unique but the existance of alternatives in
> dependencies causes multiple solutions.

This is a problem across the board. Even aptitude seems to have problems in
automatically determining the most appropriate dependencies.

Let's use this example. Suppose you already have a system with apache2
installed, but no php yet. Next you try to install phpldapadmin, using
aptitude (from the command line). Aptitude will tell you that
libapache-mod-php5 is broken, and proceed to present some alternatives that
would resolve the dependencies.

--------------------------------------------------------------------
umeboshi@stdinstall:~$ sudo aptitude -s install phpldapadmin
Reading package lists... Done
Building dependency tree
Reading state information... Done
Reading extended state information
Initializing package states... Done
Reading task descriptions... Done
The following packages are BROKEN:
libapache-mod-php5
The following NEW packages will be installed:
php5-common{a} php5-ldap{a} phpldapadmin
0 packages upgraded, 4 newly installed, 0 to remove and 0 not upgraded.
Need to get 3821kB of archives. After unpacking 11.8MB will be used.
The following packages have unmet dependencies:
libapache-mod-php5: Depends: libdb4.4 which is a virtual package.
Depends: apache-common (>= 1.3.34) which is a virtual
package.
Depends: php5-common (= 5.2.0-10+lenny1) but
5.2.6.dfsg.1-1+lenny3 is to be installed.
The following actions will resolve these dependencies:

Install the following packages:
libapache2-mod-php5 [5.2.6.dfsg.1-1+lenny3 (stable)]

Keep the following packages at their current version:
libapache-mod-php5 [Not Installed]

Score is 50

Accept this solution? [Y/n/q/?] n
The following actions will resolve these dependencies:

Install the following packages:
php5-cgi [5.2.6.dfsg.1-1+lenny3 (stable)]

Keep the following packages at their current version:
libapache-mod-php5 [Not Installed]

Score is 50

Accept this solution? [Y/n/q/?] n
The following actions will resolve these dependencies:

Install the following packages:
libapache2-mod-php5 [5.2.6.dfsg.1-1+lenny2 (stable)]
php5-common [5.2.6.dfsg.1-1+lenny2 (stable)]
php5-ldap [5.2.6.dfsg.1-1+lenny2 (stable)]

Keep the following packages at their current version:
libapache-mod-php5 [Not Installed]

Score is -30

--------------------------------------------------------------------
etc, etc, etc .....

apt-get, on the other hand, seems to use the first dependency that's listed as
an alternative.

Here, since we already have apache2 on the system, libapache2-mod-php5 is
chosen (I'm guessing because it's the first one listed).

> For an initial checkout that
> is no problem, as one can choose one some set by some pseudo-random
> selection (like "packages with alphabetically lower names get the
> first depedency in an alternative tried first" and similar things
> for virtual packages).

I think that it should be up to the maintainer of the local mirror to
explicitly list the alternatives that are preferred. I don't think that
there is anyway that an automatic dependency resolver will ever be able to do
this. The automatic dependency resolver can make this easier by marking
those dependencies as "automatically selected, alternative available" or
something similar. One of the nice things about germinate, is that it has
a "why" column in it's output that tells why a package was selected (although
it doesn't make it clear that it's one of many alternatives).

> The problem is that no such criterion can be
> stable against changes in the partially mirrored distribution.

I'm not sure what you mean here. Are you talking about an alternative that's
selected for the local mirror, but removed from the official mirror?

>
> So in this cases knowing what packages upstream has and what packages
> are wanted is not enough but one has to take into account what packages
> are currently selected. And a simply covering no longer is enough but
> one needs a full resolver knowing which installed states can be easily
> brought to which other installed states. (and things get even more
> complicated if the currently mirrored packages allow multiple subsets
> which clients using this repository might have installed)...
>

I used to have to keep outdated libraries in my filter list when I was using a
partial sid mirror, as some packages would become uninstallable without them.
I've learned over the course of years that you can't run from a snapshot of
sid, but rather have to use it for a few months to get the dependencies to
work out, even though many of those dependencies have changed versions in the
official repository.
But really, that last paragraph is me trying to understand what you were
saying. You went a bit above my head, and I'm having trouble following you.

> Hochachtungsvoll,
> Bernhard R. Link

--
Thanks:
Joseph Rawson

signature.asc

Tzafrir Cohen

unread,

Jun 19, 2009, 10:00:13 PM6/19/09

On Fri, Jun 19, 2009 at 06:23:08AM -0500, Joseph Rawson wrote:
> On Friday 19 June 2009 05:09:31 Tzafrir Cohen wrote:
> > On Fri, Jun 19, 2009 at 01:52:43AM -0500, Joseph Rawson wrote:
> > > would be much more interested in making a tool that would make it easier
> > > to manage local/partial debian mirrors (i.e. one that helped resolve the
> > > dependencies), rather than have an apt-get wrapper. I also think that
> > > once such a tool is made, it would make it easier to build an apt-get
> > > wrapper that works with it. I don't think that viewing the problem with
> > > an "apt-get wrapper" solution is the best way to approach it, but I do
> > > think that it would be valuable once the underlying problems are solved.
> >
> > And reprepro does not fit the bill because?
> >
> It fits part of the bill, as it's an excellent tool for maintaining a
> repository, but it doesn't resolve dependencies (nor should it).

Just in case it might help, here's a script we used internally (at the
Sarge time) to maintain a dummy repository that would help us eventually
resolve an original list of packages to a complete list of packages we
ask a reprepro source to update.

Tzafrir Cohen

unread,

Jun 19, 2009, 10:00:16 PM6/19/09

While it's a good queastion, the interface I'm used to use is apt-get /
aptitude. Thus the interface I had in mind is "a list of packages to
install" (in a single installation). Using some tweaking this allows you
to get exactly what you want.

If you want your repository to include conflicting options, you should
allow the interface to include multiple such entries. In our case we had
multiple files. Each file was a list of packages, and each file was
basically a "single apt-get command".

Joseph Rawson

unread,

Jun 19, 2009, 10:20:11 PM6/19/09

On Friday 19 June 2009 20:54:28 Tzafrir Cohen wrote:
> On Fri, Jun 19, 2009 at 06:23:08AM -0500, Joseph Rawson wrote:
> > On Friday 19 June 2009 05:09:31 Tzafrir Cohen wrote:
> > > On Fri, Jun 19, 2009 at 01:52:43AM -0500, Joseph Rawson wrote:
> > > > would be much more interested in making a tool that would make it
> > > > easier to manage local/partial debian mirrors (i.e. one that helped
> > > > resolve the dependencies), rather than have an apt-get wrapper. I
> > > > also think that once such a tool is made, it would make it easier to
> > > > build an apt-get wrapper that works with it. I don't think that
> > > > viewing the problem with an "apt-get wrapper" solution is the best
> > > > way to approach it, but I do think that it would be valuable once the
> > > > underlying problems are solved.
> > >
> > > And reprepro does not fit the bill because?
> >
> > It fits part of the bill, as it's an excellent tool for maintaining a
> > repository, but it doesn't resolve dependencies (nor should it).
>
> Just in case it might help, here's a script we used internally (at the
> Sarge time) to maintain a dummy repository that would help us eventually
> resolve an original list of packages to a complete list of packages we
> ask a reprepro source to update.
>

Did you forget to attach it? :)

--
Thanks:
Joseph Rawson

signature.asc

Joseph Rawson

unread,

Jun 19, 2009, 10:20:10 PM6/19/09

That was my ultimate solution to the problem. I bought one of the new
terabyte usb external drives and just mirrored the whole repository. I had
been satisfied to just call the problem solved at that point, but this thread
resparked my interest in obtaining a better solution. Before I bought the
hard drive, I was seriously looking into getting germinate and reprepro
working together, but once I bought the drive, I just set it all aside.
Still, this external drive isn't portable, and my small portable drive is
only 80G (which is more than enough for a partial mirror of source, i386, and
amd64), so I do still need to solve the problem. Besides, a month after I
bought the drive, I discovered that I have a monthly cap on my transfers so
it would be better, all around, to stop mirroring the complete repository.

> > In my way of thinking, I'm not looking to merge upstream repositories
> > together in one repository. Besides, there are already tools, such as
> > apt-move that would be better for this job. Long ago, apt-move was the
> > primary tool that I used to keep a local repository, and it worked pretty
> > well, as long as all the machines that were using it were on the same
> > release.
> >
> > I have found that reprepro is the absolute best tool for maintaining a
> > debian mirror. The only problem I have with it is when I want to
> > maintain a partial mirror, and I don't want a merged repository, is that
> > I have to spread the packages lists to different places, and when you
> > start adding machines, you start adding more lists to the configuration,
> > when it would probably be better to maintain a set of "master" lists that
> > are generated from the many lists that come from the machines.
>
> Or have a proxy that adds packages that are requested.

When I woke up this morning, I was thinking that it might be interesting to
have an apt method that talks directly to reprepro. It's just a vague idea
now, but I'll give it some more thought later.

signature.asc

Tzafrir Cohen

unread,

Jun 19, 2009, 11:00:14 PM6/19/09

Actually attaching the file this time...

apter

Goswin von Brederlow

unread,

Jun 20, 2009, 4:20:12 AM6/20/09

Joseph Rawson <umeb...@gmail.com> writes:

Way too much latency to mirror a deb when requested and you need to
run apt-get update for it to show up.

The best you can do is add the package to the filter list and then
fetch it directly. Then the next night the mirror will pick it up for
future updates.

But now you made me think about this too. So here is what I think:

- My bandwidth at home is fast enough to fetch packages directly. No
need to mirror at all.

- I don't want to download a package multiple times (once per host) so
some shared proxy would be good.

- Bootstraping a chroot still benefits from local packages but a
shared proxy would do there too.

- When I'm not at home I might not have network access or only a slow
one so then I need a mirror. And my parents computer has a Linux that
only I use and that needs a major update every time I vistit.

So the ideal setup would be an apt proxy that stores the packages in
the normal pool structure and has a simple command to create
Packages.gz, Sources.gz, Release and Release.gpg files so the cache
directory can be copied onto a USB disk and used as a repository of
its own.

Optional the apt proxy could prefetch package versions but for me that
wouldn't be a high priority.

Nice would be that it fetches sources along with binaries. When I find
a bug in some software while traveling I would hate to not have the
source available to fix it. But then it also needs to fetch
Build-depends and their depends. So that would complicate matters a
lot.

Joseph Rawson

unread,

Jun 21, 2009, 1:30:12 AM6/21/09

On Saturday 20 June 2009 03:16:33 Goswin von Brederlow wrote:
> Joseph Rawson <umeb...@gmail.com> writes:
> > On Friday 19 June 2009 12:57:25 Goswin von Brederlow wrote:
> >> Or have a proxy that adds packages that are requested.
> >
> > When I woke up this morning, I was thinking that it might be interesting
> > to have an apt method that talks directly to reprepro. It's just a vague
> > idea now, but I'll give it some more thought later.
>
> Way too much latency to mirror a deb when requested and you need to
> run apt-get update for it to show up.
>
> The best you can do is add the package to the filter list and then
> fetch it directly. Then the next night the mirror will pick it up for
> future updates.
>

What I had in mind would eliminate a large part of the latency, and also keep
from downloading the deb twice.

Use a server application (I'll call it repserve for now) on the machine that
hosts the reprepro repository.

apt-get update
The apt method talks to repserve, then repserve tells reprepro to run either
update or checkupdate, then repserve feeds the appropriate files from the
reprepro lists/ director(y/ies) back to the apt-get process on the local
machine. This would probably use a bit more bandwidth (at least for the
first update) since apt-get will download .pdiff files, where reprepro just
grabs the whole Packages.gz files.

apt-get install, upgrade, build-dep
The apt method determines which source in it's apt lists to retrieve the
package from, then sends that info to repserve. Repserve looks in it's
repositor(y/ies) to determine where those packages are (or if they aren't yet
mirrored), probably by scanning the filter lists. Repserve then tells
reprepro to update in the appropriate repositories (if necessary). Then
repserve signals the local client (or local client polls repserve), and the
debs are then transferred from reprepro repos to local client. After that,
the repserve process could instruct reprepro to retrieve the sources, if it's
configured to do that. Also, it could try and determine build deps for those
packages, and retrieve them and the sources, if it's configured to do that as
well. With retrieving builddeps enabled, there might be a problem in having
to explicitly list preferred alternatives, but this is mainly for packages
that have drop-in replacements for libfoo-dev, like libgamin-dev provides
libfam-dev.

This is still just a rough idea. One of the interesting things about using an
idea like this, is that it can still allow reprepro to be used in the normal
way, so you can have a couple of machines that instruct repserve to help
maintain the repository, and other machines on the network can just use
reprepro directly through apache, ftp, etc. The "controlling" machines would
have a sources.list like:

deb repserve://myhost/debrepos/debian lenny main contrib non-free

The repserve method on the client would send that line to the repserve server.
The server would parse the line and match it to the appropriate repository
from its configuration.

The other hosts would just have this in sources.list:

deb http://myhost/debrepos/debian lenny main contrib non-free

The hosts using repserve could be the only ones with filter lists in reprepro,
but it may be desired to have filter lists from the other machines, also.
This would help keep packages from disappearing from the pool when they are
still needed. It may also be nice to use reprepro's snapshotting each time a
repserve method updates a repository, although this may require using those
snapshot urls on the hosts that aren't using repserve.

>
> But now you made me think about this too. So here is what I think:
>
> - My bandwidth at home is fast enough to fetch packages directly. No
> need to mirror at all.
>
> - I don't want to download a package multiple times (once per host) so
> some shared proxy would be good.
>

My idea would keep that from happening, at the expense of latency. The
latency would be minimal, as it would just be dependant on reprepro
retrieving the package(s) and signalling the client that the package is
ready. Using reprepro to add extra packages to the repository from upstream
without doing a full update may not be possible, but if it were, the latency
would certainly be minimum, and the bandwidth to the internet would also be
minimum. I just looked at the manpage again, and this may be possible by
using the --nolistsdownload option with the update/checkupdate command.

> - Bootstraping a chroot still benefits from local packages but a
> shared proxy would do there too.
>
> - When I'm not at home I might not have network access or only a slow
> one so then I need a mirror. And my parents computer has a Linux that
> only I use and that needs a major update every time I vistit.
>
> So the ideal setup would be an apt proxy that stores the packages in
> the normal pool structure and has a simple command to create
> Packages.gz, Sources.gz, Release and Release.gpg files so the cache
> directory can be copied onto a USB disk and used as a repository of
> its own.
>

Getting reprepro to do this would save a lot of the hassle, but getting
reprepro to act as an apt proxy is also tricky. The current cache and proxy
methods in the apt-proxy and apt-cache packages don't work as well in making
a good repository, as opposed to reprepro.

The Release could be signed using an rsign method with the machine(s) that
manage the repository, or it could be done locally on the server using
gpg-agent, or an unencrypted private key, depending on how the administrator
prefers to manage it.

> Optional the apt proxy could prefetch package versions but for me that
> wouldn't be a high priority.
>
> Nice would be that it fetches sources along with binaries. When I find
> a bug in some software while traveling I would hate to not have the
> source available to fix it. But then it also needs to fetch
> Build-depends and their depends. So that would complicate matters a
> lot.

I mentioned that part above.
>
> MfG
> Goswin

Overall, I think that reprepro does a good job of maintaining a local
repository, and we shouldn't reimplement what it does. Reprepro also seems
flexible enough to implement most of the backend with simple commands and
options. I've never tried to implement a new apt-method before, so I think
that would take a bit more research from me.

My uses:

- I have an automated installer that I test and improve frequently. Using a
local mirror is a requirement for this. A partial mirror would help to keep
me from using as much space, and keep from downloading packages I'll never
use.

- I've been using full mirrors, but I need a partial mirror that I can carry
with me, so I can use the installer on location, instead of having to bring a
machine back with me.

- I have a mirror of lenny-backports (source only). When I need to backport a
package, I install a builder machine (using the automated installer) with
virtualbox, and send a .dsc from that mirror to the builder machine using
cowpoke, then send the package to the local repository (in this case,
separate from the source mirror, where the packages are set for auto-install,
instead of having to use the -t option in apt). It's also separate, since
there are a few packages from sid in there as well, that aren't at
backports.org.

--
Thanks:
Joseph Rawson

signature.asc

Goswin von Brederlow

unread,

Jun 21, 2009, 4:40:09 AM6/21/09

Joseph Rawson <umeb...@gmail.com> writes:

The simplest implementation would be a tiny proxy applet that, when a
deb file is requested, checks if the file is in the local
archive. If it is then send it. If not then request file from
upstream and pipe it to apt (no latency) and a tempfile. When the
download has finished then reprepro --include suite deb. Doing the
same for source is a little more tricky as you needs the dsc and
related files as a group.

>> Optional the apt proxy could prefetch package versions but for me that
>> wouldn't be a high priority.
>>
>> Nice would be that it fetches sources along with binaries. When I find
>> a bug in some software while traveling I would hate to not have the
>> source available to fix it. But then it also needs to fetch
>> Build-depends and their depends. So that would complicate matters a
>> lot.
> I mentioned that part above.
>>
>> MfG
>> Goswin
>
> Overall, I think that reprepro does a good job of maintaining a local
> repository, and we shouldn't reimplement what it does. Reprepro also seems
> flexible enough to implement most of the backend with simple commands and
> options. I've never tried to implement a new apt-method before, so I think
> that would take a bit more research from me.

I totally agree that reprepro as the cache/storage backend would be
great use of existing software.

The problem I have with it being an apt method is that the apt method
runs on a different host than the reprepro. That would require ssh
logins from all participating clients or something to alter the
reprepro filter.

Joseph Rawson

unread,

Jun 25, 2009, 6:50:15 AM6/25/09

On Sunday 21 June 2009 03:33:33 Goswin von Brederlow wrote:
<snip>

> > The Release could be signed using an rsign method with the machine(s)
> > that manage the repository, or it could be done locally on the server
> > using gpg-agent, or an unencrypted private key, depending on how the
> > administrator prefers to manage it.
>
> The simplest implementation would be a tiny proxy applet that, when a
> deb file is requested, checks if the file is in the local
> archive. If it is then send it. If not then request file from
> upstream and pipe it to apt (no latency) and a tempfile. When the
> download has finished then reprepro --include suite deb. Doing the
> same for source is a little more tricky as you needs the dsc and
> related files as a group.
>

I don't understand the tempfile part. Otherwise, that's a better idea, since
my idea depended on running reprepro update, then sending the appropriate
debs.

> >> Optional the apt proxy could prefetch package versions but for me that
> >> wouldn't be a high priority.
> >>
> >> Nice would be that it fetches sources along with binaries. When I find
> >> a bug in some software while traveling I would hate to not have the
> >> source available to fix it. But then it also needs to fetch
> >> Build-depends and their depends. So that would complicate matters a
> >> lot.
> >
> > I mentioned that part above.
> >
> >> MfG
> >> Goswin
> >
> > Overall, I think that reprepro does a good job of maintaining a local
> > repository, and we shouldn't reimplement what it does. Reprepro also
> > seems flexible enough to implement most of the backend with simple
> > commands and options. I've never tried to implement a new apt-method
> > before, so I think that would take a bit more research from me.
>
> I totally agree that reprepro as the cache/storage backend would be
> great use of existing software.
>

This is where I'm starting the code. Since regardless of how the partial
mirror(s) will be managed, we agree that using reprepro as the backend is the
best choice, I decided to start making a "frontend" or more
appropriately "middle-layer" for this. Making this part simple enough to use
with the most likely used configuration, while keeping the option to be
almost as flexible as reprepro is has been a quite bit of work and thought.

I have been working from the assumption that the local repository won't be a
merged repository, but will be a set of partial mirrors. By this I mean
that "debian.org" doesn't have to be merged with "backports.org",
but "sid/debian.org" may be in the same repository as "lenny/debian.org"
(although even this could be separate, even if not recommended). What I'm
saying is that I'm trying to allow either separate or merged repositories to
be used where they make the most sense.

> The problem I have with it being an apt method is that the apt method
> runs on a different host than the reprepro. That would require ssh
> logins from all participating clients or something to alter the
> reprepro filter.

I didn't stop to think about authentication, but I agree that it adds another
level of work. I took a bit of time to try and read up on how apt transport
methods work, but I didn't get very far. The only two transport methods that
are available now are https and debtorrent. Both of those are written in C,
which I'm not very good at using.

I think that I'm just going to work on the basics of controlling reprepro, and
adding/merging/removing filterlists, and when I'm satisfied that's working
properly it'll be easier to decide how to control/manage it. I think that it
will be better to work in that direction first, since it will be needed
anyway.

I have a small amount of code that I've started on. It doesn't do anything
yet, but create the distribution and updates files in the conf/
directory(ies). I also have a bit of code to help merge filterlists, but I
don't have any code that actually creates the lists and uses them in the
reprepro config. Once I figure out where to upload the code, I'll let you
know.

--
Thanks:
Joseph Rawson

signature.asc

Goswin von Brederlow

unread,

Jun 25, 2009, 7:50:11 AM6/25/09

Joseph Rawson <umeb...@gmail.com> writes:

> On Sunday 21 June 2009 03:33:33 Goswin von Brederlow wrote:
> <snip>
>> > The Release could be signed using an rsign method with the machine(s)
>> > that manage the repository, or it could be done locally on the server
>> > using gpg-agent, or an unencrypted private key, depending on how the
>> > administrator prefers to manage it.
>>
>> The simplest implementation would be a tiny proxy applet that, when a
>> deb file is requested, checks if the file is in the local
>> archive. If it is then send it. If not then request file from
>> upstream and pipe it to apt (no latency) and a tempfile. When the
>> download has finished then reprepro --include suite deb. Doing the
>> same for source is a little more tricky as you needs the dsc and
>> related files as a group.
>>
> I don't understand the tempfile part. Otherwise, that's a better idea, since
> my idea depended on running reprepro update, then sending the appropriate
> debs.

A tempfile so after download the proxy can run:
reprepro include sid foo.deb

Joseph Rawson

unread,

Jul 3, 2009, 3:00:21 PM7/3/09

On Sunday 21 June 2009 03:33:33 Goswin von Brederlow wrote:
<snip>

> > Overall, I think that reprepro does a good job of maintaining a local
> > repository, and we shouldn't reimplement what it does. Reprepro also
> > seems flexible enough to implement most of the backend with simple
> > commands and options. I've never tried to implement a new apt-method
> > before, so I think that would take a bit more research from me.
>
> I totally agree that reprepro as the cache/storage backend would be
> great use of existing software.
>
> The problem I have with it being an apt method is that the apt method
> runs on a different host than the reprepro. That would require ssh
> logins from all participating clients or something to alter the
> reprepro filter.
>
> MfG
> Goswin

I've started on writing the reprepro frontend part of the program. The
frontend isn't really complete, but I think it's a pretty good start.

I decided to make a system user "repserve" that will control reprepro. This
makes it easier to generate and use a gpg key with an unencrypted private
key. Since the key should only be readable by the repserve user, and since
the application is designed to make personal, partial mirrors, I think that
this strategy should be sufficient for how it will be mainly used. The
repserve user's home is at /var/lib/repserve

Running "repserve intialize" should create the gpg key, export the key to
/var/www/repserve/archive.gpg, and create the initial repserve.conf file. I
don't know what to do about gathering entropy so that the key can be made
more automatically.

Instead of directly configuring reprepro, I've put the main configuration in
a python config file (read by ConfigParser). I've made code that parses this
config file (~/repserve.conf), and generates the reprepro configuration from
this. This should make it easier to generate some of the more commonly used
configurations for reprepro.

I've made it so that the configuration can be filled from the contents of a
sources.list file. Every 'deb' line uses dpkg --print-architecture to
determine the arch to use. The deb-src lines use the "source" arch in
reprepro. There is an --arch option that will let you specify the arch to
use for the 'deb' entries, so it can be used like so:

repserve addsources /etc/apt/sources.list
repserve --arch=i386 addsources /etc/apt/sources.list

Sources can also be fed from stdin:
cat /etc/apt/sources.list | repserve --arch=c64 addsources

cat /etc/apt/sources.list | ssh repserve@mirror repserve addsources

I have made a simple function that tries to guess the name of the repository
from the method. So a method like http://security.debian.org/ gets the
name "security", etc. This function doesn't work all that well yet, at it
doesn't try to look at the names of the official mirrors to figure out if
it's a debian mirror, and use the name "debian" for those. If a name isn't
guessed, it tries to use "repos1", "repos2", etc. Adding additional sources
will use the same repository name for each method already contained in the
config file, so once you set a name, it should stay set. Changing the
upstream mirror in the sources.list may cause an extra repository to be made
if the mirror isn't identified in the guessing function.

For each source in the sources.list, the Release file is retrieved and parsed.
From the Release file the extra options such as Origin, Version, etc. are
used. This make a better reprepro configuration without having to manually
fill out all those fields. The release files are currently being retrieved
using urllib2, but should be using python-apt. I haven't had time to mess
with this yet, as I wanted to get other parts working first.

The reprepro configuration isn't created automatically after adding sources,
in case some of those repository names need to be changed. The reprepro
configuration is created by:

repserve reconfigure

Each unique url in the sources list defines a separate repository. Each
section in the repserve.conf file corresponds to a repository and the dist
(codename). Each repository is split between the basedir and outdir, which
makes it easier to use the outdirs in apache, or maybe ftpd (the default
outdir parent is /var/www/repserve). The basedirs are located
in /var/lib/repserve/repos-db/ .

I have started making the bare minimum code to help manage filterlists. Since
it hasn't really been decided how those lists are to be generated, and which
repository is going to use which filterlist, I'm somewhat stuck here. I've
tried to keep things flexible, so once something is decided, it should be
relatively easy to implement.

Reprepro isn't really being used yet. Only the configuration is being managed
so far, which has been the most difficult part. There is some code that
handles running reprepro, but it hasn't really been used yet. Only update
and export are handled now, but it shouldn't be too difficult get this part
going. I have been more concerned with getting reprepro configured in a way
that makes it easy to use as a backend with a simple frontend configuration.

I'm not really happy with the name "repserve", but I picked it out of the air,
because I needed to start with something. I would like to use another name,
but I can't think of one that will work. I'm open to suggestion here. I'm
also open to suggestion concerning anything that I've written above, although
some suggestions should be accompanied by a patch or example. I would really
like to gather suggestions on how to name the repositories. I think that my
guessing function is a good start, but it could use a lot of improvement.
When the guessing gets good enough, the function could raise a warning when a
name couldn't be guessed, so the user can then edit the repserve.conf to fix
the problem.

All the code is here:
svn://svn.berlios.de/paella/repserve/trunk

I would like to move the code to it's own project space, but I need to name it
something before that happens.

I have been playing around with germinate a bit, in case we want to make a
short list of manually selected packages, and use germinate to resolve the
dependencies and create the filterlists. I don't expect this part to be
working properly anytime soon.

--
Thanks:
Joseph Rawson

signature.asc

0 new messages