can we reduce python-daemon's dependencies?

hun...@napofearth.com

ongelezen,

20 okt 2014, 19:51:5920-10-2014

aan wa...@googlegroups.com

Hi,

I've been using WAL-E for a few months for backups and recovery. It's

great!

WAL-E does, however, pull in 20 Python dependencies.* For an infrastructure tool,

that's almost an unmanageably large number.

And thus it was a bit of a disappointment to see the new release

candidate (thanks for WALE_S3_ENDPOINT, btw!) introduce three more

dependencies through the use of python-daemon:

python-daemon>=1.5.2

setuptools (through python-daemon)

lockfile>=0.7 (through python-daemon)

The dependency on setuptools is particularly pernicious, since pretty

much all our virtualenvs are using distribute.

It seems like WAL-E only uses python-daemon in one place during

operator.backup.wal_restore(). Is there any chance that could be

refactored to use one of the various vendored daemon implementations?

Or, is there any chance you all could persuade the python-daemon

maintainer to update his install_requires so that it doesn't include

setuptools?

Thanks much,

HJB

* Although the readme says "It is possible to use WAL-E without the

dependencies of back-end storage one does not use installed: the imports

for those are only performed if the storage configuration demands their

use," I've never found an operationalizable way to install only the

packages I'd need. So I end up with half the OpenStack API libraries

and Azure, just in order to pull things from S3.

The list as of 0.7.2 was:

Babel==1.3

argparse==1.2.1

azure==0.8.2

boto==2.32.1

certifi==14.05.14

gevent==1.0.1

greenlet==0.4.3

iso8601==0.1.10

lxml==3.3.6

netaddr==0.7.12

oslo.config==1.3.0

pbr==0.10.0

prettytable==0.7.2

python-keystoneclient==0.10.1

python-swiftclient==2.2.0

pytz==2014.4

requests==2.4.0

simplejson==3.6.3

six==1.7.3

stevedore==0.14.1

wal-e==0.7.2

Daniel Farina

ongelezen,

20 okt 2014, 20:10:4720-10-2014

aan hun...@napofearth.com, wa...@googlegroups.com

On Mon, Oct 20, 2014 at 4:51 PM, <hun...@napofearth.com> wrote:
> Hi,
>
> I've been using WAL-E for a few months for backups and recovery. It's
> great!
>
> WAL-E does, however, pull in 20 Python dependencies.* For an infrastructure
> tool,
> that's almost an unmanageably large number.

Regrettably a ton of these are from OpenStack. As it would turn out,
WAL-E does enough lazy-loading that not having all the dependencies
for all backends is not a problem.

One person suggested making all backends optional, but I deep-sixed
this idea on the basis that, unrefined, it means that the
default-installation of WAL-E would be exactly useless.

A better suggestion that would require some work is to have other
packages like "wal-e-{s3,openstack,azure}" that would do the obvious
and not install the other junk.

> And thus it was a bit of a disappointment to see the new release
> candidate (thanks for WALE_S3_ENDPOINT, btw!) introduce three more
> dependencies through the use of python-daemon:
>
> python-daemon>=1.5.2
> setuptools (through python-daemon)
> lockfile>=0.7 (through python-daemon)
>
> The dependency on setuptools is particularly pernicious, since pretty
> much all our virtualenvs are using distribute.

I think it'd be reasonable to take stewardship of the daemonization
parts to avoid this, yes. Consider that I decided to own subprocess
because of unbackpatched bug fixes :/

> It seems like WAL-E only uses python-daemon in one place during
> operator.backup.wal_restore(). Is there any chance that could be
> refactored to use one of the various vendored daemon implementations?

Yes. Do you have a recommendation of one that works very reliably?

> Or, is there any chance you all could persuade the python-daemon
> maintainer to update his install_requires so that it doesn't include
> setuptools?

Maybe? In this case given cutting a dependency is tractable I'd be
okay absorbing one under my maintenance for now.

Do you have any interest in submitting a patch that does as you say
about the daemon bits?

(You might wonder why that's even there: it's for the
apparently-very-useful pipelined and parallel WAL download on restore)

Daniel Farina

ongelezen,

21 okt 2014, 15:13:2721-10-2014

aan Hunter Blanks, wa...@googlegroups.com

On Mon, Oct 20, 2014 at 11:24 PM, Hunter Blanks <hun...@napofearth.com> wrote:
> Daniel,
>
> Thanks for writing! Indeed, most of the deps are OpenStack. The ones that
> aren't are basically gevent, and but I think others have already touched on
> the long-term goal of using multiprocessing as an alternative.
>
> So far as limiting backend deps, I'd agree that wal-e-s3, etc. packages are
> probably the way to go, though it would take a little care to make those
> packages work out of the same repo.
>
> As for python-daemon, your reckoning may differ, but my own list of
> preferences would be:
>
> If you don't have the requirements in
> http://legacy.python.org/dev/peps/pep-3143/#correct-daemon-behaviour, then
> just use subprocess to farm out the fetches. My own reckoning, though, is
> that you must need them or else you wouldn't have gone to the trouble.
> (Maybe to prevent shared network FD's, maybe to prevent polluting stdout /
> stderr; the rationale is fairly clear, but I'm either too casual or slow a
> reader to say.)

Ah. I'll itemize the rationale:

Postgres does a system() call to run the archive fetch command, and
blocks until it is complete, then applies the fetched WAL. If one
only does parallelism, the result is that, say, 8 WAL segments will be
downloaded in parallel and then Postgres will apply them...something
that can take some time. And in that time, no downloads are
happening, which is a big loss.

So, to get around this synchronous API, it's necessary to detach from
the parent process and be downloading even after the parent invoked by
Postgres returns with WAL segment in-place. A small exacted cost is
that new parent processes look-aside at the prefetch directory first
to see if it can promote a segment downloaded in the background in
this way.

Do something like "watch find pg_xlog/.wal-e" to see the directories
backing this dance in action while a database catches up in
"wal-fetch".

> If you do require everything that is "daemonization", and you're willing to
> maintain it yourself, Alex Martelli's daemonization example is pretty
> straightforward and where I usually end up. On the one hand, it is sad that
> this stuff never has made it into the standard library. On the other,
> daemonization has just enough differences of opinion that the "one way to do
> it" may never make it in, notwithstanding . (For a hint of all that, see the
> in-depth comments and example Alex refers to at
> http://code.activestate.com/recipes/278731/. Mr. Finney also has quite a
> good discussion in his PEP from 2009.) In the rare cases where I had to do
> such a thing, I've just worked off of Alex's example, taking the parts I
> needed to take.
>
> If you still require daemonization and don't want to write it yourself,
> daemonize seems to be a fairly similar implementation that lacks
> dependencies.
> .
> Else, you could talk to Ben Finney about altering his install_requires and
> maybe removing the lockfile dependency.
>
> Well, sorry for the long story there. Please let me know which of those you
> find amenable. None of them are particularly hard, and I'm happy to do a
> little legwork on any of them .

Sure, pick your favorite given you know my basic requirement: above
background processes that can run after the parent exits.

I'm a bit reticent to do this in the release candidate part of the
release cycle, but we can get on it pronto for 0.9dev and apply the
patch first before more interesting changes go in. I'd recommend,
then, using that.

Finally, I think I did things this way because one can do "apt-get
install python-daemon" and get the dependency they need on Ubuntus.

Daniel Farina

ongelezen,

21 okt 2014, 20:46:5721-10-2014

aan Hunter Blanks, wa...@googlegroups.com

On Tue, Oct 21, 2014 at 4:59 PM, Hunter Blanks <hun...@napofearth.com> wrote:
>> I'm a bit reticent to do this in the release candidate part of the
>> release cycle, but we can get on it pronto for 0.9dev and apply the
>> patch first before more interesting changes go in. I'd recommend,
>> then, using that.
>>
>

> That's fine by me. I've already got a package archive containing 0.8c1 and
> its dependencies, so I don't think you should hold up the release while I
> putter this out.

I hope you stick it out and make it happen anyway :)

> That is a considerate choice, and indeed, when I first operationalized
> WAL-E, I, too, reached for the Ubuntu system packages as a means for
> installing WAL-E's deps. Sadly, some of the OpenStack dependencies didn't
> line up with what was in Ubuntu 12.04 / 14.04. So it made more sense to
> materialize all the dependencies into python packages (bdist_eggs. 2014 and
> all, but really) and just follow a similar path to how we deploy other
> Python environments.

Yeah, I totally got when the new backends started coming in that it
was going to screw up the debian-install-ability a bit fierce, but
made the call to absorb this solvable loss in favor of multi-backend.
I still think that is clearly the best for the longevity of this bit
of infrastructure.

Hunter Blanks

ongelezen,

21 okt 2014, 22:44:5621-10-2014

aan Daniel Farina, wa...@googlegroups.com

Daniel,

Thanks for writing.

Ah. I'll itemize the rationale:

Postgres does a system() call to run the archive fetch command, and
blocks until it is complete, then applies the fetched WAL. If one
only does parallelism, the result is that, say, 8 WAL segments will be
downloaded in parallel and then Postgres will apply them...something
that can take some time. And in that time, no downloads are
happening, which is a big loss.

Indeed. When I woke up this morning, it all made sense.

I'm a bit reticent to do this in the release candidate part of the
release cycle, but we can get on it pronto for 0.9dev and apply the
patch first before more interesting changes go in. I'd recommend,
then, using that.

That's fine by me. I've already got a package archive containing 0.8c1 and its dependencies, so I don't think you should hold up the release while I putter this out.

Finally, I think I did things this way because one can do "apt-get

install python-daemon" and get the dependency they need on Ubuntus.

That is a considerate choice, and indeed, when I first operationalized WAL-E, I, too, reached for the Ubuntu system packages as a means for installing WAL-E's deps. Sadly, some of the OpenStack dependencies didn't line up with what was in Ubuntu 12.04 / 14.04. So it made more sense to materialize all the dependencies into python packages (bdist_eggs. 2014 and all, but really) and just follow a similar path to how we deploy other Python environments.

-HJB

Hunter Blanks

ongelezen,

21 okt 2014, 22:44:5621-10-2014

aan Daniel Farina, wa...@googlegroups.com

Daniel,

Thanks for writing! Indeed, most of the deps are OpenStack. The ones that aren't are basically gevent, and but I think others have already touched on the long-term goal of using multiprocessing as an alternative.

So far as limiting backend deps, I'd agree that wal-e-s3, etc. packages are probably the way to go, though it would take a little care to make those packages work out of the same repo.

As for python-daemon, your reckoning may differ, but my own list of preferences would be:

If you don't have the requirements in http://legacy.python.org/dev/peps/pep-3143/#correct-daemon-behaviour, then just use subprocess to farm out the fetches. My own reckoning, though, is that you must need them or else you wouldn't have gone to the trouble. (Maybe to prevent shared network FD's, maybe to prevent polluting stdout / stderr; the rationale is fairly clear, but I'm either too casual or slow a reader to say.)

If you do require everything that is "daemonization", and you're willing to maintain it yourself, Alex Martelli's daemonization example is pretty straightforward and where I usually end up. On the one hand, it is sad that this stuff never has made it into the standard library. On the other, daemonization has just enough differences of opinion that the "one way to do it" may never make it in, notwithstanding . (For a hint of all that, see the in-depth comments and example Alex refers to at http://code.activestate.com/recipes/278731/. Mr. Finney also has quite a good discussion in his PEP from 2009.) In the rare cases where I had to do such a thing, I've just worked off of Alex's example, taking the parts I needed to take.
If you still require daemonization and don't want to write it yourself, daemonize seems to be a fairly similar implementation that lacks dependencies.
.
Else, you could talk to Ben Finney about altering his install_requires and maybe removing the lockfile dependency.

Well, sorry for the long story there. Please let me know which of those you find amenable. None of them are particularly hard, and I'm happy to do a little legwork on any of them .

-HJB

Daniel Farina

ongelezen,

29 dec 2014, 18:21:3829-12-2014

aan Daniel Farina, Hunter Blanks, wal-e

On Tue, Oct 21, 2014 at 5:46 PM, Daniel Farina <dan...@fdr.io> wrote:
> On Tue, Oct 21, 2014 at 4:59 PM, Hunter Blanks <hun...@napofearth.com> wrote:
>>> I'm a bit reticent to do this in the release candidate part of the
>>> release cycle, but we can get on it pronto for 0.9dev and apply the
>>> patch first before more interesting changes go in. I'd recommend,
>>> then, using that.
>>>
>>
>> That's fine by me. I've already got a package archive containing 0.8c1 and
>> its dependencies, so I don't think you should hold up the release while I
>> putter this out.
>
> I hope you stick it out and make it happen anyway :)

I made it happen, for reasons seen in the commit message. Do you
think you can help test it? I fixed some other bugs, too...a new
release candidate is around the corner.

(https://github.com/wal-e/wal-e/commit/35b19fb94310de42a08076796596f269053a99d1)

Hunter Blanks

ongelezen,

29 dec 2014, 19:33:3629-12-2014

aan Daniel Farina, Daniel Farina, wal-e

Daniel,

You bet! I'm out of town until Monday but can take a look at it next week.

I left one comment on the commit. Vendoring looks like a reasonable choice to me, and https://github.com/fdr/pep3143daemon/commit/5c091da7f5a912bc35e349fa4d0249915c233813 makes sense as well.

-HJB

Daniel Farina

ongelezen,

29 dec 2014, 19:41:3829-12-2014

aan Hunter Blanks, Daniel Farina, wal-e

On Mon, Dec 29, 2014 at 4:33 PM, Hunter Blanks <hun...@napofearth.com> wrote:
> Daniel,
>
> You bet! I'm out of town until Monday but can take a look at it next week.
>
> I left one comment on the commit. Vendoring looks like a reasonable choice
> to me, and
> https://github.com/fdr/pep3143daemon/commit/5c091da7f5a912bc35e349fa4d0249915c233813
> makes sense as well.

That'd be great! I'll be looking to make another release candidate
from what's current in master around that time (modulo finding bugs).
Right now I think I have all the outstanding bugs/minor patches I
wanted fixed in, so what's left is, once again, verification.

Allen beantwoorden

Auteur beantwoorden

Doorsturen