appeal for advice re: pip provider, multiple packages with same name, etc.

24 views
Skip to first unread message

Jason Antman

unread,
Jan 14, 2014, 8:57:59 PM1/14/14
to puppe...@googlegroups.com
Greetings, oh wizards of puppet internals. (and other titles and
salutations as appropriate)

I've been doing some work lately on the `pip` package provider and,
despite some assistance on my PR
(https://github.com/puppetlabs/puppet/pull/2170) and IRC, I think I've
hit a brick wall.

( Aside, For those not familiar with Python pip/virtualenv: pip is
Python's next-gen package management tool. It can work either
system-wide (installing packages under /usr/lib/python or some such
path), or by operating on packages in python virtual environments, which
are totally* isolated python environments rooted at a specific,
arbitrary directory, containing their own copy of the python interpreter
and other binaries, their own libraries/modules/packages, etc. *To
complicate things a bit, a virtualenv can optionally include system-wide
packages. I'm not sure if there's an exact parallel in other languages -
I assume it's roughly analogous to a combination of bundler and
rvm/rbenv, if they were builtin parts of ruby. )

My plan was to add virtualenv (aka virtual environment / venv) support
to the `pip` provider (this is, IMO, the most common use case of pip
these days). I got to a somewhat-working point by adding a "prefix"
parameter to the package type and, if specified, having the pip provider
call #{prefix}/bin/pip instead of the system-wide pip, which has the
effect of operating only on the virtualenv rooted at #{prefix}.

I got to the point that I thought I had working code (that's PR 2170
linked above, now closed) when I hit a few major bumps in the road. Or,
the road disappeared.
1) The provider prefetches. As such, if a package "foo" is installed
system-wide (i.e. under /usr/lib/python2.x/site-packages) it's seen as
installed and never passed on to the provider, so my pip-path-munging to
install in a virtualenv is never called.
2) In the case where the package isn't already present system-wide, and
execution does make it to the provider, my patch will manage it fine,
but the package can still only be managed in one virtualenv on a given
system. A composite namevar seems to be the only way to handle this, but
that doesn't seem possible for a single provider, just for a type.

Given all this, and the pending discussion about tiering providers, can
anyone provide some advice on whether it's even worth pursuing this path
vs developing a python/virtualenv module to publish on the forge,
presumably which would contain an awfully-named type (pippackage)? Or,
if the former, is there something I'm missing about how types and
providers work (and the package type specifically) that would make this
easier? I work with one of the pip/virtualenv maintainers, and there's a
strong desire to have a canonical way to manage them in Puppet, whether
it be in core or a module.

Thanks in advance for any advice, guidance, or explanations of my
insanity. This is my first real attempt at type/provider development (or
ruby, for that matter).
-Jason Antman

Dustin J. Mitchell

unread,
Jan 14, 2014, 9:25:07 PM1/14/14
to puppe...@googlegroups.com
I could have sworn I've said some of this before, but I'm not sure
where. So sorry if this is repetitive.

First, I think what your implementation is missing is a resource name
that's globally unique to the host. Let's say I want to install
different versions of sqlalchemy in /venvs/staging and /venvs/prod. I
need some resource name to distinguish those three resources. The
best option I can think of is to munge the virtualenv and package name
together into a string:

Package { provider => 'pip' }
package {
'/venvs/staging::sqlalchemy':
version => '0.8.6';
'/venvs/prod::sqlalchemy':
version => '0.7.9';
}

You could introduce a new type, or maybe shoe-horn this into the
existing type, where an unqualified package name would be installed
systemwide. You'd need to pick your delimiter carefully -- don't take
'::' as a recommendation :)

But another, entirely different approach may be to think of the
virtualenvs as the resources, with the packages just a property:

python::virtualenv {
"/path/to/virtualenv":
python => "/path/to/python",
packages => [ "package==version", "mock==0.6.0",
"buildbot==0.8.0" ];
}

In fact, that's exactly what the python module in PuppetAgain does[1].

More generally, I find it helpful to design puppety things from the
front back: start by asking "how do I want to use this from my other
manifests", then work from there to the best and simplest means of
implementing it.

Dustin

[1] https://wiki.mozilla.org/ReleaseEngineering/Puppet/Modules/python
/ https://github.com/mozilla/build-puppet/tree/master/modules/python

Jason Antman

unread,
Jan 14, 2014, 10:25:23 PM1/14/14
to puppe...@googlegroups.com
Dustin,

Yeah, you sure have. I reached out to the puppetagain/releng list (and
maybe also puppet-users) a few months ago when I first started this
project. Your advice was probably pretty much the same, so I hope my
replies are.

On 01/14/2014 09:25 PM, Dustin J. Mitchell wrote:
> I could have sworn I've said some of this before, but I'm not sure
> where. So sorry if this is repetitive.
>
> First, I think what your implementation is missing is a resource name
> that's globally unique to the host. Let's say I want to install
> different versions of sqlalchemy in /venvs/staging and /venvs/prod. I
> need some resource name to distinguish those three resources. The
> best option I can think of is to munge the virtualenv and package name
> together into a string:
>
> Package { provider => 'pip' }
> package {
> '/venvs/staging::sqlalchemy':
> version => '0.8.6';
> '/venvs/prod::sqlalchemy':
> version => '0.7.9';
> }
This is certainly technically feasible, and simple, but I find it to
be... for lack of a better term, ugly. Putting something like this in a
manifest, let alone a forge module, makes me shiver. And, more to the
point, I'm not sure I'd stand up and recommend that it be endorsed by pypa.
>
> You could introduce a new type, or maybe shoe-horn this into the
> existing type, where an unqualified package name would be installed
> systemwide. You'd need to pick your delimiter carefully -- don't take
> '::' as a recommendation :)
Unless someone pipes up and says, "hey, you could ... ", I think a new
type in a Forge module is how this will be headed. If I go down that
route, the module will only manage virtualenvs and the packages inside
them, it'll leave the python env up to something else.
>
> But another, entirely different approach may be to think of the
> virtualenvs as the resources, with the packages just a property:
>
> python::virtualenv {
> "/path/to/virtualenv":
> python => "/path/to/python",
> packages => [ "package==version", "mock==0.6.0",
> "buildbot==0.8.0" ];
> }
>
> In fact, that's exactly what the python module in PuppetAgain does[1].
That's actually what our current module is based on, albeit with a
handful of changes.

This works fine within a controlled environment. However, (a) it works
fine if you know all the packages that belong in the venv at one time,
in one place. It doesn't work if you (not saying this is the best idea)
need multiple classes to install packages in the same venv, or if you
need to determine dependencies based on other classes. More importantly
from my point of view though, (b) it loses Puppet's native ability to
log and report on changes to individual resources. The ability to easily
see when a given package is upgraded, or what version it's at, across an
entire infrastructure is incredibly powerful. Wrapping all of that up in
one defined type loses the data that comes along with native types, and
is so wonderfully easy to extract from PuppetDB. Also, though It's a bit
of an oddball issue, (c) I currently have use cases where puppet needs
to install packages in an already-existing, non-puppet-managed virtualenv.

>
> More generally, I find it helpful to design puppety things from the
> front back: start by asking "how do I want to use this from my other
> manifests", then work from there to the best and simplest means of
> implementing it.
>
> Dustin
>
> [1] https://wiki.mozilla.org/ReleaseEngineering/Puppet/Modules/python
> / https://github.com/mozilla/build-puppet/tree/master/modules/python
>
Thanks for the feedback. The RelEng python module provided the base of
what I'm using today, and was wonderful inspiration. But I'm trying to
move to a native type, in core or a module, that will hopefully become
the canonical way to manage pip/venv through Puppet.

-Jason

Dustin J. Mitchell

unread,
Jan 15, 2014, 11:01:38 AM1/15/14
to puppe...@googlegroups.com
Hah, so I was looking in the wrong mail account. I'm glad I'm not going crazy!

I agree with all of the limitations you've outlined. I don't see any
good fixes, sadly, but hopefully someone else does.

Dustin

Felix Frank

unread,
Jan 18, 2014, 11:17:04 AM1/18/14
to puppe...@googlegroups.com
Hi,

fascinating discussion. I think I can amend to Jason's point of view.

On 01/15/2014 04:25 AM, Jason Antman wrote:
> it loses Puppet's native ability to
> log and report on changes to individual resources. The ability to easily
> see when a given package is upgraded, or what version it's at, across an
> entire infrastructure is incredibly powerful. Wrapping all of that up in
> one defined type loses the data that comes along with native types, and
> is so wonderfully easy to extract from PuppetDB. Also, though It's a bit

Right, strucutring venv support like this would be saying "no, we don't
manage packages in venvs. We manage whole venvs with all aspects".

This would encourage workflows that spawn lots of venvs, potentially up
to one dedicated one per task. This is wasteful of course, and...

> of an oddball issue, (c) I currently have use cases where puppet needs
> to install packages in an already-existing, non-puppet-managed virtualenv.

...would be very inconvenient.

When I read this exchange, I felt reminded of [1] and its predecessor
[2]. Reviewing [2], I stumbled upon [3] and [4], which is apparently
close to what you're trying to do.

If I were to pick, I'd see [1] solved first. This might make your case
easier to implement. As far as I can see, it still won't address the
issue of the shared namevar across venvs.

Isn't this similar to installing both a dpkg/rpm package and a gem
called, say, "puppet"? Is that currently possible? Generically, this
would likely be approached by a "name" parameter that allows the
resource title to be different from the package name.

[1] https://tickets.puppetlabs.com/browse/PUP-1183
[2] http://projects.puppetlabs.com/issues/4113
[3] http://projects.puppetlabs.com/issues/18029
[4] https://tickets.puppetlabs.com/browse/PUP-1071

Andy Parker

unread,
Jan 21, 2014, 1:20:31 PM1/21/14
to puppe...@googlegroups.com
On Sat, Jan 18, 2014 at 8:17 AM, Felix Frank <Felix...@alumni.tu-berlin.de> wrote:
Hi,

fascinating discussion. I think I can amend to Jason's point of view.

On 01/15/2014 04:25 AM, Jason Antman wrote:
> it loses Puppet's native ability to
> log and report on changes to individual resources. The ability to easily
> see when a given package is upgraded, or what version it's at, across an
> entire infrastructure is incredibly powerful. Wrapping all of that up in
> one defined type loses the data that comes along with native types, and
> is so wonderfully easy to extract from PuppetDB. Also, though It's a bit

Right, strucutring venv support like this would be saying "no, we don't
manage packages in venvs. We manage whole venvs with all aspects".

This would encourage workflows that spawn lots of venvs, potentially up
to one dedicated one per task. This is wasteful of course, and...


I've been watching this thread with interest. Sorry that I didn't chime in sooner.

I think that what we are hitting here is just a limitation of puppet's ability to describe systems. What I think is missing is any idea of an independent container. Within a given container everything needs to be unique, but between containers you can have duplication. Each container has certain properties that describe the container and would need to be accessible from the managed resource while puppet executes. Containers could be used to model the python virtual environments, different gem install locations, etc. I think they would also be useful for modeling different hosts where each host is a container and then you have a catalog that includes the entire infrastructure.

This is just an idea that I'm throwing out there right now.
 
> of an oddball issue, (c) I currently have use cases where puppet needs
> to install packages in an already-existing, non-puppet-managed virtualenv.

...would be very inconvenient.

When I read this exchange, I felt reminded of [1] and its predecessor
[2]. Reviewing [2], I stumbled upon [3] and [4], which is apparently
close to what you're trying to do.

If I were to pick, I'd see [1] solved first. This might make your case
easier to implement. As far as I can see, it still won't address the
issue of the shared namevar across venvs.

Isn't this similar to installing both a dpkg/rpm package and a gem
called, say, "puppet"? Is that currently possible? Generically, this
would likely be approached by a "name" parameter that allows the
resource title to be different from the package name.

[1] https://tickets.puppetlabs.com/browse/PUP-1183
[2] http://projects.puppetlabs.com/issues/4113
[3] http://projects.puppetlabs.com/issues/18029
[4] https://tickets.puppetlabs.com/browse/PUP-1071

--
You received this message because you are subscribed to the Google Groups "Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-dev+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-dev/52DAA900.6040200%40Alumni.TU-Berlin.de.



--
Andrew Parker
Freenode: zaphod42
Twitter: @aparker42
Software Developer

Join us at PuppetConf 2014September 23-24 in San Francisco

David Schmitt

unread,
Jan 22, 2014, 3:31:33 AM1/22/14
to puppe...@googlegroups.com
On 21.01.2014 19:20, Andy Parker wrote:
> I think that what we are hitting here is just a limitation of puppet's
> ability to describe systems. What I think is missing is any idea of an
> independent container. Within a given container everything needs to be
> unique, but between containers you can have duplication. Each container
> has certain properties that describe the container and would need to be
> accessible from the managed resource while puppet executes. Containers
> could be used to model the python virtual environments, different gem
> install locations, etc. I think they would also be useful for modeling
> different hosts where each host is a container and then you have a
> catalog that includes the entire infrastructure.
>
> This is just an idea that I'm throwing out there right now.

Well, it's basically core-support what one would already implement in a
manifest and workaround as already described with "structured" resource
names:

define virtualenv($package) {
package { "$name/$package": ensure => installed }
}

Once upon a time there were some patches/talk floating around to parse
complex titles into parameters and vice versa. That was deemed way too
complex at that time, but the world seems to have catched up ;-)


Regards, David
Reply all
Reply to author
Forward
0 new messages