Module critique

113 views
Skip to first unread message

Bai Shen

unread,
Sep 4, 2012, 11:26:07 AM9/4/12
to puppet...@googlegroups.com
I've gotten an install of solr working, but it's pretty much a hack job at the moment.  If y'all could give me your thoughts on how to improve my setup, I'd appreciate it.  apache-tomcat is an rpm of Tomcat 7 that references the oracle jdk instead of openjdk.

Thanks.


class solr {
        service { 'iptables' :
                ensure => stopped,
        }

        file { '/opt/apache-tomcat/conf/Catalina':
                ensure => directory,
        }

        file { '/opt/apache-tomcat/conf/Catalina/localhost':
                ensure => directory,
        }

        file { '/opt/apache-tomcat/conf/Catalina/localhost/solr.xml':
                source => 'puppet:///modules/solr/solr.xml',
                owner => 'tomcat',
                group => 'tomcat',
                mode => '644',
                notify => Service['apache-tomcat'],
                require => Package['apache-tomcat'],
        }

        file { '/opt/apache-tomcat/conf/server.xml':
                source => 'puppet:///modules/solr/server.xml',
                owner => 'tomcat',
                group => 'tomcat',
                mode => '644',
                notify => Service['apache-tomcat'],
                require => Package['apache-tomcat'],
        }

        file { '/opt/solr':
                ensure => directory,
                recurse => true,
                purge => true,
                source => 'puppet:///modules/solr/solr',
                owner => 'tomcat',
                group => 'tomcat',
                mode => '644',
                notify => Service['apache-tomcat'],
                require => Package['apache-tomcat'],
        }

        file { '/opt/solr/solr.war':
                ensure => 'link',
                target => '/opt/solr/apache-solr-3.6.1.war',
        }

        file { '/solr':
                ensure => directory,
                owner => 'tomcat',
                group => 'tomcat'
        }
}

jcbollinger

unread,
Sep 5, 2012, 9:48:27 AM9/5/12
to puppet...@googlegroups.com


On Tuesday, September 4, 2012 10:26:14 AM UTC-5, Bai Shen wrote:
I've gotten an install of solr working, but it's pretty much a hack job at the moment.  If y'all could give me your thoughts on how to improve my setup, I'd appreciate it.  apache-tomcat is an rpm of Tomcat 7 that references the oracle jdk instead of openjdk.

[...]


class solr {


Since Package['apache-tomcat'] is apparently declared in a different class, your should 'include' that class here.



        service { 'iptables' :
                ensure => stopped,
        }


It's not a Puppet problem that your class stops iptables, but I sure find it questionable in a broader sense.  If you're turning it off because you have a different local firewall installed (or because you have no IPv4 configured), then it would be more appropriate to manage that somewhere else.  On the other hand, if you're turning it off because it interferes with SOLR, then you should address the problem by adding the appropriate firewall rules, not by shutting down your firewall.
 

        file { '/opt/apache-tomcat/conf/Catalina':
                ensure => directory,
        }

        file { '/opt/apache-tomcat/conf/Catalina/localhost':
                ensure => directory,
        }


Supposing that directory /opt/apache-tomcat/conf belongs to Package['apache-tomcat'], File['/opt/apache-tomcat/conf/Catalina'] should 'require' that package or the class that declares it.  The */localhost file will automatically require the File managing its parent directory, however, so you don't need an explicit relationship there.

Does neither or those directories belong to the RPM, though?  If they do, then you're gaining nothing but complexity and cycle burn by declaring them as you do above.
 


I strongly recommend that you build a native package for SOLR, put it in a local repository, and ensure it installed via a Puppet Package resource.  Recursive directory management will bite you, especially if there are many files or large ones, plus using packages is in general a major win.  You can package up your custom SOLR configuration files along with, or manage just those via File resources as you are now doing; either is fine.
 

        file { '/solr':
                ensure => directory,
                owner => 'tomcat',
                group => 'tomcat'
        }


No software should require its own subdirectory of the filesystem root.  I'm not sure what that directory is for, but the appropriate place for it is likely to be under one of /var/lib, /usr/share, or /opt/solr.


John

Bai Shen

unread,
Sep 6, 2012, 1:08:50 PM9/6/12
to puppet...@googlegroups.com
Thanks for the comments.  My responses are below.

 
class solr {


Since Package['apache-tomcat'] is apparently declared in a different class, your should 'include' that class here.


Will do.  I hadn't thought about that.

 

        service { 'iptables' :
                ensure => stopped,
        }


It's not a Puppet problem that your class stops iptables, but I sure find it questionable in a broader sense.  If you're turning it off because you have a different local firewall installed (or because you have no IPv4 configured), then it would be more appropriate to manage that somewhere else.  On the other hand, if you're turning it off because it interferes with SOLR, then you should address the problem by adding the appropriate firewall rules, not by shutting down your firewall.

I did that because I was testing.  I haven't messed with iptables, so it was easier just to turn it off while I was getting things to work.  But you are correct, I need to add the appropriate rules to it and turn it back on.
 

        file { '/opt/apache-tomcat/conf/Catalina':
                ensure => directory,
        }

        file { '/opt/apache-tomcat/conf/Catalina/localhost':
                ensure => directory,
        }


Supposing that directory /opt/apache-tomcat/conf belongs to Package['apache-tomcat'], File['/opt/apache-tomcat/conf/Catalina'] should 'require' that package or the class that declares it.  The */localhost file will automatically require the File managing its parent directory, however, so you don't need an explicit relationship there.


When I tried it with just the second entry, it gave me an error saying that the parent didn't exist.  According to some of the people on the irc, you have to declare the directories like this in order to have them created.
 
Does neither or those directories belong to the RPM, though?  If they do, then you're gaining nothing but complexity and cycle burn by declaring them as you do above.

I'm not sure what you mean by this.
I'll take a look at that.  I'm not really familiar with how to build rpms yet.  The tomcat one was built using a file I found online.

 

        file { '/solr':
                ensure => directory,
                owner => 'tomcat',
                group => 'tomcat'
        }


No software should require its own subdirectory of the filesystem root.  I'm not sure what that directory is for, but the appropriate place for it is likely to be under one of /var/lib, /usr/share, or /opt/solr.

SOLR is a search engine.  It requires a data directory to store it's index files.  Since they can grow quite big, I was putting them on / and making it the largest partition.  What would be the reasoning behind putting it in each of the locations you mentioned?  I'm still learning the whys of how things are laid out on the filesystem. 

Garrett Honeycutt

unread,
Sep 6, 2012, 2:41:19 PM9/6/12
to puppet...@googlegroups.com
On 9/6/12 7:08 PM, Bai Shen wrote:
> SOLR is a search engine. It requires a data directory to store it's
> index files. Since they can grow quite big, I was putting them on / and
> making it the largest partition. What would be the reasoning behind
> putting it in each of the locations you mentioned? I'm still learning
> the whys of how things are laid out on the filesystem.

If you are really interested in filesystem layout or need a sleeping
aid, check out the FHS.

http://www.pathname.com/fhs/pub/fhs-2.3.html#PURPOSE

-g
--
Garrett Honeycutt

206.414.8658
http://puppetlabs.com

Andreas Ntaflos

unread,
Sep 6, 2012, 5:48:31 PM9/6/12
to puppet...@googlegroups.com
On 2012-09-04 17:26, Bai Shen wrote:
> I've gotten an install of solr working, but it's pretty much a hack job
> at the moment. If y'all could give me your thoughts on how to improve
> my setup, I'd appreciate it. apache-tomcat is an rpm of Tomcat 7 that
> references the oracle jdk instead of openjdk.

I don't know Solr so I am not exactly sure how it is set up and run, so
take the following only as guidelines, not as explicit instructions. It
seems Solr requires Tomcat, so you could either implicitly manage Tomcat
within your Solr module (bad), or have a separate Tomcat module with
which your Solr module can interface (good). You would then bring the
modules together in a separate profile or role class. Regardint that,
have a look at this insightful blog post by Craig Dunn:
http://www.craigdunn.org/2012/05/239/

> class solr {

I recommend you follow best practices and not manage every aspect of
your Solr resource in a single class, but split it up into subclasses,
probably at least: solr::install (install.pp), solr::config (config.pp),
solr::service (service.pp). The solr class (init.pp) then includes all
subclasses and explicitly declares their dependencies among each other,
like so:

class solr {
include 'solr::install'
include 'solr::config'
include 'solr::service'

Class['solr::install']
-> Class['solr::config']
~> Class['solr::service']
}

This makes it easier to manage and change later on.

> service { 'iptables' :
> ensure => stopped,
> }

This is strange, as John has noted. Why should Solr have anything to say
about the iptables service on the machine? At most it should interface
with an iptables module to, e.g., open or close ports.

> file { '/opt/apache-tomcat/conf/Catalina':
> ensure => directory,
> }
>
> file { '/opt/apache-tomcat/conf/Catalina/localhost':
> ensure => directory,
> }

These directories are fairly standard for Tomcat, are they not? Thus
they should be created by your apache-tomcat package, not managed
explicitly by Puppet.

> file { '/opt/apache-tomcat/conf/Catalina/localhost/solr.xml':
> source => 'puppet:///modules/solr/solr.xml',
> owner => 'tomcat',
> group => 'tomcat',
> mode => '644',
> notify => Service['apache-tomcat'],
> require => Package['apache-tomcat'],
> }

This would go into solr::config.

It is better, IMHO, to use explicit dependencies here as well, i.e. not
"notify" and "require", but something like this:

Package['apache-tomcat']
-> File['/opt/apache-tomcat/conf/Catalina/localhost/solr.xml']
~> Service['apache-tomcat']

I guess that depends on your preferences.

But that points out the problem that your module mixes quite a few
resources, but not wholly implicitly. Again I refer you to the blog post
above.

> file { '/opt/apache-tomcat/conf/server.xml':
> source => 'puppet:///modules/solr/server.xml',
> owner => 'tomcat',
> group => 'tomcat',
> mode => '644',
> notify => Service['apache-tomcat'],
> require => Package['apache-tomcat'],
> }

Also something that should go into a config class. Either solr::config
or apache-tomcat::config.

> file { '/opt/solr':
> ensure => directory,
> recurse => true,
> purge => true,
> source => 'puppet:///modules/solr/solr',
> owner => 'tomcat',
> group => 'tomcat',
> mode => '644',
> notify => Service['apache-tomcat'],
> require => Package['apache-tomcat'],
> }
> file { '/opt/solr/solr.war':
> ensure => 'link',
> target => '/opt/solr/apache-solr-3.6.1.war',
> }
>
> file { '/solr':
> ensure => directory,
> owner => 'tomcat',
> group => 'tomcat'
> }
> }

Again solr::config. Also, see above.

Well-designed modules are rare and take time and experience to create. I
myself have written around 60 modules over the past nine months but I
would never dare publish any of them on Puppetforge or even Github. It's
hard to make a module of publishable quality (i.e. one that can be used
by someone else without having to look at or change the code).

Andreas

signature.asc

Ryan Coleman

unread,
Sep 7, 2012, 1:07:26 AM9/7/12
to puppet...@googlegroups.com
On Thu, Sep 6, 2012 at 2:48 PM, Andreas Ntaflos <da...@pseudoterminal.org> wrote:
> Well-designed modules are rare and take time and experience to create. I
> myself have written around 60 modules over the past nine months but I
> would never dare publish any of them on Puppetforge or even Github. It's
> hard to make a module of publishable quality (i.e. one that can be used
> by someone else without having to look at or change the code).

All the more reason to get your modules on GitHub and the Puppet
Forge! It's far easier to get to this point through collaboration with
others in the community. No one expects your modules to be perfect,
but many will help you iterate on them till they're awesome.

If you'd like help with a module or would like to pair, send me an email! :-)

jcbollinger

unread,
Sep 7, 2012, 9:48:00 AM9/7/12
to puppet...@googlegroups.com


On Thursday, September 6, 2012 4:48:40 PM UTC-5, Andreas Ntaflos wrote:
On 2012-09-04 17:26, Bai Shen wrote:
> I've gotten an install of solr working, but it's pretty much a hack job
> at the moment.  If y'all could give me your thoughts on how to improve
> my setup, I'd appreciate it.  apache-tomcat is an rpm of Tomcat 7 that
> references the oracle jdk instead of openjdk.

[...] 
>         file { '/opt/apache-tomcat/conf/Catalina/localhost/solr.xml':
>                 source => 'puppet:///modules/solr/solr.xml',
>                 owner => 'tomcat',
>                 group => 'tomcat',
>                 mode => '644',
>                 notify => Service['apache-tomcat'],
>                 require => Package['apache-tomcat'],
>         }

This would go into solr::config.

It is better, IMHO, to use explicit dependencies here as well, i.e. not
"notify" and "require", but something like this:

Package['apache-tomcat']
-> File['/opt/apache-tomcat/conf/Catalina/localhost/solr.xml']
~> Service['apache-tomcat']

I guess that depends on your preferences.


Indeed, I would place that firmly in the category of style.  Myself, I tend to try to judge whether the relationship is inherent in the nature of the resource (which is usually the case for me) vs. whether it's relevant only to some larger context.  I use resource parameters in the former case.  Admittedly, that's a pretty fuzzy distinction, but in the the example above I personally would write the relationships just as the OP did.


John

jcbollinger

unread,
Sep 7, 2012, 10:21:43 AM9/7/12
to puppet...@googlegroups.com


On Thursday, September 6, 2012 12:08:56 PM UTC-5, Bai Shen wrote:

        file { '/opt/apache-tomcat/conf/Catalina':
                ensure => directory,
        }

        file { '/opt/apache-tomcat/conf/Catalina/localhost':
                ensure => directory,
        }


Supposing that directory /opt/apache-tomcat/conf belongs to Package['apache-tomcat'], File['/opt/apache-tomcat/conf/Catalina'] should 'require' that package or the class that declares it.  The */localhost file will automatically require the File managing its parent directory, however, so you don't need an explicit relationship there.


When I tried it with just the second entry, it gave me an error saying that the parent didn't exist.  According to some of the people on the irc, you have to declare the directories like this in order to have them created.


You misunderstood me.  I was speaking there not about whether the parent directory's File resource was needed, but rather about whether a relationship was needed between the subdirectory and the parent directory (it isn't, so you're fine in that respect).

 
 
Does neither or those directories belong to the RPM, though?  If they do, then you're gaining nothing but complexity and cycle burn by declaring them as you do above.

I'm not sure what you mean by this.


I mean that the apache-tomcat package probably ought to be responsible at least for directory /opt/apache-tomcat/conf/Catalina, and maybe for both of those directories.  If it provides them, then you should rely on the package to get it right instead of managing those directories via Puppet as well.

 


No software should require its own subdirectory of the filesystem root.  I'm not sure what that directory is for, but the appropriate place for it is likely to be under one of /var/lib, /usr/share, or /opt/solr.

SOLR is a search engine.  It requires a data directory to store it's index files.  Since they can grow quite big, I was putting them on / and making it the largest partition.  What would be the reasoning behind putting it in each of the locations you mentioned?  I'm still learning the whys of how things are laid out on the filesystem. 


The reasoning is basically to place programs, data, configuration, etc. in consistent places so that people know where to look for them, so that people can make intelligent choices about partitioning and filesystems, and also to permit certain kinds of configuration (such as /usr being a shared, read-only filesystem).

Garrett pointed you to the FHS, which goes into it in far more detail.  Basically, though, you have it exactly backwards when you say you put the directory under / because its contents may get big.  Instead, you should ensure from the beginning that whatever filesystem contains /var (which may be the root filesystem) is big, because that is the designated place for applications' variable runtime files, some of which can grow very large.  Then when your applications are set up to put such files there, everything works out as it should.

The proper place for a search engine's index files would probably be under /var/lib (e.g. /var/lib/solr), or with solr installed in /opt, rigorous compliance with the FHS would require use of /var/opt/solr instead.  The FHS docs for the various top-level directories are not very complicated, though you can get bogged down reading all the options and details.  They're worth reading at least once.


John

Tim Mooney

unread,
Sep 7, 2012, 3:14:41 PM9/7/12
to puppet...@googlegroups.com
In regard to: Re: [Puppet Users] Re: Module critique, Bai Shen said (at...:

>> I strongly recommend that you build a native package for SOLR, put it in a
>> local repository, and ensure it installed via a Puppet Package resource.
>> Recursive directory management will bite you, especially if there are many
>> files or large ones, plus using packages is in general a major win. You
>> can package up your custom SOLR configuration files along with, or manage
>> just those via File resources as you are now doing; either is fine.
>>
>
> I'll take a look at that. I'm not really familiar with how to build rpms
> yet. The tomcat one was built using a file I found online.

With Tomcat WAR files its generally pretty easy. We're actually not
packaging SOLR but I have packaged other WAR files and could provide
an example that you could adapt to SOLR pretty easily.

Our lead developer just finished a SOLR deploy with puppet about a month
ago, I'll see if I can get permission to share his module for you to
compare to. One thing he did that I didn't see on first inspection with
your solr module was he has a define so that we can have multiple solr
"cores". Right now it's all just using file shipping, though. We looked
at templating some of the config but decided that was more advanced than
what we wanted for our first solr roll-out.

Tim
--
Tim Mooney Tim.M...@ndsu.edu
Enterprise Computing & Infrastructure 701-231-1076 (Voice)
Room 242-J6, IACC Building 701-231-8541 (Fax)
North Dakota State University, Fargo, ND 58105-5164

Alessandro Franceschi

unread,
Sep 14, 2012, 5:50:22 PM9/14/12
to puppet...@googlegroups.com
On Thursday, September 6, 2012 11:48:40 PM UTC+2, Andreas Ntaflos wrote:
On 2012-09-04 17:26, Bai Shen wrote:
> class solr {

I recommend you follow best practices and not manage every aspect of
your Solr resource in a single class, but split it up into subclasses,
probably at least: solr::install (install.pp), solr::config (config.pp),
solr::service (service.pp). The solr class (init.pp) then includes all
subclasses and explicitly declares their dependencies among each other,
like so:

class solr {
  include 'solr::install'
  include 'solr::config'
  include 'solr::service'

  Class['solr::install']
  -> Class['solr::config']
  ~> Class['solr::service']
}

This makes it easier to manage and change later on.

If I may add my very personal 2 cents to this approach, I have to say that this is IMHO the worst "best practice" ever suggested for Puppet modules, even if it's written on Puppet Pro and has been originally suggested by a giant like R.I.P., if I remember well.

It my opinion if has 2 major defects:
- It multiplies the number of objects needed to manage the same things (at scale you feel it) without really giving a great advantage if not having a bit more comfortable dependency management.
- Most of all, it makes a real PITA any attempt to override some of the resources parameters using class inheritance (yes, the more you avoid class inheritance and the better, but if your module doesn't provide a way to (re)define the behaviour of most of the resources defined in these classes, trying to change them without changing the module becomes almost impossible).

No flames intended :-)

Alessandro Franceschi 
Reply all
Reply to author
Forward
0 new messages