Meeting notes for 2012-09-17: puppet-autoami, lxc, filesystems, continuous integration, external devices, PuppetDB, etc.

76 views
Skip to first unread message

Igal Koshevoy

unread,
Sep 18, 2012, 5:43:26 AM9/18/12
to pdxdevops
We had another great pdxdevops meeting. Although we started with no agenda, we ended up with about three hours of awesome content. This sort of thing is normal and another reason not to skip meetings just because they don't have an agenda. However, if you have a talk or discussion topic, I still urge you to send something to the list so we can add you to the agenda for an upcoming meeting.

Below are some of the topics we discussed. Feel free to post comments and corrections.

Carl Caum of Puppet Labs talked about AWS EC2 image management with Puppet
  • Using configuration management tools like Puppet is a great way to configure VMs for the cloud. However, it can sometimes take a long time to configure a new instance because many packages may need to be installed. This is problematic if you want to quickly add capacity because the new machines won't be ready for a while.
  • The "autoami" module speeds things up by creating a new AWS EC2 disk image after Puppet is finished running and if changes were made, so that the latest version is available as a fully up-to-date disk image that has all the Puppet changes applied and can then be started within seconds. This approach gets the best benefits of fast startup of a gold master, but with all the configuration management we expect.
  • Blog post by presenter: http://puppetlabs.com/blog/rapid-scaling-with-auto-generated-amis-using-puppet/
  • Code: https://github.com/ccaum/puppet-autoami

Ben Kero (bkero) works on devops at Mozilla talked about a bunch of topics
  • LXC
    • "LXC (Linux Containers) is an operating system-level virtualization method for running multiple isolated Linux systems (containers) on a single control host"
    • Conceptually similar to chroot and BSD jails, where all the VM-like containers share the host's kernel.
    • Great for development and integration testing because it's easy to spin up a bunch of containers quickly.
    • Works only with Linux.
    • Very little overhead for CPU and IO, so apps run at nearly native speed.
    • VMs aren't entirely isolated, e.g. dmesg output is shared between all containers, host machine can see all the processes run in the containers, host machine can access the disks of all containers, etc.
    • Blog post by presenter: http://bke.ro/running-512-containers-on-a-laptop/
    • Overview: http://en.wikipedia.org/wiki/LXC
    • Homepage: http://lxc.sourceforge.net/
  • btrfs
    • "btrfs" is a next-generation Linux filesystem with many fancy features, some inspired by ZFS.
    • Not stable, DO NOT USE for data you aren't prepared to lose.
    • Has copy-on-write, which saves disk space when storing duplicated data. Ideal for launching many lxc containers sharing common content, e.g. 256 containers consumed only 480MB of disk space at launch because almost all their data was the same.
    • Overview: http://en.wikipedia.org/wiki/Btrfs
    • Homepage: https://btrfs.wiki.kernel.org/index.php/Main_Page
  • Kernel SamePage Merging (KSM)
    • Linux kernel feature used by hypervisor to share identical memory pages amongst processes, containers or VMs. Great for lxc when launching many identical instances because much of the memory is the same and can be shared, thus reducing memory usage and allowing for more containers to run.
    • Overview: http://en.wikipedia.org/wiki/Kernel_SamePage_Merging_(KSM)
  • cgroups
    • cgroups is a Linux kernel feature to limit, account and isolate resource usage (CPU, memory, disk I/O, etc.) of process groups.
    • Used by LXC because it provides a way to limit resources given to individual containers.
    • Overview: http://en.wikipedia.org/wiki/Cgroups
  • dnsmasq
  • bonnie++
  • Other filesystems
    • ext4: Still the best option for an adequately fast and very reliable Linux filesystem.
    • btrfs: Linux-only filesystem that has much promise and amazing features, but isn't stable.
    • zfs: Has amazing features, but works well only in Solaris and its derivatives, also supported by some BSD variants. Its license is incompatible with Linux and so it can't be merged. Linux users can use ZFS-FUSE, which is a FUSE driver (user space filesystem), fairly stable, but rather slow at half the speed of EXT4. Linux users can also use the less stable, but faster kernel module reimplementation called ZFS On Linux.
    • xfs: Has good performance and features useful for servers, like supporting huge files and expanding the size of a filesystem on a mounted filesystem. However, has had some ugly bugs and it behaves badly on power loss, losing data -- should be used with a UPS (uninterruptible power supply) and/or a fancy battery-backed RAID controller to store the XFS external journal so it isn't lost.
    • Redhat's Scalable File System: Improved version of xfs. http://www.redhat.com/products/enterprise-linux-add-ons/file-systems/
  • phoronix: Great website with lots of detailed information and benchmarks on filesystems, graphics hardware and such. http://www.phoronix.com/
  • SATA controllers for Linux with lots of ports
    • IBM BR10i and Intel SASUC8I, which are 8-port SATA II controllers with good Linux support.
    • Surplus controllers are available on eBay at low prices.
    • Gotcha: They don't support disks larger than 2TB.
  • buildbot
    • "BuildBot is a software development continuous integration tool which automates the compile/test cycle required to validate changes to the project code base. It began as a light-weight alternative to the Mozilla project's Tinderbox, and is now used at Mozilla, Chromium and many other projects."
    • Homepage: http://trac.buildbot.net/
  • Continuous integration at Mozilla
    • Developers commit code with Mercurial to a special branch.
    • The commit triggers the buildbot continuous integration system, which then deploys and tests the code on all supported platforms, e.g. various versions of Linux, Windows, OS X, Android, etc.
    • Test results are reported so devs can see what worked and view logs of what didn't.
  • Server inventory at Mozilla
  • Getting additional server inventory information
    • Use PuppetDB to collect facts about nodes. It's fast and de-duplicates data. Future releases plan to also store historical data.
    • Query PuppetDB for facts to describe nodes that have checked in.
  • Monitoring, reporting and on-call at Mozilla
    • Support staff set themselves as on-call using a webapp.
    • Alerts from Nagios monitoring go to this person and also to an IRC channel.
    • An admin can use IRC to acknowledge that they're working on the task. If the person doesn't acknowledge work in a timely manner, the issue will be escalated up the chain, and could eventually get escalated to the CEO.

Nan Liu of Puppet Labs talked about managing F5 network switches and load balancers as external devices with Puppet
  • Newer versions of Puppet have a native concept of external devices that can be seen as nodes. Examples of external devices are network routers, firewalls, UPSes, etc. A Puppet node acts as a proxy to manage the external devices.
  • PuppetDB is very useful in conjunction with devices. In Nan's demo, Puppet manifests used R.I.'s PuppetDB tool to get configuration information for nodes and use it to generate the configuration data submitted to the F5 device.
  • The presenter will give a full talk on the topic at PuppetConf: http://puppetconf.com/schedule/
  • Blog post: http://puppetlabs.com/blog/managing-f5-big-ip-network-devices-with-puppet/
  • Latest code: https://github.com/puppetlabs/puppetlabs-f5
  • Igal Koshevoy talked about a simple way to do something similar using old versions of Puppet, or configuration management tools like Chef that don't support the concept of external devices: Implement a type/provider, add the resource to a node that will act as a proxy, have the type/provider interact with the external device using SSH/HTTPS/etc. E.g. the proxy node will try to apply the resource, and in doing so, download the external device's current configuration, check to see if it's in the desired state, and if not, update it.

-igal
Reply all
Reply to author
Forward
0 new messages