Addressing buildpack size

163 views
Skip to first unread message

Mike Dalessio

unread,
Apr 8, 2015, 11:10:33 AM4/8/15
to vcap...@cloudfoundry.org

Hello vcap-dev!

This email details a proposed change to how Cloud Foundry buildpacks are packaged, with respect to the ever-increasing number of binary dependencies being cached within them.

This proposal's permanent residence is here:

https://github.com/cloudfoundry-incubator/buildpack-packager/issues/4

Feel free to comment there or reply to this email.


Buildpack Sizes

Where we are today

Many of you have seen, and possibly been challenged by, the enormous sizes of some of the buildpacks that are currently shipping with cf-release.

Here's the state of the world right now, as of v205:

 php-buildpack:    1.1G
 ruby-buildpack:   922M
 go-buildpack:     675M
 python-buildpack: 654M
 nodejs-buildpack: 403M
 ----------------------
 total:            3.7G

These enormous sizes are the result of the current policy of packaging every-version-of-everything-ever-supported ("EVOEES") within the buildpack.

Most recently, this problem was exacerbated by the fact that buildpacks now contain binaries for two rootfses.

Why this is a problem

If continued, buildpacks will only continue to increase in size, leading to longer and longer build and deploy times, longer test times, slacker feedback loops, and therefore less frequent buildpack releases.

Additionally, this also means that we're shipping versions of interpreters, web servers, and libraries that are deprecated, insecure, or both. Feedback from CF users has made it clear that many companies view this as an unnecessary security risk.

This policy is clearly unsustainable.

What we can do about it

There are many things being discussed to ameliorate the impact that buildpack size is having on the operations of CF.

Notably, Onsi has proposed a change to buildpack caching, to improve Diego staging times (link to proposal).

However, there is an immediate solution available, which addresses both the size concerns as well as the security concern: packaging fewer binary dependencies within the buildpack.

The proposal

I'm proposing that we reduce the binary dependencies in each buildpack in a very specific way.

Aside on terms I'll use below:

  • Versions of the form "1.2.3" are broken down as: MAJOR.MINOR.TEENY. Many language ecosystems refer to the "TEENY" as "PATCH" interchangeably, but we're going to use "TEENY" in this proposal.
  • We'll assume that TEENY gets bumped for API/ABI compatible changes.
  • We'll assume that MINOR and MAJOR get bumped when there are API/ABI incompatible changes.

I'd like to move forward soon with the following changes:

  1. For language interpreters/compilers, we'll package the two most-recent TEENY versions on each MAJOR.MINOR release.
  2. For all other dependencies, we'll package only the single most-recent TEENY version on each MAJOR.MINOR release.
  3. We will discontinue packaging versions of dependencies that have been deprecated.
  4. We will no longer provide "EVOEES" buildpack releases.
  5. We will no longer provide "online" buildpack releases, which download dependencies from the public internet.
  6. We will document the process, and provide tooling, for CF operators to build their own buildpacks, choosing the dependencies that their organization wants to support or creating "online" buildpacks at operators' discretion.

An example for #1 is that we'll go from packaging 34 versions of node v0.10.x to only packaging two: 0.10.37 and 0.10.38.

An example for #2 is that we'll go from packaging 3 versions of nginx 1.5 in the PHP buildpack to only packaging one: 1.5.12.

An example for #3 is that we'll discontinue packaging ruby 1.9.3 in the ruby-buildpack, which reached end-of-life in February 2015.

Outcomes

With these changes, the total buildpack size will be reduced greatly. As an example, we expect the ruby-buildpack size to go from 922M to 338M.

We also want to set the expectation that, as new interpreter versions are released, either for new features or (more urgently) for security fixes, we'll release new buildpacks much more quickly than we do today. My hope is that we'll be able to do it within 24 hours of a new release.

Planning

These changes will be relatively easy to make, since all the buildpacks are now using a manifest.yml file to declare what's being packaged. We expect to be able to complete this work within the next two weeks.

Stories are in the Tracker backlog under the Epic named "skinny-buildpacks", which you can see here:

https://www.pivotaltracker.com/epic/show/1747328


Please let me know how these changes will impact you and your organizations, and let me know of any counter-proposals or variations you'd like to consider.

Thanks,

-mike


Patrick Mueller

unread,
Apr 8, 2015, 10:38:11 PM4/8/15
to vcap-dev

--
You received this message because you are subscribed to the Google Groups "Cloud Foundry Developers" group.
To view this discussion on the web visit https://groups.google.com/a/cloudfoundry.org/d/msgid/vcap-dev/CAGeQLZwDbON2B6cAynyJY12tCWXO8XPKSCmhCc%3D%3DBu4KsHe%3DhA%40mail.gmail.com.

To unsubscribe from this group and stop receiving emails from it, send an email to vcap-dev+u...@cloudfoundry.org.



--
Patrick Mueller
http://muellerware.org

Onsi Fakhouri

unread,
Apr 8, 2015, 11:36:34 PM4/8/15
to vcap...@cloudfoundry.org
Hey Patrick,

Sorry about that - the diego-dev-notes is an internal documentation repo that the Diego team uses to stay on the same page and toss ideas around.

There isn't much that's terribly interesting at that link - just some ideas on how to extend diego's existing caching capabilities to avoid copying cached artifacts into containers (we'd mount them in directly instead).

Happy to share more detail if there is interest.

Onsi

Jack Cai

unread,
Apr 13, 2015, 1:59:29 PM4/13/15
to vcap...@cloudfoundry.org
> We will no longer provide "online" buildpack releases, which download dependencies from the public internet.

I think it would make sense to retain the ability to download additional runtime versions on demand (that's not packaged in the buildpack) if the user explicitly requests it. So basically it will be a hybrid model, where the most recent versions are "cached", while old versions are still available.

Jack


Mike Dalessio

unread,
Apr 13, 2015, 3:08:38 PM4/13/15
to vcap...@cloudfoundry.org
Hi Jack,

Thanks so much for your feedback!

Based on my conversations with CF users to date, this is definitely something that we would want to be "opt-in" behavior; the consensus-desired default appears to be to disallow the downloading of old/deprecated versions.

Notably, though, what we'll continue to support is the specification of a buildpack using the `cf push` `-b` option:

```
   -b Custom buildpack by name (e.g. my-buildpack) or GIT URL 
```

Buildpacks used in this manner will behave in "online" mode, meaning they'll attempt to download dependencies from the public internet. Does that satisfy your needs, at least in the short-term?

-m


hair...@gmail.com

unread,
Apr 15, 2015, 10:07:41 AM4/15/15
to vcap...@cloudfoundry.org
Hi Mike

You sent me here from the php-buildpack.

Personally I feel that breaking BC when bumping MINOR seems a bit too much.

Wouldn't it be easier to just follow semver (http://semver.org/) on the MAJOR.MINOR parts? I think semver does a great job at defining version semantics. If everyone just stuck to semver life would be better for everyone that needs to do software lifecycle management.

Regards,
Lucas

Mike Dalessio

unread,
Apr 15, 2015, 10:30:50 AM4/15/15
to vcap...@cloudfoundry.org
Hi Lucas,

Thanks for commenting!

I agree completely that if everyone followed semver, it would be easier to approach this systematically. However, since often projects interpret semver differently (and sometimes people make mistakes), I'm worried that if we're too aggressive in removing older-but-still-supported versions from the buildpacks, we'll increase the burden to application developers unnecessarily.

In the Ruby buildpack, which I'm most familiar with personally, we definitely *do* need to treat MINOR bumps as possibly-API-incompatible; Ruby 2.0.x and 2.2.x are different enough that we need to support all of them. As an example, Ruby 2.2 removed some deprecated C API calls that were present in 2.1; any library that used these deprecated APIs would stop working in 2.2. Application developers may not want to upgrade to Ruby 2.2.x just yet, and I don't want that upgrade pain to be a blocker to CF adoption.

In the PHP buildpack, if we assumed that MINOR bumps were still API/ABI compatible, then we'd be able to drop 5.4.x and 5.5.x completely. I don't have enough experience in the PHP ecosystem to know whether this is desired or not. Do you think it's acceptable to include only 5.6.6 and 5.6.7 in the buildpack? Or would we run into similar issues?

I'd prefer to make pessimistic assumptions about interpreter compatibility for now, for this first cut of "skinny buildpacks". We can definitely optimize the support matrix as we go forward, based on feedback from you and other experts.

Does all that make sense?

-m


--
You received this message because you are subscribed to the Google Groups "Cloud Foundry Developers" group.

Daniel Mikusa

unread,
Apr 15, 2015, 10:50:19 AM4/15/15
to vcap...@cloudfoundry.org
On Wed, Apr 15, 2015 at 10:30 AM, Mike Dalessio <mdal...@pivotal.io> wrote:
Hi Lucas,

Thanks for commenting!

I agree completely that if everyone followed semver, it would be easier to approach this systematically. However, since often projects interpret semver differently (and sometimes people make mistakes), I'm worried that if we're too aggressive in removing older-but-still-supported versions from the buildpacks, we'll increase the burden to application developers unnecessarily.

In the Ruby buildpack, which I'm most familiar with personally, we definitely *do* need to treat MINOR bumps as possibly-API-incompatible; Ruby 2.0.x and 2.2.x are different enough that we need to support all of them. As an example, Ruby 2.2 removed some deprecated C API calls that were present in 2.1; any library that used these deprecated APIs would stop working in 2.2. Application developers may not want to upgrade to Ruby 2.2.x just yet, and I don't want that upgrade pain to be a blocker to CF adoption.

In the PHP buildpack, if we assumed that MINOR bumps were still API/ABI compatible, then we'd be able to drop 5.4.x and 5.5.x completely. I don't have enough experience in the PHP ecosystem to know whether this is desired or not. Do you think it's acceptable to include only 5.6.6 and 5.6.7 in the buildpack? Or would we run into similar issues?

I'm curious to hear what others thing, but I don't think this would be acceptable for PHP users.  While almost all PHP apps use 5+, usage of the minor versions is still pretty diverse with the majority using older versions [1].  While I think the relevance of these stats could be debated, because I feel like this trend is caused by the fact that most Linux distros don't keep up with the latest versions and people tend to use what's packaged with the distro, I think it shows that most people aren't thinking about using PHP 5.6 yet.

My thought for support is that we should continue including binaries for a PHP 5.x version as long as it's being maintained.  In other words, support the latest 5.4.x release until the PHP dev's stop publishing them.  Since they seem to kill off the oldest release when the next PHP is stable, that generally leaves the build pack with three versions to support.

On a slightly different note, I think we can help adoption of newer PHP versions by bumping up the default version of PHP used by the build pack.  It's currently 5.4.  Perhaps we should consider going to 5.5?

Dan

 

Mike Dalessio

unread,
Apr 15, 2015, 12:47:51 PM4/15/15
to vcap...@cloudfoundry.org
On Wed, Apr 15, 2015 at 10:49 AM, Daniel Mikusa <dmi...@pivotal.io> wrote:
On Wed, Apr 15, 2015 at 10:30 AM, Mike Dalessio <mdal...@pivotal.io> wrote:
Hi Lucas,

Thanks for commenting!

I agree completely that if everyone followed semver, it would be easier to approach this systematically. However, since often projects interpret semver differently (and sometimes people make mistakes), I'm worried that if we're too aggressive in removing older-but-still-supported versions from the buildpacks, we'll increase the burden to application developers unnecessarily.

In the Ruby buildpack, which I'm most familiar with personally, we definitely *do* need to treat MINOR bumps as possibly-API-incompatible; Ruby 2.0.x and 2.2.x are different enough that we need to support all of them. As an example, Ruby 2.2 removed some deprecated C API calls that were present in 2.1; any library that used these deprecated APIs would stop working in 2.2. Application developers may not want to upgrade to Ruby 2.2.x just yet, and I don't want that upgrade pain to be a blocker to CF adoption.

In the PHP buildpack, if we assumed that MINOR bumps were still API/ABI compatible, then we'd be able to drop 5.4.x and 5.5.x completely. I don't have enough experience in the PHP ecosystem to know whether this is desired or not. Do you think it's acceptable to include only 5.6.6 and 5.6.7 in the buildpack? Or would we run into similar issues?

I'm curious to hear what others thing, but I don't think this would be acceptable for PHP users.  While almost all PHP apps use 5+, usage of the minor versions is still pretty diverse with the majority using older versions [1].  While I think the relevance of these stats could be debated, because I feel like this trend is caused by the fact that most Linux distros don't keep up with the latest versions and people tend to use what's packaged with the distro, I think it shows that most people aren't thinking about using PHP 5.6 yet.

My thought for support is that we should continue including binaries for a PHP 5.x version as long as it's being maintained.  In other words, support the latest 5.4.x release until the PHP dev's stop publishing them.  Since they seem to kill off the oldest release when the next PHP is stable, that generally leaves the build pack with three versions to support.

On a slightly different note, I think we can help adoption of newer PHP versions by bumping up the default version of PHP used by the build pack.  It's currently 5.4.  Perhaps we should consider going to 5.5?

This is a good idea. I've created a Tracker story here to update the default to 5.5:



 

aa...@hubernet.net

unread,
Apr 20, 2015, 2:15:03 PM4/20/15
to vcap...@cloudfoundry.org
Are the new buildpacks still expected to be released sometime soon?

Aaron Huber
Intel Corporation
Message has been deleted

JT Archie

unread,
Apr 21, 2015, 10:38:13 AM4/21/15
to vcap...@cloudfoundry.org, aa...@hubernet.net
Hi, Aaron!

Thanks for asking.
'
Our progress can be followed on the Buildpacks Tracker project, under the skinny-buildpack track.

Let us know if you have any questions.

Thanks,

JT

Jack Cai

unread,
Apr 23, 2015, 4:41:20 PM4/23/15
to vcap...@cloudfoundry.org
Sorry for the late response. So you mean the git version will still have the online dependency download support, while the "offline" (I think there is a new name for them) packages will NOT. Is that correct?

Jack


Jan Dubois

unread,
Apr 23, 2015, 5:29:08 PM4/23/15
to vcap-dev
On Thu, Apr 23, 2015 at 1:41 PM, Jack Cai <green...@gmail.com> wrote:
> Sorry for the late response. So you mean the git version will still have the
> online dependency download support, while the "offline" (I think there is a
> new name for them) packages will NOT. Is that correct?

They do, except "git version" is a slightly misleading name. You
cannot just use the Github tarball from the releases page e.g.:

https://github.com/cloudfoundry/ruby-buildpack/archive/v1.3.1.tar.gz

You actually have to package an "uncached" version yourself:

$ git clone g...@github.com:cloudfoundry/ruby-buildpack.git
$ cd ruby-buildpack
$ git checkout v1.3.0
$ git submodule update --init
$ gem install bundle
$ BUNDLE_GEMFILE=cf.Gemfile bundle
$ BUNDLE_GEMFILE=cf.Gemfile bundle exec buildpack-packager uncached

Same for nodejs, go, etc, except for the PHP, which requires:

$ git clone g...@github.com:cloudfoundry/php-buildpack.git
$ cd php-buildpack/
$ git checkout v3.1.0
$ ./bin/cf-buildpack-build uncached

The java-buildpack team already publishes an uncached buildpack, so
you don't have to build it yourself:

https://github.com/cloudfoundry/java-buildpack/releases/download/v3.0/java-buildpack-v3.0.zip

The terms "cached" and "uncached" are the new spellings of "offline"
and "online". :)

Cheers,
-Jan
Reply all
Reply to author
Forward
0 new messages