Rails and docker.

Tim Uckun

unread,

Feb 2, 2020, 7:09:47 PM2/2/20

to Ruby or Rails Oceania

Hey guys, I have been out of the loop in rails development for a few years so I am quite out of date with my knowledge. I need some advice regarding packaging up a complex rails app in a docker container and shipping it to kubernetes. I am using gitlab CI for all of this.

What I have done so far works but the CI takes a very long time and the images are HUGE. I have several question about this process.

1. The CI takes a long time because bundle install takes a very long time. I have tried some tricks like splitting up the gemfile so the "base" gems are installed first and then later on the rest of the gems are installed. The thinking here is that you are not going to upgrade rails or pg etc all the time and that layer can be cached in the image. This does work but I don't know if this is ideal. I am about to try creating a base image with rails and other "base" gems in it but it's going to be a hassle to keep the versions synchronised with the main app.

2. I do both an yarn install and a rails:assets precompile. Do I need to do both? The assets precompile is a pain because it wants to connect to the database. I am using the nulldb adapter to get around this but is there a better way to deal with this?

3. The resultant image is 1.2 Gigs in size. This is using alpine and deleting all the crap left over from APK and Bundle installs. I tried both a three stage and a two stage docker build and I can reduce the size down a few hundred megs but the CI time shoots up because only the first stage gets cached by docker (I pull the image and use --cache-from ).

Any tips or tricks to reduce the final image size and the build time would be much appreciated.

Thanks.

Simon Russell

unread,

Feb 2, 2020, 7:41:45 PM2/2/20

to rails-...@googlegroups.com

How often is your gemfile changing?

One approach I've used in the past is copying just the gemfile (and related stuff) in and creating that as a layer. Then copy the rest of the project in later.

Also I use dockerignore basically in reverse to whitelist just the stuff that I want in there; not for size purposes (mainly) but so that it doesn't recreate layers when nothing important has changed.

--
You received this message because you are subscribed to the Google Groups "Ruby or Rails Oceania" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rails-oceani...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rails-oceania/8a6bdd29-7ff3-457b-8a0c-9d42dedc4675%40googlegroups.com.

Tim Uckun

unread,

Feb 2, 2020, 9:09:47 PM2/2/20

to Ruby or Rails Oceania

On Monday, February 3, 2020 at 1:41:45 PM UTC+13, Simon Russell wrote:

How often is your gemfile changing?

It's not that the gemfile is changing it's that Gemfile.lock is changing because we have gems we are pulling from git.

One approach I've used in the past is copying just the gemfile (and related stuff) in and creating that as a layer. Then copy the rest of the project in later.

That's what I am doing but I am doing it in two phases like this.

COPY package.json yarn.lock  /app/
RUN yarn install

# This is done to speed up CI builds. The base gem file should not be changing a lot so this will cache the whole thing

COPY Gemfile_base.rb /app/
#
## Install the builder gems, don't clean out the cache and delete the gemfile for the next step.

RUN bundle config path \
  && bundle install --gemfile Gemfile_base.rb

# Install gems
COPY Gemfile  /app/
COPY Gemfile.lock  /app/

RUN bundle config path \
 && bundle config --global frozen 1 \
 && bundle install  --gemfile Gemfile \
 # Remove unneeded files (cached *.gem, *.o, *.c)
 && rm -rf $BUNDLE_PATH/cache/ \
 && find $BUNDLE_PATH/gems/ -name "*.c" -delete \

&& find $BUNDLE_PATH/gems/ -name "*.o" -delete

The Gemfile has this in it

#The last argument to eval makes exceptions in Gemfile.devel show up with the correct filename.
eval File.read('Gemfile_base.rb'), nil, 'Gemfile_base'

This allows me add the base gems as one layer and the second bundle install uses the already installed gems from the base layer so it goes faster. This allows all my gems to be defined in the same project (no base layer to pull from) but it's split into two files so that's not ideal.

Also I use dockerignore basically in reverse to whitelist just the stuff that I want in there; not for size purposes (mainly) but so that it doesn't recreate layers when nothing important has changed.

I have a pretty robust dockerignore (I can post it if you want) but the build still take a long time and as I said the docker image is HUGE 1.2 gigs. I did a three stage docker build once and got the image to just a hair under 600 megs but that still seems insanely large for a rails app.

What about the asset precompilation? Can I skip that if I have done a yarn install? how about visa versa? Can I skip the yarn install if I do a asset precompile? Is there another way to avoid a database connection when I am doing asset precompile than using the nulldb adapter?

Simon Russell

unread,

Feb 3, 2020, 4:05:14 AM2/3/20

to rails-...@googlegroups.com

It's not that the gemfile is changing it's that Gemfile.lock is changing because we have gems we are pulling from git.

Okay sure yes that would complicate things a bit; kind of the same issue -- there are frequent changes to the Gems. Just trying to understand why it was needing rebuilding so often.

That's what I am doing but I am doing it in two phases like this.

...

This allows me add the base gems as one layer and the second bundle install uses the already installed gems from the base layer so it goes faster. This allows all my gems to be defined in the same project (no base layer to pull from) but it's split into two files so that's not ideal.

That all looks okay to me.

I have a pretty robust dockerignore (I can post it if you want) but the build still take a long time and as I said the docker image is HUGE 1.2 gigs. I did a three stage docker build once and got the image to just a hair under 600 megs but that still seems insanely large for a rails app.

Yeah it seems like the issue isn't that layers are being rebuilt when they shouldn't be -- if you're updating the lock file then you do need to reinstall.

I mean nothing you're doing looks like the wrong thing; I guess there might be more you can clear out of the build process for the gems, but that might come down to looking at the Gemfile and stepping through things one by one and understand where that 1.2Gb comes from. If the gems that take ages to install are the gems that are getting updated frequently then nothing springs to mind about what you might do.

What about the asset precompilation? Can I skip that if I have done a yarn install? how about visa versa? Can I skip the yarn install if I do a asset precompile? Is there another way to avoid a database connection when I am doing asset precompile than using the nulldb adapter?

Deliberately didn't respond to this bit because I haven't used that stuff, sorry :)

Oleg Ivanov

unread,

Feb 3, 2020, 4:54:28 AM2/3/20

to rails-...@googlegroups.com

We're doing yarn install and asset precompilation for our docker build, and we run these steps on CI to prepare the source code, which is then packaged into a docker image.

This allows us to fully leverage CI's own caches (we're doing it on Gitlab as well btw), and basically we do with bundler what you already discussed here - cache the bundle plus Gemfile/Gemfile.lock, then we cache node_modules plus package.json/yarn.lock, and we also keep tmp/assets between builds to speed up assets precompilation. Can't comment on the database thing - we just let our build connect to the db, it didn't really bother us much.

If no dependencies changed between the builds – the installation steps are fast, and it's only asset compilation itself that takes some time.

It also removes the need for us to have yarn or raw assets in the docker image at all – we just add Rails bundle as a separate layer to cache it, and then copy project code with precompiled assets, but WITHOUT any raw assets - it speeds up the process and reduces final image size (no node_modules and all that in the image).

Hope this helps.

--

You received this message because you are subscribed to the Google Groups "Ruby or Rails Oceania" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rails-oceani...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/rails-oceania/CAGD6pKZ89%2B9JTEH%3DYw-CKW7e_ZnyRvN-4S5oMpQXs%2BnTBsTLXw%40mail.gmail.com.

--

With best regards,
Oleg Ivanov.
ICQ #69991809
Jabber: morh...@jabber.org

Nigel Sheridan-Smith

unread,

Feb 3, 2020, 6:30:08 AM2/3/20

to rails-...@googlegroups.com

Have you considered using bundler's vendor cache?

https://stackoverflow.com/questions/14607636/is-it-advisable-to-include-the-contents-of-vendor-cache-in-git-in-a-rails-3-2-ap

Cheers,

Nigel

Dr Nigel Sheridan-Smith PhD / Principal
Green Shores Digital

M: +61 403 930 963
E: ni...@greenshoresdigital.com
W: http://www.greenshoresdigital.com

To view this discussion on the web visit https://groups.google.com/d/msgid/rails-oceania/CAC-XhxXvASC4%3DHs0cY%3D8yv2zZU1r%3Dds%2Bx93-rXzX4-q9R%2BemEQ%40mail.gmail.com.

Tim Uckun

unread,

Feb 3, 2020, 7:02:04 PM2/3/20

to rails-...@googlegroups.com

I have thought about this but I have shied away from doing it because the development systems are Mac and the deployment is Linux. Most of the gems are pure ruby and should OK but there are bound to be some gems that use C libs and/or have platform specific versions.

Maybe bundler is smart enough to manage all of that though, I am just not sure.

To view this discussion on the web visit https://groups.google.com/d/msgid/rails-oceania/CAOOM5kx2PjvKCE6QdBAnFez4ygqBO5TpQTFLp4F2xGJ4B1o3rQ%40mail.gmail.com.

Tim Uckun

unread,

Feb 3, 2020, 7:02:05 PM2/3/20

to rails-...@googlegroups.com

Hi Oleg.

Thanks for the response. Is there any way you can share the
Dockerfile with us?

> To view this discussion on the web visit https://groups.google.com/d/msgid/rails-oceania/CAC-XhxXvASC4%3DHs0cY%3D8yv2zZU1r%3Dds%2Bx93-rXzX4-q9R%2BemEQ%40mail.gmail.com.

Oleg Ivanov

unread,

Feb 3, 2020, 7:14:53 PM2/3/20

to rails-...@googlegroups.com

> Maybe bundler is smart enough to manage all of that though, I am just not sure.

It is, give it a go :) I've been doing this for years on multiple projects, it's really nice to not depend on rubygems, especially so if you rely heavily on gems checked out from source code branches – you're shielded from upstream rebases, people deleting their stuff and all that, and your deploys run happily even when rubygems are down.

> Is there any way you can share the Dockerfile with us?

sure, I'll ping you directly a bit later today

To view this discussion on the web visit https://groups.google.com/d/msgid/rails-oceania/CAGuHJrPoRgVG420RiPqzEbiv5_P3UyHqN%2B2tbpOkPQ3JjU06og%40mail.gmail.com.

Tim Uckun

unread,

Feb 24, 2020, 11:33:26 PM2/24/20

to rails-...@googlegroups.com

Hey Everybody. I wanted to follow up on this and give an update.

So far nothing I do seems to be working very well. The basic problem is that bundler is not using the gems in the vendor/cache except for the gems which are specified as being git branches. Clearly it knows to look there because it sees that the gems have been checked out into the cache and uses them but all other gems are installed fresh even if they exist in vendor/cache.

I thought the problem was because the gems are installed on a mac and the dockerfile is using alpine so I wrote a script to tar up the vendor/cache (and other directories used by the asset precompile) and push them to google storage. You can see it here

https://gist.github.com/timuckun/b3f57cda0011b452d3c06fb914699091

The dockerfile sets the environment which looks like this

ARG BUNDLE_WITHOUT='development test'
ARG BUNDLER_VERSION='2.1.0'
ARG RUBYGEMS_VERSION='3.1.1'
ARG BUNDLE_INSTALL_PATH='/bundle'

ENV RAILS_ENV='production' \
  RAILS_LOG_TO_STDOUT=true \
  SECRET_KEY_BASE='foo' \
  LANG='C.UTF-8'  \
  PATH="/app/bin:/${BUNDLE_INSTALL_PATH}/bin/:$PATH"  \
  GEM_HOME="${BUNDLE_INSTALL_PATH}"  \
  GEM_PATH="${BUNDLE_INSTALL_PATH}"  \
  BUNDLE_WITHOUT=${BUNDLE_WITHOUT}  \
  BUNDLE_JOBS=4  \
  BUNDLE_RETRY=3  \
  BUNDLE_PATH="${BUNDLE_INSTALL_PATH}"  \
  BUNDLE_APP_CONFIG="${BUNDLE_INSTALL_PATH}"  \
  BUNDLE_BIN="bin"  \
  BUNDLE_ALLOW_OFFLINE_INSTALL="true" \
  BUNDLE_CLEAN="true" \
  BUNDLE_DEPLOYMENT="true" \
  BUNDLE_CACHE_ALL="true"

Despite all this the bundler installs fresh gems every time the dockerfile builds.   The asset precompile does see the cache and runs a lot faster but for some strange reason it runs yarn install twice.

BTW the bundler documentation is outdated so some of the flags listed don't work.

On Tue, Feb 4, 2020 at 12:30 AM Nigel Sheridan-Smith <ni...@greenshoresdigital.com> wrote:

To view this discussion on the web visit https://groups.google.com/d/msgid/rails-oceania/CAOOM5kx2PjvKCE6QdBAnFez4ygqBO5TpQTFLp4F2xGJ4B1o3rQ%40mail.gmail.com.

Oleg Ivanov

unread,

Feb 24, 2020, 11:51:14 PM2/24/20

to rails-...@googlegroups.com

Hi Tim,

> Despite all this the bundler installs fresh gems every time the dockerfile builds

Does it download them from rubygems, or does it use local copies from `vendor/cache`? I assume the latter, but you can confirm it just by looking at whether there's a download step before the installation.

If it does use gems from `vendor/cache` to install them – then everything's working well. If you want to cache not just the gems themselves, but also fully installed bundle environment to avoid bundler installing anything at all between runs – then you need to store not just `vendor/cache`, but also your bundle install path, which seems to be `/bundle` in your example (or actually it's the only thing you need to store – unless you're unhappy with what comes in `vendor/cache` from your repository for any reason).

Tim Uckun

unread,

Feb 25, 2020, 5:07:25 AM2/25/20

to rails-...@googlegroups.com

I'll have to look more closely but it uses the term "installing"
rather than "using" which it normally does when the gems are already
installed and some of the gems take quite a while to install
(httpparty, nokogiri etc)

As for tarring the entire install from /bundle it's HUGE. I am going
to try that next and see if the act of downloading, untarring,
installing, tarring, and uploading takes less time than just
installing them outright.

> --
> You received this message because you are subscribed to the Google Groups "Ruby or Rails Oceania" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to rails-oceani...@googlegroups.com.

> To view this discussion on the web visit https://groups.google.com/d/msgid/rails-oceania/CAC-XhxXu3%2BgYFgjZYLmudyyQ4qvBx31L%2BC7oF8E7aiypD8eLoA%40mail.gmail.com.

Oleg Ivanov

unread,

Feb 25, 2020, 5:16:47 AM2/25/20

to rails-...@googlegroups.com

> I'll have to look more closely but it uses the term "installing"
rather than "using"

The steps are: downloading -> installing -> using

You're caching `vendor/cache` now – so you're skipping downloading step, and go straight to installing. If you want to skip that as well – you need to cache `/bundle` directory, too.

> As for tarring the entire install from /bundle it's HUGE

tar it and put it into gitlab's cache, don't send it to your own cloud storage. It's not lightning fast, but it is much faster than installing all binary gems on every build.

To view this discussion on the web visit https://groups.google.com/d/msgid/rails-oceania/CAGuHJrPO1YmrRrA-DSVXhrc%3DWx66fMUnk%2BVCzmkFY4qON63X7A%40mail.gmail.com.

Tim Uckun

unread,

Feb 26, 2020, 4:44:39 AM2/26/20

to rails-...@googlegroups.com

That's interesting.

The downloading of the gems is pretty fast but the installation takes
a long time even if they are cached in the docker container.

I'll try the gitlab cache but I am using the public gitlab shared
runners so I wonder how that's going to work out. The script does
work now it just moves a lot of data around. My current method of
building two containers accomplishes the same thing. I pull the base
container, I build it using the pulled one as cache, then I pull the
main container which uses the base and build that. The end result is
again moving a lot of data around to save some time during the build
process.

I guess when you don't have control over your build infrastructure you
have to resort to these kinds of tricks.

> To view this discussion on the web visit https://groups.google.com/d/msgid/rails-oceania/CAC-XhxUGqH-%3D-gDBtPxf3etif_yP_LZiVktt--wtyG%3DyWqFQLg%40mail.gmail.com.

Reply all

Reply to author

Forward