Reasoning on S3 bucket balancing over CDN

5 views
Skip to first unread message

Luis Lavena

unread,
Oct 28, 2009, 11:45:22 PM10/28/09
to gemcutter
Hello Nick and fellow Gemcutter developers,

While digging in the code today, realize that Hostess contain this
particular line:

if redirect
redirect File.join("http://s3.amazonaws.com",
VaultObject.current_bucket, request.path_info)
else

Is there a reasoning about the usage of the bucket directly instead of
using CloudFront?

In my experience the usage of Amazon CDN proved to increase download
speed for big files, ranging from 600K to 1.3MB, and not increasing
too much the running costs.

There are several "fatty" gems (mostly the ones that includes
binaries) that will benefit from it.

Wanted to ask instead of submitting this as a "feature" since I know
you guys are busy with subdomain stuff ;-)

Regards,
--
Luis Lavena

Nick Quaranto

unread,
Oct 28, 2009, 11:49:16 PM10/28/09
to gemc...@googlegroups.com
I haven't had time to look into CloudFront yet, I would like to move there. The biggest priority right now is the rubygems.org transition, if you'd like to help out with this feature please let me know. :)

-Nick

Trevor Turk

unread,
Oct 29, 2009, 11:29:12 AM10/29/09
to gemcutter
On Oct 28, 10:49 pm, Nick Quaranto <n...@quaran.to> wrote:
> I haven't had time to look into CloudFront yet, I would like to move there.
> The biggest priority right now is the rubygems.org transition, if you'd like
> to help out with this feature please let me know. :)

Using Cloudfront is a good idea, but its something that you'll
probably have to set up yourself because it requires access to your
DNS manager and the AWS Console. I'll try to cover the basics here, so
maybe it would save some time.

You'll want to decide on a CNAME to use like "s3.gemcutter.org" or
whatever.

Then, you can use the AWS console to "create a distribution" - which
means hooking a bucket up to Cloudfront.

https://console.aws.amazon.com/cloudfront/home

Here's an example: http://grab.by/d6i

Then you'll need to set the CNAME up in your DNS, using the value you
get from the AWS Console.

Another example: http://grab.by/d6k

The last step would be to change the code so that it switches to use
Cloudfront in production, I suppose.

I'll watch this thread, but let me know if you'd like any other help
setting this up.

Thanks for your work on Gemcutter. It's great stuff.

- Trevor

Nick Quaranto

unread,
Oct 29, 2009, 11:34:20 AM10/29/09
to gemc...@googlegroups.com
Thanks Trevor. Is the time to get files up on Cloudfront take any longer than S3? That's my primary concern with switching to it. Currently the gems are uploaded to S3 when you do `gem push` and then gem indexing is kicked off in the background, and that usually takes around a minute to rebuild.

-Nick

Trevor Turk

unread,
Oct 29, 2009, 12:01:12 PM10/29/09
to gemcutter
On Oct 29, 10:34 am, Nick Quaranto <n...@quaran.to> wrote:
> Thanks Trevor. Is the time to get files up on Cloudfront take any longer
> than S3? That's my primary concern with switching to it. Currently the gems
> are uploaded to S3 when you do `gem push` and then gem indexing is kicked
> off in the background, and that usually takes around a minute to rebuild.

In my experience, using Cloudfront can simply be thought of as a
"faster way to access files on S3" -- files you put on S3 are
immediately accessible on Cloudfront in an invisible way. I believe
there can be a small lag while an asset is copied from S3 to a
Cloudfront edge network for an initial request, but subsequent
requests should be much faster.

To your question (I think) it appears that there can be a delay in
updating the edge servers, but it's supposedly very short. You can
read the part about Eventual Consistency here:

http://docs.amazonwebservices.com/AmazonCloudFront/2009-04-02/GettingStartedGuide/

So, it looks like Cloudfront might serve a "stale" version of a gem
for longer than S3 would. Do gems get new file names with new
versions, though? If so, I think that'd sidestep this potential
problem. If not, then I suppose you'd just have to decide if the
supposed speed gains of using Cloudfront would outweigh the chance of
serving stale gems for longer.

There's more details about how it works in general here:

http://aws.amazon.com/cloudfront/#details

No problem at all if you want to table this until after the
transition, but I'm happy to help if you're interested in
investigating this more.

- Trevor

Luis Lavena

unread,
Oct 29, 2009, 12:24:03 PM10/29/09
to gemcutter
On Oct 29, 12:34 pm, Nick Quaranto <n...@quaran.to> wrote:
> Thanks Trevor. Is the time to get files up on Cloudfront take any longer
> than S3? That's my primary concern with switching to it. Currently the gems
> are uploaded to S3 when you do `gem push` and then gem indexing is kicked
> off in the background, and that usually takes around a minute to rebuild.
>

Gems will still be uploaded to S3. CloudFront happens in the
background for HTTP requests of those files using the CNAME from
Amazon or the one your provided.

Since gems are individual, versioned files, there will be no problem
for these.

On the otherhand, if you're uploading the marshal/indexes to S3, since
they use the same filename will take a bit longer to be replicated to
the edges of the CDN.

That is basic CDN functionality.

Either way, I've been using CloudFront to handle big files (1.5MB or
so) and updating of those get a penalty hit on the first request
(similar to S3 performance) and the subsequent requests get served
from the CDN.

So, in theory there will be no performance lost or delays compared to
the current situation.

This, since will be using the own CNAME as "cdn.gemcutter.org" or
"cloud.gemcutter.org" can workaround some of the concerns about
firewall policies of some companies with S3 buckets (s3.amazonaws.com
being blacklisted)

Cheers,
--
Luis Lavena

richardiux

unread,
Nov 23, 2009, 1:45:09 PM11/23/09
to gemcutter
The corporation I work with has s3.amazonaws.com blacklisted as well.
I'm guessing this is quite common.
Now that Gemcutter completed the migration, it is almost impossible
for us to install any gems.

Here is a gist with extra info:
http://gist.github.com/241258
> "cloud.gemcutter.org" can workaround some of the concerns aboutfirewallpolicies of some companies with S3 buckets (s3.amazonaws.com

John Barnette

unread,
Nov 23, 2009, 3:58:34 PM11/23/09
to gemc...@googlegroups.com
On Mon, Nov 23, 2009 at 10:45 AM, richardiux <richa...@gmail.com> wrote:
> The corporation I work with has s3.amazonaws.com blacklisted as well.
> I'm guessing this is quite common.
> Now that Gemcutter completed the migration, it is almost impossible
> for us to install any gems.
>
> Here is a gist with extra info:
> http://gist.github.com/241258

We can fix this with CNAMEs if Tom and Nick are up for it. If we do a
bucket called, say, "files.rubygems.org", we can add a CNAME record to
DNS that maps files.rubygems.org to
files.rubygems.org.s3.amazonaws.com and be in business.


~ j.

Luis Lavena

unread,
Nov 23, 2009, 7:43:33 PM11/23/09
to gemcutter
On Nov 23, 5:58 pm, John Barnette <jbarne...@gmail.com> wrote:
Actually you map the CNAME to a CloudFront host, like this one:
dx36u3tp8d8kn.iad2.cloudfront.net (cdn.rubyinstaller.org)

--
Luis Lavena

John Barnette

unread,
Nov 24, 2009, 9:49:01 AM11/24/09
to gemc...@googlegroups.com
Sorry, yes, I was talking specifically about solving the
"s3.amazonaws.com is blocked" problem, not moving the files to
CloudFront.


~ j.

RIlindo Foster

unread,
Dec 13, 2009, 12:30:29 AM12/13/09
to gemcutter
I take it that there is no update on this yet. :( Are the gemcutters
moving over from S3 standard bucket to a gemcutter
CNAME or CDN?

Nick Quaranto

unread,
Dec 13, 2009, 12:50:48 AM12/13/09
to gemc...@googlegroups.com
I tried to look into this last weekend, and cloudfront didn't like that the gemcutter bucket was named gemcutter_production (with a underscore) and I got no further. Hopefully this week I'd like to get a simple cname set up until that can be figured out.

Tom Copeland

unread,
Dec 14, 2009, 2:03:54 PM12/14/09
to gemc...@googlegroups.com
This sounds good to me. What CNAME do we need created? I can get Rich to make the DNS change...

Yours,

Tom

Trevor Turk

unread,
Dec 14, 2009, 2:29:14 PM12/14/09
to gemcutter
On Dec 14, 1:03 pm, Tom Copeland <t...@infoether.com> wrote:
> This sounds good to me.  What CNAME do we need created?  I can get Rich to make the DNS change...

I provided a quick overview with some examples from my Cloudfront
setup in this post:

http://groups.google.com/group/gemcutter/msg/c08441b18c20f225

It's a quick process, but still a bit of a pain :)

- Trevor

Tom Copeland

unread,
Dec 14, 2009, 3:34:17 PM12/14/09
to gemc...@googlegroups.com
Thanks Trevor - but Nick, wasn't there some problem with our bucket name and Cloudfront?

Yours,

Tom

Nick Quaranto

unread,
Dec 14, 2009, 3:36:23 PM12/14/09
to gemc...@googlegroups.com
Yeah, it didn't like the fact that our buckets have underscores (gemcutter_production, gemcutter_staging) and stopped me from adding a cloudfront distribution.

I haven't gotten around to copying all of the gems to a different bucket yet, if you're willing to help me out with that and hopefully running it on ec2 (could just be a tiny Heroku app) I'd appreciate it.

-Nick

Trevor Turk

unread,
Dec 14, 2009, 4:00:35 PM12/14/09
to gemcutter
On Dec 14, 2:36 pm, Nick Quaranto <n...@quaran.to> wrote:
> I haven't gotten around to copying all of the gems to a different bucket
> yet, if you're willing to help me out with that and hopefully running it on
> ec2 (could just be a tiny Heroku app) I'd appreciate it.

Sorry I missed that message, Tom. That's weird about the Cloudfront no-
underscores rule...

A quick Google turned up this post, which has a bit of Ruby code that
utilizes the S3 "copy" functionality. I think this technique could
probably be used pretty well:

http://www.lakedenman.com/2009/10/27/copying-files-between-s3-buckets.html

I'm happy to help out with the work in getting this done but it would
require having the keys and such, so perhaps it's best left to you
guys. Please let me know if I can help, though.

- Trevor

Shane

unread,
Dec 15, 2009, 4:37:41 PM12/15/09
to gemcutter
I just wanted to chime in and say how crucial it is that a different
hostname is used as part of your move to CloudFront. After debugging
downloading gems from behind my employer's firewall I came to find out
that they block ANY Amazon S3 content.

Thanks,
Shane


On Dec 14, 4:00 pm, Trevor Turk <trevort...@gmail.com> wrote:
> On Dec 14, 2:36 pm, Nick Quaranto <n...@quaran.to> wrote:
>
> > I haven't gotten around to copying all of the gems to a different bucket
> > yet, if you're willing to help me out with that and hopefully running it on
> > ec2 (could just be a tiny Heroku app) I'd appreciate it.
>
> Sorry I missed that message, Tom. That's weird about the Cloudfront no-
> underscores rule...
>
> A quick Google turned up this post, which has a bit of Ruby code that
> utilizes the S3 "copy" functionality. I think this technique could
> probably be used pretty well:
>
> http://www.lakedenman.com/2009/10/27/copying-files-between-s3-buckets...

vanillabean

unread,
Dec 16, 2009, 5:36:03 AM12/16/09
to gemcutter
I'd also like to add my voice to this. Same issue. Our corporate
firewall uses WebSense (should be WebNonsense) to blacklist, and S3 is
on the list. This has seriously impacted my level of productivity as
I'm currently forced to grab the gems I need from an unprotected
network and then manually install the gems I need on all of the
machines behind the firewall.
Reply all
Reply to author
Forward
0 new messages