Gem file size limits

13 views
Skip to first unread message

Evan Phoenix

unread,
Jan 17, 2012, 6:10:32 PM1/17/12
to RubyGems developers mailing list
I believe that rubygems.org needs to limit the max size of a .gem file which will be allowed.

This serves two purposes:
1) It protects users from themselves. The top 19 of 20 gems sorted by size are all huge because they accidentally packaged all previous versions within themselves. This issue needs to be fixed on the gem build side also, but there is no reason to allow these gems.
2) Cost. Rubygems.org is becoming increasingly expensive to run and thus we need to begin thinking of ways to keep it mean and lean.

I think we can all agree that some kind of limit makes sense. At the moment, there is nothing from preventing a user from using rubygems.org as their personal backup and pushing terabytes in a .gem file. Clearly we can't operate if people do that.

So the natural question I have for all of you is: what makes sense as the size limit? To help you with this decision, here is some data for you to chew on:

1) The top 1000 gems, sorted by size: https://gist.github.com/1629309
2) A histogram of gem sizes by megabyte: https://gist.github.com/1629435

You can see from the histogram that 96% of gems are less than one megabyte, and 98% are 3 megs or less. It seems like that fact should inform our decision.

To start the decision, let me throw out a starting point: 10 megs.

Looking at the biggest non-accidental gems, they're almost all jruby related and contain huge .jar files. We've pinged others about removing the impediment to pushing gems with maven deps and thusly devs would use that functionality rather than packaging the jars within the gems themselves.

Comments and Criticisms Required.

- Evan

--
Evan Phoenix // ev...@phx.io


_______________________________________________
RubyGems-Developers mailing list
http://rubyforge.org/projects/rubygems
RubyGems-...@rubyforge.org
http://rubyforge.org/mailman/listinfo/rubygems-developers

Luis Lavena

unread,
Jan 17, 2012, 7:13:13 PM1/17/12
to RubyGems developers mailing list
On Tue, Jan 17, 2012 at 8:10 PM, Evan Phoenix <ev...@phx.io> wrote:
> I believe that rubygems.org needs to limit the max size of a .gem file which will be allowed.
>
> ...

>
> You can see from the histogram that 96% of gems are less than one megabyte, and 98% are 3 megs or less. It seems like that fact should inform our decision.
>
> To start the decision, let me throw out a starting point: 10 megs.
>
> Looking at the biggest non-accidental gems, they're almost all jruby related and contain huge .jar files. We've pinged others about removing the impediment to pushing gems with maven deps and thusly devs would use that functionality rather than packaging the jars within the gems themselves.
>

That leaves out gems with native dependencies like qtbindings.

QT is massive and to ask users to compile it to be able to install it
on Windows is very problematic.

qtbindings pre-compiled for Windows weight ~43MB, the limit you
comment will block them from publishing gems.

There are other gems with pre-compiled bindings like qtbindings or
gtk2 (12MB) out.

--
Luis Lavena
AREA 17
-
Perfection in design is achieved not when there is nothing more to add,
but rather when there is nothing more to take away.
Antoine de Saint-Exupéry

Nick Quaranto

unread,
Jan 17, 2012, 7:21:08 PM1/17/12
to RubyGems developers mailing list
For those willing to do more data analysis, here's everything we have on S3
right now with file sizes.

https://gist.github.com/1629174

On Tue, Jan 17, 2012 at 6:10 PM, Evan Phoenix <ev...@phx.io> wrote:

> I believe that rubygems.org needs to limit the max size of a .gem file
> which will be allowed.
>
> This serves two purposes:
> 1) It protects users from themselves. The top 19 of 20 gems sorted by size
> are all huge because they accidentally packaged all previous versions
> within themselves. This issue needs to be fixed on the gem build side also,
> but there is no reason to allow these gems.
> 2) Cost. Rubygems.org is becoming increasingly expensive to run and thus
> we need to begin thinking of ways to keep it mean and lean.
>
> I think we can all agree that some kind of limit makes sense. At the

> moment, there is nothing from preventing a user from using rubygems.orgas their personal backup and pushing terabytes in a .gem file. Clearly we

Ryan Davis

unread,
Jan 17, 2012, 8:38:51 PM1/17/12
to RubyGems developers mailing list

On Jan 17, 2012, at 16:13 , Luis Lavena wrote:

> That leaves out gems with native dependencies like qtbindings.
>
> QT is massive and to ask users to compile it to be able to install it
> on Windows is very problematic.
>
> qtbindings pre-compiled for Windows weight ~43MB, the limit you
> comment will block them from publishing gems.
>
> There are other gems with pre-compiled bindings like qtbindings or
> gtk2 (12MB) out.

Maybe we should have a second gem source that doesn't have this restriction but isn't run by rubycentral and is sponsored by someone else (even if just by bandwidth)?

After seeing a number of exponentially growing projects I do think _something_ needs to be done. My insistence on using hand checked manifests will only go so far. :P

Eric Hodel

unread,
Jan 17, 2012, 8:41:53 PM1/17/12
to RubyGems developers mailing list
On Jan 17, 2012, at 3:10 PM, Evan Phoenix wrote:
> I believe that rubygems.org needs to limit the max size of a .gem file which will be allowed.
>
> This serves two purposes:
> 1) It protects users from themselves. The top 19 of 20 gems sorted by size are all huge because they accidentally packaged all previous versions within themselves. This issue needs to be fixed on the gem build side also, but there is no reason to allow these gems.
> 2) Cost. Rubygems.org is becoming increasingly expensive to run and thus we need to begin thinking of ways to keep it mean and lean.
>
> I think we can all agree that some kind of limit makes sense. At the moment, there is nothing from preventing a user from using rubygems.org as their personal backup and pushing terabytes in a .gem file. Clearly we can't operate if people do that.
>
> So the natural question I have for all of you is: what makes sense as the size limit? To help you with this decision, here is some data for you to chew on:
>
> 1) The top 1000 gems, sorted by size: https://gist.github.com/1629309
> 2) A histogram of gem sizes by megabyte: https://gist.github.com/1629435

For additional data, here's the sum of space consumed by each gem for all its releases:

http://paste.segment7.net/qh.html

> You can see from the histogram that 96% of gems are less than one megabyte, and 98% are 3 megs or less. It seems like that fact should inform our decision.
>
> To start the decision, let me throw out a starting point: 10 megs.

Most of the gems listed in the top 10 contain embedded third-party code, SDKs, etc.

At position 726 in the list the total consumption for a gem reaches 10MB, so 98% of authors use less than 10MB total.

> Looking at the biggest non-accidental gems, they're almost all jruby related and contain huge .jar files. We've pinged others about removing the impediment to pushing gems with maven deps and thusly devs would use that functionality rather than packaging the jars within the gems themselves.

For some perspective on possible limits, here's the list of gems that used more than 5MB for any release:

http://paste.segment7.net/qi.html

For 34264 releases:

A 10MB limit would block 0.7% of gems
A 5MB limit would block 1.3% of gems

On IRC Gregory Brown suggested we cross-reference this list with the popularity of the gem, but I don't have download counts handy.

>From the most-downloaded-today list on rubygems.org:

mime-types uses 238KB total
multi_json uses 134KB total
treetop uses 2MB total
json uses 32MB total
thor uses 20MB total

neither thor nor json use > 5MB for any release

Some other known-popular gems:

rake uses 29MB total
rails uses 20MB total
activerecord uses 48MB total
actionpack uses 68MB total
actionmailer uses 9MB total
activeresource uses 5MB total
activesupport uses 28MB total
bundler uses 10MB total

There was one anomolous release of action pack, 2.3.6, which was 17MB due to garbage in tmp/test. 2.3.10 and 2.3.11 were 1MB, the rest below. The largest release in the 3 series is 600KB and the 3.2 RC is 374272 bytes.

Eric Hodel

unread,
Jan 17, 2012, 8:46:14 PM1/17/12
to RubyGems developers mailing list
On Jan 17, 2012, at 5:38 PM, Ryan Davis wrote:
> On Jan 17, 2012, at 16:13 , Luis Lavena wrote:
>> That leaves out gems with native dependencies like qtbindings.
>>
>> QT is massive and to ask users to compile it to be able to install it
>> on Windows is very problematic.
>>
>> qtbindings pre-compiled for Windows weight ~43MB, the limit you
>> comment will block them from publishing gems.
>>
>> There are other gems with pre-compiled bindings like qtbindings or
>> gtk2 (12MB) out.
>
> Maybe we should have a second gem source that doesn't have this restriction but isn't run by rubycentral and is sponsored by someone else (even if just by bandwidth)?
>
> After seeing a number of exponentially growing projects I do think _something_ needs to be done. My insistence on using hand checked manifests will only go so far. :P

Four of the exponential growth gems have been traced back to jeweler not cleaning the pkg/ dir before building gems and building out of the source directory, not a copy in the pkg/ directory. I filed the following issue:

https://github.com/technicalpickles/jeweler/issues/216

actionpack-2.3.6 had a similar issue where test/tmp was not cleaned before packaging resulting in a 17MB gem.

I will double check the package code to ensure that only files listed in the spec are added to a packaged gem which should limit this problem, provided the build-tool authors do the right thing.

Charles Oliver Nutter

unread,
Jan 18, 2012, 5:36:35 PM1/18/12
to RubyGems developers mailing list
Checking in on this one...

On Tue, Jan 17, 2012 at 5:10 PM, Evan Phoenix <ev...@phx.io> wrote:
> I believe that rubygems.org needs to limit the max size of a .gem file which will be allowed.

I agree, but having a fairly important gem that's over the 10MB limit
(jruby-jars, fetched by anyone deploying JRuby to a Java webapp
server) I have concerns :)

> This serves two purposes:
> 1) It protects users from themselves. The top 19 of 20 gems sorted by size are all huge because they accidentally packaged all previous versions within themselves. This issue needs to be fixed on the gem build side also, but there is no reason to allow these gems.
> 2) Cost. Rubygems.org is becoming increasingly expensive to run and thus we need to begin thinking of ways to keep it mean and lean.

Limiting gem sizes may stall the cost issue, but it's not going to
eliminate it. Eventually you'll be right back up there with "well
behaved" gems that have released many versions over time.

> To start the decision, let me throw out a starting point: 10 megs.
>
> Looking at the biggest non-accidental gems, they're almost all jruby related and contain huge .jar files. We've pinged others about removing the impediment to pushing gems with maven deps and thusly devs would use that functionality rather than packaging the jars within the gems themselves.

What about a Github approach? Everyone can get an account with some
amount of free space, and if you go beyond that you have to pay or
move stuff off to your own server (I will get to the "federated"
thread in a minute).

The problem with choosing a limit based on individual gem size is that
the worst offenders may be below that limit but push lots of
revisions, compared to JRuby which has maybe a dozen jruby-jars
versions around 10MB each. Basing the limit on a per-account "free"
tier makes more sense to me.

It's also possible (if not likely) that people interested in pushing
big gems and not hosting them would happily pay for larger tiers of
service. That would immediately start blunting costs without any
complicated multi-home or federation work.

Another mad-cap idea: multihome *using* github. Make it possible for
people to host their gems on github and use that as the store for gems
associated with their account. Then you don't even have to manage the
paid tiers; they just do that themselves.

There's many better options than just shutting the door at 10MB.

- Charlie

Trans

unread,
Jan 19, 2012, 2:48:50 AM1/19/12
to rubygems-...@googlegroups.com, RubyGems developers mailing list
Allow developers to mark old versions as "unmaintained" and then delete
such versions after a specific period of time without download.

If there is a desire to keep them for posterity or perhaps for just-in-case
support reasons, these deleted gems could be archived to someone's
dedicated local machine for an extended period.

Charles Oliver Nutter

unread,
Jan 19, 2012, 10:57:39 AM1/19/12
to rubygems-...@googlegroups.com, RubyGems developers mailing list
It would also be good to allow re-pushing an existing (identical) gem
with a new remote URL, to allow users to actively offload current
giants rather than waiting for new releases. That could immediately
reduce the size and bandwidth load for RG.org.

For example, JRuby team could re-push all jruby-jars gems using S3
URLs at a moment's notice.

- Charlie

Daniel DeLeo

unread,
Jan 19, 2012, 12:04:15 PM1/19/12
to RubyGems developers mailing list, rubygems-...@googlegroups.com
The chef-solr gem, which embeds Solr, is also one of the offenders. We wouldn't have a problem hosting the gem on S3 ourselves. The only issue I foresee is that many of our users are new to Ruby and its ecosystem, so I hope that the transition can be made as painless as possible for end users.


Thanks,

--
Dan DeLeo


On Thursday, January 19, 2012 at 7:57 AM, Charles Oliver Nutter wrote:

> It would also be good to allow re-pushing an existing (identical) gem
> with a new remote URL, to allow users to actively offload current
> giants rather than waiting for new releases. That could immediately

> reduce the size and bandwidth load for RG.org (http://RG.org).


>
> For example, JRuby team could re-push all jruby-jars gems using S3
> URLs at a moment's notice.
>
> - Charlie
>

> On Thu, Jan 19, 2012 at 1:48 AM, Trans <tran...@gmail.com (mailto:tran...@gmail.com)> wrote:
> > Allow developers to mark old versions as "unmaintained" and then delete
> > such versions after a specific period of time without download.
> >
> > If there is a desire to keep them for posterity or perhaps for just-in-case
> > support reasons, these deleted gems could be archived to someone's
> > dedicated local machine for an extended period.
> >
> >
> > _______________________________________________
> > RubyGems-Developers mailing list
> > http://rubyforge.org/projects/rubygems

> > RubyGems-...@rubyforge.org (mailto:RubyGems-...@rubyforge.org)


> > http://rubyforge.org/mailman/listinfo/rubygems-developers
>
>
> _______________________________________________
> RubyGems-Developers mailing list
> http://rubyforge.org/projects/rubygems

> RubyGems-...@rubyforge.org (mailto:RubyGems-...@rubyforge.org)

Reply all
Reply to author
Forward
0 new messages