This serves two purposes:
1) It protects users from themselves. The top 19 of 20 gems sorted by size are all huge because they accidentally packaged all previous versions within themselves. This issue needs to be fixed on the gem build side also, but there is no reason to allow these gems.
2) Cost. Rubygems.org is becoming increasingly expensive to run and thus we need to begin thinking of ways to keep it mean and lean.
I think we can all agree that some kind of limit makes sense. At the moment, there is nothing from preventing a user from using rubygems.org as their personal backup and pushing terabytes in a .gem file. Clearly we can't operate if people do that.
So the natural question I have for all of you is: what makes sense as the size limit? To help you with this decision, here is some data for you to chew on:
1) The top 1000 gems, sorted by size: https://gist.github.com/1629309
2) A histogram of gem sizes by megabyte: https://gist.github.com/1629435
You can see from the histogram that 96% of gems are less than one megabyte, and 98% are 3 megs or less. It seems like that fact should inform our decision.
To start the decision, let me throw out a starting point: 10 megs.
Looking at the biggest non-accidental gems, they're almost all jruby related and contain huge .jar files. We've pinged others about removing the impediment to pushing gems with maven deps and thusly devs would use that functionality rather than packaging the jars within the gems themselves.
Comments and Criticisms Required.
- Evan
--
Evan Phoenix // ev...@phx.io
_______________________________________________
RubyGems-Developers mailing list
http://rubyforge.org/projects/rubygems
RubyGems-...@rubyforge.org
http://rubyforge.org/mailman/listinfo/rubygems-developers
That leaves out gems with native dependencies like qtbindings.
QT is massive and to ask users to compile it to be able to install it
on Windows is very problematic.
qtbindings pre-compiled for Windows weight ~43MB, the limit you
comment will block them from publishing gems.
There are other gems with pre-compiled bindings like qtbindings or
gtk2 (12MB) out.
--
Luis Lavena
AREA 17
-
Perfection in design is achieved not when there is nothing more to add,
but rather when there is nothing more to take away.
Antoine de Saint-Exupéry
https://gist.github.com/1629174
On Tue, Jan 17, 2012 at 6:10 PM, Evan Phoenix <ev...@phx.io> wrote:
> I believe that rubygems.org needs to limit the max size of a .gem file
> which will be allowed.
>
> This serves two purposes:
> 1) It protects users from themselves. The top 19 of 20 gems sorted by size
> are all huge because they accidentally packaged all previous versions
> within themselves. This issue needs to be fixed on the gem build side also,
> but there is no reason to allow these gems.
> 2) Cost. Rubygems.org is becoming increasingly expensive to run and thus
> we need to begin thinking of ways to keep it mean and lean.
>
> I think we can all agree that some kind of limit makes sense. At the
> moment, there is nothing from preventing a user from using rubygems.orgas their personal backup and pushing terabytes in a .gem file. Clearly we
> That leaves out gems with native dependencies like qtbindings.
>
> QT is massive and to ask users to compile it to be able to install it
> on Windows is very problematic.
>
> qtbindings pre-compiled for Windows weight ~43MB, the limit you
> comment will block them from publishing gems.
>
> There are other gems with pre-compiled bindings like qtbindings or
> gtk2 (12MB) out.
Maybe we should have a second gem source that doesn't have this restriction but isn't run by rubycentral and is sponsored by someone else (even if just by bandwidth)?
After seeing a number of exponentially growing projects I do think _something_ needs to be done. My insistence on using hand checked manifests will only go so far. :P
For additional data, here's the sum of space consumed by each gem for all its releases:
http://paste.segment7.net/qh.html
> You can see from the histogram that 96% of gems are less than one megabyte, and 98% are 3 megs or less. It seems like that fact should inform our decision.
>
> To start the decision, let me throw out a starting point: 10 megs.
Most of the gems listed in the top 10 contain embedded third-party code, SDKs, etc.
At position 726 in the list the total consumption for a gem reaches 10MB, so 98% of authors use less than 10MB total.
> Looking at the biggest non-accidental gems, they're almost all jruby related and contain huge .jar files. We've pinged others about removing the impediment to pushing gems with maven deps and thusly devs would use that functionality rather than packaging the jars within the gems themselves.
For some perspective on possible limits, here's the list of gems that used more than 5MB for any release:
http://paste.segment7.net/qi.html
For 34264 releases:
A 10MB limit would block 0.7% of gems
A 5MB limit would block 1.3% of gems
On IRC Gregory Brown suggested we cross-reference this list with the popularity of the gem, but I don't have download counts handy.
>From the most-downloaded-today list on rubygems.org:
mime-types uses 238KB total
multi_json uses 134KB total
treetop uses 2MB total
json uses 32MB total
thor uses 20MB total
neither thor nor json use > 5MB for any release
Some other known-popular gems:
rake uses 29MB total
rails uses 20MB total
activerecord uses 48MB total
actionpack uses 68MB total
actionmailer uses 9MB total
activeresource uses 5MB total
activesupport uses 28MB total
bundler uses 10MB total
There was one anomolous release of action pack, 2.3.6, which was 17MB due to garbage in tmp/test. 2.3.10 and 2.3.11 were 1MB, the rest below. The largest release in the 3 series is 600KB and the 3.2 RC is 374272 bytes.
Four of the exponential growth gems have been traced back to jeweler not cleaning the pkg/ dir before building gems and building out of the source directory, not a copy in the pkg/ directory. I filed the following issue:
https://github.com/technicalpickles/jeweler/issues/216
actionpack-2.3.6 had a similar issue where test/tmp was not cleaned before packaging resulting in a 17MB gem.
I will double check the package code to ensure that only files listed in the spec are added to a packaged gem which should limit this problem, provided the build-tool authors do the right thing.
On Tue, Jan 17, 2012 at 5:10 PM, Evan Phoenix <ev...@phx.io> wrote:
> I believe that rubygems.org needs to limit the max size of a .gem file which will be allowed.
I agree, but having a fairly important gem that's over the 10MB limit
(jruby-jars, fetched by anyone deploying JRuby to a Java webapp
server) I have concerns :)
> This serves two purposes:
> 1) It protects users from themselves. The top 19 of 20 gems sorted by size are all huge because they accidentally packaged all previous versions within themselves. This issue needs to be fixed on the gem build side also, but there is no reason to allow these gems.
> 2) Cost. Rubygems.org is becoming increasingly expensive to run and thus we need to begin thinking of ways to keep it mean and lean.
Limiting gem sizes may stall the cost issue, but it's not going to
eliminate it. Eventually you'll be right back up there with "well
behaved" gems that have released many versions over time.
> To start the decision, let me throw out a starting point: 10 megs.
>
> Looking at the biggest non-accidental gems, they're almost all jruby related and contain huge .jar files. We've pinged others about removing the impediment to pushing gems with maven deps and thusly devs would use that functionality rather than packaging the jars within the gems themselves.
What about a Github approach? Everyone can get an account with some
amount of free space, and if you go beyond that you have to pay or
move stuff off to your own server (I will get to the "federated"
thread in a minute).
The problem with choosing a limit based on individual gem size is that
the worst offenders may be below that limit but push lots of
revisions, compared to JRuby which has maybe a dozen jruby-jars
versions around 10MB each. Basing the limit on a per-account "free"
tier makes more sense to me.
It's also possible (if not likely) that people interested in pushing
big gems and not hosting them would happily pay for larger tiers of
service. That would immediately start blunting costs without any
complicated multi-home or federation work.
Another mad-cap idea: multihome *using* github. Make it possible for
people to host their gems on github and use that as the store for gems
associated with their account. Then you don't even have to manage the
paid tiers; they just do that themselves.
There's many better options than just shutting the door at 10MB.
- Charlie
If there is a desire to keep them for posterity or perhaps for just-in-case
support reasons, these deleted gems could be archived to someone's
dedicated local machine for an extended period.
For example, JRuby team could re-push all jruby-jars gems using S3
URLs at a moment's notice.
- Charlie
Thanks,
--
Dan DeLeo
On Thursday, January 19, 2012 at 7:57 AM, Charles Oliver Nutter wrote:
> It would also be good to allow re-pushing an existing (identical) gem
> with a new remote URL, to allow users to actively offload current
> giants rather than waiting for new releases. That could immediately
> reduce the size and bandwidth load for RG.org (http://RG.org).
>
> For example, JRuby team could re-push all jruby-jars gems using S3
> URLs at a moment's notice.
>
> - Charlie
>
> On Thu, Jan 19, 2012 at 1:48 AM, Trans <tran...@gmail.com (mailto:tran...@gmail.com)> wrote:
> > Allow developers to mark old versions as "unmaintained" and then delete
> > such versions after a specific period of time without download.
> >
> > If there is a desire to keep them for posterity or perhaps for just-in-case
> > support reasons, these deleted gems could be archived to someone's
> > dedicated local machine for an extended period.
> >
> >
> > _______________________________________________
> > RubyGems-Developers mailing list
> > http://rubyforge.org/projects/rubygems
> > RubyGems-...@rubyforge.org (mailto:RubyGems-...@rubyforge.org)
> > http://rubyforge.org/mailman/listinfo/rubygems-developers
>
>
> _______________________________________________
> RubyGems-Developers mailing list
> http://rubyforge.org/projects/rubygems
> RubyGems-...@rubyforge.org (mailto:RubyGems-...@rubyforge.org)