---
title: Fetching Source Index for http://rubygems.org/
teaser: Why Bundler is slow.
tags: web,ruby,performance
author: Nick Quaranto
published_on: 2011-01-13
---

Like you, I've sat at my terminal watching [Bundler](http://gembundler.com) emit
this post's title and do nothing for quite a while. Imagine what we could be
doing instead of waiting for dependencies to resolve! I'm out of ideas already,
I love resolving dependencies.

## Why it's slow

It's actually not Bundler that is slow...it's RubyGems itself. To understand why
this process takes a long time, you need a bit of a history lesson with how
RubyGems handles its index of gems. There are three indexes available:

* Latest index (newest versions for a given gem on a given platform)
* Big index (all versions for all gems on all platforms)
* Prerelease index (only prerelease gems for all gems on all platforms)

Usually we just need to request the "latest" index when you `gem install`
something. However, Bundler needs the big index. This has a serious size
difference though:

    wget http://rubygems.org/latest_specs.4.8.gz
    wget http://rubygems.org/specs.4.8.gz
    du -h *
    172K    latest_specs.4.8.gz
    436K    specs.4.8.gz

These indexes are big gzipped and `Marshal`'d arrays of the gem name, version
and platform. Our first slowdown is actually in parsing this huge array.

    irb -rubygems -rbenchmark
    >> Benchmark.bmbm { |x| x.report { Marshal.load(Gem.gunzip(File.read("specs.4.8.gz"))) }
    Rehearsal
    ---------------------------
    2.250000   0.050000   2.300000 (  2.321536)
    ---------------------------
    total: 2.300000sec

    user     system      total        real
    2.280000   0.030000   2.310000 (  2.299291)

Once unzipped/unpacked, the entries in that array usually look like:

    ["rails", Gem::Version.new("3.0.3"), "ruby"]

Bundler also needs a given gem's dependencies. If you haven't noticed already,
those dependencies aren't in the index at all, they're in the gemspecs, which
are stored individually at a completely different location, also gzipped and
`Marshal`'d.

    irb -rubygems -ropen-uri -rpp
    >> compressed = open("http://rubygems.org/quick/Marshal.4.8/rails-3.0.0.gemspec.rz").read
    >> inflated = Gem.inflate(compressed)
    >> unmarshalled = Marshal.load(inflated)
    >> pp unmarshalled.dependencies
    [Gem::Dependency.new("activesupport", Gem::Requirement.new(["= 3.0.0"]), :runtime),
     Gem::Dependency.new("actionpack", Gem::Requirement.new(["= 3.0.0"]), :runtime),
     Gem::Dependency.new("activerecord", Gem::Requirement.new(["= 3.0.0"]), :runtime),
     Gem::Dependency.new("activeresource", Gem::Requirement.new(["= 3.0.0"]), :runtime),
     Gem::Dependency.new("actionmailer", Gem::Requirement.new(["= 3.0.0"]), :runtime),
     Gem::Dependency.new("railties", Gem::Requirement.new(["= 3.0.0"]), :runtime),
     Gem::Dependency.new("bundler", Gem::Requirement.new(["~> 1.0.0"]), :runtime)]

So that's basically how RubyGems figures out dependencies out to a N level, it
has to make separate requests to each gemspec and continue to jump through until
all possibilities are exhausted. At some point when you `gem install` a gem, add
`-V` on and you'll see all of these requests happening.

Those requests obviously take a lot of time, no matter how good Bundler's
resolver algorithm gets. I think we've pushed this system to its limits, and the
fact that it does complete resolves in a reasonable amount of time is
impressive.

## What you can do

So it's still slow. My general advice is to:

* Check in your `vendor/cache` directory with your .gem files. If `bundle
  install` doesn't make one, force it with `bundle pack`.
* On new installs, CI runs, and deploys, use `bundle --local` which will attempt
  to resolve using only `vendor/cache`
* Lock down to specific versions (or use the [pessimistic
  operator](https://thoughtbot.com/blog/post/2508037841/rubys-pessimistic-operator))
  in your Gemfile

## What we have done about it

From the RubyGems side, I think we've done a good thing by making the long
requests go out to CloudFront, so big gems get a CDN boost. However, all
requests being are still being made to the Gemcutter server at RackSpace before
being redirected to S3/CloudFront, so the network latency with that request
doesn't help those outside of the US get their gems faster.

At [Cape Code](https://thoughtbot.com/blog/recap-capeco-de),
[Matt](http://thoughtbot.com/about/#mmongeau) and I worked on [a new resolver
endpoint](https://github.com/rubygems/rubygems.org/compare/ba52664824905361288e...bc15cd0c9d482df66ed0)
for Bundler. The idea was that Bundler could make a request to this new <abbr
title="Application Programming Interface">API</abbr> that would return one level
of dependencies for a given set of gems. We can't move the entire Bundler
resolver algorithm to the server side, but this could cut down the number of
requests it needs to make out for gemspecs.

This will speed things up a bit, but it doesn't solve the root problem here.

## What needs to happen

What we really need is:

1. A better indexing scheme
1. A mirroring system that isn't horrible (read: round robin DNS)

RubyGems definitely needs a better indexing scheme, but this is difficult since
making the client support it is going to be rough (and we have to worry about
backwards compatability!)

Thankfully, our server is now in Ruby (one of the first goals of the Gemcutter
project) so we can iterate rapidly and drop the changes into a gem plugin (think
`gem fast_install rails`). I've been talking to some fellow robots here about
some possibilities (differential indices for one) but we need to bang some code
out soon.

I'm looking into getting a mirroring system set up, but as always, we need
contributors to help. My first stop has been with
[MirrorBrain](http://mirrorbrain.org), but I'm open to anything that works and
will be easy to setup. My only real requirement is that it takes less than 1
minute to get a gem distributed. Perhaps we need BitTorrent? The gem files are
small (most are way under 1MB) so I can't see that as being hard to accomplish.

My goal is to get rid of at least one of these problems in 2011. Want to help?
Hop on IRC ([#rubygems on irc.freenode.net](irc://irc.freenode.net/rubygems))
and the [Gemcutter mailing list](http://groups.google.com/group/gemcutter) as
well.
