---
title: Simple Test Metrics in Your Rails App, and What They Mean
teaser:
tags: web,rails,testing
author: Chad Pytel
published_on: 2008-10-22
---

There are two, low barrier to entry ways to get some quick metrics about your
application's test code and the coverage it provides. There are others, but
today we're going to focus on the two that are easiest to run and on what they
mean: `rake stats` and `rcov`.

The first tool available to us comes built into Rails, and that's `rake stats`.

## rake stats

If you haven't used it before, `rake stats`, when run, outputs a quick summary
of the lines of code, lines of test code, number of classes, number of methods,
the ratio of methods to classes, and the ratio of lines of code per method.

Lets take a look at the output from the application Joe, Mike, Micah, and myself
built for the Rails Rumble, Where's the Milk At?.

    +----------------------+-------+-------+---------+---------+-----+-------+
    | Name                 | Lines |   LOC | Classes | Methods | M/C | LOC/M |
    +----------------------+-------+-------+---------+---------+-----+-------+
    | Controllers          |   176 |   149 |      10 |      18 |   1 |     6 |
    | Helpers              |    38 |    35 |       0 |       4 |   0 |     6 |
    | Models               |   183 |   147 |       5 |      20 |   4 |     5 |
    | Libraries            |     0 |     0 |       0 |       0 |   0 |     0 |
    | Integration tests    |     0 |     0 |       0 |       0 |   0 |     0 |
    | Functional tests     |   855 |   686 |       9 |       3 |   0 |   226 |
    | Unit tests           |   684 |   568 |       7 |       0 |   0 |     0 |
    +----------------------+-------+-------+---------+---------+-----+-------+
    | Total                |  1936 |  1585 |      31 |      45 |   1 |    33 |
    +----------------------+-------+-------+---------+---------+-----+-------+
      Code LOC: 331     Test LOC: 1254     Code to Test Ratio: 1:3.8

When looking at the output from `rake stats`, there are a few important bits of
information that you should look at first, and that are all in the final summary
line, in this case:

* Lines of Code (Excluding test code): 331
* Lines of Test Code: 1254
* Code to Test Ratio: 1:3.8

A Code to Test Ratio of 1 to 3.8 is somewhat ridiculous.  Its incredibly high,
and when you see something like this, its important to ask _why?_  That's pretty
much the entire usefulness of the output of `rake stats` as a metric.  Here are
some guidelines I've devised, based on the experience of looking at a bunch of
applications I consider well tested and poorly tested.

* Anything less than 1:1 the code probably lacks sufficient tests
* Anything more than 1:2 is suspect to questioning, _but upon investigation
  could be found to be perfectly reasonable_.

There are a few other nice things in the output from `rake stats` that are
helpful for a birds eye view of the application.  For example, you can tell that
we didn't write integration tests, and our application has 5 models and 10
controllers.

Lets investigate why the 1:3.8 ratio we have in Where's the Milk At.  Going in,
and before doing any actual investigation, I have some initial hunches as to why
the application has the ratio it does.  Those are

* Given a rapid development schedule of 48 hours, we didn't have any opportunity
  to refactor tests
* Our [Shoulda macros](https://github.com/thoughtbot/shoulda) are being counted
  as LOTC
* We have several complex named scopes that count as 1 to 3 lines of code, but
  have many more lines of test code

### We didn't have any opportunity to refactor tests

We were on a 48 hour clock.

Refactoring tests, like refactoring code, is an essential part of real <abbr
title="Test Driven Development">TDD</abbr>.  Without taking this step, it'd only
be natural that our tests would be repetitive, and the lines of test code would
be increased.  It's difficult to present a brief example, but here are some
typical things that you'll want to look for in your tests that would be
candidates for refactoring

* Duplicated setup code that can be moved into a common context
* Multiple contexts that do the same thing
* Unnecessary tests
* Duplicated test code that can me moved into a macro

Upon inspection of the Where's the Milk At test code, I actually found very few,
if any, instances of any of the above.  In fact, I found that we used extensive
use of the macros Shoulda provides, we wrote our application specific macros,
such as `should_have_map` and `should_display`, and we used good practice of
shared contexts.

So, I put this aside as a possible cause, but now that I've started to review
the test code, I've started to develop some new ideas about our code to test
ratio that I'll come back to later on.

### Our Shoulda macros are being counted as LOTC

We used several helpful shoulda test macros to speed up development.  My initial
suspicion was that these macros were being counted as lines of test code.  After
investigating, I was able to determine that rake stats _only_ looks in
test/unit, test/functional, and test/integration, so this isn't the case.  I
putthis aside for now, and pocket the info about how rake stats works internally
for possible future use some time down the road.

### We have several complex named scopes

The last of my initial assumptions about our ratio (the astute reader will
notice I'm 0 for 2 now) is that we have several complex named scopes that are
only 1 to 3 lines of code, but have many more lines of test code.  Upon
inspection, this is the case. Lets take a look at an example.

We have a named scope which returns all of the Purchases that were made in a
specific set of stores.  Here's what it looks like:

    named_scope :in_stores, lambda {|stores|
      { :conditions => ['purchases.store_id IN(?)', stores] }
    }

And here is the accompanying test (this test was pure <abbr title="Test Driven
Development">TDD</abbr>, the tests were written a little bit at a time before
the named scope was actually written).

    context "looking for purchases in stores" do
      setup do
        @stores = [Factory(:store), Factory(:store)]

        @in_store_purchases = []
        @stores.each do |store|
          2.times do
            @in_store_purchases << Factory(:purchase, :store => store)
          end
        end

        Factory(:purchase) # purchase at another store

        @result = Purchase.in_stores(@stores)
      end

      should "not return any purchases for other stores" do
        assert_all @result do |purchase|
          @stores.include?(purchase.store)
        end
      end

      should "return every purchase for the specified stores" do
        assert_all @in_store_purchases do |purchase|
          @result.include?(purchase)
        end
      end
    end

You can see that for our 3 line named_scope, we have 23 lines of test code.
That's a ratio of 1:8, and this is an example of one of the simpler named scopes
in the the application (assert_all is an assertion we wrote).

Additionally, we could make this ratio slightly worse (or better, depending on
how you're looking at it) by putting the named scope all on one line, instead of
3.

There are quite a few of these finders and accompanying tests, and I feel
confident after investigating that this is one of the reasons for the ratio.

### Other causes

In reviewing the test code, I started to notice a few other things the
contribute to the ratio.

Take the following test, for example:

    logged_in_user_context do
      context "with at least one purchase" do
        setup do
          @purchases = paginate([Factory(:purchase)])
          @store     = @purchases.last.store

          @user.     stubs(:purchases).returns(@purchases)
          @purchases.stubs(:latest).   returns(@purchases)
          @purchases.stubs(:paginate). returns(@purchases)
        end

        context "on GET to index" do
          setup do
            get :index
          end

          before_should "find the user's purchases" do
            @user.expects(:purchases).with().returns(@purchases)
          end

          before_should "find the latest purchases" do
            @purchases.expects(:latest).with().returns(@purchases)
          end

          before_should "paginate the purchases" do
            @purchases.expects(:paginate).returns(@purchases)
          end

When you use stubbing for tests, its best practice to write the stubs and then
write expectations for what you've stubbed.  We're doing this in the above code
by putting the stubs in the setup (3 lines of test code) and then using
shoulda's `before_should` to declare the expectations (9 lines of test code).
That's 12 lines of test code for what is ultimately 1 line of code.

Now, there isn't anything necessarily wrong with this, again, we're only
investigating causes of the ratio here.  But its something to note and perhaps
consider for either test refactoring or to somehow incorporate in your test
framework.

Finally, I also noticed a lots of tests like this:

    should "crown the best store" do
      assert_select 'a', "#{assigns(:stores)[0].name}" do
        assert_select 'span[class=crown]'
      end
    end

    should "rerender the purchase form" do
      assert_select_rjs :replace, 'new_purchase' do
        assert_select '#purchase_store_id[value=?]', @store.id
        assert_match @focus_quantity, @response.body
      end
    end

    should "remove the purchase from the list" do
      assert_match /new Effect.Fade\("#{dom_id(@purchase)}"/,
                   @response.body
    end

In short, we're testing the views, markup, JavaScript (some of it), and RJS - as
we should be.  And we're doing it quite extensively, there are 45 calls to
`assert_select` and `assert_select_rjs` in the functional tests.  `rake stats`
doesn't count the lines in the views. If you consider that most of the calls to
`assert_select` and its ilk will be surrounded by a `should` and an `end`,
that's 3 lines of test code, that aren't showing up at all as lines of code at
all in our `rake stats`.

If we modify the `rake stats` task to include the views (which we can't
seriously do without taking other things into account, like JavaScript, but bare
with me here), here is the new output of rake stats:

    +----------------------+-------+-------+---------+---------+-----+-------+
    | Name                 | Lines |   LOC | Classes | Methods | M/C | LOC/M |
    +----------------------+-------+-------+---------+---------+-----+-------+
    | Controllers          |   176 |   149 |      10 |      18 |   1 |     6 |
    | Helpers              |    38 |    35 |       0 |       4 |   0 |     6 |
    | Models               |   183 |   147 |       5 |      20 |   4 |     5 |
    | Views                |   605 |   545 |       0 |       0 |   0 |     0 |
    | Libraries            |     0 |     0 |       0 |       0 |   0 |     0 |
    | Integration tests    |     0 |     0 |       0 |       0 |   0 |     0 |
    | Functional tests     |   852 |   683 |       9 |       3 |   0 |   225 |
    | Unit tests           |   684 |   568 |       7 |       0 |   0 |     0 |
    +----------------------+-------+-------+---------+---------+-----+-------+
    | Total                |  2538 |  2127 |      31 |      45 |   1 |    45 |
    +----------------------+-------+-------+---------+---------+-----+-------+
      Code LOC: 876     Test LOC: 1251     Code to Test Ratio: 1:1.4

I've spent a lot of time talking about `rake stats`, but here's the rub. **It's
worthless to tell you the real important metric, how _good_ your test code is**.
Or, said differently, how much coverage your tests provide for your actual code.
You _really_ only want to use `rake stats` for a high level assessment of your
code and as one tool in the arsenal you'll use for investigation in how to
improve your tests.

The guidelines I outlined above are the extent of how you should use rake stats
for _judging_ your test code.  And as I've illustrated here, your assumptions
about your test code, and even my guidelines may be wrong or flexible.

In fact, based on what I've uncovered about the view <abbr title="Lines Of
Code">LOC</abbr> and the stub/expectations, I may begin to reevaluate my 1:2
guideline.

The second tool you can get up and running with easily, and one that is even
more valuable than `rake stats` is `rcov`

## rcov

rcov executes your tests and does the best job it can telling which lines of
code were executed by your tests.  The theory being, that if the line of code is
executed, then there was a test for it.  Rcov provides C0 coverage, so it cannot
tell if two parts of a conditional were both hit, the line being executed means
that that line had coverage.

You should get the [latest rcov from
github](http://github.com/spicycode/rcov/tree/master), it crashes less.  In
order to easily run rcov on your rails app, you can use [this rake
task](http://github.com/thoughtbot/limerick_rake/tree/master/tasks/coverage.rake),
which is included in our plugin that provides standard tasks,
[limerick\_rake](http://github.com/thoughtbot/limerick_rake), which is in turn
included in our Rails application template,
[Suspenders](https://github.com/thoughtbot/suspenders).

Running rcov on Where's the Milk At? provides the following information:

<pre>
+----------------------------------------------------+-------+-------+--------+
|                  File                              | Lines |  LOC  |  COV   |
+----------------------------------------------------+-------+-------+--------+
|app/controllers/application.rb                      |    14 |    11 | 100.0% |
|app/controllers/confirmations_controller.rb         |     3 |     3 | 100.0% |
|app/controllers/items_controller.rb                 |    15 |    11 | 100.0% |
|app/controllers/openid_controller.rb                |    27 |    25 | 100.0% |
|app/controllers/passwords_controller.rb             |     3 |     3 | 100.0% |
|app/controllers/purchases_controller.rb             |    48 |    40 | 100.0% |
|app/controllers/sessions_controller.rb              |     7 |     6 | 100.0% |
|app/controllers/stores_controller.rb                |    21 |    18 | 100.0% |
|app/controllers/users_controller.rb                 |    28 |    23 | 100.0% |
|app/helpers/application_helper.rb                   |    38 |    35 | 100.0% |
|app/models/item.rb                                  |    22 |    17 | 100.0% |
|app/models/purchase.rb                              |    55 |    43 | 100.0% |
|app/models/quantity.rb                              |    28 |    27 | 100.0% |
|app/models/store.rb                                 |    10 |     7 | 100.0% |
|app/models/user.rb                                  |    63 |    49 | 100.0% |
|app/models/user_mailer.rb                           |     5 |     4 | 100.0% |
+----------------------------------------------------+-------+-------+--------+
|Total                                               |   387 |   322 | 100.0% |
+----------------------------------------------------+-------+-------+--------+
100.0%   16 file(s)   387 Lines   322 LOC
</pre>

This shows us that, according to rcov, 100% of the lines of code in our
application were executed when our tests were run.  This is great, but as with
most things, isn't the whole story and should be taken with a grain of salt.
Here are some guidelines/principals you should take into consideration for rcov.

* Like we discovered with our `rake stats`, rcov doesn't check coverage on the
  views (this includes JavaScript!), so its very possible to have 100% coverage
  and still have functionality that is uncovered.
* Since rcov only provides C0 coverage reports, 100% doesn't mean that you don't
  have bugs or that you're even perfectly tested.
* If you're doing real, actual,
  [TATFT](http://smartic.us/2008/8/15/tatft-i-feel-a-revolution-coming-on) TDD,
  then reaching 100% coverage (as reported by rcov) should be a reachable goal;
  in fact, if you have less than 80% and you think you've been doing <abbr
  title="Test Driven Development">TDD</abbr>, something is not right and you
  should investigate.

The most important lesson we can take away from rcov is that its not perfect,
but it provides a good benchmark.  When its not reporting 100%, you can click
through and see exactly which lines of code were not executed by your tests.
So, in short, its great at identifying deficiencies in your test suite, but
_should not_ be taken as a false safety net, thinking that with 90-100% coverage
you're all good because there can be big holes in your coverage and you'd still
be reporting 100%.

### What All This Means

Hopefully you've gotten a good idea of what to look for and how to use these two
simple tools to investigate the quality of your tests.  The benchmarks and
guidelines I've presented here are based on my experience developing over 30
rails applications and reviewing the different stats and coverage reports I've
seen from them, but that doesn't mean they are inflexible or infallible.

Also, these metrics, the tools, and other ones that exist out there are meant to
assist, but not replace your role as a developer.  To correctly understand the
problem domain and have confidence in the code itself and the test suite, and to
realize the obvious fact that these tools do not analyze the logical correctness
of anything you've done.

Here are the guidelines again, in summary.

* Anything less than 1:1 code to test ratio from `rake stats` probably lacks
  sufficient tests.
* Anything more than 1:2 is suspect to questioning, but upon investigation could
  be found to be perfectly reasonable.
* Both rcov and rake stats don't check the views (this includes JavaScript!) so
  its very possible to have 100% coverage and still have functionality that is
  uncovered or to have a a very high code to test ratio.
* Since rcov only provides C0 coverage reports, 100% doesn't mean that you don't
  have bugs or that you're even perfectly tested.
* If you're doing real, actual,
  [TATFT](http://smartic.us/2008/8/15/tatft-i-feel-a-revolution-coming-on) TDD,
  then reaching 100% coverage (as reported by rcov) should be a reachable goal;
  in fact, if you have less than 80% and you think you've been doing <abbr
  title="Test Driven Development">TDD</abbr>, something is not right and you
  should investigate.