There are two, low barrier to entry ways to get some quick metrics about your
application’s test code and the coverage it provides. There are others, but
today we’re going to focus on the two that are easiest to run and on what they
mean: rake stats
and rcov
.
The first tool available to us comes built into Rails, and that’s rake stats
.
rake stats
If you haven’t used it before, rake stats
, when run, outputs a quick summary
of the lines of code, lines of test code, number of classes, number of methods,
the ratio of methods to classes, and the ratio of lines of code per method.
Lets take a look at the output from the application Joe, Mike, Micah, and myself built for the Rails Rumble, Where’s the Milk At?.
+----------------------+-------+-------+---------+---------+-----+-------+
| Name | Lines | LOC | Classes | Methods | M/C | LOC/M |
+----------------------+-------+-------+---------+---------+-----+-------+
| Controllers | 176 | 149 | 10 | 18 | 1 | 6 |
| Helpers | 38 | 35 | 0 | 4 | 0 | 6 |
| Models | 183 | 147 | 5 | 20 | 4 | 5 |
| Libraries | 0 | 0 | 0 | 0 | 0 | 0 |
| Integration tests | 0 | 0 | 0 | 0 | 0 | 0 |
| Functional tests | 855 | 686 | 9 | 3 | 0 | 226 |
| Unit tests | 684 | 568 | 7 | 0 | 0 | 0 |
+----------------------+-------+-------+---------+---------+-----+-------+
| Total | 1936 | 1585 | 31 | 45 | 1 | 33 |
+----------------------+-------+-------+---------+---------+-----+-------+
Code LOC: 331 Test LOC: 1254 Code to Test Ratio: 1:3.8
When looking at the output from rake stats
, there are a few important bits of
information that you should look at first, and that are all in the final summary
line, in this case:
- Lines of Code (Excluding test code): 331
- Lines of Test Code: 1254
- Code to Test Ratio: 1:3.8
A Code to Test Ratio of 1 to 3.8 is somewhat ridiculous. Its incredibly high,
and when you see something like this, its important to ask why? That’s pretty
much the entire usefulness of the output of rake stats
as a metric. Here are
some guidelines I’ve devised, based on the experience of looking at a bunch of
applications I consider well tested and poorly tested.
- Anything less than 1:1 the code probably lacks sufficient tests
- Anything more than 1:2 is suspect to questioning, but upon investigation could be found to be perfectly reasonable.
There are a few other nice things in the output from rake stats
that are
helpful for a birds eye view of the application. For example, you can tell that
we didn’t write integration tests, and our application has 5 models and 10
controllers.
Lets investigate why the 1:3.8 ratio we have in Where’s the Milk At. Going in, and before doing any actual investigation, I have some initial hunches as to why the application has the ratio it does. Those are
- Given a rapid development schedule of 48 hours, we didn’t have any opportunity to refactor tests
- Our Shoulda macros are being counted as LOTC
- We have several complex named scopes that count as 1 to 3 lines of code, but have many more lines of test code
We didn’t have any opportunity to refactor tests
We were on a 48 hour clock.
Refactoring tests, like refactoring code, is an essential part of real TDD. Without taking this step, it’d only be natural that our tests would be repetitive, and the lines of test code would be increased. It’s difficult to present a brief example, but here are some typical things that you’ll want to look for in your tests that would be candidates for refactoring
- Duplicated setup code that can be moved into a common context
- Multiple contexts that do the same thing
- Unnecessary tests
- Duplicated test code that can me moved into a macro
Upon inspection of the Where’s the Milk At test code, I actually found very few,
if any, instances of any of the above. In fact, I found that we used extensive
use of the macros Shoulda provides, we wrote our application specific macros,
such as should_have_map
and should_display
, and we used good practice of
shared contexts.
So, I put this aside as a possible cause, but now that I’ve started to review the test code, I’ve started to develop some new ideas about our code to test ratio that I’ll come back to later on.
Our Shoulda macros are being counted as LOTC
We used several helpful shoulda test macros to speed up development. My initial suspicion was that these macros were being counted as lines of test code. After investigating, I was able to determine that rake stats only looks in test/unit, test/functional, and test/integration, so this isn’t the case. I putthis aside for now, and pocket the info about how rake stats works internally for possible future use some time down the road.
We have several complex named scopes
The last of my initial assumptions about our ratio (the astute reader will notice I’m 0 for 2 now) is that we have several complex named scopes that are only 1 to 3 lines of code, but have many more lines of test code. Upon inspection, this is the case. Lets take a look at an example.
We have a named scope which returns all of the Purchases that were made in a specific set of stores. Here’s what it looks like:
named_scope :in_stores, lambda {|stores|
{ :conditions => ['purchases.store_id IN(?)', stores] }
}
And here is the accompanying test (this test was pure TDD, the tests were written a little bit at a time before the named scope was actually written).
context "looking for purchases in stores" do
setup do
@stores = [Factory(:store), Factory(:store)]
@in_store_purchases = []
@stores.each do |store|
2.times do
@in_store_purchases << Factory(:purchase, :store => store)
end
end
Factory(:purchase) # purchase at another store
@result = Purchase.in_stores(@stores)
end
should "not return any purchases for other stores" do
assert_all @result do |purchase|
@stores.include?(purchase.store)
end
end
should "return every purchase for the specified stores" do
assert_all @in_store_purchases do |purchase|
@result.include?(purchase)
end
end
end
You can see that for our 3 line namedscope, we have 23 lines of test code. That’s a ratio of 1:8, and this is an example of one of the simpler named scopes in the the application (assertall is an assertion we wrote).
Additionally, we could make this ratio slightly worse (or better, depending on how you’re looking at it) by putting the named scope all on one line, instead of 3.
There are quite a few of these finders and accompanying tests, and I feel confident after investigating that this is one of the reasons for the ratio.
Other causes
In reviewing the test code, I started to notice a few other things the contribute to the ratio.
Take the following test, for example:
logged_in_user_context do
context "with at least one purchase" do
setup do
@purchases = paginate([Factory(:purchase)])
@store = @purchases.last.store
@user. stubs(:purchases).returns(@purchases)
@purchases.stubs(:latest). returns(@purchases)
@purchases.stubs(:paginate). returns(@purchases)
end
context "on GET to index" do
setup do
get :index
end
before_should "find the user's purchases" do
@user.expects(:purchases).with().returns(@purchases)
end
before_should "find the latest purchases" do
@purchases.expects(:latest).with().returns(@purchases)
end
before_should "paginate the purchases" do
@purchases.expects(:paginate).returns(@purchases)
end
When you use stubbing for tests, its best practice to write the stubs and then
write expectations for what you’ve stubbed. We’re doing this in the above code
by putting the stubs in the setup (3 lines of test code) and then using
shoulda’s before_should
to declare the expectations (9 lines of test code).
That’s 12 lines of test code for what is ultimately 1 line of code.
Now, there isn’t anything necessarily wrong with this, again, we’re only investigating causes of the ratio here. But its something to note and perhaps consider for either test refactoring or to somehow incorporate in your test framework.
Finally, I also noticed a lots of tests like this:
should "crown the best store" do
assert_select 'a', "#{assigns(:stores)[0].name}" do
assert_select 'span[class=crown]'
end
end
should "rerender the purchase form" do
assert_select_rjs :replace, 'new_purchase' do
assert_select '#purchase_store_id[value=?]', @store.id
assert_match @focus_quantity, @response.body
end
end
should "remove the purchase from the list" do
assert_match /new Effect.Fade\("#{dom_id(@purchase)}"/,
@response.body
end
In short, we’re testing the views, markup, JavaScript (some of it), and RJS - as
we should be. And we’re doing it quite extensively, there are 45 calls to
assert_select
and assert_select_rjs
in the functional tests. rake stats
doesn’t count the lines in the views. If you consider that most of the calls to
assert_select
and its ilk will be surrounded by a should
and an end
,
that’s 3 lines of test code, that aren’t showing up at all as lines of code at
all in our rake stats
.
If we modify the rake stats
task to include the views (which we can’t
seriously do without taking other things into account, like JavaScript, but bare
with me here), here is the new output of rake stats:
+----------------------+-------+-------+---------+---------+-----+-------+
| Name | Lines | LOC | Classes | Methods | M/C | LOC/M |
+----------------------+-------+-------+---------+---------+-----+-------+
| Controllers | 176 | 149 | 10 | 18 | 1 | 6 |
| Helpers | 38 | 35 | 0 | 4 | 0 | 6 |
| Models | 183 | 147 | 5 | 20 | 4 | 5 |
| Views | 605 | 545 | 0 | 0 | 0 | 0 |
| Libraries | 0 | 0 | 0 | 0 | 0 | 0 |
| Integration tests | 0 | 0 | 0 | 0 | 0 | 0 |
| Functional tests | 852 | 683 | 9 | 3 | 0 | 225 |
| Unit tests | 684 | 568 | 7 | 0 | 0 | 0 |
+----------------------+-------+-------+---------+---------+-----+-------+
| Total | 2538 | 2127 | 31 | 45 | 1 | 45 |
+----------------------+-------+-------+---------+---------+-----+-------+
Code LOC: 876 Test LOC: 1251 Code to Test Ratio: 1:1.4
I’ve spent a lot of time talking about rake stats
, but here’s the rub. It’s
worthless to tell you the real important metric, how good your test code is.
Or, said differently, how much coverage your tests provide for your actual code.
You really only want to use rake stats
for a high level assessment of your
code and as one tool in the arsenal you’ll use for investigation in how to
improve your tests.
The guidelines I outlined above are the extent of how you should use rake stats for judging your test code. And as I’ve illustrated here, your assumptions about your test code, and even my guidelines may be wrong or flexible.
In fact, based on what I’ve uncovered about the view LOC and the stub/expectations, I may begin to reevaluate my 1:2 guideline.
The second tool you can get up and running with easily, and one that is even
more valuable than rake stats
is rcov
rcov
rcov executes your tests and does the best job it can telling which lines of code were executed by your tests. The theory being, that if the line of code is executed, then there was a test for it. Rcov provides C0 coverage, so it cannot tell if two parts of a conditional were both hit, the line being executed means that that line had coverage.
You should get the latest rcov from github, it crashes less. In order to easily run rcov on your rails app, you can use this rake task, which is included in our plugin that provides standard tasks, limerick_rake, which is in turn included in our Rails application template, Suspenders.
Running rcov on Where’s the Milk At? provides the following information:
+----------------------------------------------------+-------+-------+--------+ | File | Lines | LOC | COV | +----------------------------------------------------+-------+-------+--------+ |app/controllers/application.rb | 14 | 11 | 100.0% | |app/controllers/confirmations_controller.rb | 3 | 3 | 100.0% | |app/controllers/items_controller.rb | 15 | 11 | 100.0% | |app/controllers/openid_controller.rb | 27 | 25 | 100.0% | |app/controllers/passwords_controller.rb | 3 | 3 | 100.0% | |app/controllers/purchases_controller.rb | 48 | 40 | 100.0% | |app/controllers/sessions_controller.rb | 7 | 6 | 100.0% | |app/controllers/stores_controller.rb | 21 | 18 | 100.0% | |app/controllers/users_controller.rb | 28 | 23 | 100.0% | |app/helpers/application_helper.rb | 38 | 35 | 100.0% | |app/models/item.rb | 22 | 17 | 100.0% | |app/models/purchase.rb | 55 | 43 | 100.0% | |app/models/quantity.rb | 28 | 27 | 100.0% | |app/models/store.rb | 10 | 7 | 100.0% | |app/models/user.rb | 63 | 49 | 100.0% | |app/models/user_mailer.rb | 5 | 4 | 100.0% | +----------------------------------------------------+-------+-------+--------+ |Total | 387 | 322 | 100.0% | +----------------------------------------------------+-------+-------+--------+ 100.0% 16 file(s) 387 Lines 322 LOC
This shows us that, according to rcov, 100% of the lines of code in our application were executed when our tests were run. This is great, but as with most things, isn’t the whole story and should be taken with a grain of salt. Here are some guidelines/principals you should take into consideration for rcov.
- Like we discovered with our
rake stats
, rcov doesn’t check coverage on the views (this includes JavaScript!), so its very possible to have 100% coverage and still have functionality that is uncovered. - Since rcov only provides C0 coverage reports, 100% doesn’t mean that you don’t have bugs or that you’re even perfectly tested.
- If you’re doing real, actual, TATFT TDD, then reaching 100% coverage (as reported by rcov) should be a reachable goal; in fact, if you have less than 80% and you think you’ve been doing TDD, something is not right and you should investigate.
The most important lesson we can take away from rcov is that its not perfect, but it provides a good benchmark. When its not reporting 100%, you can click through and see exactly which lines of code were not executed by your tests. So, in short, its great at identifying deficiencies in your test suite, but should not be taken as a false safety net, thinking that with 90-100% coverage you’re all good because there can be big holes in your coverage and you’d still be reporting 100%.
What All This Means
Hopefully you’ve gotten a good idea of what to look for and how to use these two simple tools to investigate the quality of your tests. The benchmarks and guidelines I’ve presented here are based on my experience developing over 30 rails applications and reviewing the different stats and coverage reports I’ve seen from them, but that doesn’t mean they are inflexible or infallible.
Also, these metrics, the tools, and other ones that exist out there are meant to assist, but not replace your role as a developer. To correctly understand the problem domain and have confidence in the code itself and the test suite, and to realize the obvious fact that these tools do not analyze the logical correctness of anything you’ve done.
Here are the guidelines again, in summary.
- Anything less than 1:1 code to test ratio from
rake stats
probably lacks sufficient tests. - Anything more than 1:2 is suspect to questioning, but upon investigation could be found to be perfectly reasonable.
- Both rcov and rake stats don’t check the views (this includes JavaScript!) so its very possible to have 100% coverage and still have functionality that is uncovered or to have a a very high code to test ratio.
- Since rcov only provides C0 coverage reports, 100% doesn’t mean that you don’t have bugs or that you’re even perfectly tested.
- If you’re doing real, actual, TATFT TDD, then reaching 100% coverage (as reported by rcov) should be a reachable goal; in fact, if you have less than 80% and you think you’ve been doing TDD, something is not right and you should investigate.