The Self-Contained Test

Edward Loveall

A good test tells a story. It can serve not only as verification that software is behaving predictably, but also as documentation for other developers.

I often see this kind of test:

before do
  model_a = create(:model_a, status: :draft)
  model_b = create(:model_b, created_at: Date.today)
  model_a.run_operation(with: model_b)
end

# Many lines of other tests here

describe("ModelA") do
  before do
    double(:external_system_1, ...)
    double(:external_system_2, ...)
  end

  it("has run the operation") do
    expect(model_a.valid).to be_true
  end

  it("associates to a ModelB") do
    expect(model_a.relations).to contain(ModelB)
  end
}

A few things are tested on an object and each test needs the same or similar setup. One of the earliest pieces of wisdom we are given as programmers is to not write duplicate code: Don’t Repeat Yourself (or DRY if you prefer). Identical blocks of code to set up a test sure does look like repetition, so we extract it into a before block.

This is a mistake for tests.

Optimize For Understanding, Not Code Brevity

It feels like an optimization. Those extra setup lines are making each ModelA test 75% shorter! But the tradeoff is that to understand what is required of ModelA, you have to go looking all over the file. This makes it difficult to remember all the necessary setup to make a valid model.

This is like taking a novel and grouping all the world building and character development up front in a large chapter. Then a series of one sentence chapters where each character has a payoff moment, or the story comes to a dramatic climax.

Reusing similar setup over and over again feels like it’s against all the rules. As counterintuitive as that may be, the best way to make a test readable is putting all the context right next to the expectation, every single time. If that means repeating tons of code, that’s fine. After all, the time it takes to understand code is expensive, and adding lines of code to a text file is very cheap.

Here’s a real test from an internal repository at thoughtbot:

describe Person, ".with_downtime_next_week" do
  it "returns people with downtime" do
    inactive_person = create(:person, name: "exclude inactive person")
    unbillable_person = create(:person, name: "exclude unbillable person")
    active_billable_unbooked_person = create(
      :person,
      name: "include person with downtime",
    )
    vacation_person = create(:person, name: "exclude vacation person")
    create(
      :position,
      person: inactive_person,
      ends_on: 2.months.ago,
    )
    create(
      :position,
      person: unbillable_person,
      department: create(:department, billable: false),
    )
    create(
      :position,
      person: active_billable_unbooked_person,
      department: create(:department, billable: true),
      starts_on: 1.month.ago,
    )
    create(
      :position,
      person: vacation_person,
      department: create(:department, billable: true),
      starts_on: 1.month.ago,
    )
    create(
      :reservation,
      person: vacation_person,
      starts_on: Date.current.next_week,
      ends_on: 3.months.from_now,
    )

    people = Person.with_downtime_next_week.pluck(:name)

    expect(people).to contain_exactly("include person with downtime")
  end
end

The entire test file contains 88 more tests, many of which also create people and link them to other objects. There’s plenty of opportunities to extract common functionalty here, but that risks making it more difficult to understand.

A good test tells a story. Because this test is self-contained, the story here is easy to read:

  • If there is an inactive person,
  • an unbillable person,
  • an active and billable person who is unbooked
  • and a person on a vacation.
  • The method with_downtime_next_week will only return the active and billable person who is unbooked.

This will not win us a Pulitzer, but it is invaluable to someone trying to understand the particulars of the with_downtime_next_week method. It also shows how to use other models (positions and reservations) to set up different types of people. This extra setup might seem distracting, but it could also be the breakthrough someone needs to figure out how fix their bug or write their feature.

When Extracting Setup Is Okay

There are some times when extracting setup is not only okay, but preferred. Your test might need a database to run. Or perhaps some factories to generate that data. Define these outside of your test.

The difference here is that they are meta concerns and not part of the business logic that you’re testing. Anything that is related to the setup of your environment or a workaround needed to run tests in a separate environment should live outside the tests.

Striking A Balance

Sometimes the setup needed is genuinely too many lines of code. In these cases, a middle ground would be to extract the smallest possible pieces into functions and call those from your test. An example:

describe ApiRequest do
  it "calls the api with arguments" do
    api_request = mock_web_request(
      url: "https://...",
      verb: :get,
      params: {...}

      # lots of lines
      # specifying params
      # and return values
    )

    expect(api_request).to have_received(...)
  end
end

This isn’t really a complex piece of setup, but it is long. Sometimes the required setup is not able to be as descriptive as we’d like. Consider this refactor:

describe ApiRequest do
  it "calls the api with arguments" do
    params = {...}
    api_request = user_data_api_call(params)

    expect(api_request).to have_received(...)
  end

  def user_data_api_call(params)
    mock_web_request(
      url: "https://...",
      verb: :get,
      params: params

      # lots of lines
      # specifying params
      # and return values
    )
  end
end

This user_data_api_call can now keep your test more readable, without hiding the fact that there is some setup. It’s clear what that setup is and where to find it, as opposed to hiding it in a before block. These extracted methods can still make a test convoluted and hard to understand, so use sparingly. But done well, these small, focused, extracted functions can improve the readability of the test. In this example they can even give the mocked web request a descriptive name!

One thing to consider is that if the setup is genuinely overwhelming, it might be a sign that it’s overwhelming for your users too. It could also be a sign that your code is highly coupled and poorly factored. It’s worth keeping an eye on this as your system grows, but that’s a topic for another time.

Next time you find yourself writing the same setup over and over, think about it like you’re writing a story, a piece of prose for a future developer. That future developer might even be you, and they will thank you for it.

Further Reading