---
title: How to Apply What You've Learned from TDD to Writing Data Migrations
teaser:
tags: web,ruby,testing
author: Joe Ferris
published_on: 2012-02-13
---

After doing <abbr title="Test Driven Development">TDD</abbr> full time for
years, I have a hard time writing code without a test. One example that I find
particularly difficult is writing data migrations.

Some schema changes require more than just setting a default value for all
existing rows. For example, let's say you have this schema:

```ruby
create_table :users do |table|
  table.string :email
  table.string :encrypted_password
end

create_table :posts do |table|
  table.integer :user_id
  table.boolean :published
  table.string :message
end
```

If you want to find the top ten users based on the number of published posts,
you can do a `JOIN` with a `COUNT` and a `GROUP BY` clause, but over time that
could get slow or just annoying, so you decide to add a cache column:

```ruby
add_column :users, :published_posts_count, :integer, :default => 0, :null => false
```

You add code to populate the column and all the tests pass, but of course
there's a big problem: every existing user in production will report zero
published posts. That means it's time to break out a data migration. Running
migrations over and over with different data or comparing lots of queries before
and after migrating production data is tedious and error-prone, so let's write a
throw-away test for this migration. You can save this as spec/migration_spec.rb:

```ruby
require 'spec_helper'
require Dir.glob(Rails.root.join("db", "migrate", "*_set_published_posts_for_existing_users.rb")).first

describe SetPublishedPostsForExistingUsers do
  it "counts only published posts" do
    user = FactoryBot.create(:user)
    FactoryBot.create_list(:post, 3, :user => user, :published => true)
    reset_cache_and_run_migration
    user.reload.published_posts_count.should == 3
  end

  def reset_cache_and_run_migration
    User.update_all("published_posts = 0")
    SetPublishedPostsForExistingUsers.new.up
  end
end
```

Add an empty migration, and the test fails because the user is reporting no
published posts. We can get this test passing with a simple migration:

```ruby
class SetActivatedFlagForExistingUsers < ActiveRecord::Migration
  def up
    connection.update(<<-SQL)
      UPDATE users
      SET published_posts_count = (
        SELECT COUNT(*) FROM posts
      )
    SQL
  end

  def down
    # No problem
  end
end
```

Next up, we need to make sure it's only counting published posts:

```ruby
it "counts only published posts" do
  user = FactoryBot.create(:user)
  FactoryBot.create_list(:post, 3, :user => user, :published => true)
  FactoryBot.create(:post, :user => user, :published => false)
  reset_cache_and_run_migration
  user.reload.published_posts_count.should == 3
end
```

That will fail because the migration counts the published posts, ending up with
a total of four. We can fix that easily:

```ruby
def up
  connection.update(<<-SQL)
    UPDATE users
    SET published_posts_count = (
      SELECT COUNT(*) FROM posts
      WHERE posts.published = true
    )
  SQL
end
```

Next up, we need to make sure each user only counts their own posts, so we add a
post for a different user:

```ruby
it "counts only published posts" do
  user = FactoryBot.create(:user)
  FactoryBot.create_list(:post, 3, :user => user, :published => true)
  FactoryBot.create(:post, :user => user, :published => false)
  other_user = FactoryBot.create(:user)
  FactoryBot.create(:post, :user => other_user, :published => true)
  reset_cache_and_run_migration
  user.reload.published_posts_count.should == 3
end
```

The test fails again with a count of four, since it picked up the other user's
post. Getting this test to pass leads to our final migration:

```ruby
def up
  connection.update(<<-SQL)
    UPDATE users
    SET published_posts_count = (
      SELECT COUNT(*) FROM posts
      WHERE posts.published = true
      AND posts.user_id = users.id
    )
  SQL
end
```

At this point, I just delete the spec. Since [migrations should never be edited
after they run](http://guides.rubyonrails.org/migrations.html#changing-migrations),
there's little reason to test for regressions. Inevitably the schema will change,
which will mean the spec no longer applies.

The spec provides no value after the migration is committed, but I still find
writing specs like these well worth the time. It's easier for me to think like
I'm used to, by writing tests first, and it makes me confident that the
migration actually covers the cases it's supposed to.
