After doing TDD full time for years, I have a hard time writing code without a test. One example that I find particularly difficult is writing data migrations.
Some schema changes require more than just setting a default value for all existing rows. For example, let’s say you have this schema:
create_table :users do |table|
table.string :email
table.string :encrypted_password
end
create_table :posts do |table|
table.integer :user_id
table.boolean :published
table.string :message
end
If you want to find the top ten users based on the number of published posts,
you can do a JOIN
with a COUNT
and a GROUP BY
clause, but over time that
could get slow or just annoying, so you decide to add a cache column:
add_column :users, :published_posts_count, :integer, :default => 0, :null => false
You add code to populate the column and all the tests pass, but of course there’s a big problem: every existing user in production will report zero published posts. That means it’s time to break out a data migration. Running migrations over and over with different data or comparing lots of queries before and after migrating production data is tedious and error-prone, so let’s write a throw-away test for this migration. You can save this as spec/migration_spec.rb:
require 'spec_helper'
require Dir.glob(Rails.root.join("db", "migrate", "*_set_published_posts_for_existing_users.rb")).first
describe SetPublishedPostsForExistingUsers do
it "counts only published posts" do
user = FactoryBot.create(:user)
FactoryBot.create_list(:post, 3, :user => user, :published => true)
reset_cache_and_run_migration
user.reload.published_posts_count.should == 3
end
def reset_cache_and_run_migration
User.update_all("published_posts = 0")
SetPublishedPostsForExistingUsers.new.up
end
end
Add an empty migration, and the test fails because the user is reporting no published posts. We can get this test passing with a simple migration:
class SetActivatedFlagForExistingUsers < ActiveRecord::Migration
def up
connection.update(<<-SQL)
UPDATE users
SET published_posts_count = (
SELECT COUNT(*) FROM posts
)
SQL
end
def down
# No problem
end
end
Next up, we need to make sure it’s only counting published posts:
it "counts only published posts" do
user = FactoryBot.create(:user)
FactoryBot.create_list(:post, 3, :user => user, :published => true)
FactoryBot.create(:post, :user => user, :published => false)
reset_cache_and_run_migration
user.reload.published_posts_count.should == 3
end
That will fail because the migration counts the published posts, ending up with a total of four. We can fix that easily:
def up
connection.update(<<-SQL)
UPDATE users
SET published_posts_count = (
SELECT COUNT(*) FROM posts
WHERE posts.published = true
)
SQL
end
Next up, we need to make sure each user only counts their own posts, so we add a post for a different user:
it "counts only published posts" do
user = FactoryBot.create(:user)
FactoryBot.create_list(:post, 3, :user => user, :published => true)
FactoryBot.create(:post, :user => user, :published => false)
other_user = FactoryBot.create(:user)
FactoryBot.create(:post, :user => other_user, :published => true)
reset_cache_and_run_migration
user.reload.published_posts_count.should == 3
end
The test fails again with a count of four, since it picked up the other user’s post. Getting this test to pass leads to our final migration:
def up
connection.update(<<-SQL)
UPDATE users
SET published_posts_count = (
SELECT COUNT(*) FROM posts
WHERE posts.published = true
AND posts.user_id = users.id
)
SQL
end
At this point, I just delete the spec. Since migrations should never be edited after they run, there’s little reason to test for regressions. Inevitably the schema will change, which will mean the spec no longer applies.
The spec provides no value after the migration is committed, but I still find writing specs like these well worth the time. It’s easier for me to think like I’m used to, by writing tests first, and it makes me confident that the migration actually covers the cases it’s supposed to.