Referential integrity is a relational database concept that states implied relationships among data should be enforced. Referential integrity ensures that the relationship between rows in two tables will remain synchronized during all updates and deletes.
Rails allows us to easily set up these implied relationships, but does nothing to help us enforce referential integrity. It’s very simple to accidentally or intentionally break referential integrity in most Rails applications.
An Example Scenario
Consider the following minimal set of models describing a blogging engine:
class User < ActiveRecord::Base
has_many :posts
validates :name, presence: true
end
class Post < ActiveRecord::Base
belongs_to :user
validates :user, presence: true
end
Our blogging platform has taken off, but we’ve received requests from some users to delete their accounts. We add an interface for administrators to delete users and everything works fine.
A few days later we receive a report that we’re getting 500s on our “Popular Posts” page. Looking into it, we find that we’re getting:
undefined method `name' for nil:NilClass
This is happening when we render the name of the user associated with each post.
Somehow we’ve got a post that has no associated user even though Post
has a
validation that requires a user
.
We quickly realize that we allowed administrators to delete users but never cleaned up the deleted users’ posts. We manually clean the data and make the following change to our model to prevent this in the future:
class User < ActiveRecord::Base
has_many :posts, dependent: :destroy
end
The addition of dependent: :destroy
means when a user is
destroyed their posts will be as well. Administrators can now delete users
without fear of orphaned posts causing problems.
Months pass and our now-venture-backed blogging engine has attracted millions of
users. Unfortunately, lots of those users are spammers. We’re told we’ll be
given a daily list of user_id
s corresponding to spammers and need to write a
job to delete them. We know this list could include thousands of ids on any
given day, so we write the following code to avoid instantiating those objects
and issuing thousands of queries to destroy them:
user_ids = CSV.read(csv_path).flatten
User.where(id: user_ids).delete_all
We soon receive a call telling us we’re getting 500s on the “Recent Posts” page.
You guessed it; we violated referential integrity once again and we’re seeing
the same NoMethodError
as before.
Why didn’t dependent: :destroy
save us here? Well, delete_all
doesn’t
instantiate the objects it is deleting and thus does not fire any
after_destroy
callbacks. The dependent
options work via that callback.
Add Foreign Key Constraints
Fool me once, shame on me. Fool me thrice and I gotta find a new job. We can’t let this happen again. Rails can’t be trusted to maintain referential integrity, but you know what’s really good at doing that? Our relational database.
We can add foreign key constraints at the database level and ensure that the database will reject any operation that would violate referential integrity. Until Rails 4.2 ships with native support for foreign keys, we’ll need to add the Foreigner gem in order to do this. We add Foreigner and run the following migration:
def change
add_foreign_key :posts, :users
end
This will run the following SQL if you’re using Postgres and Foreigner:
ALTER TABLE `posts`
ADD CONSTRAINT `posts_user_id_fk`
FOREIGN KEY (`user_id`) REFERENCES `users`(id);
With the foreign key in place, any operation that causes a post to point to a
non-existent user will fail. It’s important to realize that a user_id
of
NULL
is allowed, so we still need appropriate presence validations and NOT
NULL
constraints.
Now our nightly job is failing due to the foreign key constraint. The database is preventing us from deleting any users that still have associated posts. Does this mean we have to go back to the dreaded N+1 query scenario to destroy individual users?
Cascading Deletes
With a slight tweak to our foreign key, we can have the database, rather than ActiveRecord callbacks, handle the cascading deletes. Let’s change our foreign key just a bit:
def change
remove_foreign_key :posts, :users
add_foreign_key :posts, :users, dependent: :delete
# or in the upcoming native support in Rails 4.2
# add_foreign_key :posts, :users, on_delete: :cascade
end
This will run the following SQL when creating the foreign key:
ALTER TABLE `posts`
ADD CONSTRAINT `posts_user_id_fk`
FOREIGN KEY (`user_id`) REFERENCES `users`(id)
ON DELETE CASCADE;
With the dependent
option functionality now moved to our foreign key, the
database can now handle cleaning up the associated records. We no longer need to
rely on callbacks for this behavior, so let’s remove the option.
class User
# Old association:
# has_many :posts, dependent: :destroy
has_many :posts
end
Foreigner and the native support in Rails 4.2 both support options that cascade, nullify, and restrict changes. See the documentation for Foreigner and Rails 4.2.
Adding Foreign Key Constraints to an Existing Application
With immigrant, you can automatically generate a migration that will add any
foreign key constraints your application is missing. With immigrant
added to
your Gemfile
, run rails generate immigration add_foreign_keys
to
create the migration.
If you’re working with an application of any substantial size that has been running for some time, you are very likely to encounter errors applying this migration to your production data. Foreign key constraints cannot be applied if they are not valid for all current data.
I suggest downloading a copy of your production data and trying to run the migration on that data to surface any issues you will have at deployment time. Once the data is fixed and your migration applied in production the actions that were causing the invalid data will result in errors, which you can then target for fixes.
Caveats
Polymorphic associations are maintained by Rails; the database knows nothing about them. Foreign key constraints cannot help you here so you must keep logic in your Rails application to try to maintain referential integrity.
The Lesson
Foreign key constraints help us maintain valid data and are yet another way of
helping us to avoid unexpected nil
values in our applications. It’s
unrealistic to expect application logic alone to provide the same level of
protection.
Enforcing referential integrity is another job relational databases are better prepared to handle than Rails application code. Be a Juke Box Hero and check out Foreigner (or Rails 4.2) today.