Referential integrity is a relational database concept that states implied relationships among data should be enforced. Referential integrity ensures that the relationship between rows in two tables will remain synchronized during all updates and deletes.
Rails allows us to easily set up these implied relationships, but does nothing to help us enforce referential integrity. It’s very simple to accidentally or intentionally break referential integrity in most Rails applications.
Consider the following minimal set of models describing a blogging engine:
class User < ActiveRecord::Base has_many :posts validates :name, presence: true end class Post < ActiveRecord::Base belongs_to :user validates :user, presence: true end
Our blogging platform has taken off, but we’ve received requests from some users to delete their accounts. We add an interface for administrators to delete users and everything works fine.
A few days later we receive a report that we’re getting 500s on our “Popular Posts” page. Looking into it, we find that we’re getting:
undefined method `name' for nil:NilClass
This is happening when we render the name of the user associated with each post.
Somehow we’ve got a post that has no associated user even though
Post has a
validation that requires a
We quickly realize that we allowed administrators to delete users but never cleaned up the deleted users’ posts. We manually clean the data and make the following change to our model to prevent this in the future:
class User < ActiveRecord::Base has_many :posts, dependent: :destroy end
The addition of
dependent: :destroy means when a user is
destroyed their posts will be as well. Administrators can now delete users
without fear of orphaned posts causing problems.
Months pass and our now-venture-backed blogging engine has attracted millions of
users. Unfortunately, lots of those users are spammers. We’re told we’ll be
given a daily list of
user_ids corresponding to spammers and need to write a
job to delete them. We know this list could include thousands of ids on any
given day, so we write the following code to avoid instantiating those objects
and issuing thousands of queries to destroy them:
user_ids = CSV.read(csv_path).flatten User.where(id: user_ids).delete_all
We soon receive a call telling us we’re getting 500s on the “Recent Posts” page.
You guessed it; we violated referential integrity once again and we’re seeing
NoMethodError as before.
dependent: :destroy save us here? Well,
instantiate the objects it is deleting and thus does not fire any
after_destroy callbacks. The
dependent options work via that callback.
Fool me once, shame on me. Fool me thrice and I gotta find a new job. We can’t let this happen again. Rails can’t be trusted to maintain referential integrity, but you know what’s really good at doing that? Our relational database.
We can add foreign key constraints at the database level and ensure that the database will reject any operation that would violate referential integrity. Until Rails 4.2 ships with native support for foreign keys, we’ll need to add the Foreigner gem in order to do this. We add Foreigner and run the following migration:
def change add_foreign_key :posts, :users end
This will run the following SQL if you’re using Postgres and Foreigner:
ALTER TABLE `posts` ADD CONSTRAINT `posts_user_id_fk` FOREIGN KEY (`user_id`) REFERENCES `users`(id);
With the foreign key in place, any operation that causes a post to point to a
non-existent user will fail. It’s important to realize that a
NULL is allowed, so we still need appropriate presence validations and
Now our nightly job is failing due to the foreign key constraint. The database is preventing us from deleting any users that still have associated posts. Does this mean we have to go back to the dreaded N+1 query scenario to destroy individual users?
With a slight tweak to our foreign key, we can have the database, rather than ActiveRecord callbacks, handle the cascading deletes. Let’s change our foreign key just a bit:
def change remove_foreign_key :posts, :users add_foreign_key :posts, :users, dependent: :delete # or in the upcoming native support in Rails 4.2 # add_foreign_key :posts, :users, on_delete: :cascade end
This will run the following SQL when creating the foreign key:
ALTER TABLE `posts` ADD CONSTRAINT `posts_user_id_fk` FOREIGN KEY (`user_id`) REFERENCES `users`(id) ON DELETE CASCADE;
dependent option functionality now moved to our foreign key, the
database can now handle cleaning up the associated records. We no longer need to
rely on callbacks for this behavior, so let’s remove the option.
class User # Old association: # has_many :posts, dependent: :destroy has_many :posts end
With immigrant, you can automatically generate a migration that will add any
foreign key constraints your application is missing. With
immigrant added to
rails generate immigration add_foreign_keys to
create the migration.
If you’re working with an application of any substantial size that has been running for some time, you are very likely to encounter errors applying this migration to your production data. Foreign key constraints cannot be applied if they are not valid for all current data.
I suggest downloading a copy of your production data and trying to run the migration on that data to surface any issues you will have at deployment time. Once the data is fixed and your migration applied in production the actions that were causing the invalid data will result in errors, which you can then target for fixes.
Polymorphic associations are maintained by Rails; the database knows nothing about them. Foreign key constraints cannot help you here so you must keep logic in your Rails application to try to maintain referential integrity.
Foreign key constraints help us maintain valid data and are yet another way of
helping us to avoid unexpected
nil values in our applications. It’s
unrealistic to expect application logic alone to provide the same level of