don't be normal

Jared Carroll

Database normalization, the database world’s equivalent of DRY, can lead you to write more code than you really need.

For example, say we have the following classes:

class Group < ActiveRecord::Base
  belongs_to :group_type

class GroupType < ActiveRecord::Base

And their corresponding tables:

groups (id, name, group_type_id)
group_types (id, name)

In our application, you can have 3 kinds of group types:

  • Public
  • Private
  • Formal

These are for display purposes. Whenever we show a Group on our site we display its GroupType‘s name. These 3 available group types are the only group types the client wants for now, the application is bootstrapped with them and we offer no interface to CRUD them or any additional group types. The Group class is justified because the client wants to CRUD Groups.

This design results in a normalized database, because the group type name is not repeated for each group.

But what is the purpose of the GroupType class? It has no behavior, only state. But what about the CRUD behavior it gets for free from ActiveRecord::Base? We don’t need that it, remember the client said these 3 group types are fine for now, and we didn’t add any way to CRUD GroupTypes in the application anyway. Creating classes based on state is not object-oriented, objects are about behavior. The GroupType class has complicated our design by bending our objects to fit our database schema. We should be thinking about objects and their behavior, not about database tables and normalization.

Let’s refactor and get rid of GroupType:

class Group < ActiveRecord::Base

And our tables:

groups (id, name, group_type)

That’s much better and more accurately reflects the client’s expectations. Remember, the client said they were fine with those 3 group types, and they don’t see changing them or adding any additional group types in the near future. Therefore, we don’t need a class and those 3 group types should just be a constant defined in the application like so:

class Group < ActiveRecord::Base
  TYPES = %w(Public Private Formal)

What do we end up? A denormalized database. But who cares if there’s duplication in the groups table:

groups (id, name, group_type)
(1, news, Public)
(2, business, Private)
(3, sports, Private)
(4, health, Public)
(5, weather, Formal)

The important point is that we didn’t bend our objects to our database schema. Designers should be thinking in objects, not how ultimately those objects will be mapped to tables in a database.

One thing I did not mention was the fact that the denormalized design will perform better than the normalized one, because you no longer need to join to group_types to get a Group’s GroupType. To me, denormalization is not about performance, it’s about design and only building what the client currently wants; performance is just a nice side effect, not one that drives me to denormalize an application.