Database normalization, the database world’s equivalent of DRY, can lead you to write more code than you really need.
For example, say we have the following classes:
class Group < ActiveRecord::Base
belongs_to :group_type
end
class GroupType < ActiveRecord::Base
end
And their corresponding tables:
groups (id, name, group_type_id)
group_types (id, name)
In our application, you can have 3 kinds of group types:
- Public
- Private
- Formal
These are for display purposes. Whenever we show a Group
on our site we
display its GroupType
‘s name. These 3 available group types are the only
group types the client wants for now, the application is bootstrapped with them
and we offer no interface to CRUD
them or any additional group types. The Group
class is justified because the
client wants to CRUD Group
s.
This design results in a normalized database, because the group type name is not repeated for each group.
But what is the purpose of the GroupType
class? It has no behavior, only
state. But what about the CRUD
behavior it gets for free from ActiveRecord::Base
? We don’t need that it,
remember the client said these 3 group types are fine for now, and we didn’t add
any way to CRUD GroupType
s in
the application anyway. Creating classes based on state is not object-oriented,
objects are about behavior. The GroupType
class has complicated our design by
bending our objects to fit our database schema. We should be thinking about
objects and their behavior, not about database tables and normalization.
Let’s refactor and get rid of GroupType
:
class Group < ActiveRecord::Base
end
And our tables:
groups (id, name, group_type)
That’s much better and more accurately reflects the client’s expectations. Remember, the client said they were fine with those 3 group types, and they don’t see changing them or adding any additional group types in the near future. Therefore, we don’t need a class and those 3 group types should just be a constant defined in the application like so:
class Group < ActiveRecord::Base
TYPES = %w(Public Private Formal)
end
What do we end up? A denormalized database. But who cares if there’s
duplication in the groups
table:
groups (id, name, group_type)
(1, news, Public)
(2, business, Private)
(3, sports, Private)
(4, health, Public)
(5, weather, Formal)
The important point is that we didn’t bend our objects to our database schema. Designers should be thinking in objects, not how ultimately those objects will be mapped to tables in a database.
One thing I did not mention was the fact that the denormalized design will
perform better than the normalized one, because you no longer need to join to
group_types
to get a Group
’s GroupType
. To me, denormalization is not
about performance, it’s about design and only building what the client currently
wants; performance is just a nice side effect, not one that drives me to
denormalize an application.