Too often, I use structs and maps exclusively to model domains in Elixir. You might do so too. I think the habit comes from modeling domains in object-oriented languages and from having a one-to-one mapping between structs and database records. But lately, I have found sum types to be a powerful domain modeling technique that can help rid projects of bugs caused by invalid states.
Let’s look at an example.
The problem
Some errors in our applications are caused by invalid state – state we thought impossible in our application but which is nevertheless present when we find a bug.
Suppose we are modeling a chess game:
defmodule Game do
defstruct [:status, :players, :winning_player]
@type status :: :not_started | :in_progress | :finished
@type t :: %__MODULE__{
status: status(),
players: {Player.t(), Player.t()} | nil,
winning_player: Player.t() | nil
}
end
We have a Game
struct with three fields:
The
status
field has three valid values::not_started
,:in_progress
, and:finished
.We have a tuple representing the two players, along with the possibility of
nil
if the game has not started.And we have a field for the
winning_player
when the game finishes, which will benil
otherwise.
Elsewhere, we have a function that determines the message to be displayed at the top of a player’s screen:
def status_message(game) do
case game.status do
:not_started ->
"Waiting for players to join..."
:in_progress ->
{player1, player2} = game.players
"Game on: #{player1.username} vs #{player2.username}."
:finished ->
"Player #{game.winning_player.username} wins!"
end
end
One day we receive a bug report saying that when two users finished the game,
they got a 500 error. Looking at our error tracking, we see the exception **
(UndefinedFunctionError) function nil.username/0 is undefined.
Looking at the
game state, we find this:
%Game{
status: :finished,
players: {%Player{username: "gandalf"}, %Player{username: "aragorn"}},
winning_player: nil
}
“But that’s impossible!” we say. Somehow the game is :finished
without a
winning_player
.
Another day our error tracker alerts us to an exception: ** (MatchError) no
match of right hand side value: nil
. Then we get a report saying that two
players cannot start a game. Looking at the game state, we find the following:
%Game{
status: :in_progress,
players: nil,
winning_player: nil
}
Somehow the game is :in_progress
, but the players are not assigned. So the
code {player1, player2} = game.players
in the status_message/1
function is
throwing a match error.
What can be done? Should we pattern match on more fields?
def status_message(game) do
case {game.status, game.players, game.winning_player} do
{:not_started, _, _} ->
"Waiting for players to join..."
{:in_progress, players, _} when not is_nil(players) ->
{player1, player2} = players
"Game on: #{player1.username} vs #{player2.username}."
{:finished, _, winning_player} when not is_nil(winning_player) ->
"Player #{winning_player.username} wins!"
_ ->
""
end
end
No. There is too much defensive programming, and we have a strange catch-all clause at the end that returns an empty message. Those seem like code smells. After all, we know that a finished game should have a winning player. And we know that a game in progress should have two players. That is the business logic of our chess game. The states we’re seeing are invalid, so let’s represent them differently.
Using sum types
Restructuring :finished
Let us first restructure the case when the game is finished. We know a winning
player is only present if the game finishes, so let’s enrich the status
to use
a tagged tuple for the finished case. The tag :finished
will continue to mark
the finished status, but the second element of the tuple will now hold the
winning player struct:
@type status :: :not_started | :in_progress | {:finished, Player.t()}
Now let’s remove the winning_player
from the Game
struct:
defmodule Game do
defstruct [:status, :players]
@type status :: :not_started | :in_progress | {:finished, Player.t()}
@type t :: %__MODULE__{
status: status(),
players: {Player.t(), Player.t()} | nil
}
end
And we can improve our status_message/1
function:
def status_message(game) do
case game.status do
:not_started ->
"Waiting for players to join..."
:in_progress ->
{player1, player2} = game.players
"Game on: #{player1.username} vs #{player2.username}."
- :finished ->
- "Player #{game.winning_player.username} wins!"
+ {:finished, winning_player} ->
+ "Player #{winning_player.username} wins!"
end
end
Restructuring :in_progress
Now let’s turn :in_progress
into another tagged tuple, where the second
element holds the two player structs:
@type players :: {Player.t(), Player.t()}
@type status :: :not_started | {:in_progress, players()} | {:finished, Player.t()}
We can now remove the players
field from the Game
struct:
defmodule Game do
defstruct [:status]
@type players :: {Player.t(), Player.t()}
@type status :: :not_started | {:in_progress, players()} | {:finished, Player.t()}
@type t :: %__MODULE__{status: status()}
end
Finally, let’s improve our status_message/1
function again:
def status_message(game) do
case game.status do
:not_started ->
"Waiting for players to join..."
- :in_progress ->
- {player1, player2} = game.players
+ {:in_progress, {player1, player2}} ->
"Game on: #{player1.username} vs #{player2.username}."
{:finished, winning_player} ->
"Player #{winning_player.username} wins!"
end
end
By making our status
field a sum type, we have modeled our domain more
accurately and removed several invalid states that were causing bugs. Take a
look at our final status_message/1
function:
def status_message(game) do
case game.status do
:not_started ->
"Waiting for players to join..."
{:in_progress, {player1, player2}} ->
"Game on: #{player1.username} vs #{player2.username}."
{:finished, winning_player} ->
"Player #{winning_player.username} wins!"
end
end
Theoretical explanation
Both product types and sum types are called algebraic data types – structs falling under the umbrella of product types. So the set of all possible values for a struct is the cartesian product of the values of its fields.
In practice that means that a struct could contain any combination of each of
the possible values of each of its fields. So in order to enumerate all
the possible values of our original Game
struct, we have to look at all
possible combinations of its fields:
# not started
%Game{status: :not_started, players: nil, winning_player: nil}
%Game{status: :not_started, players: {player1, player2}, winning_player: nil}
%Game{status: :not_started, players: {player1, player2}, winning_player: player}
%Game{status: :not_started, players: nil, winning_player: player1}
# in progress
%Game{status: :in_progress, players: nil, winning_player: nil}
%Game{status: :in_progress, players: {player1, player2}, winning_player: nil}
%Game{status: :in_progress, players: {player1, player2}, winning_player: player}
%Game{status: :in_progress, players: nil, winning_player: player}
# finished
%Game{status: :finished, players: nil, winning_player: nil}
%Game{status: :finished, players: {player1, player2}, winning_player: nil}
%Game{status: :finished, players: {player1, player2}, winning_player: player}
%Game{status: :finished, players: nil, winning_player: player}
Sum types are also algebraic data types. But the set of all possible values of a
sum type is the disjoint union of all of its variants. In practice that means
that our final Game
struct, with its restructured status
, could only be one
of the following:
%Game{status: :not_started}
# OR
%Game{status: {:in_progress, {player1, player2}}
# OR
%Game{status: {:finished, winning_player}}
Using a sum type to model our domain reduced the number of potential (and invalid) states considerably!
Parting thoughts
If you find the idea that domain modeling can remove certain invalid states interesting, you might enjoy seeing the same concept applied in other languages. Richard Feldman gave a great talk about it in Elm. Scott Wlaschin showed the concept in F#. And Yaron Minsky, who coined the term “make illegal states unrepresentable”, explained the concept in OCaml.