Models That Match Reality

Data modeling is a big part of software design and good models should accurately approximate reality. Ideas from modeling in other disciplines, sets, and an aphorism from the Elm community can help us design better models.

Inspired by a discussion on narrowing types from the Elm discourse.

What is a model?

When we think of the word “model” in software, we often think of the M in model-view-controller (MVC). If you’re from the Elm community, you might think of the Model type that stores an application’s state in the Elm Architecture (TEA).

More broadly, models are simplified constructions that try to approximate and describe reality. A model might be physical, such as a model train or a scale model of a city. It can also be logical, such as a weather model which is a bunch of mathematical equations that try to describe weather systems.

Photo of a scale model of an 18th century city. In the foreground are
    some docks along a river and the lower town containing many row-houses.
    Towering above the lower town on top of a large bluff is the upper town,
    surrounded by stone walls. Two large churches feature prominently. In the
    background, farmland can be seen stretching away to the horizon.
Duberger scale model of Québec City, created in 1806 for the military authorities. Jeangagnon, CC BY-SA 3.0 via Wikimedia Commons.

Models in software

What about our models in the software world? They also try to approximate reality. For example a Registration might try to describe some aspects of a business process in a vacation booking application.

Real-life customers and processes can be infinitely complex and we can’t capture all of that in our modeling. Instead, we approximate and focus on the characteristics that matter in the context of this particular application.

Model accuracy

Just because all models are approximations doesn’t mean they can’t be accurate.

I find it helpful to think of reality and our model as sets. If we had a perfect model, the set of values our model can describe and the set of values in reality would be identical. If we drew this as a Venn diagram, the two circles would perfectly overlay each other.

Venn diagram with a green circle labeled 'reality' perfectly overlapped by a blue octagon labeled 'our model'
A model that accurately describes reality

In practice, our models often exclude important details, or more commonly, include values that are not part of the domain we are trying to describe. As programmers, we often label these as “invalid values”.

Venn diagram with a green circle labeled 'reality', partly overlapped by a blue octagon labeled 'our model'. The part of the octogon that does not overlap the circle is labeled 'invalid values'.
An inaccurate model that describes a lot of values not part of reality

Modeling flights

For example, as part of modeling a registration process we might ask a customer if they want a one-way or round-trip flight. We might describe this process with two booleans. I’m going to use Elm types to show concrete modeling examples below but the concept applies regardless of your your modeling language.

type alias Flight =
  { roundTrip : Bool
  , oneWay : Bool
  }

This model is too broad. Reality says there are only 2 kinds of flights: one-way or round-trip and that these two choices are mutually exclusive. However, our model describes 4 different kinds of flights including 2 that are nonsensical (both round-trip and one-way, and neither round-trip nor one-way).

In set terms, we would say that the “reality” set has a cardinality of 2, while the “model” set has a cardinality of 4. Remember, we’re trying to get both sets to be identical so this tells us that our model is too permissive. If our program gets into one of these states, we will likely get garbage output.

We can try and create a different model that better describes the reality of flight choices. The following can be read as “a Flight is OneWay OR RoundTrip

type Flight = OneWay | RoundTrip

Now, both our “reality” and “model” sets have a cardinality of 2 (and both contain the same 2 values). Our model is much more accurate.

Not just types

While the examples in the section above used Elm types, these modeling ideas are not restricted to typed languages. We could do something similar with a Ruby on Rails model as well:

class Flight < ApplicationRecord
  enum direction: [:one_way, :round_trip]
end

Making impossible states impossible

Elm programmers often refer to this kind of modeling improvement as “making impossible states impossible”. You’ll often hear this brought up in discussions about what types to use, however the big idea behind it is much broader: align your modeling with the reality you are trying to describe.

More on data modeling

Want to learn more about data modeling? Here are some helpful resources from our blog and around the internet.