---
title: Shaping Values with Types
teaser: 'Elm types like `String` can represent an infinite number of possible values.
  Let''s use types to reduce that number and better declare intent.

  '
tags: elm,types
author: Josh Clayton
published_on: 2018-04-13
---

On a client project recently, I was putting fake data together as a first pass
for seeding a UI in Elm.

The domain model looked fairly straightforward:

```elm
module Data.Employee
    exposing
        ( Employee
        , Name(..)
        , EmployeeId(..)
        )


type alias Employee =
    { id : EmployeeId
    , fullName : Name
    }


type EmployeeId
    = EmployeeId String


type Name
    = Name String
```

In my module to generate fake data, I built an `employee` function:

```elm
employee : Employee
employee =
    { id = EmployeeId "A-1234-jane-doe"
    , fullName = Name "Jane Doe"
    }
```

With this (and other data) wrapped up, I submitted a pull request to gather
feedback. Another developer commented that the value of the `EmployeeId`
didn't reflect reality (employee IDs are four- or five-digit codes, which may
include leading zeroes).

## Types without Reality

While wrapping the underlying `String` in a `EmployeeId` [prevents argument
order bugs] (e.g. a signature `String -> String -> Employee` allows calling
both `Employee "id" "name"` and `Employee "name" "id"`), it doesn't reflect
real-world usage and possible values.

[prevents argument order bugs]: https://thoughtbot.com/blog/lessons-learned-avoiding-primitives-in-elm

While there are only 110,000 valid four- and five-digit employee IDs, our data
model for an employee ID uses the underlying type of `String`, which can
represent an infinite number of values. Our data model does not reflect
reality. By reducing the number of possible values captured in a type, it's
less likely that an incorrect value sneaks in.

## Dissecting Types and Value Surface Area

### `String`

The `String` type (independent of any memory or storage limitations) can
represent an infinite number of characters. Values like the one I submitted in
my pull request (`A-1234-jane-doe`) have a type of `String`, but the type
is too permissive.

For example, `"12357"` is valid, but `"made-up-id"`, `""`, and `"-----"` are
not. All have the type `String`.

### `List Int`

A `List Int` type better describes that we expect to have a list of numbers,
but this list can also be infinitely long.

For example, `[1, 2, 3, 4]` is correct, but `[1, 2, 3, 4, 5, 6, 7, 8, 9]` is
not. Both have the type `List Int`.

### `(Int, Int, Int, Int)` and `(Int, Int, Int, Int, Int)`

This is closer to what we'd actually expect; there are explicit, arbitrary
limits to the digits themselves. However, valid `Int`s include negative numbers
and numbers greater than 9.

For example, `(1, 2, 3, 4)` is correct, but `(-100, 15, 2, 295001)` is
not. Both have the type `(Int, Int, Int, Int)`.

## Constructor Validation

Let's take a quick tangent and discuss ways to guarantee correct values even
with less-than-ideal types.

With the type

```elm
type EmployeeId
    = EmployeeId String
```

Instead of exposing the `EmployeeId` data constructor (the function of type
`String -> EmployeeId`), we can define a function to build an employee ID that
might fail:

```elm
parseEmployeeId : String -> Result String EmployeeId
parseEmployeeId value =
    case intResults value of
        [ Ok d1, Ok d2, Ok d3, Ok d4 ] ->
            Ok <| buildEmployeeIdFromSafeInts [ d1, d2, d3, d4 ]

        [ Ok d1, Ok d2, Ok d3, Ok d4, Ok d5 ] ->
            Ok <| buildEmployeeIdFromSafeInts [ d1, d2, d3, d4, d5 ]

        _ ->
            Err "Employee ID is not in the correct format"


intResults : String -> List (Result String Int)
intResults =
    List.map String.toInt << String.split ""


buildEmployeeIdFromSafeInts : List Int -> EmployeeId
buildEmployeeIdFromSafeInts =
    EmployeeId << String.concat << List.map toString
```

With some [property testing], we could achieve a high level of confidence that
this function protects the system from invalid data; coupled with the fact that
we don't expose `EmployeeId : String -> EmployeeId`, we're all but guaranteed
that the system won't be fed bad data.

[property testing]: http://package.elm-lang.org/packages/elm-community/elm-test/4.2.0/Fuzz

This safety is provided at *runtime* instead of *compile-time*, however; the
underlying data (the value of type `String`) will fulfill the business
requirements but doesn't help clarify what those requirements are. From a
communication perspective, readers of our code can't understand the business
requirements behind an `EmployeeId` only by reading the type because it's still
wrapping a `String`.

## A Long-Winded (and Theoretically "Correct") Type

How can we model `EmployeeId` to reflect reality?

```elm
type Digit
    = D0
    | D1
    | D2
    | D3
    | D4
    | D5
    | D6
    | D7
    | D8
    | D9


type EmployeeId
    = FourDigitEmployeeId Digit Digit Digit Digit
    | FiveDigitEmployeeId Digit Digit Digit Digit Digit
```

This greatly reduces the number of values possible to represent employee IDs
(now 110,000, where previous types like `String` and `List Int` were both
infinity!) More importantly, the type enforces that the value represented is
valid.

With a couple of boilerplate functions:

```elm
digitFromChar : Char -> Result String Digit
digitFromChar char =
    case char of
        '0' ->
            Ok D0

        '1' ->
            Ok D1

        '2' ->
            Ok D2

        '3' ->
            Ok D3

        '4' ->
            Ok D4

        '5' ->
            Ok D5

        '6' ->
            Ok D6

        '7' ->
            Ok D7

        '8' ->
            Ok D8

        '9' ->
            Ok D9

        v ->
            Err <| String.fromChar v


parseDigitsFromString : String -> List (Result String Digit)
parseDigitsFromString =
    List.map digitFromChar << String.toList
```

We can now build out our same constructor function to parse values and generate
correct `EmployeeId`s:

```elm
parseEmployeeId : String -> Result String EmployeeId
parseEmployeeId value =
    case parseDigitsFromString value of
        [ Ok d1, Ok d2, Ok d3, Ok d4 ] ->
            Ok <| FourDigitEmployeeId d1 d2 d3 d4

        [ Ok d1, Ok d2, Ok d3, Ok d4, Ok d5 ] ->
            Ok <| FiveDigitEmployeeId d1 d2 d3 d4 d5

        _ ->
            Err "Employee ID is not in the correct format"
```

This safety is now provided at *compile-time*. In the first example, the data
is correct because of `parseEmployeeId`, while in this example, we need
`parseEmployeeId` because the data is correct. The relationship is flipped:
the need to parse is the *cause* of correctness in the first example, while
in the second, the need to parse is *caused by* correctness.

## Practical Application

Is this more strict approach viable? Useful? Flexible? It depends on the
application, the likelihood of the domain being "correct", and the risks of
introducing values where the types are correct but the data isn't.

I'd avoid this approach in cases where the domain is evolving rapidly or when
there are less rigid data structure requirements, instead relying on the
["newtype" technique] of wrapping primitives (e.g. `type Example = Example
String`).

["newtype" technique]: https://thoughtbot.com/blog/lessons-learned-avoiding-primitives-in-elm

The benefits of this approach are two-fold: types introduce improved safety
when working with data and we're able to communicate business rules. Improved
safety results in a more accurate system, assuming types properly encode the
structures. Communicating business rules means other developers understand
possible values and states the information can exist in, allowing for improved
reasoning across the codebase.
