A Broader Take on Parsing

Joël Quenneville

We usually think of “parsing” as turning strings into richer data structures. Broadening the definition gives us a really nice mental model to think about many types of transformations.

Inspired by a discussion on narrowing types from the Elm discourse.

What is parsing?

In prose, we might say: “Parsing is transforming a broader type into a narrower type, with potential failures”.

Described as an Elm type signature, we can say that parsing is a function:

parse : broaderType -> Result error narrowerType

Typically, not all values of the broader type can successfully be transformed into the narrower type, hence the need for Result. For example, the string "1" can be transformed into an integer but the string "hello" cannot.

Parsing non-strings

Under this definition, one can “parse” rich data types into other rich data types. This often comes up when capturing user input. I often define both a broader type that captures user input with a lot of uncertainty and a narrower type that accurately describes the data I want to capture.

-- Broad type with a lot of uncertainty

type alias ProfileForm =
  { age : String
  , occupation : Maybe Occupation

-- Narrow type

type alias Profile =
  { age : Int
  , occupation : Occupation

Then, in response to some action such as the user clicking a submit button, I can try to parse the narrow Profile out of the ProfileForm with something like:

parseProfile : ProfileForm -> Maybe Profile
parseProfile form =
  Maybe.map2 Profile
    (String.toInt form.age)

On a fancier form, you might use a dedicated form-decoder library to parse these records.

Outputs can also be inputs

One can transform data in several passes, with the parsed output at each step becoming the raw input of the next step. The types get narrower and narrower at every step and the pipeline acts as a funnel.

For example, when getting data from an API, we might:

  1. Try to parse the string body of the response as a JSON value. This might fail because not all strings are valid JSON (note that the elm/json library does this automatically for us)
  2. Try and parse the JSON value into a broad UserSubmission record. This might fail because not all JSON values are valid user submissions.
  3. We might combine this UserSubmission with some user input and then try to parse it into a narrower User type. This might fail because not all user submissions are valid users.

Diagram showing 4 rectangles one on top of each other. Each rectangle is smaller than the one above, creating a sort of funnel. On the left, a series of arrows point from each rectangle to the one below it and say "parse". The four rectangles are labeled as follows from top to bottom: 1. String 2. Json.Decode.Value 3. UserSubmission 4. User

Narrowing types like this is the core idea of parse, don’t validate.

Note that these transformations don’t need to happen all at once. Each level might have some domain-relevant operations that you might want to use and you may not need to parse to the next level until certain events happen.

Not just for types

This mental model doesn’t just apply to statically-typed functional languages. Consider how a Ruby program might build value objects out of the payload from a 3rd party API.

Movie = Struct.new(:name, :director)

def build_movies(response)
  json = JSON.parse response.body # parse string into array of hashes

  # parse hashes into movie objects
  movies = json.map do |obj|
    Movie.new(obj.fetch("movie_name"), obj.fetch("dir"))

Here, failure happens via an exception but the concept is the same. We slowly move through a funnel from less structured to more structured data as we parse from strings to hashes and finally into movie objects.

Thinking in transformations

Once you have this broader mental model of parsing in your mind, you will start to see it everywhere. So much of our work as software developers is transforming data. On the web, we are constantly dealing with unstructured inputs from APIs and from users.

Thinking in terms of parsing can help us become more conscious of the boundaries within our systems and be more intentional in setting them. Because we know that the transformation at each layer can result in errors, this mental model can guide us to the hot spots where we need better test coverage and error-handling code.