We usually think of “parsing” as turning strings into richer data structures. Broadening the definition gives us a really nice mental model to think about many types of transformations.
Inspired by a discussion on narrowing types from the Elm discourse.
What is parsing?
In prose, we might say: “Parsing is transforming a broader type into a narrower type, with potential failures”.
Described as an Elm type signature, we can say that parsing is a function:
parse : broaderType -> Result error narrowerType
Typically, not all values of the broader type can successfully be transformed
into the narrower type, hence the need for Result
. For example, the string
"1"
can be transformed into an integer but the string "hello"
cannot.
Parsing non-strings
Under this definition, one can “parse” rich data types into other rich data types. This often comes up when capturing user input. I often define both a broader type that captures user input with a lot of uncertainty and a narrower type that accurately describes the data I want to capture.
-- Broad type with a lot of uncertainty
type alias ProfileForm =
{ age : String
, occupation : Maybe Occupation
}
-- Narrow type
type alias Profile =
{ age : Int
, occupation : Occupation
}
Then, in response to some action such as the user clicking a submit button, I
can try to parse the narrow Profile
out of the ProfileForm
with something
like:
parseProfile : ProfileForm -> Maybe Profile
parseProfile form =
Maybe.map2 Profile
(String.toInt form.age)
form.occupation
On a fancier form, you might use a dedicated form-decoder library to parse these records.
Outputs can also be inputs
One can transform data in several passes, with the parsed output at each step becoming the raw input of the next step. The types get narrower and narrower at every step and the pipeline acts as a funnel.
For example, when getting data from an API, we might:
- Try to parse the string body of the response as a JSON value. This might
fail because not all strings are valid JSON (note that the
elm/json
library does this automatically for us) - Try and parse the JSON value into a broad
UserSubmission
record. This might fail because not all JSON values are valid user submissions. - We might combine this
UserSubmission
with some user input and then try to parse it into a narrowerUser
type. This might fail because not all user submissions are valid users.
Narrowing types like this is the core idea of parse, don’t validate.
Note that these transformations don’t need to happen all at once. Each level might have some domain-relevant operations that you might want to use and you may not need to parse to the next level until certain events happen.
Not just for types
This mental model doesn’t just apply to statically-typed functional languages. Consider how a Ruby program might build value objects out of the payload from a 3rd party API.
Movie = Struct.new(:name, :director)
def build_movies(response)
json = JSON.parse response.body # parse string into array of hashes
# parse hashes into movie objects
movies = json.map do |obj|
Movie.new(obj.fetch("movie_name"), obj.fetch("dir"))
end
end
Here, failure happens via an exception but the concept is the same. We slowly move through a funnel from less structured to more structured data as we parse from strings to hashes and finally into movie objects.
Thinking in transformations
Once you have this broader mental model of parsing in your mind, you will start to see it everywhere. So much of our work as software developers is transforming data. On the web, we are constantly dealing with unstructured inputs from APIs and from users.
Thinking in terms of parsing can help us become more conscious of the boundaries within our systems and be more intentional in setting them. Because we know that the transformation at each layer can result in errors, this mental model can guide us to the hot spots where we need better test coverage and error-handling code.