Comment your regular expressions

Summer ☀️

Regular expressions have a reputation for being cryptic and arcane, and with good reason: their syntax is dense and non-obvious. Unfortunately that leads many people to not view them as real code, so they copy-and-paste them without analyzing them to verify their behavior, or they ignore them in code reviews. This isn’t ideal; is there a way to help ensure that regexes are treated with the significance that they warrant?

Yes: we can comment them! For example, let’s take this regex for a USA postal code in Ruby:

usa_postal_code_pattern = /\A\d{5}(-\d{4})?\z/

That’s pretty hard to read; no wonder we want to gloss over it. Using Ruby’s “extended mode” for regexes via the x flag (and a %r{⋯} symmetrical percent literal for better readability across multiple lines), we can split that into parts and add comments explaining them:

usa_postal_code_pattern = %r{
  \A # Beginning of string
  \d{5} # 5 digits
  ( # ZIP+4
    - # Hyphen
    \d{4} # 4 digits
  )? # ZIP+4 is optional
  \z # End of string
}x

Beware that because whitespace is deliberately ignored in this mode, you must escape it when you want to represent literal whitespace characters. For example, here’s a pattern for UK postal codes:

uk_postal_code_pattern = %r{
  \A # Beginning of string
  [A-Z]{1,2} # 1–2 capital letters
  \d # Digit
  [A-Z\d]? # Optional capital letter or digit

  (\ ) # Single space

  \d # Digit
  [A-Z]{2} # 2 capital letters
  \z # End of string
}x

This is possible in other languages too! Perl supports it; Python calls it the “verbose” flag; in JavaScript you can use string concatenation.

About thoughtbot

We've been helping engineering teams deliver exceptional products for over 20 years. Our designers, developers, and product managers work closely with teams to solve your toughest software challenges through collaborative design and development. Learn more about us.