Comment your regular expressions

Summer ☀️

Regular expressions have a reputation for being cryptic and arcane, and with good reason: their syntax is dense and non-obvious. Unfortunately that leads many people to not view them as real code, so they copy-and-paste them without analyzing them to verify their behavior, or they ignore them in code reviews. This isn’t ideal; is there a way to help ensure that regexes are treated with the significance that they warrant?

Yes: we can comment them! For example, let’s take this regex for a USA postal code in Ruby:

usa_postal_code_pattern = /\A\d{5}(-\d{4})?\z/

That’s pretty hard to read; no wonder we want to gloss over it. Using Ruby’s “extended mode” for regexes via the x flag (and a %r{⋯} symmetrical percent literal for better readability across multiple lines), we can split that into parts and add comments explaining them:

usa_postal_code_pattern = %r{
  \A # Beginning of string
  \d{5} # 5 digits
  ( # ZIP+4
    - # Hyphen
    \d{4} # 4 digits
  )? # ZIP+4 is optional
  \z # End of string
}x

Beware that because whitespace is deliberately ignored in this mode, you must escape it when you want to represent literal whitespace characters. For example, here’s a pattern for UK postal codes:

uk_postal_code_pattern = %r{
  \A # Beginning of string
  [A-Z]{1,2} # 1–2 capital letters
  \d # Digit
  [A-Z\d]? # Optional capital letter or digit

  (\ ) # Single space

  \d # Digit
  [A-Z]{2} # 2 capital letters
  \z # End of string
}x

This is possible in other languages too! Perl supports it; Python calls it the “verbose” flag; in JavaScript you can use string concatenation.