---
title: Comment your regular expressions
teaser: "(Don't Fear) The Regex"
tags: regex,ruby
author: Summer ☀️
published_on: 2024-09-09
---

Regular expressions have a reputation for being cryptic and arcane, and with
good reason: their syntax is dense and non-obvious. Unfortunately that leads
many people to not view them as real code, so they copy-and-paste them without
analyzing them to verify their behavior, or they ignore them in code reviews.
This isn't ideal; is there a way to help ensure that regexes are treated with
the significance that they warrant?

Yes: we can comment them! For example, let's take this regex for a
[USA postal code](https://en.wikipedia.org/wiki/ZIP_Code) in Ruby:

```ruby
usa_postal_code_pattern = /\A\d{5}(-\d{4})?\z/
```

That's pretty hard to read; no wonder we want to gloss over it. Using
[Ruby's "extended mode" for regexes via the `x` flag](https://docs.ruby-lang.org/en/3.3/Regexp.html#class-Regexp-label-Extended+Mode)
 (and a
[`%r{⋯}` symmetrical percent literal](https://docs.ruby-lang.org/en/3.3/syntax/literals_rdoc.html#label-25r-3A+Regexp+Literals)
for better readability across multiple lines), we can split that into parts and
add comments explaining them:

```ruby
usa_postal_code_pattern = %r{
  \A # Beginning of string
  \d{5} # 5 digits
  ( # ZIP+4
    - # Hyphen
    \d{4} # 4 digits
  )? # ZIP+4 is optional
  \z # End of string
}x
```

Beware that because whitespace is deliberately ignored in this mode, you must
escape it when you want to represent literal whitespace characters. For example,
here's a pattern for
[UK postal codes](https://en.wikipedia.org/wiki/Postcodes_in_the_United_Kingdom):

```ruby
uk_postal_code_pattern = %r{
  \A # Beginning of string
  [A-Z]{1,2} # 1–2 capital letters
  \d # Digit
  [A-Z\d]? # Optional capital letter or digit

  (\ ) # Single space

  \d # Digit
  [A-Z]{2} # 2 capital letters
  \z # End of string
}x
```

<aside class="info">
  <p>
    In the above examples, every line is commented in order to be illustrative.
    That's probably not necessary for most regexes.
  </p>
</aside>

This is possible in other languages too!
[Perl supports it](https://perldoc.perl.org/perlre#/x-and-/xx);
[Python calls it the "verbose" flag](https://docs.python.org/3/howto/regex.html#using-re-verbose);
[in JavaScript you can use string concatenation](https://stackoverflow.com/a/15463297/16330198).
