---
title: How We Used a Custom Enumerator to Fix a Production Problem
teaser: 'Let''s look at an example of how custom enumerators can help on your day-to-day
  work.

  '
tags: ruby,enumerator
author: Thiago Araújo Silva
published_on: 2024-05-13
---

[Enumerators] are a powerful Ruby feature. Did you know that you can
create custom enumerators with custom logic? Today, I will tell you
how we fixed a production problem with a custom enumerator.

## Problem

We had a method like the following:

```ruby
def customer_property_ids
  sql = "SELECT DISTINCT PropertyId from AddressMatch"
  client.execute(sql).map { _1["PropertyId"] }
end
```

It runs a Microsoft SQL Server query that returns an array of property
IDs. In Ruby, arrays implement the [Enumerable] interface by default,
so we can use the rich assortment of `Enumerable` methods to iterate
over them.

Other methods from the same class relied on the array returned by
`customer_property_ids`; we had three occurrences of the following
call throughout the class:

```ruby
customer_property_ids.each_slice(BATCH_SIZE) do |batch_ids|
  # ...
end
```

We were able to pinpoint the cause of the crash: the SQL Server query
returned more than 5 million property IDs, which made the Ruby process
core dump due to memory exhaustion.

We needed to push a quick fix to production to get the application
back up while limiting the scope of the changes for maximum safety.
Getting back to business was a matter of the utmost urgency. What to
do?

## Solution

We would avoid the core dump if we kept fewer IDs in memory at a time,
and that's where our custom enumerator comes in. The first thing we
did was figure out how to paginate a query in SQL Server. Here's how
we do it:

```ruby
SELECT DISTINCT PropertyId
FROM AddressMatch
ORDER BY PropertyId ASC
OFFSET 0 ROWS FETCH FIRST 100000 ROWS ONLY
```

This query retrieves the first 100,000 IDs. We can interpolate an
`OFFSET` and `FETCH FIRST` into the SQL string to get the subsequent
pages.

Finally, we limited the changes to a single spot: the
`customer_property_ids` method. Putting everything together, the
enumerator code looked as follows:

```ruby
def customer_property_ids
  Enumerator.new do |yielder|
    limit = 100_000
    offset = 0

    loop do
      sql = <<~SQL
        SELECT DISTINCT PropertyId
        FROM AddressMatch
        ORDER BY PropertyId ASC
        OFFSET #{offset} ROWS FETCH FIRST #{limit} ROWS ONLY
      SQL

      ids = client.execute(sql)
        .map { |row| yielder << row["PropertyId"] }

      break if ids.none?

      offset += limit
    end
  end
end
```

In the enumerator, we manage a `limit` and an `offset` and execute the
query within a loop until it returns an empty result. Each time the
query returns the results for a page, we give the property IDs to the
`yielder`.

With that scoped change, we avoided further changes to the methods
that called `customer_property_ids.each_slice`. We built our own
custom enumerator that paginates the query results and avoids memory
bloat! However, pagination is an internal detail encapsulated by the
enumerator.

## Explanation

Let's explain the solution from a mental model perspective, which
doesn't necessarily correspond to the internal Ruby implementation
details.

The magic of enumerators is that they use [Fibers] underneath, so in
our example, whenever we shove a value into the `yielder`, it pauses
execution and returns control to the `each_slice` method. If
`each_slice` does not have enough values to work with, it resumes the
execution and asks for more. Here's a brief example:

```ruby
enum = Enumerator.new do |yielder|
  puts "Starting the custom enumerator..."

  [1, 2].each do |n|
    puts "Giving #{n} to the yielder"
    yielder << n
  end

  puts "each_slice is still asking for more values..."

  [3, 4].each do |n|
    puts "Giving #{n} to the yielder"
    yielder << n
  end
end

enum.each_slice(3) do |slice|
  puts "We have enough, let's take a slice: #{slice}"
end
```

And the output:

```
Starting the custom enumerator...
Giving 1 to the yielder...
Giving 2 to the yielder...
each_slice is still asking for more values...
Giving 3 to the yielder
We have enough, let's take a slice: [1, 2, 3]
Giving 4 to the yielder
We have enough, let's take a slice: [4]
```

Back to the `customer_property_ids` enumerator, that continuous
context switching guarantees that the values are processed as the
yielder gets fed with them, which allows the garbage collector to
discard the results already used. Compare that to eager loading all
the 5 million IDs in memory!

While working on that fix, we discovered that the code was not optimal
and could benefit from other refactors to make it scale better, but
that simple change quickly put our client back in business.

[Enumerators]: https://docs.ruby-lang.org/en/master/Enumerator.html
[Enumerable]: https://docs.ruby-lang.org/en/master/Enumerable.html
[Fibers]: https://docs.ruby-lang.org/en/master/Fiber.html
