Modeling a Paginated API as a Lazy Stream

Joël Quenneville

You are integrating with a 3rd party application that contains statistics on the most popular baby names for a given year.

You have both high-level stats and per-name information you’d like to display. It’d be nice if you could write the code like this:

class NamesController < ApplicationController
  def index
    @names = Names::Client.all_names
  end

  def show
    @name = Names::Client.find_name(params[:name])
  end
end

The Client

You could write a simple client like this:

module Names
  class Client
    Name = Struct.new(:id, :name, :births)
    BASE_URL = "http://name-service.com"

    def all_names
      fetch_data("/users").
        map { |data| convert_to_name(data) }
    end

    private

    def fetch_data(path)
      HTTParty.get(BASE_URL + path).
    end

    def convert_to_name(data)
      Name.new(data["id"], data["name"], data["births"])
    end
  end
end

Seems straightforward enough. Almost too easy. You’re about to hit your first roadblock.

Pagination

As you start using the API, you notice that some results seem to be missing. You take a closer look and notice that you’re always getting exactly 10 results from the API. The same 10 results. Aha! Looks like pagination!

Like many APIs, this one paginates its data for performance since it’s a really large set. The items per page seems to be hard-coded to 10.

You could write a method that fetches the 10 results for a given page number but that’s not how your application uses the data. You would like to be able to deal with the data as a single list. Breaking the data up into pages is an implementation detail of the API.

It would be nice to model the data as a stream of data instead. Specifically, a lazy stream so that we only make the minimum number of HTTP requests. Enter the Enumerator.

Enumerator

You add a new method to the client to work with paginated results. This fetches a page and then yields the results one at a time until it runs out of local results. Then it makes a request for the next page and starts the process over again. The enumeration ends once an HTTP request responds with a non-200 response.

def fetch_paginated_data(path)
  Enumerator.new do |yielder|
    page = 1

    loop do
      results = fetch_data("#{path}?page=#{page}")

      if results.success?
        results.map { |item| yielder << item }
        page += 1
      else
        raise StopIteration
      end
    end
  end.lazy
end

Note that appending ?page=#{page} to the end of the path is a bit naive and will only work with URLs that don’t have any other query parameters. For more complex URLs, you will want to use Ruby’s URI library.

The client’s public all_names method doesn’t change much. The only difference is that it calls fetch_paginated_data instead of fetch_data.

The API you’re integrating against returns an HTTP 404 response code for pages with no results so the Enumerator stops iterating when it gets a non-successful status code. For other API implementations, it may make sense to check on empty results instead. Some APIs provide links to the “next” page so you would check on that. The Bootic client has an example of this approach.

module Names
  class Client
    Name = Struct.new(:id, :name, :births)
    BASE_URL = "http://name-service.com"

    def all_names
      fetch_paginated_data("/users").
        map { |data| convert_to_name(data) }
    end
  end
end

The show page

Going back to our controller implementation:

class NamesController < ApplicationController
  def index
    @names = Names::Client.all_names
  end

  def show
    @name = Names::Client.find_name(params[:name])
  end
end

Getting all names now works the way you’d expect. But what about that show action? The API doesn’t provide a way to search. You could get the all the results and then filter them in Ruby but that would cause a lot of useless HTTP requests. How can you make the minimum number of requests to get the name you want?

This is where the lazy Enumerator really pays off. This code does the minimum work needed to get us a result.

def find_name(name)
  all_names.detect { |n| n.name == name }
end

Too simple? Time to try it out! Sofia is the 28th name on the list (and therefore should be on page 3). If all works the way you expect the client should only make requests for pages 1, 2, and 3 and stop once it finds Sofia.

trying out the code

Success!

Extra

Want to play around with this concept? The code for the client as well as a sample server can be found on GitHub.

The list of names used came from the US Social Security Administration’s list of most popular names of 2015

Check out this article on lazy refactoring for a different use case of lazy enumerators.