---
title: Make Phoenix Even Faster with a GenServer-backed Key Value Store
teaser: 'Skip adding another dependency and leverage GenServer and Erlang Term Storage
  to build an in-memory key-value store for even faster Phoenix applications.

  '
tags: web,elixir,phoenix
author: Josh Clayton
published_on: 2016-07-13
---

As developers, many of us build breakable toys in various programming
languages when learning something new; the goal is to learn the language and
not the intricacies of a new product, after all! Some of us reach for todo
apps, others a messaging clone. If you're like me, though, it's URL
shorteners.

I built my most recent URL shortener in [Elixir], and decided to see how much raw
speed I could get out of the application by using [`GenServer`] - **Gen**eric
**Server** - to build out a key-value store to bypass the database when
possible.

[Elixir]: http://elixir-lang.org/
[`GenServer`]: http://elixir-lang.org/docs/stable/elixir/GenServer.html

## State in Elixir

A GenServer is one mechanism for leveraging state in Elixir, so it's the
perfect use for a transient, in-memory cache. In this example, I'll be caching
the [Struct] straight from [Ecto] and using the slug from the URL to determine
the cache key.

[Ecto]: https://hexdocs.pm/ecto/Ecto.html
[Struct]: http://elixir-lang.org/getting-started/structs.html

## Cache Interface

Now that we understand the underlying mechanism we'll be using, the cache
behavior is the next step:

```elixir
# lib/link_cache/cache.ex

defmodule LinkCache.Cache do
  def fetch(slug, default_value_function) do
    case get(slug) do
      {:not_found} -> set(slug, default_value_function.())
      {:found, result} -> result
    end
  end
end
```

We'll use `LinkCache.Cache.fetch/2` to provide both a slug for lookup and a
function to execute to populate the cache if no value is found. While
`LinkCache.Cache.get/1` and `LinkCache.Cache.set/2` don't exist yet, we can
infer behavior: `LinkCache.Cache.get/1` will either return a tuple of
`{:not_found}` (at which point we'll call `LinkCache.Cache.set/2` to assign the
value and return it), or `{:found, result}`, at which point we return the
result directly.

To leverage this, let's change `RedirectTo.RedirectController`:

```diff
--- a/web/controllers/redirect_controller.ex
+++ b/web/controllers/redirect_controller.ex
@@ -3,7 +3,9 @@ defmodule RedirectTo.RedirectController do
   alias RedirectTo.Queries

   def show(conn, %{"slug" => slug}) do
-    link = Queries.Link.by_slug(slug) |> Repo.one
+    link = LinkCache.Cache.fetch(slug, fn ->
+      Queries.Link.by_slug(slug) |> Repo.one
+    end)

     redirect conn, external: link.long_url
   end
```

## Digging into our cache

With our usage out of the way, let's build out the rest of `LinkCache.Cache`.

```elixir
# lib/link_cache/cache.ex

defmodule LinkCache.Cache do
  use GenServer

  def start_link(opts \\ []) do
    GenServer.start_link(__MODULE__, [
      {:ets_table_name, :link_cache_table},
      {:log_limit, 1_000_000}
    ], opts)
  end

  def fetch(slug, default_value_function) do
    case get(slug) do
      {:not_found} -> set(slug, default_value_function.())
      {:found, result} -> result
    end
  end

  defp get(slug) do
    case GenServer.call(__MODULE__, {:get, slug}) do
      [] -> {:not_found}
      [{_slug, result}] -> {:found, result}
    end
  end

  defp set(slug, value) do
    GenServer.call(__MODULE__, {:set, slug, value})
  end

  # GenServer callbacks

  def handle_call({:get, slug}, _from, state) do
    %{ets_table_name: ets_table_name} = state
    result = :ets.lookup(ets_table_name, slug)
    {:reply, result, state}
  end

  def handle_call({:set, slug, value}, _from, state) do
    %{ets_table_name: ets_table_name} = state
    true = :ets.insert(ets_table_name, {slug, value})
    {:reply, value, state}
  end

  def init(args) do
    [{:ets_table_name, ets_table_name}, {:log_limit, log_limit}] = args

    :ets.new(ets_table_name, [:named_table, :set, :private])

    {:ok, %{log_limit: log_limit, ets_table_name: ets_table_name}}
  end
end
```

We've already seen `LinkCache.Cache.fetch/2`, so let's move on to
`LinkCache.Cache.get/1` and `LinkCache.Cache.set/2`.

```elixir
defp get(slug) do
  case GenServer.call(__MODULE__, {:get, slug}) do
    [] -> {:not_found}
    [{_slug, result}] -> {:found, result}
  end
end
```

[`GenServer.call/3`] is a synchronous function we use to send a request to the
server (`__MODULE__`, which is equivalent to `LinkCache.Cache`). The 2-tuple
`{:get, slug}` (which we'll pattern-match on) tells the server to retrieve the
value with the key `slug`, and will return zero or one results due to the
underlying storage mechanism we're using.

[`GenServer.call/3`]: http://elixir-lang.org/docs/stable/elixir/GenServer.html#call/3

```elixir
defp set(slug, value) do
  GenServer.call(__MODULE__, {:set, slug, value})
end
```

Here, we're performing another synchronous action to cache the `value` for
the specific `slug`, returning the 2-tuple `{slug, value}`.

Both `LinkCache.Cache.get/1` and `LinkCache.Cache.set/2` are sending requests
to `__MODULE__`, as discussed before; we need to hook into these behaviors,
pattern-matching on the requests we're sending, to trigger underlying
behavior.

Above, we'd seen `GenServer.call/3` used, with the appropriate tuples being
passed; let's look at the hooks we've written to handle these requests:

```elixir
def handle_call({:get, slug}, _from, state) do
  %{ets_table_name: ets_table_name} = state
  result = :ets.lookup(ets_table_name, slug)
  {:reply, result, state}
end

def handle_call({:set, slug, value}, _from, state) do
  %{ets_table_name: ets_table_name} = state
  true = :ets.insert(ets_table_name, {slug, value})
  {:reply, value, state}
end
```

[`GenServer.handle_call/3`] manages the request (either `{:get, slug}` or
`{:set, slug, value}`), a 2-tuple of the request sender (PID and descriptor),
which we're ignoring here (`_from`), and the server's internal state (`state`).
Both reply to the request with a 3-tuple `{:reply, link_struct, state}` - with
the link Struct available, we can then retrieve the URL and send the user on
their way.

[`GenServer.handle_call/3`]: http://elixir-lang.org/docs/stable/elixir/GenServer.html#c:handle_call/3

## Erlang Term Storage

`LinkCache.Cache.get/1` and `LinkCache.Cache.set/2` each reference `:ets`, an
Erlang module referencing the underlying [ETS (Erlang Term Storage)]
architecture, an incredibly performant in-memory store with `O(1)` lookups for
"set" storage, which we're using.

[ETS (Erlang Term Storage)]: http://erlang.org/doc/man/ets.html

In both definitions, we pattern-match on `state` to retrieve the table name
for our link cache and perform both insert (for `LinkCache.Cache.set/2`) and
lookup (for `LinkCache.Cache.get/1`) operations.

We configure `:ets` in `LinkCache.Cache.start_link/1` and
`LinkCache.Cache.init/1`:

```elixir
def start_link(opts \\ []) do
  GenServer.start_link(__MODULE__, [
    {:ets_table_name, :link_cache_table},
    {:log_limit, 1_000_000}
  ], opts)
end

def init(args) do
  [{:ets_table_name, ets_table_name}, {:log_limit, log_limit}] = args

  :ets.new(ets_table_name, [:named_table, :set, :private])

  {:ok, %{log_limit: log_limit, ets_table_name: ets_table_name}}
end
```

These functions prepare `LinkCache.Cache` to be supervised
([`GenServer.start_link/3`]) with configuration (a table named
`:link_cache_table` with room for 1 million 2-tuples), as well as create the
table ([`:ets.new/2`]), configured for a named table with "set" storage.

[`GenServer.start_link/3`]: http://elixir-lang.org/docs/stable/elixir/GenServer.html#start_link/3
[`:ets.new/2`]: http://erlang.org/doc/man/ets.html#new-2

## Fault-tolerance with a supervisor

With the underlying cache built, we now want to ensure Phoenix keeps the cache
running. Elixir promotes a strategy of letting things crash - and ensuring
recovery is quick and straightforward - with [Supervisors] at the helm.

[Supervisors]: http://elixir-lang.org/docs/stable/elixir/Supervisor.html

Let's build out a supervisor to ensure `LinkCache.Cache` is monitored and runs
correctly:

```elixir
# lib/link_cache/supervisor.ex

defmodule LinkCache.Supervisor do
  use Supervisor

  def start_link do
    Supervisor.start_link(__MODULE__, :ok, name: __MODULE__)
  end

  def init(:ok) do
    children = [
      worker(LinkCache.Cache, [[name: LinkCache.Cache]])
    ]

    supervise(children, strategy: :one_for_one)
  end
end
```

Our `LinkCache.Supervisor` supervises only `LinkCache.Cache`, with a
`:one_for_one` strategy.

We'll also need to update our application to include this supervisor:

```elixir
# lib/redirect_to.ex

defmodule RedirectTo do
  use Application

  def start(_type, _args) do
    import Supervisor.Spec, warn: false

    children = [
      supervisor(RedirectTo.Endpoint, []),
      supervisor(RedirectTo.Repo, []),
      supervisor(LinkCache.Supervisor, []),
    ]

    opts = [strategy: :one_for_one, name: RedirectTo.Supervisor]
    Supervisor.start_link(children, opts)
  end

  # ...
end
```

So, in addition to our `RedirectTo.Endpoint` and `RedirectTo.Repo`, our
`LinkCache.Supervisor` will also be managed by `RedirectTo`; this ensures if
anything happens to the cache, our application will restart it.

You can find a more thorough walkthrough of supervisors, supervision trees,
and how to tie various components together on [Elixir's discussion of OTP
topics].

[Elixir's discussion of OTP topics]: http://elixir-lang.org/getting-started/mix-otp/supervisor-and-application.html

## Testing `LinkCache.Cache`

To test the cache, I wrote a small test ensuring the cache set the value based
on return value of a function, and reused that value on a subsequent lookup.

```elixir
# test/lib/link_cache/cache_test.exs

defmodule LinkCacheTest do
  use ExUnit.Case

  test "caches and finds the correct data" do
    assert LinkCache.Cache.fetch("A", fn ->
      %{id: 1, long_url: "http://www.example.com"}
    end) == %{id: 1, long_url: "http://www.example.com"}

    assert LinkCache.Cache.fetch("A", fn -> "" end) == %{id: 1, long_url: "http://www.example.com"}
  end
end
```

Again, the cache can store Structs, Maps, or all sorts of data structures. As
is the case with any cache, invalidation might be tricky. The architecture of
this application is such that the links themselves cannot be updated, so I'm
not concerned about invalid data. Your mileage may vary.

## Memory usage with 1 million 2-tuples

I'd admittedly chosen a `log_limit` of 1 million pairs without knowing the
memory implications of that particular cache size. And while lookups are
`O(1)` (constant time, regardless of the number of pairs in the cache), I did
want to verify I wouldn't be running out of memory.

How do we verify this?

First, fire up IEx running the Phoenix application:

```sh
iex -S mix phoenix.server
```

Next, run `:observer.start()` within IEx to see a GUI outlining various metrics
of the running application.

Assuming ETS doesn't make any sort of optimizations around similar values:

```elixir
link = RedirectTo.Queries.Link.by_slug("A") |> RedirectTo.Repo.one
1..1_000_000 |> Enum.each(fn(i) -> LinkCache.Cache.fetch((i |> to_string), fn -> link end) end)
```

This inserts one million pairs into the cache, with the keys being string
versions of each integer (`"1"`, `"1250"`, etc.) and the values being the
Struct directly from Ecto.

![Memory Usage Graph of Full Struct](https://images.thoughtbot.com/blog-vellum-image-uploads/JmmT4LKeRv6v7e8CbEqA_memusage-fullgraph.png)

Over the course of ~40s, all 1 million pairs were stored in-memory (insertion
of ~25,000/s) with a result size of around 856MB, or just under 1kB per pair.

Storing a subset of the data results in significantly less memory usage:

```elixir
link = RedirectTo.Queries.Link.by_slug("A") |> RedirectTo.Repo.one |> Map.take([:slug, :long_url])
```

![Memory Usage Graph of Small Map](https://images.thoughtbot.com/blog-vellum-image-uploads/qyRuZM9uSn6nrIdRHOvl_memusage-smallmapgraph.png)

## Benchmarks on Heroku

With the cache in place, I deployed the cache- and database-read versions of
the application to [Heroku], running on the free server and database.

[Heroku]: https://www.heroku.com/

In both tests, I used [Siege] against the HTTP version of the application,
configured not to follow redirects, against the same slug with 500 concurrent
users over 60s. In both cases, the Heroku dyno was running (no spin-up time).

[Siege]: https://github.com/JoeDog/siege

### DB read results

    Transactions:                  29508 hits
    Availability:                 100.00 %
    Elapsed time:                  59.75 secs
    Data transferred:               2.67 MB
    Response time:                  0.70 secs
    Transaction rate:             493.86 trans/sec
    Throughput:                     0.04 MB/sec
    Concurrency:                  343.44
    Successful transactions:       29508
    Failed transactions:               0
    Longest transaction:            5.98
    Shortest transaction:           0.04

I was impressed by these numbers; almost 30,000 requests per minute on free
hardware, with an average response time of roughly 700ms, and this was hitting
the database on each request. No connections were dropped.

### Cache read results

    Transactions:                  44829 hits
    Availability:                 100.00 %
    Elapsed time:                  59.38 secs
    Data transferred:               4.06 MB
    Response time:                  0.38 secs
    Transaction rate:             754.95 trans/sec
    Throughput:                     0.07 MB/sec
    Concurrency:                  284.19
    Successful transactions:       44829
    Failed transactions:               0
    Longest transaction:            8.39
    Shortest transaction:           0.03

The throughput when reading from the cache was significantly better, serving 50%
more requests.  Additionally, the average response time was about half that of
the run hitting the database.

## Wrapping up

Even with Elixir's incredible speed and concurrency support out of the box
(30k RPM on free Heroku hardware with sub-second response times is pretty
great), there are always ways to leverage certain technologies for what
they're great at. By leveraging Erlang's ETS, GenServer, and supervisors, we
were able to build a fault-tolerant, scalable in-memory cache to improve both
performance and throughput.
