As developers, many of us build breakable toys in various programming languages when learning something new; the goal is to learn the language and not the intricacies of a new product, after all! Some of us reach for todo apps, others a messaging clone. If you’re like me, though, it’s URL shorteners.
I built my most recent URL shortener in Elixir, and decided to see how much raw
speed I could get out of the application by using GenServer
- Generic
Server - to build out a key-value store to bypass the database when
possible.
State in Elixir
A GenServer is one mechanism for leveraging state in Elixir, so it’s the perfect use for a transient, in-memory cache. In this example, I’ll be caching the Struct straight from Ecto and using the slug from the URL to determine the cache key.
Cache Interface
Now that we understand the underlying mechanism we’ll be using, the cache behavior is the next step:
# lib/link_cache/cache.ex
defmodule LinkCache.Cache do
def fetch(slug, default_value_function) do
case get(slug) do
{:not_found} -> set(slug, default_value_function.())
{:found, result} -> result
end
end
end
We’ll use LinkCache.Cache.fetch/2
to provide both a slug for lookup and a
function to execute to populate the cache if no value is found. While
LinkCache.Cache.get/1
and LinkCache.Cache.set/2
don’t exist yet, we can
infer behavior: LinkCache.Cache.get/1
will either return a tuple of
{:not_found}
(at which point we’ll call LinkCache.Cache.set/2
to assign the
value and return it), or {:found, result}
, at which point we return the
result directly.
To leverage this, let’s change RedirectTo.RedirectController
:
--- a/web/controllers/redirect_controller.ex
+++ b/web/controllers/redirect_controller.ex
@@ -3,7 +3,9 @@ defmodule RedirectTo.RedirectController do
alias RedirectTo.Queries
def show(conn, %{"slug" => slug}) do
- link = Queries.Link.by_slug(slug) |> Repo.one
+ link = LinkCache.Cache.fetch(slug, fn ->
+ Queries.Link.by_slug(slug) |> Repo.one
+ end)
redirect conn, external: link.long_url
end
Digging into our cache
With our usage out of the way, let’s build out the rest of LinkCache.Cache
.
# lib/link_cache/cache.ex
defmodule LinkCache.Cache do
use GenServer
def start_link(opts \\ []) do
GenServer.start_link(__MODULE__, [
{:ets_table_name, :link_cache_table},
{:log_limit, 1_000_000}
], opts)
end
def fetch(slug, default_value_function) do
case get(slug) do
{:not_found} -> set(slug, default_value_function.())
{:found, result} -> result
end
end
defp get(slug) do
case GenServer.call(__MODULE__, {:get, slug}) do
[] -> {:not_found}
[{_slug, result}] -> {:found, result}
end
end
defp set(slug, value) do
GenServer.call(__MODULE__, {:set, slug, value})
end
# GenServer callbacks
def handle_call({:get, slug}, _from, state) do
%{ets_table_name: ets_table_name} = state
result = :ets.lookup(ets_table_name, slug)
{:reply, result, state}
end
def handle_call({:set, slug, value}, _from, state) do
%{ets_table_name: ets_table_name} = state
true = :ets.insert(ets_table_name, {slug, value})
{:reply, value, state}
end
def init(args) do
[{:ets_table_name, ets_table_name}, {:log_limit, log_limit}] = args
:ets.new(ets_table_name, [:named_table, :set, :private])
{:ok, %{log_limit: log_limit, ets_table_name: ets_table_name}}
end
end
We’ve already seen LinkCache.Cache.fetch/2
, so let’s move on to
LinkCache.Cache.get/1
and LinkCache.Cache.set/2
.
defp get(slug) do
case GenServer.call(__MODULE__, {:get, slug}) do
[] -> {:not_found}
[{_slug, result}] -> {:found, result}
end
end
GenServer.call/3
is a synchronous function we use to send a request to the
server (__MODULE__
, which is equivalent to LinkCache.Cache
). The 2-tuple
{:get, slug}
(which we’ll pattern-match on) tells the server to retrieve the
value with the key slug
, and will return zero or one results due to the
underlying storage mechanism we’re using.
defp set(slug, value) do
GenServer.call(__MODULE__, {:set, slug, value})
end
Here, we’re performing another synchronous action to cache the value
for
the specific slug
, returning the 2-tuple {slug, value}
.
Both LinkCache.Cache.get/1
and LinkCache.Cache.set/2
are sending requests
to __MODULE__
, as discussed before; we need to hook into these behaviors,
pattern-matching on the requests we’re sending, to trigger underlying
behavior.
Above, we’d seen GenServer.call/3
used, with the appropriate tuples being
passed; let’s look at the hooks we’ve written to handle these requests:
def handle_call({:get, slug}, _from, state) do
%{ets_table_name: ets_table_name} = state
result = :ets.lookup(ets_table_name, slug)
{:reply, result, state}
end
def handle_call({:set, slug, value}, _from, state) do
%{ets_table_name: ets_table_name} = state
true = :ets.insert(ets_table_name, {slug, value})
{:reply, value, state}
end
GenServer.handle_call/3
manages the request (either {:get, slug}
or
{:set, slug, value}
), a 2-tuple of the request sender (PID and descriptor),
which we’re ignoring here (_from
), and the server’s internal state (state
).
Both reply to the request with a 3-tuple {:reply, link_struct, state}
- with
the link Struct available, we can then retrieve the URL and send the user on
their way.
Erlang Term Storage
LinkCache.Cache.get/1
and LinkCache.Cache.set/2
each reference :ets
, an
Erlang module referencing the underlying ETS (Erlang Term Storage)
architecture, an incredibly performant in-memory store with O(1)
lookups for
“set” storage, which we’re using.
In both definitions, we pattern-match on state
to retrieve the table name
for our link cache and perform both insert (for LinkCache.Cache.set/2
) and
lookup (for LinkCache.Cache.get/1
) operations.
We configure :ets
in LinkCache.Cache.start_link/1
and
LinkCache.Cache.init/1
:
def start_link(opts \\ []) do
GenServer.start_link(__MODULE__, [
{:ets_table_name, :link_cache_table},
{:log_limit, 1_000_000}
], opts)
end
def init(args) do
[{:ets_table_name, ets_table_name}, {:log_limit, log_limit}] = args
:ets.new(ets_table_name, [:named_table, :set, :private])
{:ok, %{log_limit: log_limit, ets_table_name: ets_table_name}}
end
These functions prepare LinkCache.Cache
to be supervised
(GenServer.start_link/3
) with configuration (a table named
:link_cache_table
with room for 1 million 2-tuples), as well as create the
table (:ets.new/2
), configured for a named table with “set” storage.
Fault-tolerance with a supervisor
With the underlying cache built, we now want to ensure Phoenix keeps the cache running. Elixir promotes a strategy of letting things crash - and ensuring recovery is quick and straightforward - with Supervisors at the helm.
Let’s build out a supervisor to ensure LinkCache.Cache
is monitored and runs
correctly:
# lib/link_cache/supervisor.ex
defmodule LinkCache.Supervisor do
use Supervisor
def start_link do
Supervisor.start_link(__MODULE__, :ok, name: __MODULE__)
end
def init(:ok) do
children = [
worker(LinkCache.Cache, [[name: LinkCache.Cache]])
]
supervise(children, strategy: :one_for_one)
end
end
Our LinkCache.Supervisor
supervises only LinkCache.Cache
, with a
:one_for_one
strategy.
We’ll also need to update our application to include this supervisor:
# lib/redirect_to.ex
defmodule RedirectTo do
use Application
def start(_type, _args) do
import Supervisor.Spec, warn: false
children = [
supervisor(RedirectTo.Endpoint, []),
supervisor(RedirectTo.Repo, []),
supervisor(LinkCache.Supervisor, []),
]
opts = [strategy: :one_for_one, name: RedirectTo.Supervisor]
Supervisor.start_link(children, opts)
end
# ...
end
So, in addition to our RedirectTo.Endpoint
and RedirectTo.Repo
, our
LinkCache.Supervisor
will also be managed by RedirectTo
; this ensures if
anything happens to the cache, our application will restart it.
You can find a more thorough walkthrough of supervisors, supervision trees, and how to tie various components together on Elixir’s discussion of OTP topics.
Testing LinkCache.Cache
To test the cache, I wrote a small test ensuring the cache set the value based on return value of a function, and reused that value on a subsequent lookup.
# test/lib/link_cache/cache_test.exs
defmodule LinkCacheTest do
use ExUnit.Case
test "caches and finds the correct data" do
assert LinkCache.Cache.fetch("A", fn ->
%{id: 1, long_url: "http://www.example.com"}
end) == %{id: 1, long_url: "http://www.example.com"}
assert LinkCache.Cache.fetch("A", fn -> "" end) == %{id: 1, long_url: "http://www.example.com"}
end
end
Again, the cache can store Structs, Maps, or all sorts of data structures. As is the case with any cache, invalidation might be tricky. The architecture of this application is such that the links themselves cannot be updated, so I’m not concerned about invalid data. Your mileage may vary.
Memory usage with 1 million 2-tuples
I’d admittedly chosen a log_limit
of 1 million pairs without knowing the
memory implications of that particular cache size. And while lookups are
O(1)
(constant time, regardless of the number of pairs in the cache), I did
want to verify I wouldn’t be running out of memory.
How do we verify this?
First, fire up IEx running the Phoenix application:
iex -S mix phoenix.server
Next, run :observer.start()
within IEx to see a GUI outlining various metrics
of the running application.
Assuming ETS doesn’t make any sort of optimizations around similar values:
link = RedirectTo.Queries.Link.by_slug("A") |> RedirectTo.Repo.one
1..1_000_000 |> Enum.each(fn(i) -> LinkCache.Cache.fetch((i |> to_string), fn -> link end) end)
This inserts one million pairs into the cache, with the keys being string
versions of each integer ("1"
, "1250"
, etc.) and the values being the
Struct directly from Ecto.
Over the course of ~40s, all 1 million pairs were stored in-memory (insertion of ~25,000/s) with a result size of around 856MB, or just under 1kB per pair.
Storing a subset of the data results in significantly less memory usage:
link = RedirectTo.Queries.Link.by_slug("A") |> RedirectTo.Repo.one |> Map.take([:slug, :long_url])
Benchmarks on Heroku
With the cache in place, I deployed the cache- and database-read versions of the application to Heroku, running on the free server and database.
In both tests, I used Siege against the HTTP version of the application, configured not to follow redirects, against the same slug with 500 concurrent users over 60s. In both cases, the Heroku dyno was running (no spin-up time).
DB read results
Transactions: 29508 hits
Availability: 100.00 %
Elapsed time: 59.75 secs
Data transferred: 2.67 MB
Response time: 0.70 secs
Transaction rate: 493.86 trans/sec
Throughput: 0.04 MB/sec
Concurrency: 343.44
Successful transactions: 29508
Failed transactions: 0
Longest transaction: 5.98
Shortest transaction: 0.04
I was impressed by these numbers; almost 30,000 requests per minute on free hardware, with an average response time of roughly 700ms, and this was hitting the database on each request. No connections were dropped.
Cache read results
Transactions: 44829 hits
Availability: 100.00 %
Elapsed time: 59.38 secs
Data transferred: 4.06 MB
Response time: 0.38 secs
Transaction rate: 754.95 trans/sec
Throughput: 0.07 MB/sec
Concurrency: 284.19
Successful transactions: 44829
Failed transactions: 0
Longest transaction: 8.39
Shortest transaction: 0.03
The throughput when reading from the cache was significantly better, serving 50% more requests. Additionally, the average response time was about half that of the run hitting the database.
Wrapping up
Even with Elixir’s incredible speed and concurrency support out of the box (30k RPM on free Heroku hardware with sub-second response times is pretty great), there are always ways to leverage certain technologies for what they’re great at. By leveraging Erlang’s ETS, GenServer, and supervisors, we were able to build a fault-tolerant, scalable in-memory cache to improve both performance and throughput.