Instrumenting your Phoenix application using telemetry

Instrumenting an application can be a huge undertaking, even before trying to decide what to measure and when to measure it.

Getting started with instrumentation can be intimidating because we often try to group together three separate concerns:

  1. How will we collect measurements?
  2. What should we measure?
  3. Where will we store those measurements?

In reality, building a good foundation around How will we collect measurements? is all you need to get started.

Just as it would be foolhardy to try to map out all the features of our application without understanding what our users need, trying to answer the question of What should we measure? upfront will lead us to measure the wrong things. We can table this question for now, and let our future performance smells and analytics needs inform our answer when we are ready.

And we don’t need to answer Where will we store those measurements? right away, because that’s a storage implementation detail.

A good foundation around how we collect measurements will allow us to easily add and remove measurements in the future, and switch out the storage implementation with minimal code changes.

Let’s build that foundation for a Phoenix application. Enter: telemetry.

What is telemetry?

Telemetry is a dynamic dispatching library for metrics and instrumentations. It is lightweight, small and can be used in any Erlang or Elixir project.

In practice, the :telemetry module allows us to emit events (optionally with a data payload), and register functions to be called when events are emitted.

The Telemetry project aims to standardize the process of introspection by allowing libraries and frameworks to expose a set of telemetry events which can then be captured for the purposes of debugging, measurement, or logging.

Some common libraries which expose telemetry events include:

Our application can also emit custom events, which can be captured in the same way as events emitted by libraries that use telemetry.

Up and running with Phoenix

Since Phoenix 1.5, applications come preconfigured with telemetry. For older applications, it can be manually added.

First, add the following ad dependencies in mix.exs:

{:telemetry_metrics, "~> 0.4"},
{:telemetry_poller, "~> 0.4"}

Next, create a telemetry supervisor at lib/app_web/telemetry.ex; the below is identical to what is generated by default since Phoenix 1.5:

defmodule AppWeb.Telemetry do
  use Supervisor
  import Telemetry.Metrics

  def start_link(arg) do
    Supervisor.start_link(__MODULE__, arg, name: __MODULE__)
  end

  @impl true
  def init(_arg) do
    children = [
      # Telemetry poller will execute the given period measurements
      # every 10_000ms. Learn more here: https://hexdocs.pm/telemetry_metrics
      {:telemetry_poller, measurements: periodic_measurements(), period: 10_000}
      # Add reporters as children of your supervision tree.
      # {Telemetry.Metrics.ConsoleReporter, metrics: metrics()}
    ]

    Supervisor.init(children, strategy: :one_for_one)
  end

  def metrics do
    [
      # Phoenix Metrics
      summary("phoenix.endpoint.stop.duration",
        unit: {:native, :millisecond}
      ),
      summary("phoenix.router_dispatch.stop.duration",
        tags: [:route],
        unit: {:native, :millisecond}
      ),

      # Database Metrics
      summary("app.repo.query.total_time", unit: {:native, :millisecond}),
      summary("app.repo.query.decode_time", unit: {:native, :millisecond}),
      summary("app.repo.query.query_time", unit: {:native, :millisecond}),
      summary("app.repo.query.queue_time", unit: {:native, :millisecond}),
      summary("app.repo.query.idle_time", unit: {:native, :millisecond}),

      # VM Metrics
      summary("vm.memory.total", unit: {:byte, :kilobyte}),
      summary("vm.total_run_queue_lengths.total"),
      summary("vm.total_run_queue_lengths.cpu"),
      summary("vm.total_run_queue_lengths.io")
    ]
  end

  defp periodic_measurements do
    [
      # A module, function and arguments to be invoked periodically.
      # This function must call :telemetry.execute/3 and a metric must be added above.
      # {AppWeb, :count_users, []}
    ]
  end
end

Be sure to replace AppWeb in the above with your application module name.

Finally, add the telemetry module to your application’s supervision tree in lib/app/application.ex:

children = [
  MyApp.Repo,
  MyAppWeb.Telemetry,
  MyAppWeb.Endpoint,
  . . .
]

After installing dependencies and restarting the application, we’re ready to get started.

Collecting metrics

The AppWeb.Telemetry module can capture telemetry events as they are emitted and turn them into metrics, or it can periodically build custom metrics on a set interval.

A metric — as defined by the telemetry package — is an aggregation of events with a specific name. Exactly how they are aggregated depends on which of the five available metric types we utilize from Telemetry.Metrics:

  • counter/2 - counts the total number of emitted events
  • sum/2 - sums the selected measurement
  • last_value/2 - the value of the selected measurement from the most recent event
  • summary/2 - the statistical mean, maximum, and percentiles for the selected measurement
  • distribution/2 - a histogram of the selected measurement

The module Phoenix generates for us captures a handful of telemetry events that are available out of the box from Phoenix and Ecto as summary metrics. Metrics are defined in the module’s metrics/0 method:

def metrics do
  [
    # Phoenix Metrics
    summary("phoenix.endpoint.stop.duration",
      unit: {:native, :millisecond}
    ),
    summary("phoenix.router_dispatch.stop.duration",
      tags: [:route],
      unit: {:native, :millisecond}
    ),

    # Database Metrics
    summary("app.repo.query.total_time", unit: {:native, :millisecond}),
    summary("app.repo.query.decode_time", unit: {:native, :millisecond}),
    summary("app.repo.query.query_time", unit: {:native, :millisecond}),
    summary("app.repo.query.queue_time", unit: {:native, :millisecond}),
    summary("app.repo.query.idle_time", unit: {:native, :millisecond}),

    # VM Metrics
    summary("vm.memory.total", unit: {:byte, :kilobyte}),
    summary("vm.total_run_queue_lengths.total"),
    summary("vm.total_run_queue_lengths.cpu"),
    summary("vm.total_run_queue_lengths.io")
  ]
end

We can look at the events emitted by phoenix itself and other libraries we use, and capture them as different metrics by adding them here to the metrics/0 method.

The definitions of these metrics alone don’t do much though — we’ll need to send them somewhere.

Reporting metrics

Metrics defined in AppWeb.Telemetry.metrics/0 need to be reported in order for us to make use of them, and so Telemetry.Metrics has the concept of a reporter which does exactly that.

Out of the box, Telemetry.Metrics.ConsoleReporter is provided, which when enabled will output the metrics we define to the phoenix log.

Reporters can be added by specifying them as children to our supervisor in AppWeb.Telemetry.init/1. Let’s enable the console reporter by uncommenting it:

def init(_arg) do
  children = [
    # Telemetry poller will execute the given period measurements
    # every 10_000ms. Learn more here: https://hexdocs.pm/telemetry_metrics
    {:telemetry_poller, measurements: periodic_measurements(), period: 10_000},
    # Add reporters as children of your supervision tree.
    {Telemetry.Metrics.ConsoleReporter, metrics: metrics()}
  ]

  Supervisor.init(children, strategy: :one_for_one)
end

After restarting your application, you should start seeing messages similar to the following in your application logs:

[Telemetry.Metrics.ConsoleReporter] Got new event!
Event name: phoenix.router_dispatch.stop
All measurements: %{duration: 2109000}
All metadata: %{conn: %Plug.Conn{. . .}, . . .} 

[Telemetry.Metrics.ConsoleReporter] Got new event!
Event name: vm.memory
All measurements: %{atom: 729321, atom_used: 702036, binary: 2443936, code: 14887818, ets: 1701368, processes: 8723368, processes_used: 8723368, system: 40073656, total: 48797024}
All metadata: %{}

There are many telemetry_metrics_* packages available via hex which facilitate reporting metrics to common storages such as statsd, Appsignal, or telegraf.

Writing your own reporter is also an option, using the ConsoleReporter as a reference implementation.

Emitting custom telemetry events

So far we’ve looked at using events emitted by third party libraries, but much in the same way, we can emit our own telemetry events and turn them into metrics.

For example, if we wanted to emit a telemetry event when a particular controller action was executed, we could emit the event directly from the controller using the :telemetry module:

defmodule AppWeb.SplashPageController do
  use AppWeb, :controller

  def index(conn, _params) do
    :telemetry.execute(
      [:web, :controller, :action],
      %{},
      %{
        controller: :splash_page,
        action: :index
      }
    )

    render(conn, "index.html")
  end
end

The :telemetry.execute/3 method takes an event name (a string or list of atoms), a map of measurements, and a map of event metadata.

There is nothing special about the event name or the name of the keys in the measurements or metadata maps. They are the same names that consumers of the event will use to listen for and access information about the event, but they can be anything that makes sense for your custom event.

In the above example, we’re not adding any measurements, so consumers of this event would only be able to count the number of times it occured.

We are adding some event metadata though — a :controller and :action key — to allow consumers of the [:web, :controller, :action] event to know which controller and action this event was emitted from.

We could have instead emitted an event that was very specific to this controller action, passing no measurements or metadata:

:telemetry.execute([:web, :controller, :splash_page, :index], %{}, %{})

This would have the same effect, but it is less flexible because it doesn’t allow for consumers to easily capture all controller actions or all events from the `:splashpage` controller_. Metadata gives us a flexible way to keep our event names generic and multipurpose, while filtering down the events to the context we care about.

As for passing measurements to :telemetry.execute/3 - what values you pass depends on what (if anything) you want to measure.

However, you should not use :telemetry.execute/3 to measure code execution time.

For that purpose, :telemetry exposes span/3:

defmodule AppWeb.PageController do
  use AppWeb, :controller

  def index(conn, _params) do
    :telemetry.span(
      [:web, :controller, :action],
      %{},
      fn ->
        result = render(conn, "index.html")
        {result, %{}}
      end
    )
  end
end

The span/3 function in the above example will emit two events instead of just one:

  • In the case the code block succeeded: [:web, :controller, :action, :start] and [:web, :controller, :action, :stop]
  • In the case the code block failed: [:web, :controller, :action, :start] and [:web, :controller, :action, :exception]

Additionally, these events include some default measurements:

  • The start event includes :system_time
  • The end event includes :duration
  • The exception event includes :error

Encapsulating custom metrics

In the previous examples, we just inline the calls to the :telemetry module into our controller, but often it is more readable and reusable to define our events in a separate module.

For that purpose, let’s define a AppWeb.TelemetryEvents module at app/lib/app_web/telemetry_events.ex:

defmodule AppWeb.TelemetryEvents do
  def controller_action(controller, action) do
    :telemetry.execute(
      [:web, :controller, :action],
      %{},
      %{
        controller: controller,
        action: action
      }
    )
  end
end

Then, when we want to emit that event from our controller, we can write:

defmodule AppWeb.PageController do
  use AppWeb, :controller

  def index(conn, _params) do
    AppWeb.TelemetryEvents.controller_action(:splash_page, :index)

    render(conn, "index.html")
  end
end

This method of defining events as methods in a dedicated module allows us to ensure that our event names are consistent when we use them throughout our application, and that changes to our events don’t require changes within our application code.

Final thoughts

Having built a good foundation for collecting application metrics, it is straightforward to add additional metrics as we need them.

When it comes to the next step — What should we measure? — we should avoid rushing to answer that question.

We can add metrics when we notice performance smells in our application, or when a requirement comes along to capture additional data for analytics purposes.

It’s important to only measure what we need. Measuring too much will have an impact on application performance, and result in more data we need to sort through when trying to diagnose a problem.

The question of Where will we store those measurements? is now an implementation detail. Our foundation uses swappable metrics reporters, and we can have one or multiple of them active at one time!

As a bonus: be sure to check out Phoenix LiveDashboard which can utilize these metrics in the same way and visualize them into a LiveView dashboard hosted by your Phoenix app.