Instrumenting an application can be a huge undertaking, even before trying to decide what to measure and when to measure it.
Getting started with instrumentation can be intimidating because we often try to group together three separate concerns:
- How will we collect measurements?
- What should we measure?
- Where will we store those measurements?
In reality, building a good foundation around How will we collect measurements? is all you need to get started.
Just as it would be foolhardy to try to map out all the features of our application without understanding what our users need, trying to answer the question of What should we measure? upfront will lead us to measure the wrong things. We can table this question for now, and let our future performance smells and analytics needs inform our answer when we are ready.
And we don’t need to answer Where will we store those measurements? right away, because that’s a storage implementation detail.
A good foundation around how we collect measurements will allow us to easily add and remove measurements in the future, and switch out the storage implementation with minimal code changes.
Let’s build that foundation for a Phoenix application. Enter: telemetry
.
What is telemetry
?
Telemetry is a dynamic dispatching library for metrics and instrumentations. It is lightweight, small and can be used in any Erlang or Elixir project.
In practice, the :telemetry
module allows us to emit events (optionally with a
data payload), and register functions to be called when events are emitted.
The Telemetry project aims to standardize the process of introspection by allowing libraries and frameworks to expose a set of telemetry events which can then be captured for the purposes of debugging, measurement, or logging.
Some common libraries which expose telemetry events include:
Our application can also emit custom events, which can be captured in the same way as events emitted by libraries that use telemetry.
Up and running with Phoenix
Since Phoenix 1.5, applications come preconfigured with telemetry. For older applications, it can be manually added.
First, add the following ad dependencies in mix.exs
:
{:telemetry_metrics, "~> 0.4"},
{:telemetry_poller, "~> 0.4"}
Next, create a telemetry supervisor at lib/app_web/telemetry.ex
; the below is
identical to what is generated by default since Phoenix 1.5:
defmodule AppWeb.Telemetry do
use Supervisor
import Telemetry.Metrics
def start_link(arg) do
Supervisor.start_link(__MODULE__, arg, name: __MODULE__)
end
@impl true
def init(_arg) do
children = [
# Telemetry poller will execute the given period measurements
# every 10_000ms. Learn more here: https://hexdocs.pm/telemetry_metrics
{:telemetry_poller, measurements: periodic_measurements(), period: 10_000}
# Add reporters as children of your supervision tree.
# {Telemetry.Metrics.ConsoleReporter, metrics: metrics()}
]
Supervisor.init(children, strategy: :one_for_one)
end
def metrics do
[
# Phoenix Metrics
summary("phoenix.endpoint.stop.duration",
unit: {:native, :millisecond}
),
summary("phoenix.router_dispatch.stop.duration",
tags: [:route],
unit: {:native, :millisecond}
),
# Database Metrics
summary("app.repo.query.total_time", unit: {:native, :millisecond}),
summary("app.repo.query.decode_time", unit: {:native, :millisecond}),
summary("app.repo.query.query_time", unit: {:native, :millisecond}),
summary("app.repo.query.queue_time", unit: {:native, :millisecond}),
summary("app.repo.query.idle_time", unit: {:native, :millisecond}),
# VM Metrics
summary("vm.memory.total", unit: {:byte, :kilobyte}),
summary("vm.total_run_queue_lengths.total"),
summary("vm.total_run_queue_lengths.cpu"),
summary("vm.total_run_queue_lengths.io")
]
end
defp periodic_measurements do
[
# A module, function and arguments to be invoked periodically.
# This function must call :telemetry.execute/3 and a metric must be added above.
# {AppWeb, :count_users, []}
]
end
end
Be sure to replace AppWeb
in the above with your application module name.
Finally, add the telemetry module to your application’s supervision tree in
lib/app/application.ex
:
children = [
MyApp.Repo,
MyAppWeb.Telemetry,
MyAppWeb.Endpoint,
. . .
]
After installing dependencies and restarting the application, we’re ready to get started.
Collecting metrics
The AppWeb.Telemetry
module can capture telemetry events as they are emitted
and turn them into metrics, or it can periodically build custom metrics on a set
interval.
A metric — as defined by the telemetry package — is an aggregation of events with a specific
name. Exactly how they are aggregated depends on which of the five available
metric types we utilize from Telemetry.Metrics
:
counter/2
- counts the total number of emitted eventssum/2
- sums the selected measurementlast_value/2
- the value of the selected measurement from the most recent eventsummary/2
- the statistical mean, maximum, and percentiles for the selected measurementdistribution/2
- a histogram of the selected measurement
The module Phoenix generates for us captures a handful of telemetry events that
are available out of the box from Phoenix and Ecto as summary metrics. Metrics
are defined in the module’s metrics/0
method:
def metrics do
[
# Phoenix Metrics
summary("phoenix.endpoint.stop.duration",
unit: {:native, :millisecond}
),
summary("phoenix.router_dispatch.stop.duration",
tags: [:route],
unit: {:native, :millisecond}
),
# Database Metrics
summary("app.repo.query.total_time", unit: {:native, :millisecond}),
summary("app.repo.query.decode_time", unit: {:native, :millisecond}),
summary("app.repo.query.query_time", unit: {:native, :millisecond}),
summary("app.repo.query.queue_time", unit: {:native, :millisecond}),
summary("app.repo.query.idle_time", unit: {:native, :millisecond}),
# VM Metrics
summary("vm.memory.total", unit: {:byte, :kilobyte}),
summary("vm.total_run_queue_lengths.total"),
summary("vm.total_run_queue_lengths.cpu"),
summary("vm.total_run_queue_lengths.io")
]
end
We can look at the events emitted by phoenix itself and other libraries we use,
and capture them as different metrics by adding them here to the metrics/0
method.
The definitions of these metrics alone don’t do much though — we’ll need to send them somewhere.
Reporting metrics
Metrics defined in AppWeb.Telemetry.metrics/0
need to be reported in order for
us to make use of them, and so Telemetry.Metrics
has the concept of a
reporter which does exactly that.
Out of the box, Telemetry.Metrics.ConsoleReporter
is provided, which when
enabled will output the metrics we define to the phoenix log.
Reporters can be added by specifying them as children to our supervisor in
AppWeb.Telemetry.init/1
. Let’s enable the console reporter by uncommenting
it:
def init(_arg) do
children = [
# Telemetry poller will execute the given period measurements
# every 10_000ms. Learn more here: https://hexdocs.pm/telemetry_metrics
{:telemetry_poller, measurements: periodic_measurements(), period: 10_000},
# Add reporters as children of your supervision tree.
{Telemetry.Metrics.ConsoleReporter, metrics: metrics()}
]
Supervisor.init(children, strategy: :one_for_one)
end
After restarting your application, you should start seeing messages similar to the following in your application logs:
[Telemetry.Metrics.ConsoleReporter] Got new event!
Event name: phoenix.router_dispatch.stop
All measurements: %{duration: 2109000}
All metadata: %{conn: %Plug.Conn{. . .}, . . .}
[Telemetry.Metrics.ConsoleReporter] Got new event!
Event name: vm.memory
All measurements: %{atom: 729321, atom_used: 702036, binary: 2443936, code: 14887818, ets: 1701368, processes: 8723368, processes_used: 8723368, system: 40073656, total: 48797024}
All metadata: %{}
There are many telemetry_metrics_*
packages available via hex which
facilitate reporting metrics to common storages such as statsd, Appsignal,
or telegraf.
Writing your own reporter is also an option, using the ConsoleReporter
as a
reference implementation.
Emitting custom telemetry events
So far we’ve looked at using events emitted by third party libraries, but much in the same way, we can emit our own telemetry events and turn them into metrics.
For example, if we wanted to emit a telemetry event when a particular controller
action was executed, we could emit the event directly from the controller using
the :telemetry
module:
defmodule AppWeb.SplashPageController do
use AppWeb, :controller
def index(conn, _params) do
:telemetry.execute(
[:web, :controller, :action],
%{},
%{
controller: :splash_page,
action: :index
}
)
render(conn, "index.html")
end
end
The :telemetry.execute/3
method takes an event name (a string or list of
atoms), a map of measurements, and a map of event metadata.
There is nothing special about the event name or the name of the keys in the measurements or metadata maps. They are the same names that consumers of the event will use to listen for and access information about the event, but they can be anything that makes sense for your custom event.
In the above example, we’re not adding any measurements, so consumers of this event would only be able to count the number of times it occured.
We are adding some event metadata though — a :controller
and :action
key — to allow consumers of the [:web, :controller, :action]
event to
know which controller and action this event was emitted from.
We could have instead emitted an event that was very specific to this controller action, passing no measurements or metadata:
:telemetry.execute([:web, :controller, :splash_page, :index], %{}, %{})
This would have the same effect, but it is less flexible because it doesn’t allow for consumers to easily capture all controller actions or all events from the `:splashpage` controller_. Metadata gives us a flexible way to keep our event names generic and multipurpose, while filtering down the events to the context we care about.
As for passing measurements to :telemetry.execute/3
- what values you pass
depends on what (if anything) you want to measure.
However, you should not use :telemetry.execute/3
to measure code execution
time.
For that purpose, :telemetry
exposes span/3
:
defmodule AppWeb.PageController do
use AppWeb, :controller
def index(conn, _params) do
:telemetry.span(
[:web, :controller, :action],
%{},
fn ->
result = render(conn, "index.html")
{result, %{}}
end
)
end
end
The span/3
function in the above example will emit two events instead of just
one:
- In the case the code block succeeded:
[:web, :controller, :action, :start]
and[:web, :controller, :action, :stop]
- In the case the code block failed:
[:web, :controller, :action, :start]
and[:web, :controller, :action, :exception]
Additionally, these events include some default measurements:
- The start event includes
:system_time
- The end event includes
:duration
- The exception event includes
:error
Encapsulating custom metrics
In the previous examples, we just inline the calls to the :telemetry
module
into our controller, but often it is more readable and reusable to define our
events in a separate module.
For that purpose, let’s define a AppWeb.TelemetryEvents
module at
app/lib/app_web/telemetry_events.ex
:
defmodule AppWeb.TelemetryEvents do
def controller_action(controller, action) do
:telemetry.execute(
[:web, :controller, :action],
%{},
%{
controller: controller,
action: action
}
)
end
end
Then, when we want to emit that event from our controller, we can write:
defmodule AppWeb.PageController do
use AppWeb, :controller
def index(conn, _params) do
AppWeb.TelemetryEvents.controller_action(:splash_page, :index)
render(conn, "index.html")
end
end
This method of defining events as methods in a dedicated module allows us to ensure that our event names are consistent when we use them throughout our application, and that changes to our events don’t require changes within our application code.
Final thoughts
Having built a good foundation for collecting application metrics, it is straightforward to add additional metrics as we need them.
When it comes to the next step — What should we measure? — we should avoid rushing to answer that question.
We can add metrics when we notice performance smells in our application, or when a requirement comes along to capture additional data for analytics purposes.
It’s important to only measure what we need. Measuring too much will have an impact on application performance, and result in more data we need to sort through when trying to diagnose a problem.
The question of Where will we store those measurements? is now an implementation detail. Our foundation uses swappable metrics reporters, and we can have one or multiple of them active at one time!
As a bonus: be sure to check out Phoenix LiveDashboard which can utilize these metrics in the same way and visualize them into a LiveView dashboard hosted by your Phoenix app.