---
title: Advanced Postgres Performance Tips
teaser: What do you do when indexes aren't enough?
tags: postgresql,databases,sql,performance
author: Caleb Hearth
published_on: 2016-07-29
---

You've [added the `INDEX`es], both partial and covering. You've `VACCUUM ANALYZE`d.
You `JOIN`ed and `INNER JOIN`ed everything to a single query. And yet, your
report is still taking too long. What do you do when the [low-hanging fruit] has
been harvested?

[added the `INDEX`es]: <https://thoughtbot.com/blog/postgres-index-types>
  "Postgres Index Types"
[low-hanging fruit]: <https://thoughtbot.com/blog/postgresql-performance-considerations>
  "Postgresql Performance Considerations"

It's time to get down and dirty with some of the lesser known SQL constructs in
Rails land.

## EXPLAIN what's happening

If you've gotten this far, you're probably [familiar] with using `EXPLAIN` and
`EXPLAIN ANALYZE` to get insight into what approach Postgres is taking to
execute queries and the actual performance of those approaches. You know that an
[Index Scan is preferable to a Seq Scan], but you'll settle for a Heap Scan as a
Join Cond.

[Index Scan is preferable to a Seq Scan]: <https://thoughtbot.com/blog/why-postgres-wont-always-use-an-index>
  "Why Postgres Won't Always Use an Index"
[familiar]: <https://thoughtbot.com/blog/reading-an-explain-analyze-query-plan>
  "Reading a Postgres EXPLAIN ANALYZE Query Plan"

Query plans aren't the easiest thing to read, however. They're packed with
information and it's closer to being machine parsable than human readable.

### Visualize the problem

[Postgres Explain Viewer (PEV)] is a tool to simplify reading query plans. It
provides a horizontal tree with each node representing a node in the query plan.
It includes timing information, the error amount in the planned versus actual
times, and badges for interesting nodes like "costliest" or "bad estimate".

[Postgres Explain Viewer (PEV)]: <http://tatiyants.com/pev/#/plans/new>
  "Pev is designed to make Postgres query plans easier to grok."

![PEV Example](https://images.thoughtbot.com/blog-vellum-image-uploads/PbDCVAVRWCyg0dnx4KwC_PEV.png)

It's pretty useful to have at-a-glance feedback about your query. But knowing
how long queries take is just the beginning: what can we do to speed these up?

## Materialize your desires

Views are a tool for storing "partial queries". That is, they allow the database
to store a parsed query that you can treat as a table in most respects later.
You can SELECT (and sometimes UPDATE or DELETE) from a view with identical
syntax as if you were executing the statement against a table. They're useful
when you'd like to perform a complex statement repeatedly and don't want to deal
with cluttering your Ruby files with arcane ARel or strung-together scopes.

> Making liberal use of views is a key aspect of good SQL database design. Views
> allow you to encapsulate the details of the structure of your tables, which
> might change as your application evolves, behind consistent interfaces.

-- [PostgreSQL Documentation - Advanced Features - Views](<https://www.postgresql.org/docs/9.5/static/tutorial-views.html>)

A view can be materialized, which means the results are stored by Postgres at
`CREATE MATERIALIZED VIEW` and `REFRESH MATERIALIZED VIEW` time. The cost of the
partial query is paid at these times, so we can benefit from that over and over,
especially in read-heavy situations (most situations are read-heavy in my
experience).

Materialized views are especially helpful when your select list includes a
subset of columns, you perform identical operations such as COUNT or SUM or
extracting part of a jsonb object on every query, and when JOINing additional
tables. When it comes time to actually retrieve rows, these rows can be queried
against to return only relevant data. This is often cheaper to execute than a
full statement. These can be further beneficial by creating indices on the
materialized views themselves (such as on the column you're JOINed by, or on a
data column for a reporting query).

As an aside, Ruby on Rails does not have first-class support for views, despite
that from a usage perspective they're very similar to tables. We maintain a
Rails extension, [Scenic], which [helps with versioning, migrating, and
maintaining SQL views].

[Scenic]: <https://github.com/thoughtbot/scenic>
  "Versioned database views for Rails"
[helps with versioning, migrating, and maintaining SQL views]: <https://thoughtbot.com/blog/announcing-scenic--versioned-database-views-for-rails>
  "Announcing Scenic - Versioned Database Views for Rails"

The 37 minute query plan above can't be improved on by a materialized view,
unfortunately, because there aren't any good candidates for caching partial
results. This was an exceptional case and I was surprised to find that this
feature wouldn't be helpful to me.

## Common Table Expressions and Subqueries

[Common Table Expressions] (CTEs) such as `WITH expression_name AS (...) SELECT
...` and Subqueries such as `SELECT ... FROM (SELECT ...) AS subquery_name` are
tools for breaking up complex SQL queries, and sometimes the only way to achieve
a goal. While CTEs are arguably easier to read than subqueries, in Postgres they
are an "optimization fence", preventing the query optimizer from rewriting
queries by moving constraints into or out of the CTE. For example this query:

[Common Table Expressions]: <https://www.postgresql.org/docs/current/static/queries-with.html>
  "WITH Queries (Common Table Expressions) in the Postgres documentation"

```sql
SELECT
  x, count
FROM (SELECT x, count(x) count FROM big_table GROUP BY x) aggregates
WHERE x = 42;
```

The query could be optimized by "pushing" the WHERE clause into the subquery
to avoid performing the read and count operations for every row in big\_table:

```sql
SELECT
  x, count
FROM (SELECT x, count(x) count FROM big_table GROUP BY x WHERE x = 42) aggregates;
```

However, a CTE such as this would prevent the optimization, causing the entire
table to be read, aggregated, and materialized to disk before being re-scanned
from the materialized table for the `x = 42` constraint:

```sql
WITH aggregates AS (SELECT x, count(x) count FROM big_table GROUP BY x)
SELECT
  x, count
FROM aggregates
WHERE x = 42;
```

There are also times when the optimization fence is useful, such as when using
data-modifying statements (INSERT, UPDATE, or DELETE) in WITH. As the CTE is
only executed once, the result is the same where a subquery is allowed to be
called multiple times by the planner and would not return information from
deleted or updated rows.

Both Common Table Expressions and subqueries are useful, and one or the other
may be more performant in a specific case. This is one example where subqueries
are the better option, but I usually find that a CTE is as faster or better than
a subquery and lean on them most of the time. Experiment with both forms in
EXPLAIN ANALYZE to find the right tool for your query.

## Control yourself; Take only what you need from it

The common answer to why we should `SELECT col1, ..., colN` rather than `SELECT
*` is that we reduce network traffic. In most cases, for reasonably normalized
databases, and with today's high-speed network connections, this is unlikely to
be a huge deal. In [one example] involving 326/89K rows, the difference in
network traffic caused by selecting a single column versus all columns was about
10%.

[one example]: <https://weblogs.asp.net/jongalloway/the-real-reason-select-queries-are-bad-index-coverage>
  "The real reason SELECT \* queries are bad: index coverage"

A better reason to limit columns to only what's needed is index lookups. We want
the planner to hit indexes instead of doing a sequential scan of the table
itself. We generally optimize this by creating indexes on important columns such
as foreign keys and of course primary keys come with an index by default. These
are great, helping to identify the disk location of rows that match whatever
constraints we have in a query for easier lookup or ordering.

We can also use covering indexes, which include the specific columns and
expressions useful to a query, to store all of the relevant information. This
prevents the second step of reading from the table itself and can provide a big
performance gain. Using `*` as the select list in a query would likely not make
use of a covering index unless you index the entire table and keep it up-to-date
with added columns.

## Preparation is key

[Prepared statements] split the work of parsing, analyzing, rewriting, planning,
and executing a statement in the same way that materialized views split the work
of preparing parts of queries that don't change by caching their results. When a
statement is prepared, Postgres parses, analyzes, and rewrites it. It generally
uses placeholders for values being provided at EXECUTE time.

[Prepared statements]: <https://www.postgresql.org/docs/9.2/static/sql-prepare.html>
  "PREPARE in PostgreSQL documentation'

PREPARE is an optimization for the very specific use-case of similar statements
being executed many times in a single session. Prepared statements are not
persisted or shared between sessions, so they're not something you can use to
optimize a general use without setting up session connect events externally.

From the documentation:

> Prepared statements have the largest performance advantage when a single
> session is being used to execute a large number of similar statements. The
> performance difference will be particularly significant if the statements are
> complex to plan or rewrite, for example, if the query involves a join of many
> tables or requires the application of several rules. If the statement is
> relatively simple to plan and rewrite but relatively expensive to execute, the
> performance advantage of prepared statements will be less noticeable.

Since we previously mentioned that Rails does not support views, it's only fair
to point out that since version 3.1 it _does_ [make use of prepared statements].

[make use of prepared statements]: <http://patshaughnessy.net/2011/10/22/show-some-love-for-prepared-statements-in-rails-3-1>
  "Show some love for prepared statements in Rails 3.1"

## Do more in the database

The ultimate Postgres performance tip is to do more in the database. Postgres is
optimized to be very efficient at data storage, retrieval, and complex
operations such as aggregates, JOINs, etc. Let your web application deal with
displaying data and your database with manipulating and converting data.
Becoming comfortable with using Postgres for even the most complex reports
starts with familiarity with using it, from its syntax to its functions and
types. Read the Postgres documentation as you do the Ruby, Go, or Elixir docs.