---
title: Why Postgres Won't Always Use an Index
teaser: Use SQL EXPLAIN to understand when and why indexes are used (or not).
tags: web,performance,postgresql,sql
author: Simon Taranto
published_on: 2014-07-25
---

We were analyzing slow responses on a project recently and found ourselves
questioning the performance of some PostgreSQL queries. As is typical when
digging into queries, we had some questions around what the query was doing. So,
we used our handy `EXPLAIN` command to shed some light on the database's
behavior. Upon inspecting the query it turned out an index we had created was
not being used. We wanted to know why.

## EXPLAIN explained

Let's first look at how the `EXPLAIN` command works.

The [Postgres docs](http://www.postgresql.org/docs/9.3/static/sql-explain.html)
have a helpful article for learning about `EXPLAIN`. The basics are that in your
`psql` console prepend the word `EXPLAIN` in front of your query and you'll get
a query plan.

### In SQL

Run `psql <name of your database>`, or `rails dbconsole` if you are inside of a
Rails project. Prepend `EXPLAIN` to your query to see an explanation of how
Postgres is going to break down your problem in the most efficient way possible.

```sql
EXPLAIN SELECT * FROM posts;
```

                            QUERY PLAN
    ----------------------------------------------------------
     Seq Scan on posts  (cost=0.00..53.84 rows=281 width=704)
    (1 row)

### In Rails

If you don't feel like dropping down to the database you can get explain
statements from `ActiveRecord::Relation`s by calling the method `explain`.
Calling the following command will give you the same result you'll see from the
Postgres console:

```ruby
Post.all.explain
```

### How to interpret the output

Now we have the query plan. What is it telling us? Query plans are read bottom
to top and display the following information for each step of the query:

- estimated statement execution cost: an arbitrary unit of measuring cost
- estimated number of rows: the number of rows the query will produce
- estimated width of rows: the size of the rows in bytes

For a more thorough explanation of how to analyze these query plans' costs
and number of rows check out the [Postgres docs on using
explain](http://www.postgresql.org/docs/current/static/using-explain.html).

What's important to our question about indexes is the type of query being
performed and whether an index is being used or not. If we see a step containing
the words `Index Scan` we're using the index.

### A side note about query costs

As noted in the Postgres docs, the query plans are based on a sample of
statistics about the tables in the query. These statistics are kept up to date
by Postgres when your database is
[`VACUUM`](http://www.postgresql.org/docs/9.3/static/sql-vacuum.html)ed. As a
result, even when the underlying data is identical, you may see slightly
different values for the same
query run at different times.

## An Example

It doesn't take much digging to find interesting results in the query plans.
Let's use a database with a `posts` table containing a couple hundred post
records. `id` is the primary key and by default has an index.
Let's see what the query looks like to get the first 10 posts.

```sql
EXPLAIN SELECT * FROM POSTS WHERE id < 10;
```

Our plan is:

     Bitmap Heap Scan on posts  (cost=2.07..16.39 rows=9 width=704)
       Recheck Cond: (id < 10)
       ->  Bitmap Index Scan on posts_pkey  (cost=0.00..2.07 rows=9 width=0)
             Index Cond: (id < 10)
    (4 rows)

This query is using the index because we see `Index Scan` in the plan. The
meaning of the `Recheck Cond:` statement is beyond the scope of this article,
but the [postgres-performance mailing
list](http://www.postgresql.org/message-id/464F3C5D.2000700@enterprisedb.com)
has a helpful explanation.

Now, let's see what it looks like to get the rest of the posts in the table.

```sql
EXPLAIN SELECT * FROM POSTS WHERE id > 10;
```

Our plan is:

     Seq Scan on posts  (cost=0.00..53.98 rows=272 width=704)
       Filter: (id > 10)
    (2 rows)

This query is going to perform a sequential scan (`Seq Scan`) and not use the
index. A sequential scan? This plan means that Postgres is going to read the
whole table to find what we're looking for. That approach seems inefficient
given the index we already have. What's going on here? To answer this question
it's helpful to think about what an index is.  An index is a specialized
representation in memory of the contents of a particular column (or multiple
columns).

## How indexes are used

As we saw above, running a couple of queries on our `posts` table reveals that
even given an index to use, Postgres will not always choose to use it. The
reason why this is the case is that indexes have a cost to create and maintain
(on writes) and use (on reads).

When an index is used in a `SELECT` query, first the position of the requested
rows is fetched from the index (instead of from the table directly). Then, the
requested rows (a subset of the total) are actually fetched from the table. It's
a two step process. Given a large enough number of requested rows (such as all
the posts with an id greater than 10), it is actually more efficient to go fetch
all those rows from the table directly rather than to go through the extra step
of asking the index for the position of those rows and next fetch those specific
rows from the table.  If the percentage of the rows is smaller than 5-10% of all
rows in the table the benefit of using the information stored in the index
outweighs the additional intermediate step.

## What's next

Hear more about Postgres indexes, caching, and performance from Harold Giménez,
head of Heroku Postgres, on [A Beautiful
Thing](http://podcasts.thoughtbot.com/giantrobots/77), an episode of the *Giant
Robots Smashing Into Other Giants* podcast. Or, watch the [Improving Rails
Performance](https://thoughtbot.com/upcase/improving-rails-performance)
screencast from Joe Ferris, our CTO.
