---
title: PostgreSQL Performance Considerations
teaser: |
  Know the Postgres index types such as multicolumn indexes and partial indexes,
  don't over-index,
  use more joins,
  prefer INNER JOINs,
  understand the EXPLAIN output,
  and don't try to optimize on your development machine.
tags: web,postgresql,performance
author: Harold Giménez
published_on: 2011-01-07
---

There are a number of variables that allow a DBA to tune a PostgreSQL database
server for specific loads, disk types and hardware. These are fondly called the
GUCS (Global Unified Configuration Settings) and you can take a look via the
`pg_settings` view. There are also a few of things that you can do in your
application to get the most out of Postgres:

### Know the Postgres index types

By default `CREATE INDEX` will create B-tree indexes which will serve well for
most cases where we use equality, inequality and range operators. However there
are cases where you can build different indexing strategies with GiST
(Generalized Search Tree) indexes. For example, Postgres ships with built in
GiST operator classes for [geometric
operators](http://www.postgresql.org/docs/9.0/static/functions-geometry.html)
&#8212; for dealing with the geometric types like point, box, polygon, circle,
and others. There are more interesting [GiST index
examples](http://www.postgresql.org/docs/9.0/static/gist-examples.html) in the
contrib packages for things like textual search, tree structures, and more.

### Consider multicolumn indexes, when it makes sense

When your query filters the data by more than one column, be it with the `WHERE`
clause or `JOINs`, multicolumn indexes may prove useful. If you create an index
on columns (a, b), the Postgres planner can use it for queries

    WHERE a = 1
    WHERE a = 1 AND b = 2

However, it will not use it for queries using:

    WHERE a = 1 OR b = 2
    WHERE b = 2

But Postgres also has the ability to use multiple indexes in a single query.
This may come in handy if you are using the `OR` operator, but will also make
use of it for `AND` queries. So it boils down to what the most common case is
according to your application's read patterns and optimize for that, either with
an an index on (a, b) and another on (b), or two separate single column indexes.

### Partial indexes

Simply put, a partial index is an index with a `WHERE` clause. It will only
index rows that match the supplied predicate. You can use them to exclude values
from an index that you hardly query against.

For example, you have an `orders` table with a `completed` flag. The sales
people want to know what orders over $100,000.00 haven't been completed because
they want to collect their bonuses, so you build a view in your app to show them
just that (and negotiate a cut on the bonus). You could create the following
index:

    CREATE INDEX orders_incomplete_amount_index
       on orders (amount) WHERE complete is not true;

Which will be used by queries of the form:

    SELECT * FROM orders
      where amount > 100000 AND complete is not true;

### Don't over index

Part of maintaining a healthy database is going back and making sure you don't
have any unused indexes. It's common to add indexes to address a specific
performance issue for a particular query, but in many cases indexes start to
pile up becoming dead weight. Remember that the more indexes you have, the
slower INSERTs will become because more writes will need to happen to keep the
indexes updated.

### Keep statistics updated

Make sure to run `VACUUM ANALYZE` to keep data statistics up to date &#8212; as
well as recover disk space. In addition, Postgres ships with a built in
`auto-vacuum daemon` whose purpose is to automate the execution of `VACUUM
ANALYZE`. You should [read
up](http://www.postgresql.org/docs/9.0/static/routine-vacuuming.html) on
considerations for setting the auto-vacuum daemon's frequency according to your
database size and usage characteristics.

Make sure you ANALYZE when creating a new index, otherwise Postgres will not
have analyzed the data and determined that the new index may help for the query.

### Use more joins

Postgres is perfectly capable of joining multiple tables in a single query. In a
running app, queries with five joins are completely acceptable, and will help
bring in the data required by your app, reducing the number of trips to the
database. In most cases, joins are also a better solution than subqueries
&#8212; Postgres will even internally "rewrite" a subquery, creating a join,
whenever possible, but this of course increases the time it takes to come up
with the query plan. So be a pal and use joins instead of subselects.

### Prefer INNER JOINs

If the cardinality of both tables in a join is guaranteed to be equal for your
result set, always prefer doing an `INNER JOIN` instead of a `LEFT OUTER JOIN`.
A lot of research and code has gone into optimizing outer joins in Postgres over
the years. But the reality is that especially as more joins are added to a
query, left joins limit the planner's ability optimize the join order.

### Know how to understand the EXPLAIN output

Paste the output of `explain analyze [some query]` into
[explain.depesz.com](http://explain.depesz.com/) to help identify the most
costly nodes in the query plans.

Understanding `EXPLAIN` output is a very extensive topic, but these are some
general guidelines when reading plans:

* Are the cost estimates vs. actuals close, or are there discrepancies?
  Typically a sign of not having ANALYZEd recently.
* Is an index not being used? The planner may be choosing not to use it for good
  reason.
* Is there query using the `some_string LIKE pattern`? If so, make sure the
  pattern is anchored at the beginning of the string. Postgres can use an index
  when doing `some_string LIKE 'pattern%'` but not for `some_string LIKE
  '%pattern%'`
* Have you vacuumed recently? Have you indexed foreign keys?
* Are there table scans that should use an index instead? Not all table scans
  are bad &#8212; there are cases where it will perform better than an index
  scan
* Good database schema design yields better query plans. Read up on [database
  normalization.](http://en.wikipedia.org/wiki/Database_normalization)

### Never try to optimize queries on your development machine

The Postgres planner collects statistics about your data that help identify the
best possible execution plan for your query. In fact, it will just use
heuristics to determine the query plan if the table has little to no data in it.
Not only do you need realistic production data in order to analyze reasonable
query plans, but also the Postgres server's configuration has a big effect. For
this reason it's required that you run your analysis on either the production
box, or on a staging box that is configured just like production, and where
you've restored production data.

### Experimentation is key

There are no hard and fast rules to a perfectly optimized system. The best
advice is to try out different configurations, use a tool like
[NewRelic](http://newrelic.com/) to find out what the bottlenecks are, and
liberally try out different combinations of indexes and queries that yield best
results for your particular situation.
