---
title: Postgres Window Functions
teaser: Take your queries to the next level with window functions.
tags: web,performance,postgresql,sql
author: Simon Taranto
published_on: 2014-09-16
---

We recently ran into a case where a join was getting out of hand and we were
struggling with how to rein in the size of it. Then we found [window
functions](http://www.postgresql.org/docs/9.3/static/tutorial-window.html).
Window functions (Oracle calls them [analytic
functions](http://docs.oracle.com/cd/E11882_01/server.112/e26088/functions004.htm#SQLRF06174))
are a part of the <abbr title="Structured Query Language">SQL</abbr> standard
and this post will explore how to use them in Postgres. Let's see how they work
and what kind of problems they can help us solve.

## Our problem

Let's say we have a blogging platform where we have `posts` that have many
`comments`. Here's our setup:

```sql
CREATE TABLE posts (
  id integer PRIMARY KEY,
  body varchar,
  created_at timestamp DEFAULT current_timestamp
);

CREATE TABLE comments (
 id INTEGER PRIMARY KEY,
 post_id integer NOT NULL,
 body varchar,
 created_at timestamp DEFAULT current_timestamp
);

/* make two posts */
INSERT INTO posts VALUES (1, 'foo');
INSERT INTO posts VALUES (2, 'bar');

/* make 4 comments for the first post */
INSERT INTO comments VALUES (1, 1, 'foo old');
INSERT INTO comments VALUES (2, 1, 'foo new');
INSERT INTO comments VALUES (3, 1, 'foo newer');
INSERT INTO comments VALUES (4, 1, 'foo newest');

/* make 4 comments for the second post */
INSERT INTO comments VALUES (5, 2, 'bar old');
INSERT INTO comments VALUES (6, 2, 'bar new');
INSERT INTO comments VALUES (7, 2, 'bar newer');
INSERT INTO comments VALUES (8, 2, 'bar newest');
```

Here's a [SQL Fiddle](http://sqlfiddle.com/#!15/076c7/12) with our
example if you'd like to query the data.

### Our Goal

We want to get each `post`'s three most recent `comment`s. We could start by
writing a query such as the one below to pull all the posts and comments. We
could then filter this result in whatever application language we're using
(Ruby, <abbr title="PHP HyperText Preprocessor">PHP</abbr>, Go, whatever) on top
of our database:

```sql
SELECT posts.id AS post_id, comments.id AS comment_ids, comments.body AS body
FROM posts LEFT OUTER JOIN comments ON posts.id = comments.post_id;
```

resulting in:

    | POST_ID | COMMENT_IDS |       BODY |
    |---------|-------------|------------|
    |       1 |           1 |    foo old |
    |       1 |           2 |    foo new |
    |       1 |           3 |  foo newer |
    |       1 |           4 | foo newest |
    |       2 |           5 |    bar old |
    |       2 |           6 |    bar new |
    |       2 |           7 |  bar newer |
    |       2 |           8 | bar newest |

This solution may work for awhile but what happens when we have 10's of
thousands of posts that each have 10's of thousands of comments (it's a really
popular blog)? That resulting join table just got really big.

We want to see just these results without requiring any further filtering:

    | POST_ID | COMMENT_ID |       BODY |
    |---------|------------|------------|
    |       1 |          4 | foo newest |
    |       1 |          3 |  foo newer |
    |       1 |          2 |    foo new |
    |       2 |          8 | bar newest |
    |       2 |          7 |  bar newer |
    |       2 |          6 |    bar new |

To get this subset we need to some how rank or order the comments for each post
and then limit the set. How do we go about this more targeted task in SQL? Enter
window functions.

## What are window functions

Window functions are a tool to perform advanced sorting and limiting on a subset
of a joined table of data (hence, the name `window`). We'll look at one
particular function, `dense_rank()`, but all built-in (`sum`, for example) and
user-defined aggregate functions can act as window functions by calling the
`OVER` keyword. Some other popular functions include `row_number()`, `rank()`,
and `percent_rank()`. A complete list of available window functions can be found
[here](http://www.postgresql.org/docs/9.3/static/functions-window.html).

## Back to our problem

How can we use this new knowledge to write our query? We want the three most
recent comments so we know we'll need to do some sort of sorting by `created_at`
and then limit the number of results.

Here's how we can get the ranking information we need using `dense_rank()`:

```sql
SELECT posts.id AS post_id, comments.id AS comment_id, comments.body AS body,
  dense_rank() OVER (
    PARTITION BY post_id
    ORDER BY comments.created_at DESC
  ) AS comment_rank
FROM posts LEFT OUTER JOIN comments ON posts.id = comments.post_id;
```

Resulting in:

    | POST_ID | COMMENT_ID |       BODY | COMMENT_RANK |
    |---------|------------|------------|--------------|
    |       1 |          4 | foo newest |            1 |
    |       1 |          3 |  foo newer |            2 |
    |       1 |          2 |    foo new |            3 |
    |       1 |          1 |    foo old |            4 |
    |       2 |          8 | bar newest |            1 |
    |       2 |          7 |  bar newer |            2 |
    |       2 |          6 |    bar new |            3 |
    |       2 |          5 |    bar old |            4 |

We have the same query from before but we've added an additional field in our
`SELECT` statement:

```sql
dense_rank() OVER (
  PARTITION BY post_id
  ORDER BY comments.created_at DESC
) AS comment_rank
```

`OVER` is the keyword that triggers the use of the window function. Inside the
`OVER` statement we're saying:

- rank the comments by `created_at` (`ORDER BY comments.created_at`) and
- scope the ranking to each post (`PARTITION BY post_id`)

This result is still returning the full set but now we have the added
`comment_rank` field showing us the comments for each post in order of age.
Given this ranking, now we need to fetch just those comments that have a
`comment_rank` less than 4 (since we want only the three most recent comments).
We can achieve this goal by using our new ranking query as a sub-select
statement like so:

```sql
SELECT comment_id, post_id, body FROM (
  SELECT posts.id AS post_id, comments.id AS comment_id, comments.body AS body,
    dense_rank() OVER (
      PARTITION BY post_id
      ORDER BY comments.created_at DESC
    ) AS comment_rank
  FROM posts LEFT OUTER JOIN comments ON posts.id = comments.post_id
) AS ranked_comments
WHERE comment_rank < 4;
```

resulting in:

    | POST_ID | COMMENT_ID |       BODY |
    |---------|------------|------------|
    |       1 |          4 | foo newest |
    |       1 |          3 |  foo newer |
    |       1 |          2 |    foo new |
    |       2 |          8 | bar newest |
    |       2 |          7 |  bar newer |
    |       2 |          6 |    bar new |

And that's our answer. Our query has become pretty hard to read though.

## Refactoring using Common Table Expressions

We can gain back some of the readability by using a [Common Table
Expression](http://www.craigkerstiens.com/2013/11/18/best-postgres-feature-youre-not-using/)
(or CTE). Our query would become:

```sql
WITH ranked_comments AS (
  SELECT posts.id AS post_id, comments.id AS comment_id, comments.body AS body,
    dense_rank() OVER (
      PARTITION BY post_id
      ORDER BY comments.created_at DESC
    ) AS comment_rank
  FROM posts LEFT OUTER JOIN comments ON posts.id = comments.post_id
)

SELECT
  post_id,
  comment_id,
  body
FROM ranked_comments
WHERE comment_rank < 4;
```

Window functions introduce some additional syntax but are a really powerful tool
when we need more complex sorting or limiting behavior.

## Want to learn more

For some more help on the syntax and options for using these functions check out
[these
docs](http://www.postgresql.org/docs/9.3/static/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS)
or [this guide](http://postgresguide.com/tips/window.html). There are some
gotchas around dealing with `NULL` values and how ordering works so be sure to
read through [this
section](http://www.postgresql.org/docs/9.3/static/queries-table-expressions.html#QUERIES-WINDOW)
and test your code. Happy querying.
