---
title: PostgreSQL's Foreign Data Wrapper
teaser: Write queries that combine data from multiple sources.
tags: postgresql
author: George Brocklehurst
published_on: 2016-10-28
---

Have you ever needed to write a query that combined data from two PostgreSQL
databases? Our client [Healthify][healthify] has.

Since Healthify work with healthcare and store some patient information, they're
very careful not to send any personally identifiable information to third party
services.

One such service is [Segment][segment], which Healthify use to track user
activity on the site.
Conveniently,
[Segment can send your data to a PostgreSQL data warehouse][segment-warehouse],
so Healthify has two PostgeSQL databases full of useful information:

* Their app database, which their Rails app uses, and
* a reporting database, with anonymised data from Segment.

To better understand how people are using the application, we wanted to answer
some questions that needed data from both sources. We wanted to filter the
reporting data using attributes from the app data, without having to pass those
attributes to Segment. This blog post explains what we did.

## Foreign Data Wrapper

PostgreSQL has a useful feature called <dfn>Foreign Data Wrapper</dfn>, which
lets you create <dfn>foreign tables</dfn> in a PostgreSQL database that are
proxies for some other data source. When you make a query against a
foreign table, the Foreign Data Wrapper will query the external data source
and return the results as if they were coming from a table in your database.

There are two foreign data wrappers that ship with PostgreSQL:

* `file_fdw` to create foreign tables that represent flat files
  (Postgres 9.1 and later), and
* `postgres_fdw` to create foreign tables that represent tables
  in another PostgreSQL database (Postgres 9.3 and later).

You can also define your own wrapper,
or use an third party wrapper
(we've used [`tds_fdw`][tds_fdw] on other thoughtbot projects),
but for Healthify's situation `postgres_fdw` was exactly what we needed.

Before we could create foreign tables
in our reporting database
to proxy the tables in our app database,
we needed to do a bit of setup in the reporting database.

1.  Install the `postgres_fdw` extension:

    ```sql
    CREATE EXTENSION postgres_fdw;
    ```

2.  Create a <dfn>server</dfn>:

    ```sql
    CREATE SERVER app_database_server
      FOREIGN DATA WRAPPER postgres_fdw
      OPTIONS (host 'postgres.example.com', dbname 'my_app');
    ```

3.  Create a <dfn>user mapping</dfn>,
    which defines the credentials that a user on the local server
    will use to make queries against the remote server:

    ```sql
    CREATE USER MAPPING FOR CURRENT_USER
      SERVER app_database_server
      OPTIONS (user 'reporting', password 'secret123');
    ```

    Since we're using this for reporting,
    we chose to create a separate user in our app database
    specifically for Foreign Data Wrapper to use.
    That user only has read permissions,
    so a badly written reporting query can't accidentally `INSERT` or
    `UPDATE` data.

Now we've told Foreign Data Wrapper where the server is,
and the credentials it can use to connect,
we're ready to define the foreign tables.

We could use `CREATE FOREIGN TABLE` for this,
but we'd end up specifying all of the fields
for each table we want to access.
We'd effectively end up repeating our whole schema.
Fortunately,
Postgres 9.5 introduced the `IMPORT FOREIGN SCHEMA` command:

```sql
CREATE SCHEMA app;

IMPORT FOREIGN SCHEMA public
  FROM SERVER app_database_server
  INTO app;
```

This will create foreign tables for all of the tables
from our app database's `public` schema
into our reporting database's `app` schema.

A <dfn>schema</dfn> in PostgreSQL is just a namespace for grouping tables.
If you haven't explicitly specified a schema
then you're implicitly using the `public` schema.

We could import from the app database's `public` schema
into the reporting database's `public` schema,
but keeping all of the foreign tables in a separate schema
means we don't have to worry about naming collisions.

Segment also recommends
that if you're going to put any additional tables
in a PostgreSQL database that you're using as a Segment warehouse
you should make sure it's in a separate schema:

<blockquote cite="https://segment.com/docs/faqs/warehouses/custom-data/">
<p>
  If you want to insert custom data into your warehouse, create new schemas that
  are not associated with an existing source, since these may be deleted upon a
  reload of the Segment data in the cluster.
</p>
</blockquote>

Now we've imported our schema,
we can query the foreign tables
as if they were normal, local tables.

```sql
SELECT COUNT(*) FROM app.users;
```

We can even `JOIN` between a local table and a foreign table.

```sql
SELECT COUNT(*)
FROM
  production.events
  JOIN app.users
  ON production.events.user_id::INT = app.users.id
WHERE
  app.users.company_id = 10;
```

## It ain't all good

### Handling schema changes

When we ran `IMPORT FOREIGN SCHEMA`
we created foreign tables
based on the current structure of the app database.
If the app database changes—if we add a table, or drop a column—our
foreign tables will need updating too.

To work around this
we added some code to update our reporting database's foreign tables
every time we run the migrations on the production app.

```ruby
Rake::Task["db:migrate"].enhance do
  if ENV["UPDATE_FOREIGN_TABLES_AFTER_MIGRATE"]
    Rake::Task["db:export_schema_for_reporting"].invoke
  end
end
```

The `UPDATE_FOREIGN_TABLES_AFTER_MIGRATE` environment variable
lets us control which environments use this task.
It's always going to be useful in production,
but rarely in development.

The `db:export_schema_for_reporting` task
runs all of the queries we've previously seen,
slightly modified to make them idempotent:

```sql
BEGIN;
CREATE EXTENSION IF NOT EXISTS postgres_fdw;
DROP SERVER IF EXISTS app CASCADE;
CREATE SERVER … ;
CREATE USER MAPPING … ;
DROP SCHEMA IF EXISTS app;
CREATE SCHEMA app;
IMPORT FOREIGN SCHEMA … ;
COMMIT;
```

Rebuilding the server config
and foreign tables
every time we migrate
keeps our reporting server up-to-date.
Wrapping the whole thing in a transaction
ensures queries against the foreign tables
that are run while we're running the `db:export_schema_for_reporting` task
will be blocked until the task finishes,
but won't fail.

There were some fringe benefits from running this from our Rails application:
we can work out what arguments to pass to `CREATE SERVER`
by reading our application's database configuration.

### Performance considerations

Once we had everything set up
we learnt that we couldn't always treat foreign tables
exactly like local tables.
In particular,
the performance of queries that `JOIN` over multiple foreign tables
was very poor.

Behind the scenes,
PostgreSQL was making separate queries to the app database
and then `JOIN`ing them after the fact in the reporting database.

Re-writing our `JOIN`s as sub-queries improved things significantly.

Now that Postgres 9.6 has been released,
this should be less of a problem.
The [Postgres 9.6 release notes][postgres-9.6] say:

<blockquote cite="https://www.postgresql.org/docs/devel/static/release-9-6.html#AEN131345">
<p>
  <code>postgres_fdw</code> now supports remote joins, sorts,
  <code>UPDATE</code>s, and <code>DELETE</code>s
</p>
</blockquote>

[healthify]: https://www.healthify.us
[segment]: https://segment.com
[segment-warehouse]: https://segment.com/warehouses
[tds_fdw]: https://github.com/tds-fdw/tds_fdw
[postgres-9.6]: https://www.postgresql.org/docs/devel/static/release-9-6.html#AEN131345
