Have you ever needed to write a query that combined data from two PostgreSQL databases? Our client Healthify has.
Since Healthify work with healthcare and store some patient information, they’re very careful not to send any personally identifiable information to third party services.
One such service is Segment, which Healthify use to track user activity on the site. Conveniently, Segment can send your data to a PostgreSQL data warehouse, so Healthify has two PostgeSQL databases full of useful information:
- Their app database, which their Rails app uses, and
- a reporting database, with anonymised data from Segment.
To better understand how people are using the application, we wanted to answer some questions that needed data from both sources. We wanted to filter the reporting data using attributes from the app data, without having to pass those attributes to Segment. This blog post explains what we did.
PostgreSQL has a useful feature called Foreign Data Wrapper, which lets you create foreign tables in a PostgreSQL database that are proxies for some other data source. When you make a query against a foreign table, the Foreign Data Wrapper will query the external data source and return the results as if they were coming from a table in your database.
There are two foreign data wrappers that ship with PostgreSQL:
file_fdwto create foreign tables that represent flat files (Postgres 9.1 and later), and
postgres_fdwto create foreign tables that represent tables in another PostgreSQL database (Postgres 9.3 and later).
You can also define your own wrapper,
or use an third party wrapper
tds_fdw on other thoughtbot projects),
but for Healthify’s situation
postgres_fdw was exactly what we needed.
Before we could create foreign tables in our reporting database to proxy the tables in our app database, we needed to do a bit of setup in the reporting database.
CREATE EXTENSION postgres_fdw;
Create a server:
CREATE SERVER app_database_server FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 'postgres.example.com', dbname 'my_app');
Create a user mapping, which defines the credentials that a user on the local server will use to make queries against the remote server:
CREATE USER MAPPING FOR CURRENT_USER SERVER app_database_server OPTIONS (user 'reporting', password 'secret123');
Since we’re using this for reporting, we chose to create a separate user in our app database specifically for Foreign Data Wrapper to use. That user only has read permissions, so a badly written reporting query can’t accidentally
Now we’ve told Foreign Data Wrapper where the server is, and the credentials it can use to connect, we’re ready to define the foreign tables.
We could use
CREATE FOREIGN TABLE for this,
but we’d end up specifying all of the fields
for each table we want to access.
We’d effectively end up repeating our whole schema.
Postgres 9.5 introduced the
IMPORT FOREIGN SCHEMA command:
CREATE SCHEMA app; IMPORT FOREIGN SCHEMA public FROM SERVER app_database_server INTO app;
This will create foreign tables for all of the tables
from our app database’s
into our reporting database’s
A schema in PostgreSQL is just a namespace for grouping tables.
If you haven’t explicitly specified a schema
then you’re implicitly using the
We could import from the app database’s
into the reporting database’s
but keeping all of the foreign tables in a separate schema
means we don’t have to worry about naming collisions.
Segment also recommends that if you’re going to put any additional tables in a PostgreSQL database that you’re using as a Segment warehouse you should make sure it’s in a separate schema:
If you want to insert custom data into your warehouse, create new schemas that are not associated with an existing source, since these may be deleted upon a reload of the Segment data in the cluster.
Now we’ve imported our schema, we can query the foreign tables as if they were normal, local tables.
SELECT COUNT(*) FROM app.users;
We can even
JOIN between a local table and a foreign table.
SELECT COUNT(*) FROM production.events JOIN app.users ON production.events.user_id::INT = app.users.id WHERE app.users.company_id = 10;
When we ran
IMPORT FOREIGN SCHEMA
we created foreign tables
based on the current structure of the app database.
If the app database changes—if we add a table, or drop a column—our
foreign tables will need updating too.
To work around this we added some code to update our reporting database’s foreign tables every time we run the migrations on the production app.
Rake::Task["db:migrate"].enhance do if ENV["UPDATE_FOREIGN_TABLES_AFTER_MIGRATE"] Rake::Task["db:export_schema_for_reporting"].invoke end end
UPDATE_FOREIGN_TABLES_AFTER_MIGRATE environment variable
lets us control which environments use this task.
It’s always going to be useful in production,
but rarely in development.
runs all of the queries we’ve previously seen,
slightly modified to make them idempotent:
BEGIN; CREATE EXTENSION IF NOT EXISTS postgres_fdw; DROP SERVER IF EXISTS app CASCADE; CREATE SERVER … ; CREATE USER MAPPING … ; DROP SCHEMA IF EXISTS app; CREATE SCHEMA app; IMPORT FOREIGN SCHEMA … ; COMMIT;
Rebuilding the server config
and foreign tables
every time we migrate
keeps our reporting server up-to-date.
Wrapping the whole thing in a transaction
ensures queries against the foreign tables
that are run while we’re running the
will be blocked until the task finishes,
but won’t fail.
There were some fringe benefits from running this from our Rails application:
we can work out what arguments to pass to
by reading our application’s database configuration.
Once we had everything set up
we learnt that we couldn’t always treat foreign tables
exactly like local tables.
the performance of queries that
JOIN over multiple foreign tables
was very poor.
Behind the scenes,
PostgreSQL was making separate queries to the app database
JOINing them after the fact in the reporting database.
JOINs as sub-queries improved things significantly.
Now that Postgres 9.6 has been released, this should be less of a problem. The Postgres 9.6 release notes say:
postgres_fdwnow supports remote joins, sorts,