Joining Postgres tables using arrays of IDs

I’m working on a project with a slightly abnormal Postgres database structure. We’re using a lovely tool called Sequin to sync data into Postgres from Airtable.

In Airtable “linked fields” are represented as arrays of IDs: each ID pointing to a record in another table. That means that our Postgres database doesn’t have standard many-to-many join tables. On a many-to-many relationship (e.g., Vendors <-> Industries) each row in the Vendors table will have an array of Industry IDs, and each row in Industries will have an array of Vendor IDs.

CREATE TABLE "industries" (
    "id" text NOT NULL,
    "name" text,
    "vendors" text[],
    PRIMARY KEY ("id")
);

CREATE TABLE "vendors" (
    "id" text NOT NULL,
    "name" text,
    "industries" text[],
    PRIMARY KEY ("id")
);

I’ve found a couple ways to do joins on this schema.

Joining with the ANY array operator

The ANY operator will match any item in the array. You use it in a join like this.

SELECT
    vendors.id,
    vendors.name,
    industries.name AS industry_name
FROM vendors
JOIN industries ON industries.id = ANY (vendors.industries);

This will output a row for each vendor/industry combination.

id name industry_name
recYDX6fBzZebC0ZC Gumbs Partners HR Services
recjPKDyJMfu0Y6Ri Lily of the Valley Floral Design Floral Design
recjPKDyJMfu0Y6Ri Lily of the Valley Floral Design Event Services
recdGYB0t9U7LozYj NXTevent, Inc. Event Services
receoJe6XpK7Sk8TZ We Grow Microgreens, LLC Food & Beverage
receoJe6XpK7Sk8TZ We Grow Microgreens, LLC Event Services
recm9iAPWPJSNhf4S Manifested Events LLC Event Services

Joining with a Postgres view

Most ORMs will struggle to create a join on an array. I’m using Prisma which doesn’t understand that an array of strings can be used as a join to another table. It requires a join table for many-to-many relationships.

We can make a “virtual” join table using a database view, with Postgres’s unnest() function.

Credit goes to the folks at Sequin for suggesting this technique for working with Airtable data.

unnest(array_column) turns an array into rows, where each element of the array gets a row.

Here we’ll use Prisma’s naming format for join tables.

CREATE VIEW _IndustryToVendor AS
SELECT
  vendors.id AS "B",
  unnest(vendors.industries) AS "A"
FROM
  vendors;

This is the resulting join table:

A B
rechodXXUzIyLWG1E rec1nNDyfSeshQhVM
reczbFISIU5fu5Fda recihNylMXO81fDgs
recyz3Dn3r8ML9UR3 rec1yOytCf7VM5nZA
recyfeS5rXGahgh7H rec1nNDyfSeshQhVM
reclBbzlO9pyaE1BV recihNylMXO81fDgs
rec4cAq4w5EYg8GId rec1nNDyfSeshQhVM

And you can use it for a join like this:

SELECT
    vendors.id,
    vendors.name,
    industries.name AS industry_name
FROM vendors
JOIN "_IndustryToVendor"
ON "_IndustryToVendor"."B" = vendors.id
JOIN industries
ON "_IndustryToVendor"."A" = industries.id

This SQL query gives the same result as the ANY query above.

Prisma prefers for its join table to have foreign keys, but it’s not absolutely required. Since views can’t have foreign keys, we’ve done without.

Success!

Using the “JOIN ON ANY” technique I can easily write ad-hoc many-to-many join queries.

And using the “view as a join table” technique I can set up many-to-many joins in the Prisma ORM.

A note on performance

I’m not a DB performance expert, but I doubt these techniques will be as performant as a real many-to-many join table with indexes and foreign keys.

An EXPLAIN shows that the first technique (joining on ANY) needs to make a sequential scan of the industries table. I know you can add a GIN index on an array column, but I don’t know if that would improve anything.

According to EXPLAIN the second technique (using a join view) needs to do a sequential scan on the industries table (because it’s referenced in the view), but it’s able to do an index scan on the industries table.

I’m curious if using a materialized view would improve performance.

Regardless, we haven’t encountered issues with slow queries yet, so I’m happy with these solutions.