I’m working on a project with a slightly abnormal Postgres database structure. We’re using a lovely tool called Sequin to sync data into Postgres from Airtable.
In Airtable “linked fields” are represented as arrays of IDs: each ID pointing to a record in another table. That means that our Postgres database doesn’t have standard many-to-many join tables. On a many-to-many relationship (e.g., Vendors <-> Industries) each row in the Vendors table will have an array of Industry IDs, and each row in Industries will have an array of Vendor IDs.
CREATE TABLE "industries" (
"id" text NOT NULL,
"name" text,
"vendors" text[],
PRIMARY KEY ("id")
);
CREATE TABLE "vendors" (
"id" text NOT NULL,
"name" text,
"industries" text[],
PRIMARY KEY ("id")
);
I’ve found a couple ways to do joins on this schema.
Joining with the ANY
array operator
The ANY
operator will match any item in the array. You use it in a join like this.
SELECT
vendors.id,
vendors.name,
industries.name AS industry_name
FROM vendors
JOIN industries ON industries.id = ANY (vendors.industries);
This will output a row for each vendor/industry combination.
id | name | industry_name |
---|---|---|
recYDX6fBzZebC0ZC | Gumbs Partners | HR Services |
recjPKDyJMfu0Y6Ri | Lily of the Valley | Floral Design Floral Design |
recjPKDyJMfu0Y6Ri | Lily of the Valley | Floral Design Event Services |
recdGYB0t9U7LozYj | NXTevent, Inc. | Event Services |
receoJe6XpK7Sk8TZ | We Grow Microgreens, | LLC Food & Beverage |
receoJe6XpK7Sk8TZ | We Grow Microgreens, | LLC Event Services |
recm9iAPWPJSNhf4S | Manifested Events | LLC Event Services |
Joining with a Postgres view
Most ORMs will struggle to create a join on an array. I’m using Prisma which doesn’t understand that an array of strings can be used as a join to another table. It requires a join table for many-to-many relationships.
We can make a “virtual” join table using a database view, with Postgres’s unnest()
function.
Credit goes to the folks at Sequin for suggesting this technique for working with Airtable data.
unnest(array_column)
turns an array into rows, where each element of the array gets a row.
Here we’ll use Prisma’s naming format for join tables.
CREATE VIEW _IndustryToVendor AS
SELECT
vendors.id AS "B",
unnest(vendors.industries) AS "A"
FROM
vendors;
This is the resulting join table:
A | B |
---|---|
rechodXXUzIyLWG1E | rec1nNDyfSeshQhVM |
reczbFISIU5fu5Fda | recihNylMXO81fDgs |
recyz3Dn3r8ML9UR3 | rec1yOytCf7VM5nZA |
recyfeS5rXGahgh7H | rec1nNDyfSeshQhVM |
reclBbzlO9pyaE1BV | recihNylMXO81fDgs |
rec4cAq4w5EYg8GId | rec1nNDyfSeshQhVM |
And you can use it for a join like this:
SELECT
vendors.id,
vendors.name,
industries.name AS industry_name
FROM vendors
JOIN "_IndustryToVendor"
ON "_IndustryToVendor"."B" = vendors.id
JOIN industries
ON "_IndustryToVendor"."A" = industries.id
This SQL query gives the same result as the ANY
query above.
Prisma prefers for its join table to have foreign keys, but it’s not absolutely required. Since views can’t have foreign keys, we’ve done without.
Success!
Using the “JOIN ON ANY” technique I can easily write ad-hoc many-to-many join queries.
And using the “view as a join table” technique I can set up many-to-many joins in the Prisma ORM.
A note on performance
I’m not a DB performance expert, but I doubt these techniques will be as performant as a real many-to-many join table with indexes and foreign keys.
An EXPLAIN
shows that the first technique (joining on ANY
) needs to make a sequential scan of the industries table. I know you can add a GIN index on an array column, but I don’t know if that would improve anything.
According to EXPLAIN
the second technique (using a join view) needs to do a sequential scan on the industries table (because it’s referenced in the view), but it’s able to do an index scan on the industries table.
I’m curious if using a materialized view would improve performance.
Regardless, we haven’t encountered issues with slow queries yet, so I’m happy with these solutions.