Want to see the full-length video right now for free?
ActiveRecord can write a lot of your SQL for you. It can figure out a lot of your joins and do a lot of things without you ever having to leave the comfort of your Ruby home.
However, there will be situations where you need to do something that ActiveRecord doesn't know how to do. Fortunately, ActiveRecord exposes holes where you can, when necessary, plug in bits of your own SQL. This lets you combine ActiveRecord's nice, composable, Ruby-like separation of concerns with all the raw power of SQL.
This frees you up to start with the clean and simple Ruby/ActiveRecord querying that we've practiced in the last two lessons, with the confidence that you can break out into custom SQL if you really need to. We can get the best of both worlds by using... Custom Joins!
This time, we're going to focus on just one model, Person
, and add a self-
referential association:
class Person < ActiveRecord::Base
belongs_to :manager, class_name: "Person", foreign_key: :manager_id
has_many :employees, class_name: "Person", foreign_key: :manager_id
end
The current state of affairs in the database is:
People.all
id | name | role_id | location_id | manager_id |
---|---|---|---|---|
1 | Eve | 2 | 2 | NULL |
2 | Bill | 2 | 1 | NULL |
3 | Wendell | 1 | 1 | 1 |
4 | Christie | 1 | 1 | 1 |
5 | Sandy | 1 | 3 | 2 |
If you look at the list of people, you'll notice we have two people with no manager (Eve and Bill), two people managed by Eve (Wendell and Christie), and one person managed by Bill (Sandy).
belongs_to
challengeLet's start by finding all the people who are not managed by "Eve." That includes Eve and Bill, since they do not have a manager at all.
joins
methodWe can start out with what we know by joining a named association with ActiveRecord:
Person.joins(:manager)
ActiveRecord can figure out that it needs to join the people
table to itself:
SELECT "people".*
FROM "people"
INNER JOIN "people" "managers_people"
ON "managers_people"."id" = "people"."manager_id"
You can see that ActiveRecord made up the alias managers_people
to keep
things straight, since we're joining a table to itself. The resulting data
looks like this:
people | managers_people | ||||||||
---|---|---|---|---|---|---|---|---|---|
id | name | role_id | location_id | manager_id | id | name | role_id | location_id | manager_id |
3 | Wendell | 1 | 1 | 1 | 1 | Eve | 2 | 2 | NULL |
4 | Christie | 1 | 1 | 1 | 1 | Eve | 2 | 2 | NULL |
5 | Sandy | 1 | 3 | 2 | 2 | Bill | 2 | 1 | NULL |
Again, the part on the left side gets sent back to ActiveRecord and built into objects, while the right side is available while in the database to query against.
You'll notice that we're missing Eve and Bill on the left side. This is because
the ActiveRecord joins
method uses a SQL "inner" join. We're not going to go
into the ins and outs of inner and outer joins in SQL too much (you can read
more about them here), but a good thing to know is that with an inner join,
you need to have a match on both sides in order to get a row back.
"Outer" joins can be used when you want to get back a row even if it doesn't have a corresponding record in the joined table. In our case, we want people to show up on the left side even if they don't have a corresponding manager on the right side. To achieve this, we can use a "left outer" join.
not
methodFirst, let's narrow down our results a bit: we were interested in finding people not managed by Eve, not just anybody with a manager. We can do this cleanly with ActiveRecord's not method:
Person.
joins(:manager).
where.
not(managers_people: { id: Person.find_by!(name: "Eve") })
which generates SQL like this:
SELECT "people".*
FROM "people"
INNER JOIN "people" "managers_people"
ON "managers_people"."id" = "people"."manager_id"
WHERE ("managers_people"."id" != 1);
and retrieves data like this:
people | managers_people | ||||||||
---|---|---|---|---|---|---|---|---|---|
id | name | role_id | location_id | manager_id | id | name | role_id | location_id | manager_id |
5 | Sandy | 1 | 3 | 2 | 2 | Bill | 2 | 1 | NULL |
You can see that we are now just left with Sandy, who is managed by Bill. So far so good, but we're still missing Eve and Bill, both of whom are also not managed by Eve.
To get Eve and Bill as well, we need a left outer join rather than an inner join. Unfortunately, ActiveRecord doesn't provide a built-in method to perform a left outer join. However, we can solve this with a Custom Join:
Person.
joins(<<-SQL).
LEFT JOIN people managers
ON managers.id = people.manager_id
SQL
where.
not(managers: { id: Person.find_by!(name: "Eve") })
A couple of notes on the above:
joins
method, but passing it a string rather
than a symbol; and so rather than looking for an association to join, it
executes the raw SQL provided.joins
method still returns an ActiveRecord relation, which means we can
still keep the rest of our query the same, which is nice!That said, this is the SQL that is generated:
SELECT "people".*
FROM "people"
LEFT JOIN "people" "managers"
ON "managers"."id" = "people"."manager_id"
WHERE ("managers"."id" != 1);
which retrieves data like this:
people | managers_people | ||||||||
---|---|---|---|---|---|---|---|---|---|
id | name | role_id | location_id | manager_id | id | name | role_id | location_id | manager_id |
5 | Sandy | 1 | 3 | 2 | 2 | Bill | 2 | 1 | NULL |
Unfortunately, our output still hasn't changed! Frustrating. We still don't get Eve and Bill back. This is due to an oddity with comparisons in SQL:
In SQL, a row is returned if the condition inside the where clause is truthy --
e.g., not false
and not NULL
. This is what we are used to in Ruby, with
everything being truthy except for false
and nil
.
However, there is a slight difference between how Ruby handles comparisons with
nil
and how SQL handles comparisons with NULL
:
Ruby Expression | Result | SQL Expression | Result | |
---|---|---|---|---|
1 == 1 |
true |
1 = 1 |
true |
|
1 == 2 |
false |
1 = 2 |
false |
|
1 != 2 |
true |
1 != 2 |
true |
|
1 == nil |
false |
1 = NULL |
NULL |
|
1 != nil |
true |
1 != NULL |
NULL |
The last comparison in SQL might be surprising, coming from a Ruby world. In
SQL, any comparison to NULL
results in NULL
-- even NULL = NULL
:
Ruby expression | Result | SQL expression | Result | |
---|---|---|---|---|
nil == nil |
true |
NULL = NULL |
NULL |
This is a problem for us, because it means when we ask "Is this person's manager ID not equal to 1?" (for Eve and Bill, the manager ID is NULL) in order to check whether to return a row, the answer is falsy, and so the row isn't being returned.
In other words,
Person.where("manager_id != null").all
generates SQL like this:
SELECT "people".* FROM "people" WHERE (manager_id != null)
but returns an empty set:
=> []
Meanwhile,
Person.where("manager_id != 5").all
generates SQL like this:
SELECT "people".* FROM "people" WHERE (manager_id != 5)
and returns people that have a non-NULL
value for manager_id
that is not
equal to 5:
=> [#<Person:0x005575ad16f6c0 id: 3, name: "Wendell", role_id: 1, location_id: 1, manager_id: 1, salary: 35000>,
#<Person:0x005575ad16f3f0 id: 4, name: "Christie", role_id: 1, location_id: 1, manager_id: 1, salary: 30000>,
#<Person:0x005575ad16f1e8 id: 5, name: "Sandy", role_id: 1, location_id: 3, manager_id: 2, salary: 40000>]
but still no Eve or Bill!
There are several solutions to this problem, but here is one we like:
Person.
joins(<<-SQL).
LEFT JOIN people managers
ON managers.id = people.manager_id
SQL
where(
"managers.id != ? OR managers.id IS NULL",
Person.find_by!(name: "Eve")
)
As with joins
, the where
method can accept a string rather than a hash, and
it will know to expect raw SQL to execute. We need to use custom SQL here
because ActiveRecord doesn't have built-in support for SQL's OR
(yet).
Our custom string includes an extra comparison, OR managers.id IS NULL
, which
behaves as you would expect:
SELECT "people".*
FROM "people"
LEFT JOIN "people" "managers"
ON "managers"."id" = "people"."manager_id"
WHERE ("managers"."id" != 1 OR "managers"."id" IS NULL);
Notice that even though we are passing the where
method a Ruby string, we are
not interpolating the value that we want into that string. Instead, we put
a question mark in the string where we want the value to go, and then pass the
value as a subsequent argument to where
.
This is because if someday a user was somehow able to supply that value, we would be vulnerable to a SQL injection attack. So one thing to always keep in mind while working with custom SQL is to never let potentially user-supplied values come near your raw SQL strings. By using the placeholder (question mark) syntax, we can allow ActiveRecord to sanitize the string for us and then embed it.
Finally, the output that we get from our query:
id | name | role_id | location_id | manager_id | id | name | role_id | location_id | manager_id |
---|---|---|---|---|---|---|---|---|---|
1 | Eve | 2 | 2 | NULL | NULL | NULL | NULL | NULL | NULL |
2 | Bill | 2 | 1 | NULL | NULL | NULL | NULL | NULL | NULL |
5 | Sandy | 1 | 3 | 2 | 2 | Bill | 2 | 1 | NULL |
At last, we have successfully found all people not managed by Eve, including Eve and Bill. And we did it all in the database, with one efficient query. Success!
includes
methodNow, we said earlier that there isn't a way to have ActiveRecord perform a left
join for us, since the joins
method uses inner joins exclusively. But that's
not completely true -- while there isn't a way to tell ActiveRecord explicitly
that we want a left join, there is another method that, coincidentally, uses
a left join: [includes][]
.
includes
is intended to be used for eager loading data, and if you also use
[references][]
to specify conditions on the eager loaded data, then it just
so happens that ActiveRecord will use a left join:
Person.
includes(:manager).
references(:manager).
where(
"managers_people.id != ? OR managers_people.id IS NULL",
Person.find_by!(name: "Eve")
)
generates SQL like this:
SELECT "people"."id" AS t0_r0, "people"."name" AS t0_r1, "people"."role_id" AS t0_r2,
"people"."location_id" AS t0_r3, "people"."manager_id" AS t0_r4, "managers_people"."id" AS t1_r0,
"managers_people"."name" AS t1_r1, "managers_people".role_id AS t1_r2,
"managers_people"."location_id" AS t1_r3, "managers_people"."manager_id" AS t1_r4
FROM "people"
LEFT OUTER JOIN "people" "managers_people"
ON "managers_people"."id" = "people"."managers_id"
WHERE ("managers_people"."id" != 1 OR "managers_people"."id" IS NULL)
and retrieves data like this:
t0_r0 | t0_r1 | t0_r2 | t0_r3 | t0_r4 | t1_r0 | t1_r1 | t1_r2 | t1_r3 | t1_r4 |
---|---|---|---|---|---|---|---|---|---|
1 | Eve | 2 | 2 | NULL | NULL | NULL | NULL | NULL | NULL |
2 | Bill | 2 | 1 | NULL | NULL | NULL | NULL | NULL | NULL |
5 | Sandy | 1 | 3 | 2 | 2 | Bill | 2 | 1 | NULL |
But we think it's a bad idea to depend on this, for a number of reasons:
includes
and
references
and break our query.t0_r0
, t0_r1
, etc) makes the query hard to read and debug.One final challenge: let's find all distinct locations with at least one person who belongs to a billable role, ordered by region name, then location name.
Wait, this sounds familiar! We've done this before in our has_many
lesson.
If you recall, we had something like
Location.
joins(people: :role).
where(roles: { billable: true }).
distinct.
joins(:region).
merge(Region.order(:name)).
order(:name)
which caused Postgres to choke:
SELECT DISTINCT "locations".*
FROM "locations"
INNER JOIN "people"
ON "people"."location_id" = "locations"."id"
INNER JOIN "roles"
ON "roles"."id" = "people"."role_id"
INNER JOIN "regions"
ON "regions"."id" = "locations"."region_id"
WHERE "roles"."billable" = 't'
ORDER BY "regions"."name" ASC, "locations"."name" ASC;
PG::InvalidColumnReference: ERROR: for SELECT DISTINCT, ORDER BY expressions must
appear in select list
LINE 1: ...gion_id" WHERE "roles"."billable" = 't' ORDER BY "regions"....
The way we resolved this issue previously was with a sub-query using ActiveRecord's from
method method:
Location.
from(
Location.
joins(people: :role).
where(roles: { billable: true }).
distinct,
:locations
).
joins(:region).
merge(Region.order(:name)).
order(:name)
which generates SQL like this:
SELECT "locations".*
FROM (
SELECT DISTINCT "locations".*
FROM "locations"
INNER JOIN "people"
ON "people"."location_id" = "locations"."id"
INNER JOIN "roles"
ON "roles"."id" = "people"."role_id"
WHERE "roles"."billable" = 't'
) locations
INNER JOIN "regions"
ON "regions"."id" = "locations"."region_id"
ORDER BY "regions"."name" ASC, "locations"."name" ASC;
and retrieves data like this:
locations | regions | |||
---|---|---|---|---|
id | name | region_id | id | name |
1 | Boston | 1 | 1 | East |
3 | Denver | 2 | 2 | West |
There is another way, which involves using a custom join:
Location.
joins(
"INNER JOIN (" +
Location.
joins(people: :role).
where(roles: { billable: true }).
distinct.
to_sql +
") billable_locations " \
"ON locations.id = billable_locations.id"
).
joins(:region).
merge(Region.order(:name)).
order(:name)
This time, rather than selecting from a sub-query, we are joining on to a sub- query. This would generate SQL like this:
SELECT "locations".*
FROM "locations"
INNER JOIN "regions"
ON "regions"."id" = "locations"."region_id"
INNER JOIN (
SELECT DISTINCT "locations".*
FROM "locations"
INNER JOIN "people"
ON "people"."location_id" = "locations"."id"
INNER JOIN "roles"
ON "roles"."id" = "people"."role_id"
WHERE "roles"."billable" = 't'
) billable_locations
ON "locations"."id" = "billable_locations"."id"
ORDER BY "regions"."name" ASC, "locations"."name" ASC;
and retrieves data like this:
locations | regions | billable_locations | |||||
---|---|---|---|---|---|---|---|
id | name | region_id | id | name | id | name | region_id |
1 | Boston | 1 | 1 | East | 1 | Boston | 1 |
3 | Denver | 2 | 2 | West | 3 | Denver | 2 |
So, which one to use? It's mostly a matter of preference. There are situations where one is more performant than the other, and even in these cases, Postgres is frequently smart enough to rewrite the query to use the better option.
So the most important thing is to use the one that makes the most sense to you. We often find joins to be easier to visualize than sub-queries because they are two-dimensional. But both are good options.
to_sql
methodThe trick that makes this work is ActiveRecord's to_sql
method, which you can
call on any ActiveRecord relation, and which will return a string of SQL rather
than execute that SQL in the database.
This is very handy for debugging, but also like in this case, for generating SQL to use inside of custom joins.
This might seem like an exception to our previous rule about never interpolating
or concatenating SQL, but since to_sql
gets its SQL from ActiveRecord, it
should be safe. But it's good that you're being vigilant!
So now we have a sense of how we can use custom joins to, when all else fails, plug in our own raw SQL into the middle of a chain of ActiveRecord methods. This way we can still achieve our goal of letting the database do as much work as possible even when ActiveRecord doesn't have a specific built-in method for what we need.
We have an exercise for you to practice custom joins, and then we'll see you in the final lesson of this series: Aggregations.
Good luck!