Want to see the full-length video right now for free?
In this lesson, we're going to learn advanced techniques for querying our
has_many
associations.
We'll start with the same domain model that we had in the previous lesson, but
this time add Location
into the mix:
class Person < ActiveRecord::Base
belongs_to :location
belongs_to :role
end
class Role < ActiveRecord::Base
has_many :people
end
class Location < ActiveRecord::Base
has_many :people
end
Locations and people have the same kind of 1-N relationship that we previously had between roles and people.
Here's the current state of affairs in our database:
Role.all
id | name | billable |
---|---|---|
1 | Developer | t |
2 | Manager | f |
3 | Unassigned | f |
Location.all
id | name | billable |
---|---|---|
1 | Boston | 1 |
2 | New York | 1 |
3 | Denver | 2 |
Person.all
id | name | role_id | location_id |
---|---|---|---|
1 | Wendell | 1 | 1 |
2 | Christie | 1 | 1 |
3 | Sandy | 1 | 3 |
4 | Eve | 2 | 2 |
This time, we want to find all distinct locations with at least one person who belongs to a billable role.
In other words, find billable locations, where a billable location is a location that has at least one billable person, and a billable person is somebody whose role is billable. The word "distinct" is in there for reasons that will soon become clear.
joins
methodJust like with a belongs_to
association, we can [tell ActiveRecord to join a has_many
association][]:
Location.joins(:people)
which generates similar SQL (but with the primary and foreign keys flipped):
SELECT "locations".*
FROM "locations"
INNER JOIN "people"
ON "people"."location_id" = "locations"."id";
and retrieves data like this:
locations | people | |||||
---|---|---|---|---|---|---|
id | name | region_id | id | name | role_id | location_id |
1 | Boston | 1 | 1 | Wendell | 1 | 1 |
1 | Boston | 1 | 2 | Christie | 1 | 1 |
3 | Denver | 2 | 3 | Sandy | 1 | 3 |
2 | New York | 1 | 4 | Eve | 2 | 2 |
The output is similar, but the place where it gets interesting is that we end
up with Boston more than once. This is because we joined all the people that
belonged to a location onto the location, and there is more than one person
with a location_id
of 1, which corresponds to Boston.
If you're used to thinking about your associated collections more like a tree than a table, then this takes a little getting used to; but having everything in this two-dimensional world of joins and tables is actually very useful and lets us efficiently reason about and query our data.
[tell ActiveRecord to join a has_many
association]: http://guides.rubyonrails.org/active_record_querying.html#using-array-hash-of-named-associations
However, in addition to joining a direct association, ActiveRecord allows us to
go even further and join indirect associations. This is almost like doing a
has_many
/through
on the fly:
Location.joins(people: :role)
which generates SQL like this:
SELECT "locations".*
FROM "locations"
INNER JOIN "people"
ON "people"."location_id" = "locations"."id"
INNER JOIN "roles"
ON "roles"."id" = "people"."role_id";
locations | people | roles | |||||||
---|---|---|---|---|---|---|---|---|---|
id | name | region_id | id | name | role_id | location_id | id | name | billable |
1 | Boston | 1 | 1 | Wendell | 1 | 1 | 1 | Developer | t |
1 | Boston | 1 | 2 | Christie | 1 | 1 | 1 | Developer | t |
3 | Denver | 2 | 3 | Sandy | 1 | 3 | 1 | Developer | t |
2 | New York | 1 | 4 | Eve | 2 | 2 | 2 | Manager | f |
You can see that we are now gluing three tables together: we join roles
to
people
(which doesn't result in duplicated rows since each person only belongs
to one role), and then we join role-enhanced people to each location
.
So currently the result set would be four ActiveRecord Location objects, two of which are Boston; but we have all of the attributes of a location's people and their roles available to query against should we wish to.
where
methodWe can now filter the way we want to, with [ActiveRecord's where
method][]:
Location.joins(people: :role).where(roles: { billable: true })
which generates SQL like this:
SELECT "locations".*
FROM "locations"
INNER JOIN "people"
ON "people"."location_id" = "locations"."id"
INNER JOIN "roles"
ON "roles"."id" = "people"."role_id"
WHERE "roles"."billable" = 't';
and retrieves data like this:
id | name | region_id | id | name | role_id | location_id | id | name | billable |
---|---|---|---|---|---|---|---|---|---|
1 | Boston | 1 | 1 | Wendell | 1 | 1 | 1 | Developer | t |
1 | Boston | 1 | 2 | Christie | 1 | 1 | 1 | Developer | t |
3 | Denver | 2 | 3 | Sandy | 1 | 3 | 1 | Developer | t |
people | roles | locations | |||||||
---|---|---|---|---|---|---|---|---|---|
id | name | region_id | id | name | role_id | location_id | id | name | billable |
1 | Boston | 1 | 1 | Wendell | 1 | 1 | 1 | Developer | t |
1 | Boston | 1 | 2 | Christie | 1 | 1 | 1 | Developer | t |
3 | Denver | 2 | 3 | Sandy | 1 | 3 | 1 | Developer | t |
but, again, we're seeing Boston twice. Let's fix that.
[ActiveRecord's where
method]: http://guides.rubyonrails.org/active_record_querying.html#specifying-conditions-on-the-joined-tables
distinct
methodFortunately, ActiveRecord has a nifty method called distinct
, which throws in
the DISTINCT
keyword (at least, in Postgres; other databases might work
differently, but ActiveRecord will handle it):
Location.joins(people: :role).where(roles: { billable: true }).distinct
which generates SQL like this:
SELECT DISTINCT "locations".*
FROM "locations"
INNER JOIN "people"
ON "people"."location_id" = "locations"."id"
INNER JOIN "roles"
ON "roles"."id" = "people"."role_id"
WHERE "roles"."billable" = 't';
and retrieves data like this:
id | name | region_id |
---|---|---|
3 | Denver | 2 |
1 | Boston | 1 |
Mission accomplished!
Just like last time, we can now encapsulate our lovely query in a method:
class Location < ActiveRecord::Base
def self.billable
joins(people: :role).where(roles: { billable: true }).distinct
end
end
In general, we want to avoid letting these ActiveRecord deeper queries move outside of our ActiveRecord objects. We could have queries like this scattered throughout our controllers, for example, but we should move them into a nicely named method or scope and then call that method from everywhere else.
Even in this simple example, we would probably define Role.billable
, and use
that to define People.billable
, and then only reference People.billable
in
Location.billable
. But we'll leave that as an exercise for the reader, since
it's easier to see here all in one place.
Now, we want to order the billable locations by region name, then by location
name. Which means we need a Region
model:
class Location < ActiveRecord::Base
belongs_to :region
end
class Region < ActiveRecord::Base
has_many :locations
end
We can start with things we've already learned to get the ordering part working for all locations:
Location.joins(:region).merge(Region.order(:name)).order(:name)
which generates SQL like this:
SELECT "locations".*
FROM "locations"
INNER JOIN "regions"
ON "regions"."id" = "locations"."region_id"
ORDER BY "regions"."name" ASC, "locations"."name" ASC;
and retrieves data like this:
locations | regions | |||
---|---|---|---|---|
id | name | region_id | id | name |
1 | Boston | 1 | 1 | East |
2 | New York | 1 | 1 | East |
3 | Denver | 2 | 2 | West |
We can see it joins the tables correctly, and merges the scopes correctly.
Let's pull this logic into its own method, by_region_and_location_name
:
class Location < ActiveRecord::Base
def self.billable
joins(people: :role).where(roles: { billable: true }).distinct
end
def self.by_region_and_location_name
joins(:region).merge(Region.order(:name)).order(:name)
end
end
It gets tricky when we want to combine our two Location
scopes. If we try
something like this:
Location.billable.by_region_and_location_name
which generates SQL like this:
SELECT DISTINCT "locations".*
FROM "locations"
INNER JOIN "people"
ON "people"."location_id" = "locations"."id"
INNER JOIN "roles"
ON "roles"."id" = "people"."role_id"
INNER JOIN "regions"
ON "regions"."id" = "locations"."region_id"
WHERE "roles"."billable" = 't'
ORDER BY "regions"."name" ASC, "locations"."name" ASC;
then Postgres throws an error:
PG::InvalidColumnReference: ERROR: for SELECT DISTINCT, ORDER BY expressions must
appear in select list
LINE 1: ...gion_id" WHERE "roles"."billable" = 't' ORDER BY "regions"....
This can be frustrating, but there are good reasons for it. Basically, it's an
order of operations issue: we have to make sure that when we eliminate some
rows with DISTINCT
, we don't lose important information that we needed for
ordering. So we need to be explicit about how to end up with distinct things
first, and then order them.
from
methodTo resolve this, we're first going to use [ActiveRecord's from
method][] to
create a sub-query that returns distinct billable locations:
Location.from(Location.billable, :locations)
which generates SQL like this:
SELECT "locations".*
FROM (
SELECT DISTINCT "locations".*
FROM "locations"
INNER JOIN "people"
ON "people"."location_id" = "locations"."id"
INNER JOIN "roles"
ON "roles"."id" = "people"."role_id"
WHERE "roles"."billable" = 't'
) locations;
and retrieves data like this:
id | name | region_id |
---|---|---|
3 | Denver | 2 |
1 | Boston | 1 |
The second argument to the from
method is the alias that we want for our
virtual table that results from the sub-query; we specify the same old
conventional name so that subsequent queries will get what they expect.
[ActiveRecord's from
method]: http://api.rubyonrails.org/classes/ActiveRecord/QueryMethods.html#method-i-from
Finally, we can put it all together:
Location.from(Location.billable, :locations).by_region_and_location_name
which generates SQL like this:
SELECT "locations".*
FROM (
SELECT DISTINCT "locations".*
FROM "locations"
INNER JOIN "people"
ON "people"."location_id" = "locations"."id"
INNER JOIN "roles"
ON "roles"."id" = "people"."role_id"
WHERE "roles"."billable" = 't'
) locations
INNER JOIN "regions"
ON "regions"."id" = "locations"."region_id"
ORDER BY "regions"."name" ASC, "locations"."name" ASC;
and retrieves data like this:
locations | regions | |||
---|---|---|---|---|
id | name | region_id | id | name |
1 | Boston | 1 | 1 | East |
3 | Denver | 2 | 2 | West |
Success!
We very frequently come across codebases that do this kind of work using plain old Ruby, and therefore make lots of extra database hits and use lots of unnecessary memory building unnecessary ActiveRecord objects.
Your database is amazingly good at querying and ordering your data: don't
reinvent the wheel if you don't have to. Just about anything that you can do
with Ruby's Enumerable
, the database can do better.
And that's it for querying has_many
associations! We have an exercise for you
to practice some of these techniques, and then we'll see you back for our next
adventure: Custom Joins with ActiveRecord.
Good luck!