Video

Want to see the full-length video right now for free?

Notes

In this lesson, we're going to learn advanced techniques for querying our has_many associations.

Our example's domain model

We'll start with the same domain model that we had in the previous lesson, but this time add Location into the mix:

class Person < ActiveRecord::Base
  belongs_to :location
  belongs_to :role
end

class Role < ActiveRecord::Base
  has_many :people
end

class Location < ActiveRecord::Base
  has_many :people
end

Locations and people have the same kind of 1-N relationship that we previously had between roles and people.

Here's the current state of affairs in our database:

Role.all
id name billable
1 Developer t
2 Manager f
3 Unassigned f
Location.all
id name billable
1 Boston 1
2 New York 1
3 Denver 2
Person.all
id name role_id location_id
1 Wendell 1 1
2 Christie 1 1
3 Sandy 1 3
4 Eve 2 2

Our first challenge

This time, we want to find all distinct locations with at least one person who belongs to a billable role.

In other words, find billable locations, where a billable location is a location that has at least one billable person, and a billable person is somebody whose role is billable. The word "distinct" is in there for reasons that will soon become clear.

Gluing tables together with the joins method

Just like with a belongs_to association, we can [tell ActiveRecord to join a has_many association][]:

Location.joins(:people)

which generates similar SQL (but with the primary and foreign keys flipped):

SELECT "locations".*
FROM "locations"
INNER JOIN "people"
  ON "people"."location_id" = "locations"."id";

and retrieves data like this:

locations people
id name region_id id name role_id location_id
1 Boston 1 1 Wendell 1 1
1 Boston 1 2 Christie 1 1
3 Denver 2 3 Sandy 1 3
2 New York 1 4 Eve 2 2

The output is similar, but the place where it gets interesting is that we end up with Boston more than once. This is because we joined all the people that belonged to a location onto the location, and there is more than one person with a location_id of 1, which corresponds to Boston.

If you're used to thinking about your associated collections more like a tree than a table, then this takes a little getting used to; but having everything in this two-dimensional world of joins and tables is actually very useful and lets us efficiently reason about and query our data.

[tell ActiveRecord to join a has_many association]: http://guides.rubyonrails.org/active_record_querying.html#using-array-hash-of-named-associations

On the fly has-many-through

However, in addition to joining a direct association, ActiveRecord allows us to go even further and join indirect associations. This is almost like doing a has_many/through on the fly:

Location.joins(people: :role)

which generates SQL like this:

SELECT "locations".*
FROM "locations"
INNER JOIN "people"
  ON "people"."location_id" = "locations"."id"
INNER JOIN "roles"
  ON "roles"."id" = "people"."role_id";
locations people roles
id name region_id id name role_id location_id id name billable
1 Boston 1 1 Wendell 1 1 1 Developer t
1 Boston 1 2 Christie 1 1 1 Developer t
3 Denver 2 3 Sandy 1 3 1 Developer t
2 New York 1 4 Eve 2 2 2 Manager f

You can see that we are now gluing three tables together: we join roles to people (which doesn't result in duplicated rows since each person only belongs to one role), and then we join role-enhanced people to each location.

So currently the result set would be four ActiveRecord Location objects, two of which are Boston; but we have all of the attributes of a location's people and their roles available to query against should we wish to.

Filtering with the where method

We can now filter the way we want to, with [ActiveRecord's where method][]:

Location.joins(people: :role).where(roles: { billable: true })

which generates SQL like this:

SELECT "locations".*
FROM "locations"
INNER JOIN "people"
  ON "people"."location_id" = "locations"."id"
INNER JOIN "roles"
  ON "roles"."id" = "people"."role_id"
WHERE "roles"."billable" = 't';

and retrieves data like this:

id name region_id id name role_id location_id id name billable
1 Boston 1 1 Wendell 1 1 1 Developer t
1 Boston 1 2 Christie 1 1 1 Developer t
3 Denver 2 3 Sandy 1 3 1 Developer t
people roles locations
id name region_id id name role_id location_id id name billable
1 Boston 1 1 Wendell 1 1 1 Developer t
1 Boston 1 2 Christie 1 1 1 Developer t
3 Denver 2 3 Sandy 1 3 1 Developer t

but, again, we're seeing Boston twice. Let's fix that.

[ActiveRecord's where method]: http://guides.rubyonrails.org/active_record_querying.html#specifying-conditions-on-the-joined-tables

Removing redundancies with the distinct method

Fortunately, ActiveRecord has a nifty method called distinct, which throws in the DISTINCT keyword (at least, in Postgres; other databases might work differently, but ActiveRecord will handle it):

Location.joins(people: :role).where(roles: { billable: true }).distinct

which generates SQL like this:

SELECT DISTINCT "locations".*
FROM "locations"
INNER JOIN "people"
  ON "people"."location_id" = "locations"."id"
INNER JOIN "roles"
  ON "roles"."id" = "people"."role_id"
WHERE "roles"."billable" = 't';

and retrieves data like this:

id name region_id
3 Denver 2
1 Boston 1

Mission accomplished!

Encapsulation of queries within ActiveRecord objects

Just like last time, we can now encapsulate our lovely query in a method:

class Location < ActiveRecord::Base
  def self.billable
    joins(people: :role).where(roles: { billable: true }).distinct
  end
end

In general, we want to avoid letting these ActiveRecord deeper queries move outside of our ActiveRecord objects. We could have queries like this scattered throughout our controllers, for example, but we should move them into a nicely named method or scope and then call that method from everywhere else.

Even in this simple example, we would probably define Role.billable, and use that to define People.billable, and then only reference People.billable in Location.billable. But we'll leave that as an exercise for the reader, since it's easier to see here all in one place.

Our next challenge

Now, we want to order the billable locations by region name, then by location name. Which means we need a Region model:

class Location < ActiveRecord::Base
  belongs_to :region
end

class Region < ActiveRecord::Base
  has_many :locations
end

We can start with things we've already learned to get the ordering part working for all locations:

Location.joins(:region).merge(Region.order(:name)).order(:name)

which generates SQL like this:

SELECT "locations".*
FROM "locations"
INNER JOIN "regions"
  ON "regions"."id" = "locations"."region_id"
ORDER BY "regions"."name" ASC, "locations"."name" ASC;

and retrieves data like this:

locations regions
id name region_id id name
1 Boston 1 1 East
2 New York 1 1 East
3 Denver 2 2 West

We can see it joins the tables correctly, and merges the scopes correctly. Let's pull this logic into its own method, by_region_and_location_name:

class Location < ActiveRecord::Base
  def self.billable
    joins(people: :role).where(roles: { billable: true }).distinct
  end

  def self.by_region_and_location_name
    joins(:region).merge(Region.order(:name)).order(:name)
  end
end

The problem with combining scopes

It gets tricky when we want to combine our two Location scopes. If we try something like this:

Location.billable.by_region_and_location_name

which generates SQL like this:

SELECT DISTINCT "locations".*
FROM "locations"
INNER JOIN "people"
  ON "people"."location_id" = "locations"."id"
INNER JOIN "roles"
  ON "roles"."id" = "people"."role_id"
INNER JOIN "regions"
  ON "regions"."id" = "locations"."region_id"
WHERE "roles"."billable" = 't'
ORDER BY "regions"."name" ASC, "locations"."name" ASC;

then Postgres throws an error:

PG::InvalidColumnReference: ERROR:  for SELECT DISTINCT, ORDER BY expressions must
appear in select list
LINE 1: ...gion_id" WHERE "roles"."billable" = 't'  ORDER BY "regions"....

This can be frustrating, but there are good reasons for it. Basically, it's an order of operations issue: we have to make sure that when we eliminate some rows with DISTINCT, we don't lose important information that we needed for ordering. So we need to be explicit about how to end up with distinct things first, and then order them.

Sub-querying with the from method

To resolve this, we're first going to use [ActiveRecord's from method][] to create a sub-query that returns distinct billable locations:

Location.from(Location.billable, :locations)

which generates SQL like this:

SELECT "locations".*
FROM (
  SELECT DISTINCT "locations".*
  FROM "locations"
  INNER JOIN "people"
    ON "people"."location_id" = "locations"."id"
  INNER JOIN "roles"
    ON "roles"."id" = "people"."role_id"
  WHERE "roles"."billable" = 't'
) locations;

and retrieves data like this:

id name region_id
3 Denver 2
1 Boston 1

The second argument to the from method is the alias that we want for our virtual table that results from the sub-query; we specify the same old conventional name so that subsequent queries will get what they expect.

[ActiveRecord's from method]: http://api.rubyonrails.org/classes/ActiveRecord/QueryMethods.html#method-i-from

Final solution

Finally, we can put it all together:

Location.from(Location.billable, :locations).by_region_and_location_name

which generates SQL like this:

SELECT "locations".*
FROM (
  SELECT DISTINCT "locations".*
  FROM "locations"
  INNER JOIN "people"
    ON "people"."location_id" = "locations"."id"
  INNER JOIN "roles"
    ON "roles"."id" = "people"."role_id"
  WHERE "roles"."billable" = 't'
) locations
INNER JOIN "regions"
  ON "regions"."id" = "locations"."region_id"
ORDER BY "regions"."name" ASC, "locations"."name" ASC;

and retrieves data like this:

locations regions
id name region_id id name
1 Boston 1 1 East
3 Denver 2 2 West

Success!

Databases are amazingly good at what they do

We very frequently come across codebases that do this kind of work using plain old Ruby, and therefore make lots of extra database hits and use lots of unnecessary memory building unnecessary ActiveRecord objects.

Your database is amazingly good at querying and ordering your data: don't reinvent the wheel if you don't have to. Just about anything that you can do with Ruby's Enumerable, the database can do better.

And that's it for querying has_many associations! We have an exercise for you to practice some of these techniques, and then we'll see you back for our next adventure: Custom Joins with ActiveRecord.

Good luck!