Want to see the full-length video right now for free?
To round off our discussion of Advanced ActiveRecord querying, we're going to discuss aggregations.
We'll be doing some counts, minimums, maximums, averages, and sums; how to perform these aggregations after grouping rows according to certain criteria; and how to filter based on the results of these calculations, all via the database (rather than doing it ourself with Ruby).
Our models will be the same as in the previous lesson, but we're going to add a salary
column on the people
table:
class Person < ActiveRecord::Base
belongs_to :manager, class_name: "Person", foreign_key: :manager_id
has_many :employees, class_name: "Person", foreign_key: :manager_id
belongs_to :location
belongs_to :role
end
People.all
id | name | role_id | location_id | manager_id | salary |
---|---|---|---|---|---|
1 | Eve | 2 | 2 | NULL | 50000 |
2 | Bill | 2 | 1 | NULL | 40000 |
3 | Wendell | 1 | 1 | 1 | 35000 |
4 | Christie | 1 | 1 | 1 | 30000 |
5 | Sandy | 1 | 3 | 2 | 45000 |
count
methodLet's start out really simple: count the number of people.
Person.count
SELECT COUNT(*)
FROM "people";
count |
---|
5 |
average
methodSlightly more complicated: find the average salary across all people.
Person.average(:salary)
SELECT AVG(salary)
FROM "people";
avg |
---|
40000.000000000000 |
You can also combine these methods with the other queries that we've been talking about:
Person.
joins(:role).
where(roles: { billable: false }).
sum(:salary)
SELECT SUM(people.salary)
FROM "people"
INNER JOIN "roles"
ON "roles"."id" = "people"."role_id"
WHERE "roles"."billable" = 'f';
sum |
---|
90000 |
group
methodLet's next find the average salary by role. We often see this kind of work done in pure Ruby, which wastes database hits and a lot of memory.
Instead, a neat thing you can do if you want the count or average of a bunch of things but broken down by category, is use [ActiveRecord's group
method][]:
Person.
joins(:role).
group("roles.name").
average(:salary)
which generates SQL like this:
SELECT roles.name, AVG(people.salary)
FROM "people"
INNER JOIN "roles"
ON "roles.id" = "people"."role_id"
GROUP BY roles.name;
and retrieves data like this:
name | avg |
---|---|
Manager | 45000.000000000000 |
Developer | 36666.666666666667 |
The database does all the work for us in one query.
[ActiveRecord's group
method]: http://guides.rubyonrails.org/active_record_querying.html#group
Next, let's count employees for each person:
Person.
joins(:employees).
group("people.name").
count("employees_people.id")
which generates SQL like this:
SELECT people.name, COUNT(employees_people.id)
FROM "people"
INNER JOIN "people" "employees_people"
ON "employees_people"."manager_id" = "people"."id"
GROUP BY people.name;
name | count |
---|---|
Eve | 2 |
Bill | 1 |
Wait, count employees_people.id
? Where did that come from? Well, since the
:employees
association is not named conventionally (i.e., it's not named after
the model), ActiveRecord makes up an alias for the table that it can guarantee
won't be ambiguous.
This naming is consistent, so you can rely on it without worrying too much that it will break your queries some day.
It's important to realize that the return value when you use group
is a hash, with the keys being the unique values you are grouping by, and the values are the aggregation.
count
, sum
, average
, etc just returned scalar values.
In either case, once you go down the road of aggregating, you can no longer continue to chain further relation methods on, since you've left ActiveRecord land.
However, we notice that in our last result, we don't have entries for people who have no employees. Why is that? Well, [it's the same issue we've had previously][] -- inner joins don't return rows that have no matches in the joined set.
Our solution will be the same, a custom left join:
Person.
joins("LEFT JOIN people employees ON employees.manager_id = people.id").
group("people.name").
count("employees.id")
which generates SQL like this:
SELECT people.name, COUNT(employees.id)
FROM "people"
LEFT JOIN people employees
ON employees.manager_id = people.id GROUP BY people.name;
and retrieves the data that we want:
name | count |
---|---|
Christie | 0 |
Sandy | 0 |
Wendell | 0 |
Eve | 2 |
Bill | 1 |
This also has the benefit of allowing us to choose our own name for the relation, rather than relying on ActiveRecord's conventional alias.
Let's get a little crazy and find people with lower than average salaries at their location:
Person.
joins(
"INNER JOIN (" +
Person.
select("location_id, AVG(salary) as average").
group("location_id").
to_sql
") salaries " \
"ON salaries.location_id = people.location_id"
).
where("people.salary < salaries.average")
Tricky! Let's jump into the middle of this query first,
Person.
select("location_id, AVG(salary) as average").
group("location_id").
to_sql
This is essentially the same thing as
Person.group("location_id").average(:salary)
, but this way we can call
to_sql
on it (since it is still a relation, rather than the scalar result).
We now have a virtual table of the average salary at each location, aliased as
salaries
, which we join on to the people
table (using their location_id
to
match up rows).
Finally, with the new columns available, we can use a simple where
clause to
filter our results down. The resulting SQL looks like this:
SELECT "people".*
FROM "people"
INNER JOIN (
SELECT location_id, AVG(salary) as average
FROM "people"
GROUP BY "people"."location_id"
) salaries
ON salaries.location_id = people.location_id
WHERE (people.salary < salaries.average)
and retrieves data like this:
people | salaries | ||||||
---|---|---|---|---|---|---|---|
id | name | role_id | location_id | manager_id | salary | location_id | average |
4 | Christie | 1 | 1 | 1 | 30000 | 1 | 35000.000000000000 |
Finally, let's find the highest salaried people, ordered by name. It would be relatively straightforward to find the highest salaried people, ordered by salary:
Person.order("salary DESC").limit(3)
but if we try to order
that by name, we'll be in trouble; and then it's tempting to fall back into Ruby and sort
. But again, we want the database to do as much work as possible.
So, we can use a window function:
Person.
joins(
"INNER JOIN (" +
Person.
select("id, rank() OVER (ORDER BY salary DESC)").
to_sql
") salaries " \
"ON salaries.id = people.id"
).
where("salaries.rank <= 3").
order(:name)
Again, we're creating a virtual table that has a person's ID and rank (by salary), and then joining that onto the people table. We can then filter and order as usual. This generates the following SQL:
SELECT "people".*
FROM "people"
INNER JOIN (
SELECT id, rank() OVER (ORDER BY salary DESC)
FROM "people"
) salaries
ON salaries.id = people.id
WHERE (salaries.rank <= 3)
ORDER BY "people"."name" ASC;
and retrieves the data we want:
people | salaries | ||||||
---|---|---|---|---|---|---|---|
id | name | role_id | location_id | manager_id | salary | id | rank |
2 | Bill | 2 | 1 | NULL | 40000 | 2 | 3 |
1 | Eve | 2 | 3 | NULL | 50000 | 1 | 1 |
5 | Sandy | 1 | 3 | 2 | 45000 | 5 | 2 |
Success!
So, it turns out that for almost every common use case where you need to query your data, ActiveRecord will either have a built-in method, or it will give you a way to plug in your own SQL.
We have a final exercise for you to get a taste of aggregations, but then that's it! We hope you've enjoyed learning these advanced techniques for querying your ActiveRecord models, and we'll see you in the forums!