We recently came across an interesting problem, which was discussed in a previous post, Postgres Window Functions:
We want to get each post’s three most recent comments.
As discussed, you can’t use simple eager loading:
Post.order(created_at: :desc).limit(5).includes(:comments)
This will load every comment for each post. When there are many comments per post, this quickly becomes unacceptable.
Starting Slow
It’s frequently easiest to start with a slow implementation and make it faster when necessary. In many cases, the slower (and possibly simpler) implementation will work just fine, and it’s best to deploy it as-is. Let’s look at a naive implementation of our list of posts and comments:
class PostsController < ApplicationController
def index
@posts = Post.order(created_at: :desc).limit(5)
end
end
class Post < ActiveRecord::Base
has_many :comments, dependent: :destroy
def latest_comments
comments.order(created_at: :desc).limit(3)
end
end
This will frequently do fine, but causes N+1 queries, which will look like this in your log:
Started GET "/" for 127.0.0.1 at 2014-09-18 11:36:18 -0400
Processing by PostsController#index as HTML
Post Load (0.4ms) SELECT "posts".* FROM "posts"
ORDER BY "posts"."created_at" DESC LIMIT 5
Comment Load (0.2ms) SELECT "comments".* FROM "comments"
WHERE "comments"."post_id" = $1
ORDER BY "comments"."created_at" DESC
LIMIT 3 [["post_id", 27]]
Comment Load (0.3ms) SELECT "comments".* FROM "comments"
WHERE "comments"."post_id" = $1
ORDER BY "comments"."created_at" DESC
LIMIT 3 [["post_id", 28]]
Comment Load (0.2ms) SELECT "comments".* FROM "comments"
WHERE "comments"."post_id" = $1
ORDER BY "comments"."created_at" DESC
LIMIT 3 [["post_id", 29]]
...
If you’re using New Relic like we do, you’ll know a slow transaction has an N+1 problem when it shows many queries to the same model or table in the transaction log. Once performance starts to suffer, you’ll want to consolidate those queries.
In the previous post, we described how you could use a Postgres Window Function to find the comments you want in one query:
SELECT * FROM (
SELECT comments.*, dense_rank() OVER (
PARTITION BY comments.post_id
ORDER BY comments.created_at DESC
) AS comment_rank
) AS ranked_comments
WHERE comment_rank < 4;
However, how can we plug this query into ActiveRecord such that we can use the data in our views?
It’s actually fairly easy. You need two new objects: a Query Object and a Decorator. Let’s refactor to introduce these objects, and then we’ll plug in our query.
The Query Object
We can perform the Extract Class refactoring and create a Feed
model:
class Feed
def initialize(posts:)
@posts = posts.order(created_at: :desc).limit(5)
end
def posts
@posts
end
end
class PostsController < ApplicationController
def index
@feed = Feed.new(posts: Post.all)
end
end
The Decorator
We can use SimpleDelegator
to create a quick decorator class for Post
:
class PostWithLatestComments < SimpleDelegator
def latest_comments
comments.order(created_at: :desc).limit(3)
end
end
We can apply this decorator to each Post
in the Feed
:
class Feed
def posts
@posts.map { |post| PostWithLatestComments.new(post) }
end
end
The SQL Query
At this point, we’ve done nothing except to introduce two new classes to our system. However, we’ve provided ourselves an opportunity.
We frequently use Query Objects to wrap up complex SQL without polluting models. In addition to encapsulating SQL, though, they can also hold context, empowering objects to remember the query from whence they came. We’ll use this property of query objects to plug our SQL into our application.
First, we’ll use the above SQL query to find the comments relevant to our posts:
# feed.rb
def comments
Comment.
select("*").
from(Arel.sql("(#{ranked_comments_query}) AS ranked_comments")).
where("comment_rank <= 3")
end
def ranked_comments_query
Comment.where(post_id: @posts.map(&:id)).select(<<-SQL).to_sql
comments.*,
dense_rank() OVER (
PARTITION BY comments.post_id
ORDER BY comments.created_at DESC
) AS comment_rank
SQL
end
Then, we’ll group those comments by post_id
into a Hash
:
class Feed
def initialize(posts:)
@posts = posts.order(created_at: :desc).limit(5)
@comment_cache = build_comment_cache
end
# ...
private
def build_comment_cache
comments.group_by(&:post_id)
end
# ...
end
Now, we pass that Hash
to our decorator:
# feed.rb
def posts
@posts.map { |post| PostWithLatestComments.new(post, @comment_cache) }
end
class PostWithLatestComments < SimpleDelegator
def initialize(post, comments_by_post_id)
super(post)
@comments_by_post_id = comments_by_post_id
end
def latest_comments
@comments_by_post_id[id] || []
end
end
The Result
Our Feed
class is now smart enough to perform two SQL queries:
- One query to
posts
to find the posts we care about. - One query to
comments
(using Postgres Window Functions) to find the latest three comments for each post.
It then decorates each post, providing the preloaded Hash
of comments to the
decorator. This allows the decorated posts to find their latest three comments
without performing an additional query.
The finished Feed
class looks like this:
class Feed
def initialize(posts:)
@posts = posts.order(created_at: :desc).limit(5)
@comment_cache = build_comment_cache
end
def posts
@posts.map { |post| PostWithLatestComments.new(post, @comment_cache) }
end
private
def build_comment_cache
comments.group_by(&:post_id)
end
def comments
Comment.
select("*").
from(Arel.sql("(#{ranked_comments_query}) AS ranked_comments")).
where("comment_rank <= 3")
end
def ranked_comments_query
Comment.where(post_id: @posts.map(&:id)).select(<<-SQL).to_sql
comments.*,
dense_rank() OVER (
PARTITION BY comments.post_id
ORDER BY comments.created_at DESC
) AS comment_rank
SQL
end
end
As you can see, most of the logic is concerned with generating that SQL query, and the machinery for plugging the results into our ActiveRecord models is very lightweight.
At this point, requests in our log look something like this:
Started GET "/" for 127.0.0.1 at 2014-09-18 13:53:39 -0400
Processing by PostsController#index as HTML
Post Load (0.4ms) SELECT "posts".* FROM "posts"
ORDER BY "posts"."created_at" DESC
LIMIT 5
Comment Load (0.5ms) SELECT * FROM (
SELECT comments.*,
dense_rank() OVER (
PARTITION BY comments.post_id
ORDER BY comments.created_at DESC
) AS comment_rank
FROM "comments"
WHERE "comments"."post_id" IN (154, 153)
) AS ranked_comments WHERE (comment_rank <= 3)
You can use this approach for many situations where it’s difficult to use simple eager loading.
What’s next
If you’re looking for more ActiveRecord magic, be sure to check out our Advanced ActiveRecord Querying course on Upcase, or learn how the query in this post works by reading about Postgres Window Functions.