Preloading Nested Associations with Ecto

For the purpose of this article, assume we are building a simple blog app – our users can write blog posts and make comments on them. We’re interested in the associations, or relationships, between the models. A single blog post belongs to a user and has many comments. Any given comment belongs to a user and a blog post. Finally, a user has many posts and has many comments.

Now, say we want to display a user with a list of their post titles and all of the comments made on those posts. In such a use case, posts are nested under the user and comments are nested under posts.

User -> Post -> Comment

We could start off by writing something along the lines of:

user = Blog.Repo.get(Blog.User, ^user_id)

# or via query expression

user = Blog.Repo.one from user in Blog.User,
  where: user.id == ^user_id

Trying to access posts on this user will give us an error:

cannot encode association :posts from Blog.User to JSON because the
association was not loaded. Please make sure you have preloaded the
association or remove it from the data to be encoded

It is letting us know that we must explicitly preload the data that we are accessing. This is good, as it prevents accidental N+1 queries from negatively affecting our performance. We can preload our associations with:

user = Blog.Repo.one from user in Blog.User,
  where: user.id == ^user_id,
  preload: [posts: :comments]

Our query works and we can access user’s posts and related comments. But, we are generating three queries – one to find our user, one to load its posts, and one for the comments.

SELECT u0."id", u0."name", u0."email", u0."inserted_at", u0."updated_at"
  FROM "users" AS u0
  WHERE (u0."id" = $1) [1]
SELECT p0."id", p0."title", p0."body", p0."user_id", p0."inserted_at", p0."updated_at", p0."user_id"
  FROM "posts" AS p0
  WHERE (p0."user_id" = $1)
  ORDER BY p0."user_id" [1]
SELECT c0."id", c0."body", c0."user_id", c0."post_id", c0."inserted_at", c0."updated_at", c0."post_id"
  FROM "comments" AS c0
  WHERE (c0."post_id" = $1)
  ORDER BY c0."post_id" [1]

This isn’t an N+1 problem, since it will only generate a query per preloaded association and not per each record. However, we’ve been taught that the less queries we make, the better it is. Could we have one query to load all the users plus their posts and comments? Yes, we can achieve that by using SQL joins and preloading our associations directly into the structs:

user = Blog.Repo.one from user in Blog.User,
  where: user.id == ^user_id,
  left_join: posts in assoc(user, :posts),
  left_join: comments in assoc(posts, :comments),
  preload: [posts: {posts, comments: comments}]

Success! We preloaded all of our data using SQL joins by generating a single query:

SELECT u0."id", u0."name", u0."email", u0."inserted_at", u0."updated_at", p1."id", p1."title", p1."body", p1."user_id", p1."inserted_at", p1."updated_at", c2."id", c2."body", c2."user_id", c2."post_id", c2."inserted_at", c2."updated_at" FROM "users" AS u0 LEFT OUTER JOIN "posts" AS p1 ON p1."user_id" = u0."id" LEFT OUTER JOIN "comments" AS c2 ON c2."post_id" = p1."id" WHERE (u0."id" = $1) [1]

However, it can be a bit cryptic trying to figure out what is going on in that Ecto expression. Let’s break it down:

First, we create a binding for the user’s data, in this case user (it can be named anything, like u or foo as long as it’s consistent in the query expression).
Next, we tell our left_join clause what tables to join – join the posts table using the user_id foreign key, then join the comments table using the post_id foreign key.
Finally, we preload the joined data into our posts and comments struct bindings. Again, these can be named anything, but must match throughout the entire expression.

The preload syntax for nested associations looks a bit odd at first, but it really is just a list of key-value pairs (where value can be an atom, a tuple or a list). Imagine if our users also had multiple file uploads and we wanted to preload those too, we’d write:

preload: [posts: {posts, comments: comments}, uploads: uploads]

Ecto provides two different types of syntax for queries: keyword-based and pipe-based (also called macro-based). So far, we’ve been using the keyword-based query syntax. Much of Elixir code is written with the help of the pipe |> operator, thus we’ll re-write our query using pipe-based expression which favors the pipe.

user = Blog.User
|> where([user], user.id == ^user_id)
|> join(:left, [user], posts in assoc(user, :posts))
|> join(:left, [user, posts], comments in assoc(posts, :comments))
|> preload([user, posts, comments], [posts: {posts, comments: comments}])
|> Blog.Repo.one

Looks nice, doesn’t it? But there’s a slight disadvantage of having to specify the bindings for every operation. Because we are chaining individual functions, the names of the bindings don’t have to be consistent across refinement queries, and we can write our expression as:

user = Blog.User
|> where([user], user.id == ^user_id)
|> join(:left, [u], _ in assoc(u, :posts))
|> join(:left, [_, posts], _ in assoc(posts, :comments))
|> preload([_, p, c], [posts: {p, comments: c}])
|> Blog.Repo.one

One last thing worth pointing out about bindings is that they are positional. We cannot re-arrange them like [comments, user, posts]. However, if we only need the first binding, we don’t have to specify the rest. Thus, in our join queries we only specify bindings for user and posts. Further, a ... operator allows us to avoid listing out bindings which we don’t care about. For example, say we only needed the first and the last binding in our query expression, we could write our bindings as [user, ..., comments], independently of how many bindings are in between (can be zero or many). Since we don’t need a binding for the user in our preload, we can write it as:

preload([..., posts, comments], [posts: {posts, comments: comments}])

You can read more about preloading associations in the official Ecto docs. This article was written using Ecto 2.1.2.

Sign up to receive a weekly recap from thoughtbot

About thoughtbot