For the purpose of this article, assume we are building a simple blog app – our users can write blog posts and make comments on them. We’re interested in the associations, or relationships, between the models. A single blog post belongs to a user and has many comments. Any given comment belongs to a user and a blog post. Finally, a user has many posts and has many comments.
Now, say we want to display a user with a list of their post titles and all of the comments made on those posts. In such a use case, posts are nested under the user and comments are nested under posts.
User -> Post -> Comment
We could start off by writing something along the lines of:
user = Blog.Repo.get(Blog.User, ^user_id)
# or via query expression
user = Blog.Repo.one from user in Blog.User,
where: user.id == ^user_id
Trying to access posts on this user will give us an error:
cannot encode association :posts from Blog.User to JSON because the
association was not loaded. Please make sure you have preloaded the
association or remove it from the data to be encoded
It is letting us know that we must explicitly preload the data that we are accessing. This is good, as it prevents accidental N+1 queries from negatively affecting our performance. We can preload our associations with:
user = Blog.Repo.one from user in Blog.User,
where: user.id == ^user_id,
preload: [posts: :comments]
Our query works and we can access user’s posts and related comments. But, we are generating three queries – one to find our user, one to load its posts, and one for the comments.
SELECT u0."id", u0."name", u0."email", u0."inserted_at", u0."updated_at"
FROM "users" AS u0
WHERE (u0."id" = $1) [1]
SELECT p0."id", p0."title", p0."body", p0."user_id", p0."inserted_at", p0."updated_at", p0."user_id"
FROM "posts" AS p0
WHERE (p0."user_id" = $1)
ORDER BY p0."user_id" [1]
SELECT c0."id", c0."body", c0."user_id", c0."post_id", c0."inserted_at", c0."updated_at", c0."post_id"
FROM "comments" AS c0
WHERE (c0."post_id" = $1)
ORDER BY c0."post_id" [1]
This isn’t an N+1 problem, since it will only generate a query per preloaded association and not per each record. However, we’ve been taught that the less queries we make, the better it is. Could we have one query to load all the users plus their posts and comments? Yes, we can achieve that by using SQL joins and preloading our associations directly into the structs:
user = Blog.Repo.one from user in Blog.User,
where: user.id == ^user_id,
left_join: posts in assoc(user, :posts),
left_join: comments in assoc(posts, :comments),
preload: [posts: {posts, comments: comments}]
Success! We preloaded all of our data using SQL joins by generating a single query:
SELECT u0."id", u0."name", u0."email", u0."inserted_at", u0."updated_at", p1."id", p1."title", p1."body", p1."user_id", p1."inserted_at", p1."updated_at", c2."id", c2."body", c2."user_id", c2."post_id", c2."inserted_at", c2."updated_at" FROM "users" AS u0 LEFT OUTER JOIN "posts" AS p1 ON p1."user_id" = u0."id" LEFT OUTER JOIN "comments" AS c2 ON c2."post_id" = p1."id" WHERE (u0."id" = $1) [1]
However, it can be a bit cryptic trying to figure out what is going on in that Ecto expression. Let’s break it down:
- First, we create a binding for the user’s data, in this case
user
(it can be named anything, likeu
orfoo
as long as it’s consistent in the query expression). - Next, we tell our
left_join
clause what tables to join – join theposts
table using theuser_id
foreign key, then join thecomments
table using thepost_id
foreign key. - Finally, we preload the joined data into our
posts
andcomments
struct bindings. Again, these can be named anything, but must match throughout the entire expression.
The preload syntax for nested associations looks a bit odd at first, but it really is just a list of key-value pairs (where value can be an atom, a tuple or a list). Imagine if our users also had multiple file uploads and we wanted to preload those too, we’d write:
preload: [posts: {posts, comments: comments}, uploads: uploads]
Ecto provides two different types of syntax for queries: keyword-based and
pipe-based (also called macro-based). So far, we’ve been using the keyword-based
query syntax. Much of Elixir code is written with the help of the pipe |>
operator, thus we’ll re-write our query using pipe-based expression which
favors the pipe.
user = Blog.User
|> where([user], user.id == ^user_id)
|> join(:left, [user], posts in assoc(user, :posts))
|> join(:left, [user, posts], comments in assoc(posts, :comments))
|> preload([user, posts, comments], [posts: {posts, comments: comments}])
|> Blog.Repo.one
Looks nice, doesn’t it? But there’s a slight disadvantage of having to specify the bindings for every operation. Because we are chaining individual functions, the names of the bindings don’t have to be consistent across refinement queries, and we can write our expression as:
user = Blog.User
|> where([user], user.id == ^user_id)
|> join(:left, [u], _ in assoc(u, :posts))
|> join(:left, [_, posts], _ in assoc(posts, :comments))
|> preload([_, p, c], [posts: {p, comments: c}])
|> Blog.Repo.one
One last thing worth pointing out about bindings is that they are positional.
We cannot re-arrange them like [comments, user, posts]
. However, if we only
need the first binding, we don’t have to specify the rest. Thus, in our join
queries we only specify bindings for user and posts. Further, a ...
operator
allows us to avoid listing out bindings which we don’t care about. For example,
say we only needed the first and the last binding in our query expression, we
could write our bindings as [user, ..., comments]
, independently of how many
bindings are in between (can be zero or many). Since we don’t need a binding for
the user in our preload
, we can write it as:
preload([..., posts, comments], [posts: {posts, comments: comments}])
You can read more about preloading associations in the official Ecto docs. This article was written using Ecto 2.1.2.