---
title: Why Ecto's Way of Storing Embedded Lists of Maps Makes Querying Hard
teaser: 'You can use PostgreSQL''s JSON data types to store embedded data on Ecto
  models. But the way Ecto tells you to store it might not be the best way.

  '
tags: elixir,ecto,postgresql,json
author: Jon Yurek
published_on: 2018-02-16
---

## The Problem

We have a bunch of embedded models on an Ecto model. They're self-contained and
don't need to be normalized or unique. They need to be flexible and lightweight
and not have a lot of overhead in either UX or code.

Here's a migration that defines the `embeds_many` column according to how [Ecto
tells you to].

```elixir
defmodule Grocer.Repo.Migrations.CreateFoods do
  use Ecto.Migration

  def change do
    create table("foods") do
      add :name, :string
      add :upc, :string
      add :ingredients, {:array, :map}, default: []
    end
  end
end
```

And the associated schemas:

```elixir
schema "foods" do
  field :name, :string
  field :upc, :string
  embeds_many :ingredients, Grocer.Ingredient
end
```

```elixir
embedded_schema do
  field :name, :string
  field :mg, :integer
end
```

Pretty standard stuff. Anyone who's defined an `embeds_many` has seen that
before. I've found that users of ORMs like Ecto, etc. don't always look at the
resulting database table, so let's look. When we go into `psql` and enter `\d
foods`, we'll see this:

                                       Table "public.foods"
       Column    |          Type          |                     Modifiers
    -------------+------------------------+----------------------------------------------------
     id          | bigint                 | not null default nextval('foods_id_seq'::regclass)
     name        | character varying(255) |
     upc         | character varying(255) |
     ingredients | jsonb[]                | default ARRAY[]::jsonb[]
    Indexes:
        "foods_pkey" PRIMARY KEY, btree (id)

Where Ecto's `{:array, :map}` column type gives us `jsonb[]` in the database.
This means "a Postgres array of `jsonb` values".

And there's the thing: Postgres has an [Array type] of its own ([and has for a
while]). This is well and good and well-supported, but they don't work like
JSON arrays do. But you know what does work like JSON? JSON.

## Getting our hands dirty

I wanted to see how well an array of JSON would stack up against a single JSON
document that contains an array. Since we have the `JSONB[]` column already,
let's start checking there. Important note: JSONB and JSON are functionally
equivalent, but have some [important under-the-hood differences]. Put briefly:
You want JSONB. You don't want JSON.

I inserted 1 million rows using [this quick script] which isn't relevant but
I'll link because the names it made entertained me greatly (and I want you to be
entertained, too). Through the magic of `EXPLAIN ANALYZE`, we can see how the
performance will be. Our SQL statement:

```sql
EXPLAIN ANALYZE SELECT name
FROM foods
WHERE ARRAY_TO_JSON(ingredients_::JSONB @> '[{"name": "Milk"}]');
```

We get these results:

                                                      QUERY PLAN
    ---------------------------------------------------------------------------------------------------------------
     Seq Scan on foods  (cost=0.00..50729.56 rows=1000 width=23) (actual
    time=0.134..4395.623 rows=287015 loops=1)
       Filter: ((array_to_json(ingredients))::jsonb @> '[{"name": "Milk"}]'::jsonb)
       Rows Removed by Filter: 712980
     Planning time: 0.040 ms
     Execution time: 4411.212 ms

4.4 seconds? [Dag, yo]. That's not really tenable.

Ok, so that's one option. The other option is simply `JSONB`. I changed the
column to real `:jsonb` in the migration (note, the default is `'[]'` and not
`[]` like it was before):

```elixir
add :ingredients, :jsonb, default: "[]"
```

And Postgres said the column type become this:

    ingredients | jsonb                  | default '[]'::jsonb

With no index on the column, the equivalent query from before, `EXPLAIN ANALYZE
SELECT * FROM foods WHERE ingredients @> '[{"name": "Milk"}]';`, looked like
this:

                                                      QUERY PLAN
    ---------------------------------------------------------------------------------------------------------------
     Seq Scan on foods  (cost=0.00..39618.01 rows=1000 width=187) (actual time=0.040..575.419 rows=208370 loops=1)
       Filter: (ingredients @> '[{"name": "Milk"}]'::jsonb)
       Rows Removed by Filter: 791631
     Planning time: 0.036 ms
     Execution time: 585.420 ms

Better! But, let's be honest, ~0.6 seconds still isn't great. However, a very
important part about JSONB columns is that you can index them. And you can't
really expect good performance out of a non-trivial query without good indexing.

To add an index, we can add `execute "CREATE INDEX ingredients_gin ON foods
USING GIN (ingredients);"` to our migration (and `execute "DROP INDEX
ingredeients_gin;"` to the down migration). This lets the same query as before
have these results:

                                                                QUERY PLAN
    -----------------------------------------------------------------------------------------------------------------------------------
     Bitmap Heap Scan on foods  (cost=27.75..3407.64 rows=1000 width=187) (actual time=42.464..248.331 rows=208370 loops=1)
       Recheck Cond: (ingredients @> '[{"name": "Milk"}]'::jsonb)
       Heap Blocks: exact=27113
       ->  Bitmap Index Scan on ingredients_gin  (cost=0.00..27.50 rows=1000 width=0) (actual time=37.778..37.778 rows=208370 loops=1)
             Index Cond: (ingredients @> '[{"name": "Milk"}]'::jsonb)
     Planning time: 0.054 ms
     Execution time: 258.965 ms

That's better. That's way better. It's still not 100% perfect, but that's
serviceable for columns you're not querying on in a tight loop. And it's quite a
bit better than the 4.4 seconds from the first example.

## Can I use this?

This is all fine here in a blog post. It's useless to me if I can't actually use
this in my code. Let's see it in action in Elixir and not just in SQL.

```elixir
iex(1)> %Grocer.Food{} |> Grocer.Food.changeset(%{name: "Egg", upc: "12345", ingredients: [%{name: "Egg", mg: 1}]}) |> Grocer.Repo.insert
[debug] QUERY OK db=42.6ms
INSERT INTO "foods" ("ingredients","name","upc") VALUES ($1,$2,$3) RETURNING "id" [[%{id: "fea00c41-7050-4795-9aca-5b5d39affa60", mg: 1, name: "Egg"}], "Egg", "12345"]
{:ok,
 %Grocer.Food{
   __meta__: #Ecto.Schema.Metadata<:loaded, "foods">,
   id: 1000002,
   ingredients: [
     %Grocer.Ingredient{
       id: "fea00c41-7050-4795-9aca-5b5d39affa60",
       mg: 1,
       name: "Egg"
     }
   ],
   name: "Egg",
   upc: "12345"
 }}
```

It certainly appears to work just as well as the recommended column type. What
does the database look like?

```psql
grocer_dev=# select * from foods where upc = '12345';
   id    | name |  upc  |                               ingredients
---------+------+-------+--------------------------------------------------------------------------
 1000002 | Egg  | 12345 | [{"id": "fea00c41-7050-4795-9aca-5b5d39affa60", "mg":
1, "name": "Egg"}]
```

That looks like it works just fine!

## So why do we use JSONB[]?

Postgres came out with JSON support in [version 9.2]. It was implemented as
little more than a text blob, though. [Version 9.4] has JSONB support, but that
didn't come out until the end of 2014. [Ecto's first commits for
`embeds_many`][embeds_many] came in mid-2015. It seems quite likely that the
concern at the time was that older versions of Postgres (which would still have
been very common) wouldn't have good JSONB support and would therefore be not
performant enough.

Since we're over 3 years out from the release of Postgres 9.4, I think it's
probably been long enough to consider switching your column types, especially on
new development and when you control your database or you're using something
like [Heroku] which keeps its databases nice and shiny.

[Ecto tells you to]:https://hexdocs.pm/ecto/Ecto.Schema.html#embeds_many/3
[this quick script]:https://gist.github.com/jyurek/c18149980e2e48baa96c072beaa08981
[our own post]:https://thoughtbot.com/blog/embedding-elixir-structs-in-ecto-models
[Array type]:https://www.postgresql.org/docs/current/static/arrays.html
[and has for a while]:https://www.postgresql.org/docs/7.1/static/arrays.html
[Dag, yo]:https://youtu.be/539zCn8ySbc?t=49s
[important under-the-hood differences]:https://www.citusdata.com/blog/2016/07/14/choosing-nosql-hstore-json-jsonb/
[version 9.2]:https://www.postgresql.org/docs/9.2/static/release-9-2.htm
[Version 9.4]:https://www.postgresql.org/docs/9.4/static/release-9-4.html
[embeds_many]:https://github.com/elixir-ecto/ecto/commit/b58c8cd1e31c27c675153c7eb57a379e5244ce3c
[Heroku]:https://www.heroku.com/
