3 ways to model an associated collection in Rails

Imagine we’re modeling an amusement park. It has attractions and guests.

We start with this database migration:

class InitialSchema < ActiveRecord::Migration[7.1]
  def change
    create_table :attractions do |t|
      t.string :name, null: false
      t.timestamps
    end

    create_table :guests do |t|
      t.timestamps
    end
  end
end

Guests have to wait in line before going on an attraction.

What are some fundamental properties of this? I can think of these:

  1. A guest can only physically queue at a single attraction.
  2. A guest can only be in the queue once.

As a last constraint, let’s say that our amusement park has only one queue per attraction.

Let’s see different ways we can represent the queue in a Rails application.

An array column

PosgreSQL has an array type and it is supported by ActiveRecord.

class CreateQueuedGuests < ActiveRecord::Migration[7.1]
  def change
    add_column :attractions, :queued_guests, :bigint, array: true, null: false, default: []
  end
end

There are two notable instructions. One is that the column type is an array of bigints. That is because we want to store the IDs of guest records.

The other notable option is the default value of [], i.e. empty array. Without specifying a default value, the, uh, default is null. You cannot add elements to null/nil:

nil << 2
# => undefined method `<<' for nil:NilClass (NoMethodError)

But you can add elements to an empty array:

[] << 2
# => [2]

In our case adding guests to the end of the queue is a common operation and we don’t want to think about initializing the array before adding the first item.

class AttractionTest < ActiveSupport::TestCase
  test "adding guests to the queue" do
    guest1 = Guest.create!
    guest2 = Guest.create!
    attraction = Attraction.create!(name: "Water ride")

    attraction.queued_guests << guest1.id
    attraction.queued_guests << guest2.id

    assert_equal [guest1.id, guest2.id], attraction.queued_guests
  end
end

What’s more, our rule of “an attraction has only one queue” is better served by an empty queue ([]) than an non-existent queue (null/nil).

Sometimes guests change their mind after waiting some time and they want to go to another attraction. How do we remove them from the queue? We use the delete method like we would for any array. A quick test to illustrate:

test "removing a guest from the queue" do
  guest = Guest.create!
  attraction = Attraction.create!(name: "Ferris wheel")
  attraction.queued_guests << guest.id
  assert_equal [guest.id], attraction.queued_guests

  attraction.queued_guests.delete(guest.id)

  assert_equal [], attraction.reload.queued_guests
end

We demonstrated adding and removing elements from the array column, but we’ve ignored the rules we’d agreed on at the start. Remember that we said we’re modeling standing in a single queue. We could check this in a test like the following:

test "a guest can only queue at one attraction" do
  guest1 = Guest.create!
  guest2 = Guest.create!
  attraction1 = Attraction.create!(name: "Water ride")
  attraction2 = Attraction.create!(name: "Ferris wheel")

  attraction1.queue_guest(guest1)
  attraction1.queue_guest(guest1)
  attraction1.queue_guest(guest2)
  attraction2.queue_guest(guest2)

  assert_equal [guest1.id, guest2.id], attraction1.queued_guests
  assert_equal [], attraction2.queued_guests
end

As far as I know, PostgreSQL can’t help us with uniqueness constraints when we’re using an array column. We need to check by ourselves if the guest has queue already before adding them to a queue:

class Attraction < ApplicationRecord
  def queue_guest(guest)
    Attraction.transaction do
      existing_queued_guests = Attraction.lock.all.pluck(:queued_guests).flatten
      if existing_queued_guests.none? { |id| id == guest.id }
        queued_guests << guest.id
        save
      end
    end
  end
end

Notice my attempt at locking all records before updating. That is because if the same code is running in two processes simultaneously, one of them might insert the record right after the one has checked that the record does not exist. I don’t even think the code is correct and I am not sure how to do it properly.

It also seems like we are doing a lot of manual work to do something at which the database is better.

A table for queues and a join table

This time we represent a queue as a row in a guest_queues table instead of a column on in the attractions table.

create_table :guest_queues do |t|
  t.references :attraction, null: false, foreign_key: true, index: true
  t.timestamps
end

We also introduce a join table between the queue and guests. I’ll call it queue_membership. It represents the fact of belonging to a queue. Adding a person is a matter of creating a QueueMembership. Removing a person of the queue is done by deleting the QueueMembership.

create_table :queue_memberships do |t|
  t.references :guest_queue, null: false, foreign_key: true, index: true
  t.references :guest, null: false, foreign_key: true, index: {unique: true}
  t.timestamps
  t.index :created_at
  t.index [:guest_queue_id, :guest_id], unique: true
end

The uniqueness constraints enforce our rules from the start: a guest can only be in a single queue and they appear at most once in a queue.

The queue order was previously provided by the array. Now we can use the order of arrival. This is why added a database index to the created_at column in the migration.

class GuestQueue < ApplicationRecord
  has_many :queue_memberships, -> { order(:created_at) }
  has_many :guests, through: :queue_memberships
end

Now that we’ve named the guest queue and queue membership concepts, here’s how we could add and remove guests:

class QueueMembershipsController < ApplicationController
  def create
    guest_queue = GuestQueue.find_or_create_by!(
      attraction_id: params.fetch(:attraction_id)
    )
    guest_queue.queue_memberships.find_or_create_by!(
      guest_id: params.fetch(:guest_id)
    )
  end

  def destroy
    QueueMembership.find_by(id: params[:id])&.destroy
  end
end

An annoying detail becomes apparent in this code: a queue record must exist before we add a guest to it. Here we’ve chosen to create it on demand. However, one could also accidentally create more than one queue, say through a bug in that GuestQueue.find_or_create_by call.

How can we make it so we only have a single queue and we don’t need to create it manually?

Just a join table

That guest_queues table in the previous example is not doing much work for us. It points to an attraction. Guests are queueing in the queue of the attraction. We can skip the indirection and say they are queueing at the attraction. In code that looks like the following.

create_table :queue_memberships do |t|
  t.references :attraction, null: false, foreign_key: true, index: true
  t.references :guest, null: false, foreign_key: true, index: {unique: true}
  t.timestamps
  t.index :created_at
  t.index [:attraction_id, :guest_id], unique: true
end

The attraction can retrieve the queued_guests directly, much like in the first example where that was an array column.

class Attraction < ApplicationRecord
  has_many :queue_memberships, -> { order(:created_at) }
  has_many :queued_guests, through: :queue_memberships, source: :guest
end

class QueueMembership < ApplicationRecord
  belongs_to :attraction
  belongs_to :guest
end

Now, the controller is also shorter:

class QueueMembershipsController < ApplicationController
  def create
    QueueMembership.create!(
      attraction_id: params.fetch(:attraction_id),
      guest_id: params.fetch(:guest_id)
    )
  end

  def destroy
    QueueMembership.find_by(id: params[:id])&.destroy
  end
end

Do we need a table for the collection?

We’ve seen that it’s possible to represent the queue without a queues table, so why would we add one?

When has_one becomes has_many

The amusement park management might decide to introduce a more expensive tickets that gives you access to an exclusive second queue with higher priority. Now we need to model multiple queues per attraction and perhaps give a name or type to each queue.

When you want to store other properties of the collection

Above, the GuestQueue model didn’t hold any information except the associated attraction. Say we want to model the maximum number of guests that can comfortably fit in the queue. A new capacity column on the guest_queues table would do the job well, because each queue has a different capacity.