Announcing Scenic - Versioned Database Views for Rails

Derek Prior

Database views are powerful SQL constructs that are unfortunately often overlooked by Rails application developers because views lack out-of-the-box support from Rails itself. Many of the more complex ActiveRecord scopes, relationships, and query objects we’ve come across could more clearly and composably be expressed as database views, and yet they seldom are.

We’ve used database views to generate complex reports, implement full-text search across multiple models, and promote concepts hidden in our data to proper domain objects in our Rails applications.

Don’t take our word for the importance of SQL views. Here’s what the PostgreSQL documentation itself has to say on the matter:

Making liberal use of views is a key aspect of good SQL database design. Views allow you to encapsulate the details of the structure of your tables, which might change as your application evolves, behind consistent interfaces.

Scenic adds first class support for database views to ActiveRecord migrations. Scenic goes beyond basic create_view and drop_view support to offer a view versioning system that makes maintaining views as they change as simple and robust as managing tables via migrations. It also ships with extensive support for materialized views.

scenic landscape

Scenic has been open source for quite some time now, but we kept it quiet while we dog-fooded internally and gathered feedback on its use from our early users. We released version 1.0.0 in late November and we’ve just recently shipped version 1.1.0.

Scenic by example

I recently worked with the Upcase team to improve site performance. One look at the data from Skylight showed several N+1 queries that were resistant to the typical prescription of eager loading. One such resistant N+1 query was in an action that showed users flashcards that they had flagged as needing further review.

The code for the relationships at play looked like this:

class Flashcard < ActiveRecord::Base
  has_many :attempts, -> { order(created_at: :desc) }

  def most_recent_attempt_for(user)
    attempts.where(user: user).first || NullAttempt.new
  end
end

class Attempt
  belongs_to :user
  belongs_to :flashcard
end

As you may have noticed, flashcards have many attempts by the same user. Attempts have a confidence attribute and the intent was to track how a user’s confidence in a topic moves over time. The most_recent_attempt_for method hints that there’s a concept that is hiding in our data, as we’re about to see.

The list of flashcards that a user did not have high confidence in was retrieved using this query object:

class FlashcardsNeedingReviewQuery
  CONFIDENCE_THRESHOLD = 4

  def initialize(user)
    @user = user
  end

  def run
    Flashcard.
      all.
      map { |q| q.most_recent_attempt_for(@user) }.
      select { |a| a.confidence > 0 && a.confidence < CONFIDENCE_THRESHOLD }.
      map(&:flashcard)
  end
end

For each flashcard in the system, we’re getting the user’s most recent attempt, filtering to those that do not meet our confidence threshold, and finally mapping back to the flashcard object itself. Not only is this an N+1 query, but we’re also generating thousands of short-lived ruby objects which increase garbage collection pressure, rearing its head as a performance problem in other unrelated actions.

The data we collect gives us a history of a user’s confidence level, but here our application only cares about the confidence level of their most recent attempt. Let’s promote the idea of a most recent attempt on a flashcard to a proper domain concept.

Enter Scenic

With Scenic added to our Gemfile and installed, we have access to some handy generators that are going to help us create a view to encapsulate this domain object.

$ rails generate scenic:model latest_attempt

This creates a few things for us:

  • A migration that runs create_view :latest_attempts.
  • A SQL file for our view definition at db/views/latest_attempts_v01.sql
  • A model that is backed by our view at app/models/latest_attempt.rb

Let’s start by defining our SQL view in latest_attempts_v01.sql. We want this view to return only the latest attempt for each user for each flashcard. Free of ActiveRecord, we can confidently use all of the features of SQL to do this. In this case, we’ll use DISTINCT ON, which keeps only the first row of each set of rows where the provided columns are equal.

SELECT DISTINCT ON (user_id, flashcard_id) *
FROM attempts
ORDER BY user_id, flashcard_id, updated_at DESC

With our view definition file saved, run rake db:migrate to create the view. Let’s skip the model code for a minute and update the query object to have the code we wish existed using this new LatestAttempt model. How about something like:

def run
  LatestAttempt.by(@user).
    confidence_below(CONFIDENCE_THRESHOLD).
    includes(:flashcard).
    map(&:flashcard)
end

That looks like a decently composable API for LatestAttempt, so let’s get to implementing it in our model. Our model is backed by a database view, but we’re still able to use our familiar ActiveRecord syntax for querying and relationships.

class LatestAttempt < ActiveRecord::Base
  belongs_to :user
  belongs_to :flashcard

  def self.by(user)
    where(user_id: user.id)
  end

  def self.confidence_below(level)
    where("confidence < ?", level)
  end
end

With all of that in place, our N+1 query is gone and we’ve significantly decreased garbage collection pressure. A benchmark of both methods with production data shows that the previously existing Ruby implementation is 85 times slower than our new implementation that makes use of the view. Equally important, we’ve promoted the previously hidden concept that flashcards have most recent attempts by each user to a proper domain object that can be reused elsewhere in our system.

Next steps

Please check out Scenic today. We’re confident you can find a use for database views in your application that will improve the performance and readability of your code.

If you’d like to hear more inside baseball about the development of Scenic, check out episode 45 of The Bike Shed, where Derek and Sean discuss the challenges encountered in developing and versioning Scenic.