---
title: Lean & Maintainable Reports with Heroku Dataclips
teaser: 'Write testable reports in SQL that don''t erode over time.

  '
tags: data,heroku,rails,web
author: Oli Peate
published_on: 2017-08-21
---

Many organizations are hungry for data to help the team understand how customers
are using their services. On a recent project the business owners needed a
report but intended to hold off building an admin-y report-packed hub until core
customer-facing functionality was complete.

![Pacman eating CSV files](https://images.thoughtbot.com/blog-vellum-image-uploads/R8KnSbwtS6aTRU6q2Bko_csv-pacman-ii.gif)

All reports take effort: Writing the database query, maintaining it as data
structures change, and allowing business users to customize the report all cost
time. Layer on supporting different output formats & security and one can see
the gulf between cobbling together a casual database query and building a
reporting tool with ongoing value.

The ideal solution for the client also needed to be self-service — let's face
it — no one wants to be the guardian of a jumble of SQL scripts and field
ad-hoc requests to run them.

## A solution

[Heroku Dataclips] is a data sharing tool that displays the results of a SQL
query run against your Heroku PostgreSQL database. Dataclips are awesome
because:

- Sharing is easy via hard-to-guess public URLs (or you can enable authentication).
- Export options include CSV, XLS, JSON.
- The CSV URL is embeddable in Google Sheets using [`ImportData("https://dataclips.heroku.com/<id>.csv")`][importdata].
- Business users can make adjustments to the SQL query if they need to.
- Results are cached to minimize load on your database.

[Heroku Dataclips]: https://blog.heroku.com/new-dataclips
[importdata]: https://support.google.com/docs/answer/3093335?hl=en

[![An example report of "Book purchases by month" with Heroku Dataclips](https://images.thoughtbot.com/blog-vellum-image-uploads/fPX6MGJS0qSr9NNNArDu_heroku-dataclips-screenshot.png)][dataclips-example]

[dataclips-example]: https://dataclips.heroku.com/vgyygvzqtezwpmwpcmmjlluamjlk

## Forge & forget?

Although the Dataclips interface features a handy built-in editor, one doesn't
have to immediately jump to writing SQL. One point to bear in mind is keeping
future maintenance costs small. If a database column is renamed during a
migration this will break your report and a CSV-starved colleague will seek you
out.

One approach to protect ourselves from brittle reports is to generate the SQL
query with an ORM and test it at the same time. This has a double benefit.
First, testing helps us verify a complex query returns the correct data. Second,
a breaking migration will cause the test to fail, in turn prompting our diligent
selves to update the query and the dataclip.

## Example time

Here's a simplified example which lists cinema bookings and information about
the associated user.

### Report

```ruby
# app/reports/reports/bookings.rb

module Reports
  class Bookings
    def to_sql
      query.to_sql
    end

    private

    def query
      Booking.
        joins(:user).
        select("bookings.id").
        select("users.name AS client_name").
        select("users.email AS client_email").
        select("bookings.price AS price_paid")
    end
  end
end
```

### Spec

```ruby
# spec/reports/reports/bookings_report_spec.rb

require "rails_helper"

RSpec.describe Reports::Bookings do
  describe "#to_sql" do
    subject(:query) { described_class.new.to_sql }

    let(:report) { ActiveRecord::Base.connection.execute(query) }

    it "includes the session details" do
      booking = create(:booking)
      row = report.first

      expect(row["id"]).to eq booking.id
      expect(row["client_name"]).to eq booking.user.name
      expect(row["client_email"]).to eq booking.user.email
      expect(row["price_paid"]).to eq booking.price.to_s
    end

    it "has a row per client booking" do
      user = create(:user)
      booking1 = create(:booking, user: user)
      booking2 = create(:booking, user: user)

      report_booking_ids = report.map { |row| row["id"] }

      expect(report_booking_ids).to contain_exactly booking1.id, booking2.id
    end

    # etc
  end
end
```

### Rake task

```ruby
# lib/tasks/reports.rake

namespace :reports do
  desc "Generate SQL queries for Heroku Dataclips"
  task generate_sql: :environment do
    reports_path = Rails.root.join("tmp", "sql")
    timestamp = Time.current.strftime("%F@%H-%M-%S")

    FileUtils.mkdir_p(reports_path)

    Dir.chdir(reports_path) do
      File.write("bookings.#{timestamp}.sql", Reports::Bookings.new.to_sql)
    end

    puts "Reports SQL saved to:", reports_path
  end
end
```

The public interface of `Reports::Bookings` deliberately exposes only `#to_sql`,
rather than the ActiveRecord Relation object from `#query`. That's because the
raw SQL is all that's required to satisfy the Rake task.

There are benefits to the spec using the raw SQL too. It helps us verify the
query can be executed standalone. Also the expectations are then written against
an array of hashes, which are closer to the rows of strings in a CSV output,
without the hassle of executing the Rake task and reading a CSV file in the
tests.

## Pulling it all together

Running the following command in the terminal creates the SQL files:

```sh
bundle exec rake reports:generate_sql
```

Which outputs the SQL files to `tmp/sql`, ready to paste into a dataclip:

```sql
SELECT bookings.id,
       users.name AS client_name,
       users.email AS client_email,
       bookings.price AS price_paid
FROM "bookings"
INNER JOIN "users" ON "users"."id" = "bookings"."user_id"
```

## A vapourware cherry on top

What do you think would make Dataclips even more useful? Top of my wishlist
would be **updating dataclip queries via the Heroku CLI**.

Picture your deployment script: Pushing changes to Heroku, running migrations,
and updating dataclips could all be tidily bundled together. The upside is that
reports and the database schema should remain in sync.

```sh
#!/bin/sh

# Run this script to deploy the app to Heroku.

set -eu

branch="$(git symbolic-ref HEAD --short)"
target="${1:-staging}"

# Backup
heroku pg:backups capture --remote "$target"

# Deploy
git push "$target" "$branch:master"

# Migrate
heroku run rails db:migrate --exit-code --remote "$target"
heroku restart --remote "$target"

# Upsert dataclips
bundle exec rake reports:generate_sql
heroku dataclips upsert --files tmp/sql --remote "$target"
```

Preferably the CLI would allow upserts — providing one combined command to
create dataclips for new queries, and update dataclips for modified queries.

A minor quandary would be associating SQL files with dataclips. Perhaps the
dataclip ID could be derived from the account name + SQL filename, or a YAML
manifest could map the two sides. Hopefully this is a stumbling block worth
navigating!
