Many organizations are hungry for data to help the team understand how customers are using their services. On a recent project the business owners needed a report but intended to hold off building an admin-y report-packed hub until core customer-facing functionality was complete.
All reports take effort: Writing the database query, maintaining it as data structures change, and allowing business users to customize the report all cost time. Layer on supporting different output formats & security and one can see the gulf between cobbling together a casual database query and building a reporting tool with ongoing value.
The ideal solution for the client also needed to be self-service — let’s face it — no one wants to be the guardian of a jumble of SQL scripts and field ad-hoc requests to run them.
A solution
Heroku Dataclips is a data sharing tool that displays the results of a SQL query run against your Heroku PostgreSQL database. Dataclips are awesome because:
- Sharing is easy via hard-to-guess public URLs (or you can enable authentication).
- Export options include CSV, XLS, JSON.
- The CSV URL is embeddable in Google Sheets using
ImportData("https://dataclips.heroku.com/<id>.csv")
. - Business users can make adjustments to the SQL query if they need to.
- Results are cached to minimize load on your database.
Forge & forget?
Although the Dataclips interface features a handy built-in editor, one doesn’t have to immediately jump to writing SQL. One point to bear in mind is keeping future maintenance costs small. If a database column is renamed during a migration this will break your report and a CSV-starved colleague will seek you out.
One approach to protect ourselves from brittle reports is to generate the SQL query with an ORM and test it at the same time. This has a double benefit. First, testing helps us verify a complex query returns the correct data. Second, a breaking migration will cause the test to fail, in turn prompting our diligent selves to update the query and the dataclip.
Example time
Here’s a simplified example which lists cinema bookings and information about the associated user.
Report
# app/reports/reports/bookings.rb
module Reports
class Bookings
def to_sql
query.to_sql
end
private
def query
Booking.
joins(:user).
select("bookings.id").
select("users.name AS client_name").
select("users.email AS client_email").
select("bookings.price AS price_paid")
end
end
end
Spec
# spec/reports/reports/bookings_report_spec.rb
require "rails_helper"
RSpec.describe Reports::Bookings do
describe "#to_sql" do
subject(:query) { described_class.new.to_sql }
let(:report) { ActiveRecord::Base.connection.execute(query) }
it "includes the session details" do
booking = create(:booking)
row = report.first
expect(row["id"]).to eq booking.id
expect(row["client_name"]).to eq booking.user.name
expect(row["client_email"]).to eq booking.user.email
expect(row["price_paid"]).to eq booking.price.to_s
end
it "has a row per client booking" do
user = create(:user)
booking1 = create(:booking, user: user)
booking2 = create(:booking, user: user)
report_booking_ids = report.map { |row| row["id"] }
expect(report_booking_ids).to contain_exactly booking1.id, booking2.id
end
# etc
end
end
Rake task
# lib/tasks/reports.rake
namespace :reports do
desc "Generate SQL queries for Heroku Dataclips"
task generate_sql: :environment do
reports_path = Rails.root.join("tmp", "sql")
timestamp = Time.current.strftime("%F@%H-%M-%S")
FileUtils.mkdir_p(reports_path)
Dir.chdir(reports_path) do
File.write("bookings.#{timestamp}.sql", Reports::Bookings.new.to_sql)
end
puts "Reports SQL saved to:", reports_path
end
end
The public interface of Reports::Bookings
deliberately exposes only #to_sql
,
rather than the ActiveRecord Relation object from #query
. That’s because the
raw SQL is all that’s required to satisfy the Rake task.
There are benefits to the spec using the raw SQL too. It helps us verify the query can be executed standalone. Also the expectations are then written against an array of hashes, which are closer to the rows of strings in a CSV output, without the hassle of executing the Rake task and reading a CSV file in the tests.
Pulling it all together
Running the following command in the terminal creates the SQL files:
bundle exec rake reports:generate_sql
Which outputs the SQL files to tmp/sql
, ready to paste into a dataclip:
SELECT bookings.id,
users.name AS client_name,
users.email AS client_email,
bookings.price AS price_paid
FROM "bookings"
INNER JOIN "users" ON "users"."id" = "bookings"."user_id"
A vapourware cherry on top
What do you think would make Dataclips even more useful? Top of my wishlist would be updating dataclip queries via the Heroku CLI.
Picture your deployment script: Pushing changes to Heroku, running migrations, and updating dataclips could all be tidily bundled together. The upside is that reports and the database schema should remain in sync.
#!/bin/sh
# Run this script to deploy the app to Heroku.
set -eu
branch="$(git symbolic-ref HEAD --short)"
target="${1:-staging}"
# Backup
heroku pg:backups capture --remote "$target"
# Deploy
git push "$target" "$branch:master"
# Migrate
heroku run rails db:migrate --exit-code --remote "$target"
heroku restart --remote "$target"
# Upsert dataclips
bundle exec rake reports:generate_sql
heroku dataclips upsert --files tmp/sql --remote "$target"
Preferably the CLI would allow upserts — providing one combined command to create dataclips for new queries, and update dataclips for modified queries.
A minor quandary would be associating SQL files with dataclips. Perhaps the dataclip ID could be derived from the account name + SQL filename, or a YAML manifest could map the two sides. Hopefully this is a stumbling block worth navigating!