A few weeks ago I had a really great bad idea: What if I tried to test SQL outside of a framework, using only SQL. Was it possible? Did it even make sense? I would soon accept what many developers come to terms with at some point in their careers: Yes my idea is possible and yes, it’s not a very good one. However, there was a silver lining! I learned that while it didn’t quite make sense to test with SQL, I could test the performance of different queries that retrieved the same data. Enter PostgreSQL’s’ EXPLAIN ANALYZE command.
EXPLAIN ANALYZE is somewhat self explanatory: it breaks down how Postgres will execute a given query. The most valuable part of this output is an estimated time of how long a query will take. So, while it might not make sense to test your big ol’ SQL query outside of a development framework, getting dirty in the Postgres terminal with EXPLAIN could save you from painfully slow data retrieval.
Let’s look at an example. The other day I wrote this fancy pants query using a common table expression and asked my colleagues if there was a less verbose way to write it:
WITH user_visits_table AS
(
SELECT row_number() over(ORDER BY created_at DESC) AS visit_date, created_at
FROM visits WHERE visits.user_id = 1
)
SELECT DISTINCT CAST(created_at AS DATE) FROM user_visits_table
My colleague Summer ☀️ suggested the following:
SELECT COUNT(*) FROM (SELECT DISTINCT CAST(created_at AS DATE) FROM visits) dates;
Definitely less verbose! But do they perform the same? Let’s EXPLAIN ANALYZE and compare.
Query 1:
EXPLAIN ANALYZE WITH user_visits_table AS
(
SELECT row_number() over(ORDER BY created_at DESC) AS visit_date, created_at
FROM visits WHERE visits.user_id = 1
)
SELECT DISTINCT CAST(created_at AS DATE) FROM user_visits_table;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=11.34..11.36 rows=3 width=4) (actual time=0.017..0.018 rows=0 loops=1)
-> Sort (cost=11.34..11.35 rows=3 width=4) (actual time=0.017..0.018 rows=0 loops=1)
Sort Key: ((user_visits_table.created_at)::date)
Sort Method: quicksort Memory: 25kB
-> Subquery Scan on user_visits_table (cost=4.17..11.32 rows=3 width=4) (actual time=0.014..0.014 rows=0 loops=1)
-> Bitmap Heap Scan on visits (cost=4.17..11.28 rows=3 width=16) (actual time=0.013..0.014 rows=0 loops=1)
Recheck Cond: (user_id = 1)
-> Bitmap Index Scan on index_visits_on_user_id (cost=0.00..4.17 rows=3 width=0) (actual time=0.003..0.003 rows=0 loops=1)
Index Cond: (user_id = 1)
Planning Time: 0.087 ms
Execution Time: 0.044 ms
(11 rows)
Time: 0.893 ms
Query 2:
EXPLAIN ANALYZE SELECT COUNT(*) FROM (SELECT DISTINCT CAST(created_at AS DATE) FROM visits) dates;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------
Aggregate (cost=22.80..22.81 rows=1 width=8) (actual time=0.011..0.011 rows=1 loops=1)
-> HashAggregate (cost=17.80..20.30 rows=200 width=4) (actual time=0.006..0.007 rows=0 loops=1)
Group Key: (visits.created_at)::date
-> Seq Scan on visits (cost=0.00..16.50 rows=520 width=4) (actual time=0.005..0.005 rows=0 loops=1)
Planning Time: 0.700 ms
Execution Time: 0.207 ms
(6 rows)
Time: 2.450 ms
I did not expect such a difference in performance between the two queries, which goes to show that EXPLAIN ANALYZE may be a useful addition to your workflow when writing and refactoring SQL. Keep in mind results on a toy-sized table cannot always be extrapolated to a larger, production database table, but it’s a good place to start. Please consult the official documentation for caveats and best practices. And if you prefer writing queries using Active Record, you can write explain queries in Rails as well.