As web developers, we deal with SQL often. Perhaps it’s mostly abstracted away from us using an ORM like ActiveRecord. But sometimes we find ourselves trying to get more complex data and reach for SQL. We hit an error and scratch our heads. Without a proper mental model of SQL, it’s very difficult to figure out what to do. Its syntax doesn’t share as much with languages like Ruby or JavaScript as we might like. In this three part series, I hope to give you strong mental models for three aspects of SQL: WHERE and SELECT, JOIN, and GROUP BY.
Test Data
PostgreSQL provides some sample datasets that we can use. For this post, I’m using the world dataset. If you have a postgres server running, here’s how you can download, unpack, create, and import this dataset:
$ curl -O https://ftp.postgresql.org/pub/projects/pgFoundry/dbsamples/world/world-1.0/world-1.0.tar.gz
$ tar -zxvf world-1.0.tar.gz
$ createdb world
$ psql world < dbsamples-0.1/world/world.sql
Then, we can connect to our new database:
psql world
The dataset has three tables, city, country, and countrylanguage. city and country are lists of places around the world and countrylanguage lists the languages spoken in a given country. You can take a look around at what kind of data we have using \d or \d [table name].
WHERE is for Rows
When querying, we can use the WHERE clause to filter the rows we get back. To use WHERE we specify a condition like “population less than 1000”:
SELECT *
FROM city
WHERE population < 1000;
I visualize the result of this query by highlighting every row where the value is less than 1000:
SELECT is for Columns
Conversely, if we want to get back only certain pieces of data (i.e. columns) of each row, we use SELECT:
SELECT name, countrycode
FROM city;
Similarly, I like to visualize this by highlighting just the columns of the city table:
Both at the Same Time
And the real power of SQL is that we can combine both together:
SELECT name, countrycode
FROM city
WHERE population < 1000;
Interestingly, this selects two columns (name, countrycode), and uses a third column (population) to filter the data. This works because WHERE is run before SELECT.
Parting Thoughts
Maybe you already knew this but needed a bit of practice. Play around with the results a bit and try some other queries. You can do multiple WHERE queries by using AND like: WHERE population < 1000 AND countrycode = 'AIA'
In part 2, we’ll talk about combining two tables together with JOIN, and part 3 will cover GROUP BY.