Back to Basics: Writing SQL Queries

Almost all applications store data in one format or another somewhere. Us developers spend a lot of time thinking about our data. In a most cases we end up storing that data in a relational database. The advent of the ORM has made writing raw SQL much less common, but when we do it’s good to have an understanding of the fundamental elements of the language. Today we’ll go over some basic queries and the syntax required to make them.

Setup

For the sake of simplicity we’ll use SQLite3 for this blog. It’s important to note that none of the topics or commands we’ll look at are specific to SQLite3. All the query examples we’ll go over below are ISO 9705 compliant and will work in any of the major databases (Postgresql, MySql, Oracle, etc).

From the command line fire up SQLite3 and create a database named back_to_basics:

% sqlite3 back_to_basics

Now let’s create three tables. We’ll call them players, teams and players_teams:

SQLite3 version 3.7.12 2012-04-03 19:43:07
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> CREATE TABLE players (id INTEGER PRIMARY KEY ASC, name TEXT, seasons_played INTEGER);
sqlite> CREATE TABLE teams (id INTEGER PRIMARY KEY ASC, name TEXT);
sqlite> CREATE TABLE players_teams (player_id INTEGER, team_id INTEGER, won_championship BOOLEAN);
sqlite> .tables
players        players_teams  teams

Before we’re done with the setup stage let’s turn on explain mode. This will let us see our column names in query results:

sqlite> .explain on

Our Data

In order to write any queries we’ll need some data, which means we need some players and teams. We’ll use baseball since players tend to move around more in that sport.

Let’s enter information on three baseball players and the first two teams they played for in their careers (just to keep the data set small).

Player	First Team	Second Team
Nolan Ryan	New York Mets	California Angels
Jim Sundberg	Texas Rangers	Milwaukee Brewers
Ivan Rodriguez	Texas Rangers	Florida Marlins

INSERT

To get our player data into the database we’ll use the INSERT statement:

sqlite> INSERT INTO players (name, seasons_played) VALUES ('Nolan Ryan', 27);
sqlite> INSERT INTO players (name, seasons_played) VALUES ('Jim Sundberg', 16);
sqlite> INSERT INTO players (name, seasons_played) VALUES ('Ivan Rodriguez', 21);

SELECT

Now that we have data in our first table let’s run a query and make sure it looks right. We’ll use the SELECT statement to do this. For our first query we’ll just ask the database to return all rows and columns from our players table. We’ll use the * operator to do this:

sqlite> SELECT *
   ...> FROM players;
id    name           seasons_played
----  -------------  --------------
1     Nolan Ryan     27
2     Jim Sundberg   16
3     Ivan Rodriguez 21

In place of the * operator we can also give the SELECT statement a list of columns. This will limit the result set to just the columns we’re interested in:

sqlite> SELECT name
   ...> FROM players;
name
----
Nolan Ryan
Jim Sundberg
Ivan Rodriguez

ORDER BY

We can also order our results how we’d like. This is done by using the ORDER BY clause:

sqlite> SELECT *
   ...> FROM players
   ...> ORDER BY seasons_played;
id    name           seasons_played
----  -------------  --------------
2     Jim Sundberg   16
3     Ivan Rodriguez 21
1     Nolan Ryan     27

Previously our results were always ordered by id. Because we used the ORDER BY clause we get results ordered by the seasons_played column.

You can also specify if you would like to order results descending:

sqlite> SELECT *
   ...> FROM players
   ...> ORDER BY seasons_played DESC;
id    name           seasons_played
----  -------------  --------------
1     Nolan Ryan     27
3     Ivan Rodriguez 21
2     Jim Sundberg   16

A Little More Setup

Now that we know how to insert data, and query to ensure we entered it correctly, let’s add data to our teams table:

sqlite> INSERT INTO teams (name) VALUES ('Texas Rangers');
sqlite> INSERT INTO teams (name) VALUES ('Florida Marlins');
sqlite> INSERT INTO teams (name) VALUES ('New York Mets');
sqlite> INSERT INTO teams (name) VALUES ('California Angels');
sqlite> INSERT INTO teams (name) VALUES ('Milwaukee Brewers');
sqlite> INSERT INTO teams (name) VALUES ('New York Yankees');

Junction Table

Now we need to connect our players and teams. We’ll do this in our players_teams table. This is what is called a junction table. Junction tables are used to create many to many relationships in relational databases. They achieve this by combining common data from multiple tables. In our case we’re going to include the id columns from our teams and players tables thus allowing us to relate rows from one to the other.

We’ll start with Nolan. His first two teams were the New York Mets and the California Angels. First we need to get our ids:

sqlite> SELECT *
   ...> FROM players;
id    name           seasons_played
----  -------------  --------------
1     Nolan Ryan     27
2     Jim Sundberg   16
3     Ivan Rodriguez 21

Nolan’s id is 1. Let’s find out what the Mets and Angles ids are:

sqlite> SELECT *
   ...> FROM teams;
id    name
----  -------------
1     Texas Rangers
2     Florida Marlins
3     New York Mets
4     California Angels
5     Milwaukee Brewers
6     New York Yankees

The Mets have an id of 3 and the Angels have an id of 4. Nolan won a World Series with the Mets. We now have enough information to write our INSERT statements:

sqlite> INSERT INTO players_teams (player_id, team_id, won_championship) VALUES (1, 4, 0);
sqlite> INSERT INTO players_teams (player_id, team_id, won_championship) VALUES (1, 3, 1);

In the INSERT statements above we associate the player with id 1 (Nolan) to the teams with id 3 (Mets) and 4 (Angels) and provide a 0, which represents false, for the won_championship column for the Angels and a 1, for true, for the won_championship column for the Mets. As you can see we’re able to create relationships between tables by using the ids our database is generating for each of our rows. This is one of the corner stones of the relational database and is called a primary key.

Now we just need to finish up building our players_teams table.

Jim played for the Rangers and Brewers and didn’t win a championship with either:

sqlite> INSERT INTO players_teams (player_id, team_id, won_championship) VALUES (2, 1, 0);
sqlite> INSERT INTO players_teams (player_id, team_id, won_championship) VALUES (2, 5, 0);

Ivan played for the Rangers and the Marlins and won a championship with the Marlins:

sqlite> INSERT INTO players_teams (player_id, team_id, won_championship) VALUES (3, 1, 0);
sqlite> INSERT INTO players_teams (player_id, team_id, won_championship) VALUES (3, 2, 1);

INNER JOIN

Now that we have some data to play with let’s write some multi-table queries.

In order to this we’ll have to use joins. The most common join we have in our tool box is called an INNER JOIN. An INNER JOIN allows us to combine two tables based on a condition we provide. The only rows from both tables that remain after the join are the rows that satisfy the condition.

Let’s join our players table to our players_teams and take a look at the results on the conditions the players table id matches the players_teams player_id column:

sqlite> SELECT *
   ...> FROM players
   ...> INNER JOIN players_teams ON players.id = players_teams.player_id;
id    name           seasons_played  player_id  team_id  won_championship
----  -------------  --------------  ---------  -------  -------------
1     Nolan Ryan      27             1          4        0
1     Nolan Ryan      27             1          3        1
2     Jim Sundberg    16             2          1        0
2     Jim Sundberg    16             2          5        0
3     Ivan Rodriguez  21             3          1        0
3     Ivan Rodriguez  21             3          2        1

The condition we discussed above is what comes after the ON keyword.

Notice that the rows in the players table have all been doubled. This is because the INNER JOIN preserves the number of rows from both tables, so since there are two rows in the players_teams table for every one row in the players table (because we’re looking at each player’s first two teams) we see the rows from the players table twice.

Earlier when we inserted data into our teams table, we added a row for the New York Yankees, but we actually don’t have a player who spent one of their first two seasons with the Yankees.

Let’s see what happens if we INNER JOIN the teams table and players_teams table:

sqlite> SELECT *
   ...> FROM teams
   ...> INNER JOIN players_teams ON teams.id = players_teams.team_id;
id    name               player_id  team_id  won_championship
----  -------------      ---------  -------  ----------------
4     California Angels  1          4        0
3     New York Mets      1          3        1
1     Texas Rangers      2          1        0
5     Milwaukee Brewers  2          5        0
1     Texas Rangers      3          1        0
2     Florida Marlins    3          2        1

Because we used an INNER JOIN the only rows present are the rows that satisfy our condition and in this case the New York Yankees do not, so that row isn’t present in our result set.

We’re not limited to joining just two tables either. Let’s inner join all three of our tables together and see what that looks like:

sqlite> SELECT *
    ...> FROM players
    ...> INNER JOIN players_teams ON players.id = players_teams.player_id
    ...> INNER JOIN teams ON players_teams.team_id = teams.id;
id    name           seasons_played  player_id  team_id  won_championship  id  name
----  -------------  --------------  ---------  -------  ----------------  --  -------------
1     Nolan Ryan      27             1          4        0                 4   California Angels
1     Nolan Ryan      27             1          3        1                 3   New York Mets
2     Jim Sundberg    16             2          1        0                 1   Texas Rangers
2     Jim Sundberg    16             2          5        0                 5   Milwaukee Brewers
3     Ivan Rodriguez  21             3          1        0                 1   Texas Rangers
3     Ivan Rodriguez  21             3          2        1                 2   Florida Marlins

We can also take advantage of the SELECT statement we learned about above to create more readable results sets. Let’s look at a list of players and the teams they played on only:

sqlite> SELECT players.name, teams.name
   ...> FROM players
   ...> INNER JOIN players_teams ON players.id = players_teams.player_id
   ...> INNER JOIN teams ON players_teams.team_id = teams.id;
name            name
----            -------------
Nolan Ryan      California Angels
Nolan Ryan      New York Mets
Jim Sundberg    Texas Rangers
Jim Sundberg    Milwaukee Brewers
Ivan Rodriguez  Texas Rangers
Ivan Rodriguez  Florida Marlins

LEFT OUTER JOIN

A LEFT OUTER JOIN is very similar to an INNER JOIN with one big exception. If a row in the table being joined on does not match the condition specified in the join, the row still remains in the result set.

Let’s look at the query from above where we joined players_teams to teams again. As we recall, when we used an INNER JOIN on the two tables the Yankees were omitted form the result set. Let’s see what happens when we LEFT OUTER join the two tables:

sqlite> SELECT *
   ...> FROM teams
   ...> LEFT OUTER JOIN players_teams ON teams.id = players_teams.team_id;
id    name               player_id  team_id  won_championship
----  -------------      ---------  -------  ----------------
1     Texas Rangers      2          1        0
1     Texas Rangers      3          1        0
2     Florida Marlins    3          2        1
3     New York Mets      1          3        1
4     California Angels  1          4        0
5     Milwaukee Brewers  2          5        0
6     New York Yankees

The Yankees show up in our result set, but with no values in the columns associated with the players_teams table.

Like INNER JOIN we can also specify columns in the SELECT, we can LEFT OUTER JOIN multiple tables and we can also mix INNER JOIN and LEFT OUTER JOIN in the same query.

WHERE

The WHERE clause gives us the ability to specify a condition that will be applied to every row in our final result set. If the condition is not met the row will not remain part of the result set.

Let’s take a look at a list of all the players in our database that spent over 20 years in the league:

sqlite> SELECT *
   ...> FROM players
   ...> WHERE seasons_played > 20;
id    name           seasons_played
----  -------------  --------------
1     Nolan Ryan     27
3     Ivan Rodriguez 21

Our result set only includes the two players with over 20 years.

There are lots of different operators we can use in our WHERE clauses. Above we used the greater than operator. Let’s take a look at a few more of our options.

We can look for rows that meet equality conditions:

sqlite> SELECT *
   ...> FROM players
   ...> WHERE seasons_played = 16;
id    name           seasons_played
----  -------------  --------------
2     Jim Sundberg   16

We can look for rows that contain values between two values:

sqlite> SELECT *
   ...> FROM players
   ...> WHERE seasons_played BETWEEN 20 and 22;
id    name           seasons_played
----  -------------  --------------
3     Ivan Rodriguez 21

We can look for rows that contain a value in a set we provide:

sqlite> SELECT *
   ...> FROM players
   ...> WHERE seasons_played IN (16, 27);
id    name           seasons_played
----  -------------  --------------
1     Nolan Ryan     27
2     Jim Sundberg   16

We can also use the WHERE clause on result sets that have been created by joining multiple tables. Let’s look at a list of players that have won a championship:

sqlite> SELECT *
   ...> FROM players
   ...> INNER JOIN players_teams ON players.id = players_teams.player_id
   ...> WHERE won_championship = 1;
id    name           seasons_played  player_id  team_id  won_championship
----  -------------  --------------  ---------  -------  ----------------
1     Nolan Ryan      27             1          3        1
3     Ivan Rodriguez  21             3          2        1

We can also use the WHERE clause on two columns from two different tables by concatenating with an AND or OR:

sqlite> SELECT *
   ...> FROM players
   ...> INNER JOIN players_teams ON players.id = players_teams.player_id
   ...> WHERE players_teams.won_championship = 1
   ...> AND players.seasons_playerd > 21;
id    name           seasons_played  player_id  team_id  won_championship
----  -------------  --------------  ---------  -------  ----------------
1     Nolan Ryan      27             1          3        1

DELETE

Deleting from a table is done using the DELETE command. Above we made a mistake and inserted the New York Yankees into our teams table. None of the players we have in our database played for that team, so, like I wish we could in real life, we need to delete the Yankees.

First let’s get the Yankees’ id:

sqlite> SELECT *
   ...> FROM teams;
id    name
----  -------------
1     Texas Rangers
2     Florida Marlins
3     New York Mets
4     California Angels
5     Milwaukee Brewers
6     New York Yankees

We’ll use their id, 6, to safely delete them from the database:

sqlite> DELETE FROM teams WHERE id = 6;
sqlite> SELECT *
   ...> FROM teams;
id    name
----  -------------
1     Texas Rangers
2     Florida Marlins
3     New York Mets
4     California Angels
5     Milwaukee Brewers

I like that result set much better!

GROUP BY

We can also group results together and aggregate values. We’ll list our teams and aggregate the total number of players in our database that spent one of their first two seasons playing there. This query will require us to GROUP BY team name and aggregate the number of players using the COUNT function. We’ll count the number of players on a team by counting the number of players.name values in each group:

sqlite> SELECT teams.name, COUNT(players.name)
   ...> FROM teams
   ...> INNER JOIN players_teams ON teams.id = players_teams.team_id
   ...> INNER JOIN players ON players.id = players_teams.player_id
   ...> GROUP BY teams.name;
name               COUNT(players.name)
----               -----------------
California Angels  1
Florida Marlins    1
Milwaukee Brewers  1
New York Mets      1
Texas Rangers      2

Let’s look at what the result set would look like without grouping (we’ll order by team name to make the results more obvious):

sqlite> SELECT teams.name, players.name
   ...> FROM teams
   ...> INNER JOIN players_teams ON teams.id = players_teams.team_id
   ...> INNER JOIN players ON players.id = players_teams.player_id
   ...> ORDER BY teams.name;
name               players.name
----               -----------------
California Angels  Nolan Ryan
Florida Marlins    Ivan Rodriguez
Milwaukee Brewers  Jim Sundberg
New York Mets      Nolan Ryan
Texas Rangers      Jim Sundberg
Texas Rangers      Ivan Rodriguez

The Texas Rangers show up twice in the non-grouped result set.

One thing to note about the GROUP BY clause is anything we leave in the SELECT statement must either be aggregated, what we’re doing with the COUNT function, or in the GROUP BY clause, in this case teams.name.

There are many other types of aggregate functions we can use.Here is a list of aggregate functions available in Postgres.

HAVING

The HAVING clause works like a WHERE clause, but on grouped results sets. If we go back to our list of teams and player counts from above we can use a HAVING to limit the result set to only teams that have more than one player from our list on them in their first two seasons, or a COUNT(player.id) of greater than one:

sqlite> SELECT teams.name, COUNT(players.id)
   ...> FROM teams
   ...> INNER JOIN players_teams ON teams.id = players_teams.team_id
   ...> INNER JOIN players ON players.id = players_teams.player_id
   ...> GROUP BY teams.name
   ...> HAVING COUNT(players.id) > 1;
name           COUNT(players.id)
----           -----------------
Texas Rangers  2

Our result set has now been limited to only the teams we’re interested in.

Sub Queries

We can also embed queries in our queries to create more useful result sets. Let’s use a sub query to get a list of players who have won a championship:

sqlite> SELECT *
   ...> FROM players
   ...> INNER JOIN (
   ...>   SELECT player_id, MAX(won_championship)
   ...>   FROM players_teams
   ...>   GROUP BY player_id) sub_query_players_teams ON players.id = sub_query_players_teams.player_id;
id    name           seasons_played  player_id  MAX(won_championship)
----  -------------  --------------  ---------  ---------------------
1     Nolan Ryan      27             1          1
2     Jim Sundberg    16             2          0
3     Ivan Rodriguez  21             3          1

In this case we’re just doing an INNER JOIN, but instead of joining to an existing table, we’re creating a new result set and joining to that. Our subquery consists of only a list of player ids and the max of all of their won_championship columns. In the case they have the max will be one otherwise it will be zero. Taking advantage of the sub query we’re able to get a nice list of players and whether they won a championship (the last column in our result set). If we tried to do this without a subquery we would have to include all the information from our players_teams table, which as we recall from our inital INNER JOIN would lead to the players in the result set being doubled. In addition we couldn’t say definitively if a player had won or not. We would be forced to look at several different rows to deduce the information.

What’s next

If you found this useful, you might also enjoy:

Aggregate Functions

Sign up to receive a weekly recap from thoughtbot

Upgrade your codebase with a Code Audit