---
title: 'Back to Basics: Writing SQL Queries'
teaser: |
  An in-depth overview of INSERT, SELECT, UPDATE, and DELETE statements,
  ORDER BY, INNER JOIN, LEFT OUTER JOIN, WHERE, GROUP BY, and HAVING clauses,
  and topics such as join tables and sub-queries.
tags: web,back to basics,sql
author: Britt Ballard
published_on: 2014-04-21
---

Almost all applications store data in one format or another somewhere. Us
developers spend a lot of time thinking about our data. In a most cases we end
up storing that data in a relational database.  The advent of the
[ORM](http://en.wikipedia.org/wiki/Object-relational_mapping) has made writing
raw <abbr title="Structured Query Language">SQL</abbr> much less common, but
when we do it's good to have an understanding of the fundamental elements of the
language. Today we'll go over some basic queries and the syntax required to make
them.

## Setup

For the sake of simplicity we'll use [SQLite3](https://sqlite.org/) for this
blog. It's important to note that none of the topics or commands we'll look at
are specific to SQLite3. All the query examples we'll go over below are [ISO
9705](http://en.wikipedia.org/wiki/SQL:2008) compliant and will work in any of
the major databases
([Postgresql](http://www.postgresql.org/docs/9.3/static/features.html),
[MySql](https://dev.mysql.com/doc/refman/5.0/en/compatibility.html),
[Oracle](http://docs.oracle.com/cd/B28359_01/server.111/b28286/ap_standard_sql.htm#SQLRF019),
etc).

From the command line fire up SQLite3 and create a database named `back_to_basics`:

    % sqlite3 back_to_basics

Now let's create three tables. We'll call them `players`, `teams` and `players_teams`:

    SQLite3 version 3.7.12 2012-04-03 19:43:07
    Enter ".help" for instructions
    Enter SQL statements terminated with a ";"
    sqlite> CREATE TABLE players (id INTEGER PRIMARY KEY ASC, name TEXT, seasons_played INTEGER);
    sqlite> CREATE TABLE teams (id INTEGER PRIMARY KEY ASC, name TEXT);
    sqlite> CREATE TABLE players_teams (player_id INTEGER, team_id INTEGER, won_championship BOOLEAN);
    sqlite> .tables
    players        players_teams  teams

Before we're done with the setup stage let's turn on explain mode. This will
let us see our column names in query results:

    sqlite> .explain on

## Our Data

In order to write any queries we'll need some data, which means we need some
players and teams.  We'll use baseball since players tend to move around more
in that sport.

Let's enter information on three baseball players and the first two teams they
played for in their careers (just to keep the data set small).

| Player         | First Team    | Second Team       |
| -------------- | ------------- | ----------------- |
| Nolan Ryan     | New York Mets | California Angels |
| Jim Sundberg   | Texas Rangers | Milwaukee Brewers |
| Ivan Rodriguez | Texas Rangers | Florida Marlins   |

## INSERT

To get our player data into the database we'll use the
[`INSERT`](http://en.wikipedia.org/wiki/Insert_(SQL)) statement:

    sqlite> INSERT INTO players (name, seasons_played) VALUES ('Nolan Ryan', 27);
    sqlite> INSERT INTO players (name, seasons_played) VALUES ('Jim Sundberg', 16);
    sqlite> INSERT INTO players (name, seasons_played) VALUES ('Ivan Rodriguez', 21);

## SELECT

Now that we have data in our first table let's run a query and make sure it
looks right. We'll use the
[`SELECT`](http://en.wikipedia.org/wiki/Select_(SQL)) statement to do this. For
our first query we'll just ask the database to return all rows and columns from
our `players` table. We'll use the `*` operator to do this:

    sqlite> SELECT *
       ...> FROM players;
    id    name           seasons_played
    ----  -------------  --------------
    1     Nolan Ryan     27
    2     Jim Sundberg   16
    3     Ivan Rodriguez 21

In place of the `*` operator we can also give the `SELECT` statement a list of
columns. This will limit the result set to just the columns we're interested
in:

    sqlite> SELECT name
       ...> FROM players;
    name
    ----
    Nolan Ryan
    Jim Sundberg
    Ivan Rodriguez

## ORDER BY

We can also order our results how we'd like. This is done by using the [`ORDER
BY`](http://en.wikipedia.org/wiki/Order_by) clause:

    sqlite> SELECT *
       ...> FROM players
       ...> ORDER BY seasons_played;
    id    name           seasons_played
    ----  -------------  --------------
    2     Jim Sundberg   16
    3     Ivan Rodriguez 21
    1     Nolan Ryan     27

Previously our results were always ordered by `id`. Because we used the `ORDER
BY` clause we get results ordered by the `seasons_played` column.

You can also specify if you would like to order results descending:

    sqlite> SELECT *
       ...> FROM players
       ...> ORDER BY seasons_played DESC;
    id    name           seasons_played
    ----  -------------  --------------
    1     Nolan Ryan     27
    3     Ivan Rodriguez 21
    2     Jim Sundberg   16

## A Little More Setup

Now that we know how to insert data, and query to ensure we entered it correctly,
let's add data to our `teams` table:

    sqlite> INSERT INTO teams (name) VALUES ('Texas Rangers');
    sqlite> INSERT INTO teams (name) VALUES ('Florida Marlins');
    sqlite> INSERT INTO teams (name) VALUES ('New York Mets');
    sqlite> INSERT INTO teams (name) VALUES ('California Angels');
    sqlite> INSERT INTO teams (name) VALUES ('Milwaukee Brewers');
    sqlite> INSERT INTO teams (name) VALUES ('New York Yankees');

## Junction Table

Now we need to connect our players and teams. We'll do this in our
`players_teams` table. This is what is called a [junction
table](http://en.wikipedia.org/wiki/Junction_table). Junction tables are used
to create many to many relationships in relational databases. They achieve this
by combining common data from multiple tables. In our case we're going to include
the `id` columns from our `teams` and `players` tables thus allowing us to
relate rows from one to the other.

We'll start with Nolan. His first two teams were the New York
Mets and the California Angels. First we need to get our `ids`:

    sqlite> SELECT *
       ...> FROM players;
    id    name           seasons_played
    ----  -------------  --------------
    1     Nolan Ryan     27
    2     Jim Sundberg   16
    3     Ivan Rodriguez 21

Nolan's `id` is 1. Let's find out what the Mets and Angles ids are:

    sqlite> SELECT *
       ...> FROM teams;
    id    name
    ----  -------------
    1     Texas Rangers
    2     Florida Marlins
    3     New York Mets
    4     California Angels
    5     Milwaukee Brewers
    6     New York Yankees

The Mets have an `id` of 3 and the Angels have an `id` of 4. Nolan won a World
Series with the Mets. We now have enough information to write our `INSERT`
statements:

    sqlite> INSERT INTO players_teams (player_id, team_id, won_championship) VALUES (1, 4, 0);
    sqlite> INSERT INTO players_teams (player_id, team_id, won_championship) VALUES (1, 3, 1);

In the `INSERT` statements above we associate the player with `id` 1 (Nolan) to
the teams with `id` 3 (Mets) and 4 (Angels) and provide a 0, which represents
`false`, for the `won_championship` column for the Angels and a 1, for `true`, for
the `won_championship` column for the Mets. As you can see we're able to create
relationships between tables by using the `ids` our database is generating for
each of our rows. This is one of the corner stones of the relational database
and is called a [primary key](http://en.wikipedia.org/wiki/Primary_key).

Now we just need to finish up building our `players_teams` table.

Jim played for the Rangers and Brewers and didn't win a championship with either:

    sqlite> INSERT INTO players_teams (player_id, team_id, won_championship) VALUES (2, 1, 0);
    sqlite> INSERT INTO players_teams (player_id, team_id, won_championship) VALUES (2, 5, 0);

Ivan played for the Rangers and the Marlins and won a championship with the Marlins:

    sqlite> INSERT INTO players_teams (player_id, team_id, won_championship) VALUES (3, 1, 0);
    sqlite> INSERT INTO players_teams (player_id, team_id, won_championship) VALUES (3, 2, 1);

## INNER JOIN

Now that we have some data to play with let's write some multi-table queries.

In order to this we'll have to use joins. The most common join we have in our
tool box is called an [`INNER JOIN`][inner-join]. An `INNER JOIN` allows us to
combine two tables based on a condition we provide. The only rows from both
tables that remain after the join are the rows that satisfy the condition.

[inner-join]: http://en.wikipedia.org/wiki/Join_(SQL)#Inner_join

Let's join our `players` table to our `players_teams` and take a look at the
results on the conditions the `players` table `id` matches the `players_teams`
`player_id` column:

    sqlite> SELECT *
       ...> FROM players
       ...> INNER JOIN players_teams ON players.id = players_teams.player_id;
    id    name           seasons_played  player_id  team_id  won_championship
    ----  -------------  --------------  ---------  -------  -------------
    1     Nolan Ryan      27             1          4        0
    1     Nolan Ryan      27             1          3        1
    2     Jim Sundberg    16             2          1        0
    2     Jim Sundberg    16             2          5        0
    3     Ivan Rodriguez  21             3          1        0
    3     Ivan Rodriguez  21             3          2        1

The condition we discussed above is what comes after the `ON` keyword.

Notice that the rows in the `players` table have all been doubled.  This is
because the `INNER JOIN` preserves the number of rows from both tables, so since
there are two rows in the `players_teams` table for every one row in the
`players` table (because we're looking at each player's first two teams) we see
the rows from the `players` table twice.

Earlier when we inserted data into our `teams` table, we added a row for the New
York Yankees, but we actually don't have a player who spent one of their first
two seasons with the Yankees.

Let's see what happens if we `INNER JOIN` the `teams` table
and `players_teams` table:

    sqlite> SELECT *
       ...> FROM teams
       ...> INNER JOIN players_teams ON teams.id = players_teams.team_id;
    id    name               player_id  team_id  won_championship
    ----  -------------      ---------  -------  ----------------
    4     California Angels  1          4        0
    3     New York Mets      1          3        1
    1     Texas Rangers      2          1        0
    5     Milwaukee Brewers  2          5        0
    1     Texas Rangers      3          1        0
    2     Florida Marlins    3          2        1

Because we used an `INNER JOIN` the only rows present are the rows that satisfy
our condition and in this case the New York Yankees do not, so that row isn't
present in our result set.

We're not limited to joining just two tables either. Let's inner join all three
of our tables together and see what that looks like:

    sqlite> SELECT *
        ...> FROM players
        ...> INNER JOIN players_teams ON players.id = players_teams.player_id
        ...> INNER JOIN teams ON players_teams.team_id = teams.id;
    id    name           seasons_played  player_id  team_id  won_championship  id  name
    ----  -------------  --------------  ---------  -------  ----------------  --  -------------
    1     Nolan Ryan      27             1          4        0                 4   California Angels
    1     Nolan Ryan      27             1          3        1                 3   New York Mets
    2     Jim Sundberg    16             2          1        0                 1   Texas Rangers
    2     Jim Sundberg    16             2          5        0                 5   Milwaukee Brewers
    3     Ivan Rodriguez  21             3          1        0                 1   Texas Rangers
    3     Ivan Rodriguez  21             3          2        1                 2   Florida Marlins

We can also take advantage of the `SELECT` statement we learned about above to
create more readable results sets. Let's look at a list of players and the
teams they played on only:

    sqlite> SELECT players.name, teams.name
       ...> FROM players
       ...> INNER JOIN players_teams ON players.id = players_teams.player_id
       ...> INNER JOIN teams ON players_teams.team_id = teams.id;
    name            name
    ----            -------------
    Nolan Ryan      California Angels
    Nolan Ryan      New York Mets
    Jim Sundberg    Texas Rangers
    Jim Sundberg    Milwaukee Brewers
    Ivan Rodriguez  Texas Rangers
    Ivan Rodriguez  Florida Marlins

## LEFT OUTER JOIN

A [`LEFT OUTER JOIN`](http://en.wikipedia.org/wiki/Join_(SQL)#Left_outer_join)
is very similar to an `INNER JOIN` with one big exception. If a row in the table
being joined on does not match the condition specified in the join, the row
still remains in the result set.

Let's look at the query from above where we joined `players_teams` to `teams`
again. As we recall, when we used an `INNER JOIN` on the two tables the Yankees
were omitted form the result set. Let's see what happens when we `LEFT OUTER`
join the two tables:

    sqlite> SELECT *
       ...> FROM teams
       ...> LEFT OUTER JOIN players_teams ON teams.id = players_teams.team_id;
    id    name               player_id  team_id  won_championship
    ----  -------------      ---------  -------  ----------------
    1     Texas Rangers      2          1        0
    1     Texas Rangers      3          1        0
    2     Florida Marlins    3          2        1
    3     New York Mets      1          3        1
    4     California Angels  1          4        0
    5     Milwaukee Brewers  2          5        0
    6     New York Yankees

The Yankees show up in our result set, but with no values in the
columns associated with the `players_teams` table.

Like `INNER JOIN` we can also specify columns in the `SELECT`, we can `LEFT OUTER
JOIN` multiple tables and we can also mix `INNER JOIN` and `LEFT OUTER JOIN` in
the same query.

## WHERE

The [`WHERE`](http://en.wikipedia.org/wiki/Where_(SQL)) clause gives us the
ability to specify a condition that will be applied to every row in our final
result set.  If the condition is not met the row will not remain part of the
result set.

Let's take a look at a list of all the players in our database that spent over
20 years in the league:

    sqlite> SELECT *
       ...> FROM players
       ...> WHERE seasons_played > 20;
    id    name           seasons_played
    ----  -------------  --------------
    1     Nolan Ryan     27
    3     Ivan Rodriguez 21

Our result set only includes the two players with over 20 years.

There are lots of different operators we can use in our `WHERE` clauses. Above we
used the greater than operator. Let's take a look at a few more of our options.

We can look for rows that meet equality conditions:

    sqlite> SELECT *
       ...> FROM players
       ...> WHERE seasons_played = 16;
    id    name           seasons_played
    ----  -------------  --------------
    2     Jim Sundberg   16

We can look for rows that contain values between two values:

    sqlite> SELECT *
       ...> FROM players
       ...> WHERE seasons_played BETWEEN 20 and 22;
    id    name           seasons_played
    ----  -------------  --------------
    3     Ivan Rodriguez 21

We can look for rows that contain a value in a set we provide:

    sqlite> SELECT *
       ...> FROM players
       ...> WHERE seasons_played IN (16, 27);
    id    name           seasons_played
    ----  -------------  --------------
    1     Nolan Ryan     27
    2     Jim Sundberg   16

We can also use the `WHERE` clause on result sets that have been created by
joining multiple tables. Let's look at a list of players that have won a
championship:

    sqlite> SELECT *
       ...> FROM players
       ...> INNER JOIN players_teams ON players.id = players_teams.player_id
       ...> WHERE won_championship = 1;
    id    name           seasons_played  player_id  team_id  won_championship
    ----  -------------  --------------  ---------  -------  ----------------
    1     Nolan Ryan      27             1          3        1
    3     Ivan Rodriguez  21             3          2        1

We can also use the `WHERE` clause on two columns from two different tables by
concatenating with an `AND` or `OR`:

    sqlite> SELECT *
       ...> FROM players
       ...> INNER JOIN players_teams ON players.id = players_teams.player_id
       ...> WHERE players_teams.won_championship = 1
       ...> AND players.seasons_playerd > 21;
    id    name           seasons_played  player_id  team_id  won_championship
    ----  -------------  --------------  ---------  -------  ----------------
    1     Nolan Ryan      27             1          3        1

## DELETE

Deleting from a table is done using the `DELETE` command. Above we made a
mistake and inserted the New York Yankees into our `teams` table. None of the
players we have in our database played for that team, so, like I wish we could
in real life, we need to delete the Yankees.

First let's get the Yankees' id:

    sqlite> SELECT *
       ...> FROM teams;
    id    name
    ----  -------------
    1     Texas Rangers
    2     Florida Marlins
    3     New York Mets
    4     California Angels
    5     Milwaukee Brewers
    6     New York Yankees

We'll use their id, 6, to safely delete them from the database:

    sqlite> DELETE FROM teams WHERE id = 6;
    sqlite> SELECT *
       ...> FROM teams;
    id    name
    ----  -------------
    1     Texas Rangers
    2     Florida Marlins
    3     New York Mets
    4     California Angels
    5     Milwaukee Brewers

[I like that](https://thoughtbot.com/blog/opening-an-austin-office) result set
much better[!](http://texas.rangers.mlb.com/index.jsp?c_id=tex)

## GROUP BY

We can also group results together and aggregate values.  We'll list our teams
and aggregate the total number of players in our database that spent one of
their first two seasons playing there. This query will require us to `GROUP BY`
team name and aggregate the number of players using the `COUNT` function. We'll
count the number of players on a team by counting the number of `players.name`
values in each group:

    sqlite> SELECT teams.name, COUNT(players.name)
       ...> FROM teams
       ...> INNER JOIN players_teams ON teams.id = players_teams.team_id
       ...> INNER JOIN players ON players.id = players_teams.player_id
       ...> GROUP BY teams.name;
    name               COUNT(players.name)
    ----               -----------------
    California Angels  1
    Florida Marlins    1
    Milwaukee Brewers  1
    New York Mets      1
    Texas Rangers      2

Let's look at what the result set would look like without grouping (we'll order
by team name to make the results more obvious):

    sqlite> SELECT teams.name, players.name
       ...> FROM teams
       ...> INNER JOIN players_teams ON teams.id = players_teams.team_id
       ...> INNER JOIN players ON players.id = players_teams.player_id
       ...> ORDER BY teams.name;
    name               players.name
    ----               -----------------
    California Angels  Nolan Ryan
    Florida Marlins    Ivan Rodriguez
    Milwaukee Brewers  Jim Sundberg
    New York Mets      Nolan Ryan
    Texas Rangers      Jim Sundberg
    Texas Rangers      Ivan Rodriguez

The Texas Rangers show up twice in the non-grouped result set.

One thing to note about the `GROUP BY` clause is anything we leave in the
`SELECT` statement must either be aggregated, what we're doing with the `COUNT`
function, or in the `GROUP BY` clause, in this case `teams.name`.

There are many other types of aggregate functions we can use.Here is a list of
[aggregate functions](http://www.postgresql.org/docs/9.3/static/functions-aggregate.html)
available in Postgres.

## HAVING

The `HAVING` clause works like a `WHERE` clause, but on grouped results sets.
If we go back to our list of teams and player counts from above we can use a
`HAVING` to limit the result set to only teams that have more than one player
from our list on them in their first two seasons, or a `COUNT(player.id)` of
greater than one:

    sqlite> SELECT teams.name, COUNT(players.id)
       ...> FROM teams
       ...> INNER JOIN players_teams ON teams.id = players_teams.team_id
       ...> INNER JOIN players ON players.id = players_teams.player_id
       ...> GROUP BY teams.name
       ...> HAVING COUNT(players.id) > 1;
    name           COUNT(players.id)
    ----           -----------------
    Texas Rangers  2

Our result set has now been limited to only the teams we're interested in.

## Sub Queries

We can also embed queries in our queries to create more useful result sets.
Let's use a sub query to get a list of players who have won a championship:

    sqlite> SELECT *
       ...> FROM players
       ...> INNER JOIN (
       ...>   SELECT player_id, MAX(won_championship)
       ...>   FROM players_teams
       ...>   GROUP BY player_id) sub_query_players_teams ON players.id = sub_query_players_teams.player_id;
    id    name           seasons_played  player_id  MAX(won_championship)
    ----  -------------  --------------  ---------  ---------------------
    1     Nolan Ryan      27             1          1
    2     Jim Sundberg    16             2          0
    3     Ivan Rodriguez  21             3          1

In this case we're just doing an `INNER JOIN`, but instead of joining to an
existing table, we're creating a new result set and joining to that. Our
subquery consists of only a list of player ids and the max of all of their
`won_championship` columns. In the case they have the max will be one otherwise
it will be zero. Taking advantage of the sub query we're able to get a nice list
of players and whether they won a championship (the last column in our result
set). If we tried to do this without a subquery we would have to include all the
information from our `players_teams` table, which as we recall from our inital
`INNER JOIN` would lead to the players in the result set being doubled.  In
addition we couldn't say definitively if a player had won or not. We would be
forced to look at several different rows to deduce the information.

## What's next

If you found this useful, you might also enjoy:

* [Aggregate Functions](http://en.wikipedia.org/wiki/Aggregate_function)
