Almost all applications store data in one format or another somewhere. Us developers spend a lot of time thinking about our data. In a most cases we end up storing that data in a relational database. The advent of the ORM has made writing raw SQL much less common, but when we do it’s good to have an understanding of the fundamental elements of the language. Today we’ll go over some basic queries and the syntax required to make them.
Setup
For the sake of simplicity we’ll use SQLite3 for this blog. It’s important to note that none of the topics or commands we’ll look at are specific to SQLite3. All the query examples we’ll go over below are ISO 9705 compliant and will work in any of the major databases (Postgresql, MySql, Oracle, etc).
From the command line fire up SQLite3 and create a database named back_to_basics
:
% sqlite3 back_to_basics
Now let’s create three tables. We’ll call them players
, teams
and players_teams
:
SQLite3 version 3.7.12 2012-04-03 19:43:07
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> CREATE TABLE players (id INTEGER PRIMARY KEY ASC, name TEXT, seasons_played INTEGER);
sqlite> CREATE TABLE teams (id INTEGER PRIMARY KEY ASC, name TEXT);
sqlite> CREATE TABLE players_teams (player_id INTEGER, team_id INTEGER, won_championship BOOLEAN);
sqlite> .tables
players players_teams teams
Before we’re done with the setup stage let’s turn on explain mode. This will let us see our column names in query results:
sqlite> .explain on
Our Data
In order to write any queries we’ll need some data, which means we need some players and teams. We’ll use baseball since players tend to move around more in that sport.
Let’s enter information on three baseball players and the first two teams they played for in their careers (just to keep the data set small).
Player | First Team | Second Team |
---|---|---|
Nolan Ryan | New York Mets | California Angels |
Jim Sundberg | Texas Rangers | Milwaukee Brewers |
Ivan Rodriguez | Texas Rangers | Florida Marlins |
INSERT
To get our player data into the database we’ll use the
INSERT
statement:
sqlite> INSERT INTO players (name, seasons_played) VALUES ('Nolan Ryan', 27);
sqlite> INSERT INTO players (name, seasons_played) VALUES ('Jim Sundberg', 16);
sqlite> INSERT INTO players (name, seasons_played) VALUES ('Ivan Rodriguez', 21);
SELECT
Now that we have data in our first table let’s run a query and make sure it
looks right. We’ll use the
SELECT
statement to do this. For
our first query we’ll just ask the database to return all rows and columns from
our players
table. We’ll use the *
operator to do this:
sqlite> SELECT *
...> FROM players;
id name seasons_played
---- ------------- --------------
1 Nolan Ryan 27
2 Jim Sundberg 16
3 Ivan Rodriguez 21
In place of the *
operator we can also give the SELECT
statement a list of
columns. This will limit the result set to just the columns we’re interested
in:
sqlite> SELECT name
...> FROM players;
name
----
Nolan Ryan
Jim Sundberg
Ivan Rodriguez
ORDER BY
We can also order our results how we’d like. This is done by using the ORDER
BY
clause:
sqlite> SELECT *
...> FROM players
...> ORDER BY seasons_played;
id name seasons_played
---- ------------- --------------
2 Jim Sundberg 16
3 Ivan Rodriguez 21
1 Nolan Ryan 27
Previously our results were always ordered by id
. Because we used the ORDER
BY
clause we get results ordered by the seasons_played
column.
You can also specify if you would like to order results descending:
sqlite> SELECT *
...> FROM players
...> ORDER BY seasons_played DESC;
id name seasons_played
---- ------------- --------------
1 Nolan Ryan 27
3 Ivan Rodriguez 21
2 Jim Sundberg 16
A Little More Setup
Now that we know how to insert data, and query to ensure we entered it correctly,
let’s add data to our teams
table:
sqlite> INSERT INTO teams (name) VALUES ('Texas Rangers');
sqlite> INSERT INTO teams (name) VALUES ('Florida Marlins');
sqlite> INSERT INTO teams (name) VALUES ('New York Mets');
sqlite> INSERT INTO teams (name) VALUES ('California Angels');
sqlite> INSERT INTO teams (name) VALUES ('Milwaukee Brewers');
sqlite> INSERT INTO teams (name) VALUES ('New York Yankees');
Junction Table
Now we need to connect our players and teams. We’ll do this in our
players_teams
table. This is what is called a junction
table. Junction tables are used
to create many to many relationships in relational databases. They achieve this
by combining common data from multiple tables. In our case we’re going to include
the id
columns from our teams
and players
tables thus allowing us to
relate rows from one to the other.
We’ll start with Nolan. His first two teams were the New York
Mets and the California Angels. First we need to get our ids
:
sqlite> SELECT *
...> FROM players;
id name seasons_played
---- ------------- --------------
1 Nolan Ryan 27
2 Jim Sundberg 16
3 Ivan Rodriguez 21
Nolan’s id
is 1. Let’s find out what the Mets and Angles ids are:
sqlite> SELECT *
...> FROM teams;
id name
---- -------------
1 Texas Rangers
2 Florida Marlins
3 New York Mets
4 California Angels
5 Milwaukee Brewers
6 New York Yankees
The Mets have an id
of 3 and the Angels have an id
of 4. Nolan won a World
Series with the Mets. We now have enough information to write our INSERT
statements:
sqlite> INSERT INTO players_teams (player_id, team_id, won_championship) VALUES (1, 4, 0);
sqlite> INSERT INTO players_teams (player_id, team_id, won_championship) VALUES (1, 3, 1);
In the INSERT
statements above we associate the player with id
1 (Nolan) to
the teams with id
3 (Mets) and 4 (Angels) and provide a 0, which represents
false
, for the won_championship
column for the Angels and a 1, for true
, for
the won_championship
column for the Mets. As you can see we’re able to create
relationships between tables by using the ids
our database is generating for
each of our rows. This is one of the corner stones of the relational database
and is called a primary key.
Now we just need to finish up building our players_teams
table.
Jim played for the Rangers and Brewers and didn’t win a championship with either:
sqlite> INSERT INTO players_teams (player_id, team_id, won_championship) VALUES (2, 1, 0);
sqlite> INSERT INTO players_teams (player_id, team_id, won_championship) VALUES (2, 5, 0);
Ivan played for the Rangers and the Marlins and won a championship with the Marlins:
sqlite> INSERT INTO players_teams (player_id, team_id, won_championship) VALUES (3, 1, 0);
sqlite> INSERT INTO players_teams (player_id, team_id, won_championship) VALUES (3, 2, 1);
INNER JOIN
Now that we have some data to play with let’s write some multi-table queries.
In order to this we’ll have to use joins. The most common join we have in our
tool box is called an INNER JOIN
. An INNER JOIN
allows us to
combine two tables based on a condition we provide. The only rows from both
tables that remain after the join are the rows that satisfy the condition.
Let’s join our players
table to our players_teams
and take a look at the
results on the conditions the players
table id
matches the players_teams
player_id
column:
sqlite> SELECT *
...> FROM players
...> INNER JOIN players_teams ON players.id = players_teams.player_id;
id name seasons_played player_id team_id won_championship
---- ------------- -------------- --------- ------- -------------
1 Nolan Ryan 27 1 4 0
1 Nolan Ryan 27 1 3 1
2 Jim Sundberg 16 2 1 0
2 Jim Sundberg 16 2 5 0
3 Ivan Rodriguez 21 3 1 0
3 Ivan Rodriguez 21 3 2 1
The condition we discussed above is what comes after the ON
keyword.
Notice that the rows in the players
table have all been doubled. This is
because the INNER JOIN
preserves the number of rows from both tables, so since
there are two rows in the players_teams
table for every one row in the
players
table (because we’re looking at each player’s first two teams) we see
the rows from the players
table twice.
Earlier when we inserted data into our teams
table, we added a row for the New
York Yankees, but we actually don’t have a player who spent one of their first
two seasons with the Yankees.
Let’s see what happens if we INNER JOIN
the teams
table
and players_teams
table:
sqlite> SELECT *
...> FROM teams
...> INNER JOIN players_teams ON teams.id = players_teams.team_id;
id name player_id team_id won_championship
---- ------------- --------- ------- ----------------
4 California Angels 1 4 0
3 New York Mets 1 3 1
1 Texas Rangers 2 1 0
5 Milwaukee Brewers 2 5 0
1 Texas Rangers 3 1 0
2 Florida Marlins 3 2 1
Because we used an INNER JOIN
the only rows present are the rows that satisfy
our condition and in this case the New York Yankees do not, so that row isn’t
present in our result set.
We’re not limited to joining just two tables either. Let’s inner join all three of our tables together and see what that looks like:
sqlite> SELECT *
...> FROM players
...> INNER JOIN players_teams ON players.id = players_teams.player_id
...> INNER JOIN teams ON players_teams.team_id = teams.id;
id name seasons_played player_id team_id won_championship id name
---- ------------- -------------- --------- ------- ---------------- -- -------------
1 Nolan Ryan 27 1 4 0 4 California Angels
1 Nolan Ryan 27 1 3 1 3 New York Mets
2 Jim Sundberg 16 2 1 0 1 Texas Rangers
2 Jim Sundberg 16 2 5 0 5 Milwaukee Brewers
3 Ivan Rodriguez 21 3 1 0 1 Texas Rangers
3 Ivan Rodriguez 21 3 2 1 2 Florida Marlins
We can also take advantage of the SELECT
statement we learned about above to
create more readable results sets. Let’s look at a list of players and the
teams they played on only:
sqlite> SELECT players.name, teams.name
...> FROM players
...> INNER JOIN players_teams ON players.id = players_teams.player_id
...> INNER JOIN teams ON players_teams.team_id = teams.id;
name name
---- -------------
Nolan Ryan California Angels
Nolan Ryan New York Mets
Jim Sundberg Texas Rangers
Jim Sundberg Milwaukee Brewers
Ivan Rodriguez Texas Rangers
Ivan Rodriguez Florida Marlins
LEFT OUTER JOIN
A LEFT OUTER JOIN
is very similar to an INNER JOIN
with one big exception. If a row in the table
being joined on does not match the condition specified in the join, the row
still remains in the result set.
Let’s look at the query from above where we joined players_teams
to teams
again. As we recall, when we used an INNER JOIN
on the two tables the Yankees
were omitted form the result set. Let’s see what happens when we LEFT OUTER
join the two tables:
sqlite> SELECT *
...> FROM teams
...> LEFT OUTER JOIN players_teams ON teams.id = players_teams.team_id;
id name player_id team_id won_championship
---- ------------- --------- ------- ----------------
1 Texas Rangers 2 1 0
1 Texas Rangers 3 1 0
2 Florida Marlins 3 2 1
3 New York Mets 1 3 1
4 California Angels 1 4 0
5 Milwaukee Brewers 2 5 0
6 New York Yankees
The Yankees show up in our result set, but with no values in the
columns associated with the players_teams
table.
Like INNER JOIN
we can also specify columns in the SELECT
, we can LEFT OUTER
JOIN
multiple tables and we can also mix INNER JOIN
and LEFT OUTER JOIN
in
the same query.
WHERE
The WHERE
clause gives us the
ability to specify a condition that will be applied to every row in our final
result set. If the condition is not met the row will not remain part of the
result set.
Let’s take a look at a list of all the players in our database that spent over 20 years in the league:
sqlite> SELECT *
...> FROM players
...> WHERE seasons_played > 20;
id name seasons_played
---- ------------- --------------
1 Nolan Ryan 27
3 Ivan Rodriguez 21
Our result set only includes the two players with over 20 years.
There are lots of different operators we can use in our WHERE
clauses. Above we
used the greater than operator. Let’s take a look at a few more of our options.
We can look for rows that meet equality conditions:
sqlite> SELECT *
...> FROM players
...> WHERE seasons_played = 16;
id name seasons_played
---- ------------- --------------
2 Jim Sundberg 16
We can look for rows that contain values between two values:
sqlite> SELECT *
...> FROM players
...> WHERE seasons_played BETWEEN 20 and 22;
id name seasons_played
---- ------------- --------------
3 Ivan Rodriguez 21
We can look for rows that contain a value in a set we provide:
sqlite> SELECT *
...> FROM players
...> WHERE seasons_played IN (16, 27);
id name seasons_played
---- ------------- --------------
1 Nolan Ryan 27
2 Jim Sundberg 16
We can also use the WHERE
clause on result sets that have been created by
joining multiple tables. Let’s look at a list of players that have won a
championship:
sqlite> SELECT *
...> FROM players
...> INNER JOIN players_teams ON players.id = players_teams.player_id
...> WHERE won_championship = 1;
id name seasons_played player_id team_id won_championship
---- ------------- -------------- --------- ------- ----------------
1 Nolan Ryan 27 1 3 1
3 Ivan Rodriguez 21 3 2 1
We can also use the WHERE
clause on two columns from two different tables by
concatenating with an AND
or OR
:
sqlite> SELECT *
...> FROM players
...> INNER JOIN players_teams ON players.id = players_teams.player_id
...> WHERE players_teams.won_championship = 1
...> AND players.seasons_playerd > 21;
id name seasons_played player_id team_id won_championship
---- ------------- -------------- --------- ------- ----------------
1 Nolan Ryan 27 1 3 1
DELETE
Deleting from a table is done using the DELETE
command. Above we made a
mistake and inserted the New York Yankees into our teams
table. None of the
players we have in our database played for that team, so, like I wish we could
in real life, we need to delete the Yankees.
First let’s get the Yankees’ id:
sqlite> SELECT *
...> FROM teams;
id name
---- -------------
1 Texas Rangers
2 Florida Marlins
3 New York Mets
4 California Angels
5 Milwaukee Brewers
6 New York Yankees
We’ll use their id, 6, to safely delete them from the database:
sqlite> DELETE FROM teams WHERE id = 6;
sqlite> SELECT *
...> FROM teams;
id name
---- -------------
1 Texas Rangers
2 Florida Marlins
3 New York Mets
4 California Angels
5 Milwaukee Brewers
I like that result set much better!
GROUP BY
We can also group results together and aggregate values. We’ll list our teams
and aggregate the total number of players in our database that spent one of
their first two seasons playing there. This query will require us to GROUP BY
team name and aggregate the number of players using the COUNT
function. We’ll
count the number of players on a team by counting the number of players.name
values in each group:
sqlite> SELECT teams.name, COUNT(players.name)
...> FROM teams
...> INNER JOIN players_teams ON teams.id = players_teams.team_id
...> INNER JOIN players ON players.id = players_teams.player_id
...> GROUP BY teams.name;
name COUNT(players.name)
---- -----------------
California Angels 1
Florida Marlins 1
Milwaukee Brewers 1
New York Mets 1
Texas Rangers 2
Let’s look at what the result set would look like without grouping (we’ll order by team name to make the results more obvious):
sqlite> SELECT teams.name, players.name
...> FROM teams
...> INNER JOIN players_teams ON teams.id = players_teams.team_id
...> INNER JOIN players ON players.id = players_teams.player_id
...> ORDER BY teams.name;
name players.name
---- -----------------
California Angels Nolan Ryan
Florida Marlins Ivan Rodriguez
Milwaukee Brewers Jim Sundberg
New York Mets Nolan Ryan
Texas Rangers Jim Sundberg
Texas Rangers Ivan Rodriguez
The Texas Rangers show up twice in the non-grouped result set.
One thing to note about the GROUP BY
clause is anything we leave in the
SELECT
statement must either be aggregated, what we’re doing with the COUNT
function, or in the GROUP BY
clause, in this case teams.name
.
There are many other types of aggregate functions we can use.Here is a list of aggregate functions available in Postgres.
HAVING
The HAVING
clause works like a WHERE
clause, but on grouped results sets.
If we go back to our list of teams and player counts from above we can use a
HAVING
to limit the result set to only teams that have more than one player
from our list on them in their first two seasons, or a COUNT(player.id)
of
greater than one:
sqlite> SELECT teams.name, COUNT(players.id)
...> FROM teams
...> INNER JOIN players_teams ON teams.id = players_teams.team_id
...> INNER JOIN players ON players.id = players_teams.player_id
...> GROUP BY teams.name
...> HAVING COUNT(players.id) > 1;
name COUNT(players.id)
---- -----------------
Texas Rangers 2
Our result set has now been limited to only the teams we’re interested in.
Sub Queries
We can also embed queries in our queries to create more useful result sets. Let’s use a sub query to get a list of players who have won a championship:
sqlite> SELECT *
...> FROM players
...> INNER JOIN (
...> SELECT player_id, MAX(won_championship)
...> FROM players_teams
...> GROUP BY player_id) sub_query_players_teams ON players.id = sub_query_players_teams.player_id;
id name seasons_played player_id MAX(won_championship)
---- ------------- -------------- --------- ---------------------
1 Nolan Ryan 27 1 1
2 Jim Sundberg 16 2 0
3 Ivan Rodriguez 21 3 1
In this case we’re just doing an INNER JOIN
, but instead of joining to an
existing table, we’re creating a new result set and joining to that. Our
subquery consists of only a list of player ids and the max of all of their
won_championship
columns. In the case they have the max will be one otherwise
it will be zero. Taking advantage of the sub query we’re able to get a nice list
of players and whether they won a championship (the last column in our result
set). If we tried to do this without a subquery we would have to include all the
information from our players_teams
table, which as we recall from our inital
INNER JOIN
would lead to the players in the result set being doubled. In
addition we couldn’t say definitively if a player had won or not. We would be
forced to look at several different rows to deduce the information.
What’s next
If you found this useful, you might also enjoy: