Enough unix to get by

Mike Burns

Rather than go in depth on everything a unix-like operating system contains, this is instead a baseline of commands and concepts that you need to navigate your command-line shell in a professional environment.

You should know these concepts:

  • Terminals and shells.
  • Tab completion.
  • Globs.
  • Pipes.
  • Program success and failure.
  • Environment variables.
  • Searching shell history.

And when to use these commands:

  • cat
  • cd
  • cp
  • echo
  • file
  • grep
  • kill
  • less
  • ls
  • man
  • ping
  • ps
  • rm
  • ssh
  • sudo
  • tail

What follows is a brief, high level guide to these concepts and commands.

Concepts

The command line

You have a program called a terminal emulator. On macOS it’s called Terminal. It emulates a physical machine from the 1970s and 80s called a terminal. The terminal was not a computer; it was a keyboard and a screen, one of many, connected to a central computer shared by all other terminals. Your terminal app emulates that.

(The terminals of the 80s actually mimick earlier terminals, which were a keyboard and a printer. Screens were a new invention.)

The Web uses the same model as the terminals did: a request/response programming model (no AJAX for terminals). Each keypress was a request sent to the central computer – typically just the individual letter, but things like backspace or tab or newline were sent as special codes (0x08, 0x09, 0x0a, respectively, in case you’re curious). The computer would push responses that were mostly individual letters but also encoded additional things like ringing a bell (0x07) or ending transmission (0x04). Yes, it was a physical bell.

So the terminal was like the Web browser of today, so it follows that you still needed to load a program to interact with. That brings us to the shell.

The shell is a simple program that allows you to run other programs conveniently. As time went on, they got a little fancier. Many shells now have built-in keywords that look just like programs but actually direct the shell to perform its own action, such as things like if and while. You’ll be thrilled to know that cd and export are also shell built-ins, not programs. We’ll get into that later.

Tab completion

Your shell will attempt to complete what you’re typing when you press the TAB key. It primarily works on files.

To see the contents of the file browserlist:

cat br<TAB>

You should be able to tab your way through the majority of any file’s name. Between tab completion and globs (next section), you will rarely type a full file name.

Globs

One fancy shell feature is the ability to specify multiple files easily, using globs. Globs can appear anywhere in your command.

Note that globs are handled by the shell, not by the program. In the following examples, the ls program is given a set of file names; it never sees a * character.

If you want all files within a directory, use the * glob. In this example, list all the show.html.erb templates for all (non-nested) controllers:

ls app/views/*/show.html.erb

If you want all files within any set of subdirectories, ** will traverse everything. For example, list all base controllers across all namespaces:

ls app/controllers/**/base_controller.rb

To expand into multiple arbitrary strings, use the {} brace expansion. To rename a JavaScript file into TypeScript:

mv main.{js,ts}

That is the equivalent of:

mv main.js main.ts

To make a quick backup copy of a file:

cp test.log{,.bak}

That is the equivalent of:

cp test.log test.log.bak

There is more glob syntax such as ?, [...], [-], but the above is what you’ll typically need.

Pipes

By default, command-line programs interact with three streams of data: standard input (stdin), standard output (stdout), and standard error (stderr).

Stdin is typically your keyboard, and stdout and stderr are typically your terminal window – but this does not have to be the case.

First, you can connect stdout to a file using >. If you want all lines related to processing a request copied from the log into a file, for example:

grep Processing > requests.log

The > syntax will overwrite any contents in the output file. To append, use >> instead.

Likewise, you can connect stdin to a file using <.

To work with stderr, you first need to know that every file has a number, and stdin, stdout, and stderr are files. Stdin = 0, stdout = 1, and stderr = 2.

The syntax for sending stderr (2) to stdout (1) is 2>&1. To create a file with a list of every file that matches a glob, and also any permission errors:

ls /Users/*/.ssh/id_rsa 2>&1 > rsa-keys.log

You can connect the stdout from one program to the stdin to another program using the | character.

For example, to search running programs for Ruby, you can pipe ps into grep:

ps | grep ruby

Program success

Every program has an exit status that it returns to the shell. This is a number that is either zero, or non-zero (less than 256).

Zero means success. Non-zero means failure.

This is typically not visible to you, but it is available in the special $? variable:

echo $?

Shells include some syntax for combining programs based on success and failure. The syntaxes to look at are && and ||.

To delete a file so long as it contains a string:

grep shhhh-secret Gemfile && rm Gemfile.lock

To re-run a command with privileges if needed:

rm /etc/passwd || sudo rm /etc/passwd

Environment

Your operating system maintains a global dictionary called the environment. You can set and access environment variables to manipulate this.

(If you’re thinking “doesn’t a global dictionary interact poorly with threads?”, let me tell you: it also interacts poorly with multiple processes. It’s been a disaster since 1976.)

Your shell has its own private dictionary which you can then export into the global dictionary. When you run a command from your shell, it does not gain access to the shell’s private environment dictionary. You must export it first.

For example, you can set DISABLE_SPRING from the shell:

DISABLE_SPRING=1

But Rails won’t see that until you export it:

export DISABLE_SPRING

You can combine this into one line:

export DISABLE_SPRING=1

Note that DISABLE_SPRING is the name of the variable but $DISABLE_SPRING is the value:

echo $DISABLE_SPRING

Different programs use environment variables in totally different ways, and you need to learn about them from their respective manuals. You set and export them from the shell, but they are used by other programs.

One important environment variable is PATH. This is a colon-separated list of directories that your shell will look in to find programs.

If your PATH is:

PATH=/usr/bin:/usr/local/bin:bin

Then when you run webpack, your shell will look for /usr/bin/webpack, then /usr/local/bin/webpack, and finally bin/webpack.

History

You can press the up arrow to re-run the prior command. Excellent.

But instead, you can enter an interactive search for the command you want by pressing control-r. Start typing, and if you get the command you want then hit enter. Press control-r to keep searching back in history.

Press control-c to give up and exit the search.

To see the history, use the history built-in.

Commands

These are the commands that you will need in order to interact with files and your operating system:

cat

The cat program lists the contents of files. You can give multiple file names and it will concatenate them.

cat Procfile Rakefile

cd

Each shell instance has its own idea of the present working directory. When you use tab completion, your shell works off its idea of which directory you are “in”.

You can change that using the cd (change directory) built-in. This has no effect on the OS; it’s a shell built-in.

cd hub

The cd built-in accepts a single - as its argument, which causes it to change to the prior directory.

cd -

cp

The cp program copies a file. To copy production.rb to staging.rb:

cp config/environments/production.rb config/environments/staging.rb

Or, using brace expansion:

cp config/environments/{production,staging}.rb

If you pass multiple arguments, the final argument must be a directory. It will copy all the preceding arguments to that directory. To copy all Markdown files into the doc directory:

cp *.md doc

echo

The incredibly simple echo program prints its arguments to stdout. It’s a fun exercise to write this program yourself.

It is mainly used in scripts to show messages to the user:

echo "Loading production data, please wait ..."

Note that you can send the message to stderr using redirects:

echo "Failed to restore database." >&2

file

The file program tells you what kind of file something is. You ever download a JPEG but it turns out to be a Webp? file will tell you that.

file hatchet.jpg

grep

The grep program searches files for a regular expression. With the advent of Ripgrep, the most common file to grep is stdout.

ps | grep ruby

The pattern can be a complex regex, but note that you are fighting shell quoting the whole time. Use single quotes if you’re at all cautious:

ps | grep '[cC]hrome'

The grep program can display context around a match with the -C flag:

grep -C2 dependencies yarn.lock

kill

The kill program sends a signal to another process.

A unix OS has a basic inter-process communication system where you can send any of two dozen one-byte messages (signals) to a program. Some notable signals are HUP (1), INT (2), KILL (9), TERM (15), STOP (19), and CONT (18).

The TERM, INT, and KILL signals all tell the process to stop. The INT (interrupt) signal is the most polite, then TERM (terminate), then KILL (immediately stop). Programs can trap the INT and TERM signals to perform cleanup, but cannot trap the KILL signal.

The STOP signal will pause the program, and the CONT signal will unpause.

The HUP signal traditionally indicates that the long-running program should re-read its configuration file.

To work with the kill program, you need a process ID. You can get that using ps or pgrep. Once you have that, you can send a signal to the process.

To tell Postgres, at process ID 42069, to reload the server configuration files:

kill -HUP 42069

The pkill program works just like kill but takes a program name instead of a process ID. This is a more dangerous game unless you specifically know how many matching processes there are and how to best write a pattern for their name. Typically by the time you finish your investigation, you have a process ID in hand.

To tell Postgres to cancel a running query:

pkill -INT postgres

less

The less command paginates a file. Its name is a joke: there was originally a program called more, but some people wanted to improve it but refused to collaborate with the more developer. They called their improvement less. It was better, and now it’s all we use.

(There is now a replacement for less called most.)

Anyway, you can either use it in a pipe:

grep http Gemfile.lock | less

Or use it on a file directly:

less package.json

By default, less will show you the bytes in a file. This is typically fine – the file is a bunch of ASCII.

However, sometimes a file or stdout contains instructions for a terminal. For example, color codes are a feature from the terminals in the 80s that work by sending ESC (0x1b), a number, and then m; the number determines the color to show. But if less is just showing you bytes, you’re going to see that ESC instead of the color.

The solution is to pass the -R flag. This will cause less to show the colors as colors:

less -R log/development.log

ls

The ls program lists files. It comes in two flavors.

Basic:

ls bin

And long:

ls -l config

By default it will list files in the current working directory:

ls -l

It skips any file whose name begins with a dot (.). To see those, too, pass the -A flag:

ls -A

man

Unix programs ship with a manual installed on the computer. Back in the day this was called the online help. Now it’s called the offline docs. Words!

You can access the manual with the man program.

man ls

The manuals are categorized into sections. Section 1 is for normal programs. Section 8 is administration programs that can break your computer if you’re not careful. Section 7 contains overviews, tutorials, and concept explanations.

You’ll often see programs written as rm(1) or useradd(8). The number is the section.

On GNU/Linux we have a glob(3) and a glob(7). Section 3 is C functions. To access the manual glob(7):

man 7 glob

For more details, see man(1)!

ping

The ping command is a quick way to know whether your Internet connection is slow.

ping thoughtbot.com

Press control-c to end the pings by sending an INT signal.

ps

The ps program lists processes.

Every program is at least one process. Some programs, like Google Chrome, spawn multiple processes.

By default ps only shows processes that are in your current terminal session. That’s usually two: your shell, and ps itself.

I usually run ps ax. You can also give ps -e a try. Either way, you’ll want to pipe it to either grep or less.

The ps program can also tell you memory usage. To use this, first find the process ID you want using ps | grep, and then use the -o rss flag to see the Resident Set Size (RSS) in kilobytes for that process ID.

ps -o rss 12345

rm

The rm program removes files and directories. By default it will not remove a directory with files in it; pass the -r flag to recur.

rm foo
rm -r tmp/*

Unix hot tip: almost every program across all flavors of unix will stop interpreting command line flags once they encounter --. Anything after -- is treated as raw text, and nothing more.

So if you accidentally create a file named -rf, you can remove it like so:

rm -- -rf

ssh

The ssh Secure SHell program allows you to connect to another computer over an encrypted connection.

To connect as ralph to thoughtbot.com:

ssh ralph@thoughtbot.com

The authentication is done via public/private keypair. You can run the ssh-keygen command to generate your keypair. You only need to run that once. This is a set of files named something like .ssh/id_rsa and .ssh/id_rsa.pub. The .pub file is your public key. Do not share the other file!

Under the hood, this is how git connects to GitHub. That’s why the clone command looks like:

git clone git@github.com:...

This is why GitHub wants your SSH public key.

SSH allows you to trust computers using the Trust On First Use (TOFU) principle: the first time you connect, it shows you the other computer’s fingerprint and asks whether you trust it. If you say yes, then it won’t ask again.

You can usually find the other computer’s fingerprint in the support section of for the computer you’re connecting to, such as Heroku’s or GitHub’s support docs.

sudo

Unix popularized the idea of one computer having multiple users. This is how the terminals worked, afterall. The sudo program allows you to switch between these users for the duration of one command.

By default, it switches to the administrator user (root).

sudo vi /etc/passwd

Not baseline knowledge but if you find yourself redirecting to files while using sudo, look into tee.

tail

The tail program shows the last 10 lines of a file.

The -f flag will show the last 10 lines and then continue to show lines as they come in. Handy for logs:

tail -f log/test.log

About thoughtbot

We've been helping engineering teams deliver exceptional products for over 20 years. Our designers, developers, and product managers work closely with teams to solve your toughest software challenges through collaborative design and development. Learn more about us.