---
title: CSVKit Brings the Unix Philosophy to CSV
teaser: Use CSVKit to parse CSV at the command line.
tags: unix
author: Gabe Berke-Williams
published_on: 2015-03-27
---

We had a lot of CSV data to parse and slice when we built our [end of year
report][eoy]. We couldn't easily parse CSV using `sed`, because a simple "split
on commas" strategy quickly fell down. CSV has a surprisingly [complicated
spec][csv-spec] that (for example) allows commas inside values. Instead, we
needed something that's built to parse CSV in all of its intricacy.

[eoy]: http://2014.thoughtbot.com/
[csv-spec]: https://tools.ietf.org/html/rfc4180

Enter [CSVkit], "a suite of utilities for converting to and working with CSV".
It's `cut`, `sort`, and `grep` for CSV.

[CSVkit]: https://csvkit.readthedocs.org/en/0.9.0/

Let's try out CSVkit's tools on this list of conferences in CSV format:

```csv
name,start_date,end_date,location,url
"Rubyfuza","02/06/2014","02/08/2014","Cape Town","http://www.rubyfuza.org/"
"Launch","02/26/2014","02/28/2014","SF","http://events.launch.co/festival"
"EmberConf","02/23/2014","03/24/2014","Portland","http://ember.com"
"NSConf","03/17/2014","03/19/2014","England","http://nsconference.com"
"RailsConf","04/22/2014","04/25/2014","Chicago","http://www.railsconf.com/"
```

## csvcut

Let's use [`csvcut`][csvcut] to grab the name and location of each conference:

[csvcut]: http://csvkit.readthedocs.org/en/0.9.0/scripts/csvcut.html

$ <kbd>csvcut --columns name,location conferences.csv</kbd>

    name,location
    Rubyfuza,Cape Town
    Launch,SF
    EmberConf,Portland
    NSConf,England
    RailsConf,Chicago

The `--columns` option determines which column the command is operating on. It's
used in nearly all of the CSVkit tools.

## csvsort

[`csvsort`][csvsort] can sort data by a specific column. Let's sort by name:

[csvsort]: https://csvkit.readthedocs.org/en/0.9.0/scripts/csvsort.html

$ <kbd>csvsort --columns name conferences.csv</kbd>

    name,start_date,end_date,location,url
    EmberConf,2014-02-23,2014-03-24,Portland,http://ember.com
    Launch,2014-02-26,2014-02-28,SF,http://events.launch.co/festival
    NSConf,2014-03-17,2014-03-19,England,http://nsconference.com
    RailsConf,2014-04-22,2014-04-25,Chicago,http://www.railsconf.com/
    Rubyfuza,2014-02-06,2014-02-08,Cape Town,http://www.rubyfuza.org/

Note that the output of `csvsort`, like the output from other CSVkit tools, is
itself valid CSV.  That means we can pipe it to other CSVkit tools, or take in
other tools' input.  Here are the names and locations of the conferences, sorted
by name:

$ <kbd>csvcut --columns name,location conferences.csv | csvsort --columns name</kbd>

    name,location
    EmberConf,Portland
    Launch,SF
    NSConf,England
    RailsConf,Chicago
    Rubyfuza,Cape Town

## csvgrep

[`csvgrep`][csvgrep] filters your data to only rows where a column matches a
specific value.

[csvgrep]: http://csvkit.readthedocs.org/en/0.9.0/scripts/csvgrep.html

Let's search the `name` column for values that match the word "Conf":

$ <kbd>csvgrep --columns name --match Conf conferences.csv</kbd>

    name,start_date,end_date,location,url
    EmberConf,02/23/2014,03/24/2014,Portland,http://ember.com
    NSConf,03/17/2014,03/19/2014,England,http://nsconference.com
    RailsConf,04/22/2014,04/25/2014,Chicago,http://www.railsconf.com/

We can also use regexes with the `--regex` option. Let's find conferences with
at least two capital letters in their name:

$ <kbd>csvgrep --columns name --regex "[A-Z].&#22;[A-Z]" conferences.csv</kbd>

    name,start_date,end_date,location,url
    EmberConf,02/23/2014,03/24/2014,Portland,http://ember.com
    NSConf,03/17/2014,03/19/2014,England,http://nsconference.com
    RailsConf,04/22/2014,04/25/2014,Chicago,http://www.railsconf.com/

Since CSVgrep is written in Python, it interprets regexes the same way [Python
does](https://docs.python.org/2/library/re.html). If it works in Python, it
should work when passed to `--regex`.

## Even more

This covers the 80% use case of CSVkit, but if there's something else you want
to do, I strongly recommend reading the [CSVkit] docs. They're exceptionally
well-written and have helpful real-life examples.
