---
title: Ruby's ARGF
teaser: How to write well-behaved data-processing Unix utilities in Ruby.
tags: ruby,unix
author: Calle Erlandsson
published_on: 2015-10-13
---

Many Unix utilities accept input both in the form of filenames passed as
command-line arguments and as data sent to the program's standard input stream.
If filenames are passed, the corresponding files will be read in sequence. If
not, the standard input stream will be read instead. This behavior makes
utilities like `cat`, `grep`, and `sed` versatile and easy to use.

In Ruby, a subset of `cat`'s features can be re-implemented with the following
code:

```ruby
# cat.rb

if ARGV.length > 0
  ARGV.each do |filename|
    puts File.read(filename)
  end
else
  puts STDIN.read
end
```

The implementation inspects the length of the `ARGV` array, containing all
command line arguments passed to the program. If any arguments are passed, they
are interpreted as filenames, read and output. If no arguments are passed, the
standard input stream is instead read and output.

The `cat` clone can then be used like this:

```sh
$ ruby cat.rb file1 file2
Contents of file1
Contents of file2
$ echo "Contents of standard input" | ruby cat.rb
Contents of standard input
```

It does what it's supposed to do, but the implementation is very concerned with
where its input is coming from. It also duplicates the output functionality in
both branches of the conditional. To solve both of these problems, Ruby provides
the `ARGF` stream.

Using the `ARGF` stream, the `cat` clone can be re-implemented like so:

```ruby
# argf.rb

puts ARGF.read
```

This implementation is oblivious to where its input is coming from and can
instead focus on what to do with it.

## Ruby's ARGF stream

So what is the `ARGF` stream? The Ruby standard library documentation describes
it as such:

> `ARGF` is a stream designed for use in scripts that process files given as
> command-line arguments or passed in via `STDIN`.

`ARGF` will interpret all elements of the `ARGV` array as filenames and when
read will produce a concatenation of the contents of these files. If `ARGV` is
empty, then `ARGF` reads from standard input.

This means that if a program also accepts flags like `--color` or
`--line-buffered`, these flags will have to be shifted off the `ARGV` array
before `ARGF` is read in order to avoid unexpected "No such file or directory"
errors.

Filenames that are manually added to the `ARGV` array will also be read by
`ARGF`.

After a file has been read using `ARGF`, its filename is automatically shifted
off the `ARGV` array.

Many Unix utilities, like `cat`, also support another helpful feature that
allows input to be sent **both** to the standard input stream **and** as
filenames passed as command-line arguments. This is done by passing the special
filename `-` as a command-line argument:

```sh
$ echo "Contents of standard input" | cat file1 - file2
Contents of file1
Contents of standard input
Contents of file2
```

Luckily, `ARGF` supports this as well:

```sh
$ echo "Contents of standard input" | ruby argf.rb file1 - file2
Contents of file1
Contents of standard input
Contents of file2
```

In addition to exposing an `IO`-like interface for reading the contents of
multiple files and streams, `ARGF` also provides handy methods for controlling
which file or stream is currently read.

To get the file that is currently being read, the `#file` method can be used. If
`STDIN` is currently read, this method will return an `IO` object instead of a
`File`:

```ruby
# file.rb

p ARGF.file
ARGF.read(ARGF.file.size + 1) # The extra byte read is EOF
p ARGF.file
ARGF.read(ARGF.file.size + 1)
p ARGF.file
```

```sh
$ echo "Contents of standard input" | ruby file.rb file1 file2 -
#<File:file1>
#<File:file2>
#<IO:<STDIN>>
```

To only get the name of the file currently being read, we can use the
`#filename` method.

If our program only processes partial files, for example the YAML front matter
of blog posts written in markdown format, the `#close` method can be used to
close the current file and skip to the next file:

```ruby
# front_matter.rb

ARGF.each_line do |line|
  if ARGF.lineno > 1 && line == "---\n"
    ARGF.close
  end
  puts line
end
```

```sh
$ ruby front_matter.rb post1.md post2.md
---
title: My First Blog Post
---
---
title: My Second Blog Post
---
```

`ARGF` is a great example of Ruby's way of promoting Unix tradition by making it
easy to write well-behaved Unix utilities.

Next time you write a Ruby program for processing data, give [`[ARGF]`][argf] a try!

[argf]: https://ruby-doc.org/core-2.5.1/ARGF.html
