Many Unix utilities accept input both in the form of filenames passed as
command-line arguments and as data sent to the program’s standard input stream.
If filenames are passed, the corresponding files will be read in sequence. If
not, the standard input stream will be read instead. This behavior makes
utilities like cat
, grep
, and sed
versatile and easy to use.
In Ruby, a subset of cat
‘s features can be re-implemented with the following
code:
# cat.rb
if ARGV.length > 0
ARGV.each do |filename|
puts File.read(filename)
end
else
puts STDIN.read
end
The implementation inspects the length of the ARGV
array, containing all
command line arguments passed to the program. If any arguments are passed, they
are interpreted as filenames, read and output. If no arguments are passed, the
standard input stream is instead read and output.
The cat
clone can then be used like this:
$ ruby cat.rb file1 file2
Contents of file1
Contents of file2
$ echo "Contents of standard input" | ruby cat.rb
Contents of standard input
It does what it’s supposed to do, but the implementation is very concerned with
where its input is coming from. It also duplicates the output functionality in
both branches of the conditional. To solve both of these problems, Ruby provides
the ARGF
stream.
Using the ARGF
stream, the cat
clone can be re-implemented like so:
# argf.rb
puts ARGF.read
This implementation is oblivious to where its input is coming from and can instead focus on what to do with it.
Ruby’s ARGF stream
So what is the ARGF
stream? The Ruby standard library documentation describes
it as such:
ARGF
is a stream designed for use in scripts that process files given as command-line arguments or passed in viaSTDIN
.
ARGF
will interpret all elements of the ARGV
array as filenames and when
read will produce a concatenation of the contents of these files. If ARGV
is
empty, then ARGF
reads from standard input.
This means that if a program also accepts flags like --color
or
--line-buffered
, these flags will have to be shifted off the ARGV
array
before ARGF
is read in order to avoid unexpected “No such file or directory”
errors.
Filenames that are manually added to the ARGV
array will also be read by
ARGF
.
After a file has been read using ARGF
, its filename is automatically shifted
off the ARGV
array.
Many Unix utilities, like cat
, also support another helpful feature that
allows input to be sent both to the standard input stream and as
filenames passed as command-line arguments. This is done by passing the special
filename -
as a command-line argument:
$ echo "Contents of standard input" | cat file1 - file2
Contents of file1
Contents of standard input
Contents of file2
Luckily, ARGF
supports this as well:
$ echo "Contents of standard input" | ruby argf.rb file1 - file2
Contents of file1
Contents of standard input
Contents of file2
In addition to exposing an IO
-like interface for reading the contents of
multiple files and streams, ARGF
also provides handy methods for controlling
which file or stream is currently read.
To get the file that is currently being read, the #file
method can be used. If
STDIN
is currently read, this method will return an IO
object instead of a
File
:
# file.rb
p ARGF.file
ARGF.read(ARGF.file.size + 1) # The extra byte read is EOF
p ARGF.file
ARGF.read(ARGF.file.size + 1)
p ARGF.file
$ echo "Contents of standard input" | ruby file.rb file1 file2 -
#<File:file1>
#<File:file2>
#<IO:<STDIN>>
To only get the name of the file currently being read, we can use the
#filename
method.
If our program only processes partial files, for example the YAML front matter
of blog posts written in markdown format, the #close
method can be used to
close the current file and skip to the next file:
# front_matter.rb
ARGF.each_line do |line|
if ARGF.lineno > 1 && line == "---\n"
ARGF.close
end
puts line
end
$ ruby front_matter.rb post1.md post2.md
---
title: My First Blog Post
---
---
title: My Second Blog Post
---
ARGF
is a great example of Ruby’s way of promoting Unix tradition by making it
easy to write well-behaved Unix utilities.
Next time you write a Ruby program for processing data, give [ARGF]
a try!