Input/Output, generally referred to as I/O, is a term that covers the ways that a computer interacts with the world. Screens, keyboards, files, and networks are all forms of I/O. Data from these devices is sent to and from programs as a stream of characters/bytes.
Unix-like systems treat all external devices as files. We can see these under
the /dev
directory.
Read this
list
for a quick description of all the devices we might find under /dev
for OS X.
For example (truncated for brevity):
$ tree /dev
/dev
├── disk0
├── fd
│ ├── 0
│ ├── 1
│ ├── 2
│ └── 3 [error opening dir]
├── null
├── stderr -> fd/2
├── stdin -> fd/0
├── stdout -> fd/1
├── tty
└── zero
I/O streams are located under the /dev/fd
directory. Files there are given a
number, known as a file descriptor. The operating system provides three streams
by default. They are:
- Standard input (
/dev/fd/0
) - Standard output (
/dev/fd/1
) - Standard error (
/dev/fd/2
)
They are often abbreviated to stdin, stdout, and stderr respectively. Standard
input will default to reading from the keyboard while standard output and
standard error both default to writing to the terminal. As can be seen above,
/dev/stdout
, /dev/stdin
, and /dev/stderr
are just symlinks to the
appropriate file descriptor.
The IO
class
Ruby IO
objects wrap Input/Output streams. The constants STDIN
, STDOUT
, and
STDERR
point to IO
objects wrapping the standard streams. By default the
global variables $stdin
, $stdout
, and $stderr
point to their respective
constants. While the constants should always point to the default streams, the
globals can be overwritten to point to another I/O stream such as a file. IO
objects can be written to via puts
and print
.
$stdout.puts 'Hello World'
We’ve all written the shorthand version of this program:
puts 'Hello World'
The bare puts
method is provided by ruby’s Kernel
module that is just an
alias to $stdout.puts
. Similarly, IO
objects can be read from via gets
.
The bare gets
provided by Kernel
is an alias to $stdin.gets
$stdin
is read-only while $stdout
and $stderr
are write-only.
[1] pry(main)> $stdin.puts 'foo'
IOError: not opened for writing
[2] pry(main)> $stdout.gets
IOError: not opened for reading
[3] pry(main)> $stderr.gets
IOError: not opened for reading
To create a new IO
object, we need a file descriptor.
In this case, 1 (stdout).
[1] pry(main)> io = IO.new(1)
=> #<IO:fd 1>
[2] pry(main)> io.puts 'hello world'
hello world
=> nil
What about creating IOs to other streams? They don’t have constant file
descriptors so we first need to get that via IO.sysopen
.
[1] pry(main)> fd = IO.sysopen('/dev/null', 'w+')
=> 8
[2] pry(main)> dev_null = IO.new(fd)
=> #<IO:fd 8>
[3] pry(main)> dev_null.puts 'hello'
=> nil
[4] pry(main)> dev_null.gets
=> nil
[5] pry(main)> dev_null.close
=> nil
/dev/null
(sometimes referred to as the “bit bucket” or “black hole”) is the
null device on Unix-like systems. Writing to it does nothing and attempting to
read from it returns nothing (nil
in Ruby)
First, we get a file descriptor for a stream that that is read/write to the
dev/null
device. Then we create an IO
object for this stream so we can
interact with it in Ruby. When writing to dev_null
, the text no longer appears
on the screen. When reading from dev_null
, we get nil
.
Since everything on a Unix-like system is a file, we can open an IO
stream to
a text file in the same way we would open a device. We just create a file
descriptor with the path to our file and then create an IO
object for that
file descriptor. When we are done with it, we close the stream to flush Ruby’s
buffer and release the file descriptor back to the operating system. Attempting
read or write from a closed stream will raise an IOError
.
Position
When working with an IO
, we have to keep position in mind. Given that we’ve
opened a stream to the following file:
Lorem ipsum
dolor
sit amet...
and we call gets
on it:
[1] pry(main)> IO.sysopen '/Users/joelquenneville/Desktop/lorem.txt'
=> 8
[2] pry(main)> lorem = IO.new(8)
=> #<IO:fd 8>
[3] pry(main)> lorem.gets
=> "Lorem ipsum\n"
it returns the first line of the file and moves the cursor to the next line. If we check the position of the cursor:
[4] pry(main)> lorem.pos
=> 12
If we call gets
a few more times:
[5] pry(main)> lorem.gets
=> "dolor\n"
[6] pry(main)> lorem.gets
=> "sit amet...\n"
[7] pry(main)> lorem.pos
=> 30
we can see ruby’s “cursor” has moved. Now that we have read the whole file, what
happens if we try to call gets
?
[8] pry(main)> lorem.gets
=> nil
[9] pry(main)> lorem.eof?
=> true
We see that it returns nil
. We can ask a stream if we have reached “end of
file” via eof?
. To return to the beginning of the stream, we can call
rewind
.
[10] pry(main)> lorem.rewind
=> 0
[11] pry(main)> lorem.pos
=> 0
This can lead to surprises when writing to a stream.
[1] pry(main)> fd = IO.sysopen '/Users/joelquenneville/Desktop/test.txt', 'w+'
=> 8
[2] pry(main)> io = IO.new(fd)
=> #<IO:fd 8>
[3] pry(main)> io.puts 'hello world'
=> nil
[4] pry(main)> io.puts 'goodbye world'
=> nil
This stream has the lines “hello world” and “goodbye world”. If we were to attempt to read:
[5] pry(main)> io.gets
=> nil
[6] pry(main)> io.eof?
=> true
Our cursor is currently at the end of the file. In order to read we would need to first rewind.
[7] pry(main)> io.rewind
=> 0
[8] pry(main)> io.gets
=> "hello world\n"
Any write operations in the middle of a stream will overwrite the existing data:
[9] pry(main)> io.pos
=> 12
[10] pry(main)> io.puts "middle"
=> nil
[11] pry(main)> io.rewind
=> 0
[12] pry(main)> io.read
=> "hello world\nmiddle\n world\n"
This kind of behavior is necessary because streams do not get loaded into
memory. Instead, only the lines being operated on are loaded. This is very
useful because some streams can point to very large files that would be
expensive to load in memory all at once. Streams can also be infinite. For
example, $stdin
has no end. We can always read more data from it (when it
receive the message gets
, it waits for the user to type something).
Sub-classes and Duck-types
Ruby gives us a couple subclasses of IO
that are more specialized for a
particular type of IO:
File
Probably the most well known IO
subclass. File
allows us to read/write files
without messing around with file descriptors. It also adds file-specific
convenience methods such as File#size
, File#chmod
, and File.path
.
The Sockets
Socket docs:
Ruby’s various socket classes inherit all ultimately inherit from IO
.
For example, I have a server running on localhost:3000
[1] pry(main)> require 'socket'
=> true
[2] pry(main)> socket = TCPSocket.new 'localhost', 3000
=> #<TCPSocket:fd 10>
[3] pry(main)> socket.puts 'GET "/"'
=> nil
[4] pry(main)> socket.gets
=> "HTTP/1.1 400 Bad Request \r\n"
StringIO
StringIO
allows strings to behave like IO
s. This is useful when we want to
pass strings into systems that consume streams. This is common in tests where
we might inject a StringIO
instead of reading an actual file from disk.
Unlike previous classes showcased, StringIO
does not inherit from IO
.
[1] pry(main)> string_io = StringIO.new('hello world')
=> #<StringIO:0x007feacb0cd4e8>
[2] pry(main)> string_io.gets
=> "hello world"
[3] pry(main)> string_io.puts 'goodby world'
=> nil
[4] pry(main)> string_io.rewind
=> 0
[5] pry(main)> string_io.read
=> "hello worldgoodby world\n"
Tempfile
Tempfile
is another class that doesn’t inherit from IO
. Instead, it
implements File
‘s interface and deals with temporary files. As such, it can be
passed to any object that consumes IO
-like objects.
Putting it all together
Say we have the following class for some command-line program:
class SystemTask
def execute
puts "preparing to execute"
puts "starting first task"
first_task
puts "starting second task"
second_task
puts "execution complete"
end
end
Testing this class causes all these messages to be output, cluttering our
results. One approach to solving this problem would be to inject IO
objects
instead of calling Kernel#puts
and to pass in a null object in tests.
class SystemTask
def initialize(io=$stdout)
@io = io
end
def execute
@io.puts "preparing to execute"
@io.puts "starting first task"
first_task
@io.puts "starting second task"
second_task
@io.puts "execution complete"
end
end
In production, we can still call SystemTask.new.execute
as before.
Now we can pass in our own IO
in tests. This could be a test double, a
StringIO
, or a stream to /dev/null
describe SystemTask do
# test double
it "executes tasks" do
io = double("io", puts: nil)
system_task = SystemTask.new(io)
system_task.execute
# expect things to have happened
# if we care about the messages, we can also expect on the double
expect(io).to have_received(:puts).with("preparing to execute")
end
# StringIO
it "executes tasks" do
io = StringIO.new
system_task = SystemTask.new(io)
system_task.execute
# expect things to have happened
# if we care about the messages read from the string io
io.rewind
expect(io.read).to eq "preparing to execute\nstarting first task\nstarting
second task\nexecution complete\n"
end
# /dev/null
it "executes tasks" do
io = File.open(File::NULL, 'w')
system_task = SystemTask.new(io)
system_task.execute
# expect things to have happened
# only use /dev/null if we don't care about the messages
end
end
Working with disparate APIs
While working on a recent project that pulled reports from several APIs, we noticed some responses were strings, others were CSV documents, and others generate the report and then we had to make a request to another endpoint to download it
The solution was to create an adapter for each API that would get the data and return in a
standard format wrapped in some type of IO-like object. A persistor object could
then process and persist any of the reports as long as they were formatted the
same way and were IO
-like. For example:
class API1Report
def fetch
# fetch report (comes down as a CSV doc)
# process it to get it in a standard format
# return standardized report as a Tempfile object
end
end
class API2Report
def fetch
# fetch report
# returns it as a File object
end
end
class Persistor
def initialize(report)
@report = report
end
def persist
# process and persist the report
end
end
What’s next
Read an overview of 4.4 BSD’s I/O to develop a deeper understanding of Unix I/O, file descriptors, and devices.
Read the TTY system to understand the relationship between Unix jobs, processes, and I/O with the TTY device.
Practice Ruby I/O by cloning this repo.
Finally, go deeper into Ruby’s I/O in this chapter from Read Ruby.