This year, whyday happened to fall on the first day of Capeco.de. I’d been interested in playing with C for a while, so I decided to sit down and start learning. The K&R helped a lot, but digging through both Redis and Potion were a lot more insightful of how C could actually be used.
Since then, I’ve been trying to get some C in my Ruby by writing gems with C extensions. This isn’t hard to do, per se, but it was difficult to find a comprehensive list of requirements (as well as guidelines for organizing the code). I ended up looking at one of the most-used Ruby gems with C - Nokogiri.
Where to Start
You’ll want a way to test the code, as well as the rake-compiler gem. I don’t test my C explicitly - think of it as a handful of private methods. You’ll want to test the public methods on whatever classes and modules you write, so a C testing framework is overkill. Finally, you’ll want to become acquainted with ruby.h.
I decided to start simple and write a Sieve of Eratosthenes. I’d written one in pure Ruby and wanted a basic translation to C. I was also interested in benchmarking the code since I knew C would be a lot faster in this instance.
The sieve gem can be found here and its source here.
Directory Structure
A C extension’s directory structure is very similar to other Ruby gems; the only
addition is an ext
directory that will store files necessary for generating a
Makefile and compiling the code. This is where your C files and their headers
will go. In my case, these files are located in ext/sieve
.
extconf.rb
extconf.rb
is what will generate your Makefile. You’ll need to require
"mkmf"
and then call create_makefile("your_gem/your_gem")
. The mkmf
documentation is an
excellent resource if you want to include other libraries or customize anything.
My gem is straightforward so all I did was ensure the Makefile was created.
sieve.h
My sieve.h
is very straightforward and doesn’t really need any explanation.
sieve.c
This is the meat and potatoes of the gem.
At the top of the file, you’ll need to #include <ruby.h>
as well as any other
headers you need.
You’ll also need an Init_your_gem()
function that will be called similarly to
main()
. This is where I create the structure of my classes and modules for my
sieve. I create a Sieve module, add a sieve
instance method, and then have
the Numeric class include Sieve.
Finally, there’s the sieve function itself. It returns a Ruby object which is
of type VALUE
. It also accepts a Ruby object (self), which is also of type
VALUE
. An important reminder is to make sure you free any memory you allocate
or your gem will leak memory, just as you would in C.
The /lib directory
Although most of the actual work is done in C, we’ll want to have a bit of Ruby
in the /lib
directory. I’ve written a scaffold of the module at
lib/sieve.rb
, which has a require "sieve/sieve"
at the top of the file
(remember in extconf.rb
when we passed a string to create_makefile
? That’s
it.).
The Rakefile
Being able to compile the gem and run the tests is important, which is why I
mentioned the rake-compiler gem
earlier. After requiring rubygems, rake, and your library, you’ll want to
require "rake/extensiontask"
. That’ll give you a couple of handy rake tasks,
namely clean and compile.
I like to set my default task to run tests, but you’ll want a couple prerequisites to that task: clean and compile. This will ensure that you’re rebuilding your gem and running with the latest compiled version.
Since I’m using Cucumber, it looks like this:
require "cucumber/rake/task"
Cucumber::Rake::Task.new(:cucumber => [:clean, :compile]) do |t|
t.rcov = true
end
task :default => :cucumber
I also have my benchmark task here so I can find out how much more performant this library is compared to a pure Ruby implementation.
Testing
You’ll want to test your C extension just like any other Ruby gem. I prefer Cucumber but anything will do. I used a scenario outline for some of the basic primes and then found a file of the first one million primes for some heavy-duty lifting. I also tested that if enough memory couldn’t be allocated, it would raise a Ruby NoMemoryError exception.
Since you’re going to want to run the features against the latest changes of
your gem (and not a version of the gem that’s installed), you’ll want to modify
the load path within your test helper (test/testhelper.rb, spec/spechelper.rb,
or features/support/env.rb). My env.rb
looks like this:
$LOAD_PATH.unshift(File.dirname(__FILE__) + '/../../lib')
require "sieve"
require "spec/expectations"
Building the gem
The gemspec for a Ruby C extension is fairly straightforward. The only thing
you’ll need to add is to set the spec’s extensions
attribute to the path to
the extconf.rb file.
Gem::Specification.new do |s|
s.require_paths = ["lib"]
s.extensions = ["ext/sieve/extconf.rb"]
# ... rest of the gemspec
end
As with any gemspec, you’ll want to make sure that you list the .c and .h files
within files
.
Results
Armed with this, you should be able to go and write Ruby C extensions to your hearts content. As for my Sieve experiment, here’s the pure-Ruby implementation of the sieve:
# usage:
# >> sieve 100
# => [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]
def sieve(n)
numbers = (0..n).map {|i| i }
numbers[0] = numbers[1] = nil
numbers.each do |num|
next unless num
break if num**2 > n
(num**2).step(n, num) {|idx| numbers[idx] = nil }
end
numbers.compact
end
My benchmarks were running the sieve on numbers from zero to one million in steps of 100,000. No memoization is used for either form.
On Ruby 1.8.7, here are the results from my benchmark:
user system total real
sieve method 4.460000 0.060000 4.520000 ( 4.522069)
Numeric#sieve 0.040000 0.000000 0.040000 ( 0.046349)
Ruby 1.9.2 is significantly faster, but still doesn’t hold a candle to the C extension:
user system total real
sieve method 2.410000 0.060000 2.470000 ( 2.468430)
Numeric#sieve 0.050000 0.000000 0.050000 ( 0.049053)
What I Learned
Writing C is both fun and can enhance performance of number-crunching and other fun things. It has it’s place and is a great addition to any Rubyist’s toolbox. Have you written any C extensions purely for performance gains? If you’re open to sharing the context, I’d love to hear about it!