My Adventure With Async Ruby

Matheus Richard

I was working on an app that generated a Markdown article. The article content had some dynamic parts that were fetched via HTTP requests. While not a huge problem, this made the article generation slow.

Ruby 3.0 introduced the fiber scheduler interface, which is used by the async gem to run tasks concurrently. It’s particularly useful for I/O-bound workloads, so I decided to give it a try. This post is a summary of my journey in figuring out how to use it.

If you don’t care about any of this, skip to the final thoughts section.

The problem

The article generation code looked like this (I’m using sleep to simulate the HTTP requests time):

class Article
  def to_s
    <<~MARKDOWN
      # #{generate_title}

      #{generate_content}
    MARKDOWN
  end

  def generate_title
    sleep 2

    "A title"
  end

  def generate_content
    5.times.map { |i|
      generate_paragraph(i)
    }.join("\n")
  end

  private

  def generate_paragraph(i)
    sleep 1

    "Paragraph #{i}"
  end
end

t0 = Process.clock_gettime(Process::CLOCK_MONOTONIC)
Article.new.to_s
t1 = Process.clock_gettime(Process::CLOCK_MONOTONIC)
puts "Time: #{t1 - t0} seconds."

This takes about 7 seconds to run (1 second for each of the 5 paragraphs plus 2 seconds for the title).

The journey

After installing the async gem, the first thing I did was wrap the whole code in an Async block as all the examples did.

require "async"

class Article
  def to_s
    Async do
      <<~MARKDOWN
        # #{generate_title}

        #{generate_content}
      MARKDOWN
    end
  end
  # ...
end

Re-running the code, it still runs in seven seconds, and now instead of the article body, I got back an Async::Task object.

If I want the result, I need to call #wait on the task.

def to_s
  Async do
    <<~MARKDOWN
      # #{generate_title}

      #{generate_content}
    MARKDOWN
  end.wait # <--- this
end

From the docs, it looks like I can replace this Async { }.wait pattern with Sync { }

def to_s
  Sync do
    <<~MARKDOWN
      # #{generate_title}

      #{generate_content}
    MARKDOWN
  end
end

Nothing is running asynchronously yet, so let’s try starting with the paragraphs:

  def generate_content
    5.times.map { |i|
      Async { generate_paragraph(i) }
    }.join("\n")
  end

This makes each loop async, and the code runs in 3 seconds. Again, we don’t have the values for the paragraphs, just ‘tasks’. Let’s add wait again:

  def generate_content
    5.times.map { |i|
      Async { generate_paragraph(i) }.wait
    }.join("\n")
  end

Waiting on each async does get the value back, but now everything is running synchronously again, i.e., in 7 seconds. What?!

The “How The Heck Do I Make This Work?” Section

I tried to wrap the whole thing in an Async + wait block with internal async tasks, but it also didn’t work.

  def generate_content
    Async do
      5.times.map { |i|
        Async { generate_paragraph(i) }
      }.join("\n")
    end.wait
  end

Ok, maybe the problem is using #join right after creating the tasks, which wouldn’t give them time to finish. Against my will, I iteratively built a list:

  def generate_content
    paragraphs = []

    Sync do
      5.times do |i|
        Async do
          paragraphs << generate_paragraph(i)
        end
      end
    end

    paragraphs.join("\n")
  end

I was surprised this didn’t work. For some reason, the paragraphs are empty! I thought the Sync block would wait for the internal Async blocks to finish, but it didn’t.

A Solution

After fighting with this for a while, reading the docs and the source code, I finally got it. I had to wait for the tasks after creating all of them, not right after creating one of them.

  def generate_content
    5.times.map { |i|
      Async do
        generate_paragraph(i)
      end
    }
    .map(&:wait) # <--- wait after creating all tasks
    .join("\n")
  end

It works! This takes 3 seconds to run, as expected (only the paragraph generation is async for now). Bonus points: #map kept working!

Here’s a visual representation of the difference between the two approaches:

A diagram displaying how waiting after each task is created leads to a total time greater than waiting after all the tasks are created.

We have to change Article#to_s to generate the title and the content in parallel:

  def to_s
    Sync do
      # We cannot use `Sync` here, because the first task would block the following one
      title_task = Async { generate_title }
      content_task = Async { generate_content }

      title = title_task.wait
      content = content_task.wait

      <<~MARKDOWN
        # #{title}

        #{content}
      MARKDOWN
    end
  end

This is fully concurrent now. It takes only 2 seconds to run, a 3.5x speedup! It’s a bit boring having to #wait, but it’s not a big deal. Here’s the full diagram of the execution:

A diagram displaying the program execution flow. It shows how the title and content creation run in parallel and how the five paragraphs are also created at the same time.

Thoughts on the Async gem

That was an interesting experience for me. I’ve seldom used Threads in Ruby because they feel easy to mess up, so I was curious to see how Async would work. Here are a few thoughts on it:

The good

I didn’t have to change my code much to make it async.

Any other code using the Article class would work as before, without knowing it’s asynchronous. Is this what people mean by it having no function colors?

It’s very scalable.

Given that you have an I/O bound problem, you can easily add more tasks to run concurrently and they’re lightweight (orders of magnitude lighter than Threads).

It is the “official” gem for this kind of problem

It looks like Matz himself invited the gem to core Ruby, but I couldn’t find where/when this happened. Samuel Williams, the author, is a core contributor to Ruby and has merged the fiber scheduler interface in Ruby 3.0.

The bad

The docs are… lacking

Documentation and examples are scarce. The guides are brief, and this blog post was one of the only examples I could find.

Sometimes I needed to dig into the code to understand how to use it, which is not an unusual thing to do, but it’s not ideal for simple use cases. I often had questions I couldn’t find answers to in the docs, like:

  • Why use Async { }.wait vs Sync { }? The documentation says they’re “very similar”, which leaves me wondering where would they differ.
  • What’s the difference (if any) between nested Async blocks and using Async { |task| task.async { ... } }?
  • Why use Async::Barrier vs multiple waits?

There’s indeed a call for better docs in the repo.

It’s not compatible with every library

I’m lucky my use case was covered, but if this was a Rails app, for instance, it wouldn’t be possible to use Async to run queries in parallel. It does work with Sequel, though. It’s not a problem with the gem itself, but it’s something to keep in mind.

Wrapping up

All in all, this was a fun experiment and I did get a good speedup. I think this ecosystem is promising and I’m looking forward to seeing more libraries supporting it. The biggest “problem” I had was the lack of documentation, but this is something that we, as a community, can help with.