---
title: Pipelining without pipes
teaser: 'Ruby, functional programming, how to build operation pipelines, and the average
  programming language color.

  '
tags: ruby,functional programming,oop,refactoring
author: Matheus Richard
published_on: 2022-03-11
---

GitHub has a library to syntax highlight code snippets called [Linguist]. It has
an [extensive list] of languages and their characteristics (e.g., name, file
extensions, color, etc.). I had some free time, so I came up with this silly
question: _what would be the average programming language color?_

<aside class="info">
I'm talking about the color used in the "Languages" section of a repository:

  <img
    src="https://images.thoughtbot.com/blog-vellum-image-uploads/FFCyKPnsR62LjpxgnkUG_github-languages-section.png"
    alt="The Language section of a GitHub repository. There's a progress bar 78.6% covered in red representing Ruby, 20.8% light green representing Shell, and other small colors and percentages for the rest.">
</aside>

## Hacking it

We can get this done with a few lines of Ruby:

```ruby
require "net/http"
require "yaml"

# Some helper functions

def fetch_url(url)
  Net::HTTP.get(URI(url))
end

def parse_yaml(yaml_string)
  YAML.safe_load(yaml_string)
end

def hex_color_to_rgb(color)
  color.delete("#").scan(/../).map(&:hex)
end

def rgb_color_to_hex(color)
  color.map { |channel| channel.to_s(16).rjust(2, "0") }.join
end

GITHUB_LANGS = "https://raw.githubusercontent.com/github/linguist/6b02d3bd769d07d1fdc661ba9e37aad0fd70e2ff/lib/linguist/languages.yml"

# Fetch and parse the language list YAML
langs_yaml = fetch_url(GITHUB_LANGS)
langs = parse_yaml(langs_yaml)

# Calculate the average color of the programming languages
red_sum = 0
green_sum = 0
blue_sum = 0
color_count = 0

langs.each do |_name, details|
  next if details["type"] != "programming" # Skip "non-programming" languages
  next if details["color"].nil? # Skip languages without a color

  rgb = hex_color_to_rgb(details["color"])
  red_sum += rgb[0] ** 2
  green_sum += rgb[1] ** 2
  blue_sum += rgb[2] ** 2
  color_count += 1
end

average_red = Math.sqrt(red_sum / color_count).to_i
average_green = Math.sqrt(green_sum / color_count).to_i
average_blue = Math.sqrt(blue_sum / color_count).to_i

average_color = "##{rgb_color_to_hex([average_red, average_green, average_blue])}"

puts average_color
```

<aside class="info">
  While writing this article, I discovered at least two different approaches to averaging colors:</p>

  <ol>
    <li>Sum individual color channels together and then divide it by the number of colors.</li>
    <li>
      <a href="https://scribe.rip/how-to-average-rgb-colors-together-6cd3ef1ff1e5">Sum the square</a> of each color
      channel, divide by the number of colors and take the square root of it.
    </li>
  </ol>

  I used the second method here because the result color looks nicer. For simplicity, I also I left
  out details, like error handling.
</aside>

There you go! It prints out the average color. We can close our laptops and call it a day. There are
[few reasons](https://thoughtbot.com/blog/storytellers) to improve code that we won't touch again.

## I'm smelling something

Well, for a throwaway project, that solution is indeed enough. **Sometimes we have fun coding, and
that's it**, no worries. In this particular case, though, I thought it would be an excellent
exercise to practice functional programming. So, I decided to refactor it.

Some things bothered me in the original code. That `each` block does a lot of stuff, and, as
generally happens with things with lots of responsibilities, it doesn't do them well:

1. The logic to calculate the average color is split into different parts of the
  code. Some of it is inside the `each` block, and some of it is outside;
1. The code is fragile:
    - Updating `color_count` [**_has_**][] to happen after the `next if ...` calls;
    - It's easy to miss why `color_count` is necessary at all and, instead, use `langs.size` to
    calculate the average color, which would give us the wrong result.
1. The code is very procedural, and it feels weird in Ruby.

It seems like having `color_count` and the color sums as separate variables is causing some pain, so
we could change those variables to be a single array of colors and calculate the mean later.
Iteratively building a collection [is an anti-pattern], but it does shine a light on a direction we
can follow.

## Data transformation 🤝 Functional programming

Functional programming teaches us to think in terms of data transformation. Each function takes data
and returns it in a new form. We can compose several functions together and form a pipeline.

Let's walk our code and try to convert it into a pipeline. We can keep the
imports and helper functions, so let's skip to this part:

```ruby
# Fetch and parse the language list YAML
langs_yaml = fetch_url(GITHUB_LANGS_URL)
langs = parse_yaml(langs_yaml)
```

In a functional programming language like Elixir, this could be written as:

```elixir
GITHUB_LANGS_URL
|> fetch_url()
|> parse_yaml()
```

We hit our first roadblock: we have no pipe operator in Ruby! It is a common
feature of functional programming languages that passes the result of an
expression as a parameter of another expression.

<aside class="info">
For a while, Ruby <strong>had</strong> a pipeline operator, but <a href="https://bugs.ruby-lang.org/issues/15799#note-46">it was removed</a> since the way it
worked caused some controversy.
</aside>

So how can we do this in Ruby? We could write it as `parse_yaml(fetch_url(GITHUB_LANGS_URL))`, but
keeping this pattern leads to quite unreadable code. Ruby is an object-oriented language, so we have
to think in terms of objects and messages (i.e., methods).

We need something that passes the caller to a given function, or, in other words, that _yields self
to a block_. Luckily, Ruby has a method that does exactly that: [`yield_self`], or its
nicer-sounding alias [`then`]. Here's how that code would look:

```ruby
GITHUB_LANGS_URL
  .then { |url| fetch_url(url) }
  .then { |languages_yaml| parse_yaml(languages_yaml) }
```

Using Ruby's [numbered parameters], we can avoid having to name the block arguments:

```ruby
GITHUB_LANGS_URL
  .then { fetch_url _1 }
  .then { parse_yaml _1 }
```

Cool, that is pretty close to the Elixir code. Now, we have to transform that big `each` block into a
pipeline. In essence, that part of the code filters out non-programming languages and languages
without color then calculates the average color. Let's split those two parts into separate steps. `parse_yaml` returns a hash, so we can use [`Enumerable#filter`] to select the languages we want.

```ruby
  # ...
  .then { parse_yaml _1 }
  .filter { |_lang_name, lang_details|
    lang_details["type"] == "programming" && lang_details["color"]
  }
```

Then, we get the colors of each language and convert them to RGB:

```ruby
  # ...
  .filter { |_lang_name, lang_details|
    lang_details["type"] == "programming" && lang_details["color"]
  }
  .map { |_lang_name, lang_details|
    hex_color_to_rgb(details["color"])
  }
```

This code works, but alas, it iterates over the languages twice (first time on `filter` and the
other on `map`). We could use [`Enumerable#reduce`] to do this in a single pass, but that would be a
bit lengthy (and many folks don't know `Enumerable#reduce`). Again, Ruby has our back and provides a
[`Enumerable#filter_map`]. It calls the given block on each element of the enumerable and returns an
array containing the truthy elements returned by the block. We can merge those two steps into one:

```ruby
  .filter_map { |_lang_name, lang_details|
    next if lang_details["type"] != "programming"
    next if lang_details["color"].nil?

    hex_color_to_rgb(details["color"])
  }
```

<aside class="info">
I split the filter condition into two steps because I think it's easier to read. Also note that the
<code>if</code> conditions are now inverted.
</aside>

Now we have an array of colors, with each color as an array of red, green, and blue values. We need
to sum all red values together, then all green values, and all blue values. Let's reshape our data
representation to group values by color channel, so this will be easier:

```ruby
  .filter_map {
    # ...
  }
  .transpose
```

The pipeline is coming together, but we still have work to do. Calculating the average color now is
fairly simple using [`Enumerable#sum`][`Enumerable#sum`] (can we get `Enumerable#mean`, tho? 😅):

```ruby
  .transpose
  .map { |channel_values|
    squared_average = channel_values.sum { |value| value ** 2 } / channel_values.size

    Math.sqrt(squared_average).to_i
  }
```

<aside class="warn">
  <strong>Readability, performance and balance</strong>

  <p>
    Those with sharp eyes will notice that we're still iterating over the values multiple times
    (<code>sum</code>, <code>size</code>, plus the call to <code>filter_map</code> and
    <code>transpose</code>). Again, using <code>Enumerable#reduce</code> would be an option for a
    single pass solution, but a <code>O(n)</code> solution isn't a hard requirement for this exercise.
  </p>

  Also, the body of that <code>reduce</code> call could be hard to grasp, so I decided to sacrifice
  a bit of performance to ease reading/teaching. As developers, we constantly have to balance
  readability, performance, and maintainability.
</aside>

Lastly, we convert the color, represented as a 3-element array, to a hex string and print it. Here's the full solution:

```ruby
require "net/http"
require "yaml"

def fetch_url(url)
  Net::HTTP.get(URI(url))
end

def parse_yaml(yaml_string)
  YAML.safe_load(yaml_string)
end

def hex_color_to_rgb(color)
  color.delete("#").scan(/../).map(&:hex)
end

def rgb_color_to_hex(color)
  color.map { |channel| channel.to_s(16).rjust(2, "0") }.join
end

GITHUB_LANGS_URL = "https://raw.githubusercontent.com/github/linguist/6b02d3bd769d07d1fdc661ba9e37aad0fd70e2ff/lib/linguist/languages.yml"

GITHUB_LANGS_URL
  .then { fetch_url _1 }
  .then { parse_yaml _1 }
  .filter_map { |_lang_name, lang_details|
    next if lang_details["type"] != "programming"
    next if lang_details["color"].nil?

    hex_color_to_rgb(lang_details["color"])
  }
  .transpose
  .map { |channel_values|
    squared_average = channel_values.sum { |value| value ** 2 } / channel_values.size

    Math.sqrt(squared_average).to_i
  }
  .then { |average_color| puts "##{rgb_color_to_hex(average_color)}" }
```

One of the neat things about that pipeline is that we can extract any part of it into a separate
method, and it still will be chainable.

## Takeaways

Ruby is a  <abbr title="Object-Oriented Programming">OOP</abbr> language, so thinking about objects
and methods is the natural way of programming. Whenever you can, use methods (like those on the
[`Enumerable`] module), or [create objects] that provide the ones you need.

Ruby also has good support for functional programming, and we can take advantage of that,
particularly when doing data transformation. Mixing
<abbr title="Object-Oriented Programming">OOP</abbr> and <abbr title="Functional Programming">FP</abbr>
[is not a sin](https://thoughtbot.com/blog/you-could-invent-objectoriented-programming), and Ruby has great features to support it.

Moreover, remember that it's okay to start with a simple solution and improve it later. That's the
natural flow when [doing TDD].

## Hey, wait! You forget something!

What? Oh, the color! Here it is:

![A shade of puce or grayish mauve picked by a color selector containing a range of color shades and values](https://images.thoughtbot.com/blog-vellum-image-uploads/uoevh1ERTcapiwmkW2PK_average-programming-language-color.png)

[Linguist]: https://github.com/github/linguist
[extensive list]: https://raw.githubusercontent.com/github/linguist/6b02d3bd769d07d1fdc661ba9e37aad0fd70e2ff/lib/linguist/languages.yml
[**_has_**]: https://connascence.io/position.html
[is an anti-pattern]: https://thoughtbot.com/blog/iteration-as-an-anti-pattern
[`yield_self`]: https://thoughtbot.com/blog/using-yieldself-for-composable-activerecord-relations
[`then`]: https://docs.ruby-lang.org/en/3.1/Kernel.html#method-i-yield_self
[numbered parameters]: https://docs.ruby-lang.org/en/3.1/Proc.html#class-Proc-label-Numbered+parameters
[`Enumerable#filter`]: https://docs.ruby-lang.org/en/3.1/Enumerable.html#method-i-filter
[`Enumerable#reduce`]: https://docs.ruby-lang.org/en/3.1/Enumerable.html#method-i-reduce
[`Enumerable#filter_map`]: https://docs.ruby-lang.org/en/3.1/Enumerable.html#method-i-filter_map
[`Enumerable#sum`]: https://docs.ruby-lang.org/en/3.1/Enumerable.html#method-i-sum
[`Enumerable`]: https://docs.ruby-lang.org/en/3.1/Enumerable.html
[create objects]: https://thoughtbot.com/upcase/videos/extract-class
[doing TDD]: https://thoughtbot.com/upcase/videos/red-green-refactor-by-example
