GitHub has a library to syntax highlight code snippets called Linguist. It has an extensive list of languages and their characteristics (e.g., name, file extensions, color, etc.). I had some free time, so I came up with this silly question: what would be the average programming language color?
Hacking it
We can get this done with a few lines of Ruby:
require "net/http"
require "yaml"
# Some helper functions
def fetch_url(url)
Net::HTTP.get(URI(url))
end
def parse_yaml(yaml_string)
YAML.safe_load(yaml_string)
end
def hex_color_to_rgb(color)
color.delete("#").scan(/../).map(&:hex)
end
def rgb_color_to_hex(color)
color.map { |channel| channel.to_s(16).rjust(2, "0") }.join
end
GITHUB_LANGS = "https://raw.githubusercontent.com/github/linguist/6b02d3bd769d07d1fdc661ba9e37aad0fd70e2ff/lib/linguist/languages.yml"
# Fetch and parse the language list YAML
langs_yaml = fetch_url(GITHUB_LANGS)
langs = parse_yaml(langs_yaml)
# Calculate the average color of the programming languages
red_sum = 0
green_sum = 0
blue_sum = 0
color_count = 0
langs.each do |_name, details|
next if details["type"] != "programming" # Skip "non-programming" languages
next if details["color"].nil? # Skip languages without a color
rgb = hex_color_to_rgb(details["color"])
red_sum += rgb[0] ** 2
green_sum += rgb[1] ** 2
blue_sum += rgb[2] ** 2
color_count += 1
end
average_red = Math.sqrt(red_sum / color_count).to_i
average_green = Math.sqrt(green_sum / color_count).to_i
average_blue = Math.sqrt(blue_sum / color_count).to_i
average_color = "##{rgb_color_to_hex([average_red, average_green, average_blue])}"
puts average_color
There you go! It prints out the average color. We can close our laptops and call it a day. There are few reasons to improve code that we won’t touch again.
I’m smelling something
Well, for a throwaway project, that solution is indeed enough. Sometimes we have fun coding, and that’s it, no worries. In this particular case, though, I thought it would be an excellent exercise to practice functional programming. So, I decided to refactor it.
Some things bothered me in the original code. That each
block does a lot of stuff, and, as
generally happens with things with lots of responsibilities, it doesn’t do them well:
- The logic to calculate the average color is split into different parts of the
code. Some of it is inside the
each
block, and some of it is outside; - The code is fragile:
- Updating
color_count
has to happen after thenext if ...
calls; - It’s easy to miss why
color_count
is necessary at all and, instead, uselangs.size
to calculate the average color, which would give us the wrong result.
- Updating
- The code is very procedural, and it feels weird in Ruby.
It seems like having color_count
and the color sums as separate variables is causing some pain, so
we could change those variables to be a single array of colors and calculate the mean later.
Iteratively building a collection is an anti-pattern, but it does shine a light on a direction we
can follow.
Data transformation 🤝 Functional programming
Functional programming teaches us to think in terms of data transformation. Each function takes data and returns it in a new form. We can compose several functions together and form a pipeline.
Let’s walk our code and try to convert it into a pipeline. We can keep the imports and helper functions, so let’s skip to this part:
# Fetch and parse the language list YAML
langs_yaml = fetch_url(GITHUB_LANGS_URL)
langs = parse_yaml(langs_yaml)
In a functional programming language like Elixir, this could be written as:
GITHUB_LANGS_URL
|> fetch_url()
|> parse_yaml()
We hit our first roadblock: we have no pipe operator in Ruby! It is a common feature of functional programming languages that passes the result of an expression as a parameter of another expression.
So how can we do this in Ruby? We could write it as parse_yaml(fetch_url(GITHUB_LANGS_URL))
, but
keeping this pattern leads to quite unreadable code. Ruby is an object-oriented language, so we have
to think in terms of objects and messages (i.e., methods).
We need something that passes the caller to a given function, or, in other words, that yields self
to a block. Luckily, Ruby has a method that does exactly that: yield_self
, or its
nicer-sounding alias then
. Here’s how that code would look:
GITHUB_LANGS_URL
.then { |url| fetch_url(url) }
.then { |languages_yaml| parse_yaml(languages_yaml) }
Using Ruby’s numbered parameters, we can avoid having to name the block arguments:
GITHUB_LANGS_URL
.then { fetch_url _1 }
.then { parse_yaml _1 }
Cool, that is pretty close to the Elixir code. Now, we have to transform that big each
block into a
pipeline. In essence, that part of the code filters out non-programming languages and languages
without color then calculates the average color. Let’s split those two parts into separate steps. parse_yaml
returns a hash, so we can use Enumerable#filter
to select the languages we want.
# ...
.then { parse_yaml _1 }
.filter { |_lang_name, lang_details|
lang_details["type"] == "programming" && lang_details["color"]
}
Then, we get the colors of each language and convert them to RGB:
# ...
.filter { |_lang_name, lang_details|
lang_details["type"] == "programming" && lang_details["color"]
}
.map { |_lang_name, lang_details|
hex_color_to_rgb(details["color"])
}
This code works, but alas, it iterates over the languages twice (first time on filter
and the
other on map
). We could use Enumerable#reduce
to do this in a single pass, but that would be a
bit lengthy (and many folks don’t know Enumerable#reduce
). Again, Ruby has our back and provides a
Enumerable#filter_map
. It calls the given block on each element of the enumerable and returns an
array containing the truthy elements returned by the block. We can merge those two steps into one:
.filter_map { |_lang_name, lang_details|
next if lang_details["type"] != "programming"
next if lang_details["color"].nil?
hex_color_to_rgb(details["color"])
}
Now we have an array of colors, with each color as an array of red, green, and blue values. We need to sum all red values together, then all green values, and all blue values. Let’s reshape our data representation to group values by color channel, so this will be easier:
.filter_map {
# ...
}
.transpose
The pipeline is coming together, but we still have work to do. Calculating the average color now is
fairly simple using Enumerable#sum
(can we get Enumerable#mean
, tho? 😅):
.transpose
.map { |channel_values|
squared_average = channel_values.sum { |value| value ** 2 } / channel_values.size
Math.sqrt(squared_average).to_i
}
Lastly, we convert the color, represented as a 3-element array, to a hex string and print it. Here’s the full solution:
require "net/http"
require "yaml"
def fetch_url(url)
Net::HTTP.get(URI(url))
end
def parse_yaml(yaml_string)
YAML.safe_load(yaml_string)
end
def hex_color_to_rgb(color)
color.delete("#").scan(/../).map(&:hex)
end
def rgb_color_to_hex(color)
color.map { |channel| channel.to_s(16).rjust(2, "0") }.join
end
GITHUB_LANGS_URL = "https://raw.githubusercontent.com/github/linguist/6b02d3bd769d07d1fdc661ba9e37aad0fd70e2ff/lib/linguist/languages.yml"
GITHUB_LANGS_URL
.then { fetch_url _1 }
.then { parse_yaml _1 }
.filter_map { |_lang_name, lang_details|
next if lang_details["type"] != "programming"
next if lang_details["color"].nil?
hex_color_to_rgb(lang_details["color"])
}
.transpose
.map { |channel_values|
squared_average = channel_values.sum { |value| value ** 2 } / channel_values.size
Math.sqrt(squared_average).to_i
}
.then { |average_color| puts "##{rgb_color_to_hex(average_color)}" }
One of the neat things about that pipeline is that we can extract any part of it into a separate method, and it still will be chainable.
Takeaways
Ruby is a OOP language, so thinking about objects
and methods is the natural way of programming. Whenever you can, use methods (like those on the
Enumerable
module), or create objects that provide the ones you need.
Ruby also has good support for functional programming, and we can take advantage of that, particularly when doing data transformation. Mixing OOP and FP is not a sin, and Ruby has great features to support it.
Moreover, remember that it’s okay to start with a simple solution and improve it later. That’s the natural flow when doing TDD.
Hey, wait! You forget something!
What? Oh, the color! Here it is: