---
title: Ruby HTML Sanitization with Loofah
teaser: Need to do some HTML sanitization, but Rails ActionView Sanitize Helpers are
  not good enough? Read this post to learn how to do it with Loofah.
tags: ruby,web,development
author: Stefanni Brasil
published_on: 2023-11-24
---

_This post was originally published on the [hexdevs blog](https://www.hexdevs.com/posts/sanitize-html-ruby-loofah/)._

----

As a Rails developer, when you want to sanitize some user's HTML input, you just write `<%= sanitize some_user_provided_string %>` and call it a day.

But sometimes, this helper is not what you really need. You need to sanitize the user-provided HTML string outside of a Rails view or template. For example: when the unsafe HTML string comes from an integration, and you need to clean it before storing it.

That's when the Loofah gem shines. It is a Ruby library for HTML/XML transformation and sanitization, built on top of Nokogiri. Fun fact: you can provide a [custom Loofah scrubber](https://github.com/rails/rails-html-sanitizer/blob/main/README.md?plain=1#L149) to the Rails sanitize method 💡

## Ruby HTML Sanitation with the Loofah gem

Loofah sanitization provides custom methods called "Scrubbers". Scrubbers are built-in methods that do amazing things for you:

```ruby
doc = Loofah.html5_document(input)
doc.scrub!(:strip)       # replaces unknown/unsafe tags with their inner text
doc.scrub!(:prune)       # removes unknown/unsafe tags and their children
```

Cool, right? Even cooler is the fact that you can create your own Scrubbers entirely, or combine Loofah's scrubbers with your ones.

## Combine built-in and custom Loofah Scrubbers

Time to scrub some HTML from its potential dirtiness. `Html::Sanitizer` combines built-in scrubbers and a custom one:

```ruby
module Html
  class Sanitizer
    class InvalidHTMLError < StandardError; end

    def self.clean(content)
      sanitized_html = Loofah.fragment(content)
                             .scrub!(:prune)
                             .scrub!(:noopener)
                             .scrub!(:nofollow)
                             .scrub!(:target_blank)
                             .scrub!(:unprintable)
                             .scrub!(CoolScrubber.new).to_s

      return sanitized_html if !sanitized_html.empty?

      raise InvalidHTMLError, "Invalid HTML received"
    end
  end

  class CoolScrubber < Loofah::Scrubber
    def scrub(node)
      # custom HTML sanitation and transformation
    end
  end
end

```

Let's go by parts.

### Loofah built-in scrubbers

First, we parse the HTML content with Loofah's `fragment` method:

```ruby
sanitized_html = Loofah.fragment(content)
```

With a HTML fragment, we apply a mix of HTML transformation and sanitation to scrub the content:

```ruby
.scrub!(:prune) # => prunes unsafe tags and their subtrees, removing all traces that they ever existed
.scrub!(:noopener) # => adds rel="noopener" attribute to links
.scrub!(:nofollow) # => adds rel="nofollow" attribute to links
.scrub!(:targetblank) # => adds target="_blank" attribute to links
.scrub!(:unprintable) # => removes unprintable characters from text nodes
.scrub!(CoolScrubber.new).to_s # => custom scrubber, see next section
```

Lastly, we chain a custom scrubber to apply some business logic 👔

### Loofah Custom Scrubbers

We needed this custom Scruber to verify some HTML elements that we don't accept. Here is how it looks like:

```ruby
class CoolScrubber < Loofah::Scrubber
  def scrub(node)
    handle_not_allowed_nodes(node)
    handle_method_elements(node, %w[href src data srcset])
    # do what else you need besides what Loofah gives you
  end
end
```

And to make sure we are sanitizing our HTML as we expect, this class has an extensive test suite.

### Test Ruby HTML Sanitation with RSpec

When I was working on this feature, I followed OWASP's Cheat Sheet to write the tests. It was my first time doing this, so having a guide to verify
the HTML cleaning was helpful. Some test examples:

```ruby
it "removes malicious CSS attributes while retaining safe ones, if safe" do
  html = "<p style=\"display: block; background-image:url('http://www.ragingplatypus.com/i/cam-full.jpg'); background-color: blue;\"></p>"

  result = Html::Sanitizer.clean(html)

  expect(result).to eq "<p style=\"display:block;background-color:blue;\"></p>"
end

it "raises an InvalidHTMLError error message, if there are malicious attributes from different elements" do
  html = "<div style='background-image:url(javascript:alert('XSS'))'>" \
          "<input type='image' src='javascript:alert('XSS');''></div>" \
          "<div style='width: expression(alert('XSS'));'></div>"

  expect do
    Html::Sanitizer.clean(html)
  end.to raise_error(Html::Sanitizer::InvalidHTMLError, /Invalid HTML received/)
end

it "adds a target=_blank to all links even if they already have a target value" do
  html = "<a href=\"www.example.com/event-1\" target=\"_top\">Community Gathering Event</a>"

  result = Html::Sanitizer.clean(html)

  expect(result).to eq "<a href=\"www.example.com/event-1\" target=\"_blank\" rel=\"noopener nofollow\">Community Gathering Event</a>"
end
```

## Resources for HTML sanitization and transformation

These are the resources I used for this work and recommend checking out:

- [Loofah](https://github.com/flavorjones/loofah)
- [Cross Site Scripting Prevention Cheat Sheet](https://cheatsheetseries.owasp.org/cheatsheets/Cross_Site_Scripting_Prevention_Cheat_Sheet.html)
- [HTML5 Security Cheat Sheet](https://cheatsheetseries.owasp.org/cheatsheets/HTML5_Security_Cheat_Sheet.html)

## Contributing to Loofah is fun

By learning more about the Loofah gem, I ended up finding an opportunity to contribute to the project. I was adding `target=_blank` to all links manually in my project. And I thought: "if I have this problem, other might have it too, and they could benefit from having this feature available in the library."

I co-authored [this PR to add the target=_blank to all links as a built-in scrubber](https://github.com/flavorjones/loofah/pull/275), which was a great way to contribute back to the gem. It’s available on version >= 2.22.0.

This was my first time doing HTML Sanitization and it was a great learning opportunity. I had the chance to meet [@flavorjones](https://github.com/flavorjones) at RubyConf 2023, which was really nice :)

-----------

What about you? Have you had to sanitize HTML before? How did you go about it? What other tools have you used?
