---
title: What you see may not be what you get
teaser: The characters you see hide a more complex reality.
tags: programming,inclusivity
author: Matheus Richard
published_on: 2025-08-06
---

Say you have a string like `"💖 Love"`, and you want to extract the emoji. Your
first instinct might be to try something like:

```rb
string = "💖 Love"
emoji = string[0] # => "💖"
```

While that works in simple cases, it's unreliable. If you try the same thing
with a more complex emoji, you'll get different results:

```rb
string = "👨‍👩‍👧 Family"
emoji = string[0] # => "👨"
```

Here's the catch: that emoji is composed of multiple [Unicode][] characters.
Even though we see one thing, it's actually made up of several code points under
the hood.

Because **not every _user-perceived_ character corresponds to a single
code point**, we need to use a different "unit" of measurement to accurately walk
through the string.

That's where [Grapheme Clusters][] come in. They provide a more accurate way to
represent user-perceived characters.

<aside class="info">
  <p>
    Note that user-perceived characters might be ambiguous depending on several
    factors, including language and context.
  </p>
  <p>
    Grapheme Clusters are a best-effort approximation of user-perceived
    characters that can be identified programmatically in a consistent way.
  </p>
</aside>

Back to the initial example, let's extract the emoji using Grapheme Clusters.
Luckily, Ruby has [built-in support] for them:

```rb
string = "💖 Love"
emoji = string.grapheme_clusters[0] # => "💖"

string = "👨‍👩‍👧 Family"
emoji = string.grapheme_clusters[0] # => "👨‍👩‍👧"
```

And... done!

## Why should you care?

Even if you're not dealing with emojis, understanding Grapheme Clusters is
useful for handling any text with “complex” characters.

If you're working with languages that have accents, ligatures, or other
composite characters (things like g̈ or กำ), being aware of Grapheme Clusters
can help you avoid subtle bugs and ensure your text processing is accurate. This
can be especially sensitive when dealing with user data like names, addresses,
and other personal information.

Now you know about Grapheme Clusters. Use that power well.

[Unicode]: https://home.unicode.org/
[Grapheme Clusters]:
    https://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries
[built-in support]:
    https://docs.ruby-lang.org/en/master/String.html#method-i-grapheme_clusters
